## STATS 131 - Group Project
Created by Kaixin Wang
### Datasets:
* GDP_and_GDP_Per_Capita.csv (1)
* Expenditure_on_Health.csv (2)
* Production_Trade_and_Supply_of_Energy.csv (3)
* Internet_Usage.csv (4)
* Public_Expenditure_on_Education.csv (5)
* Tourist_Visitors_Arrival_and_Expenditure.csv (6)
* GDP_on_R&D.csv (7)
* Exchange_Rates.csv (8)
* Consumer_Price_Index.csv (9)

### Variables:
* response: CPI (9)
* predictors:
    - GDP (1)
    - expenditure on health (2)
    - energy usage (3)
    - Internet usage (4)
    - expenditure on education (5)
    - expenditure on tourism (6)
    - expenditure on science & technology (7)
    - exchange rate (8)
    
### Dataset sources:
United Nations: http://data.un.org/
- National accounts (1)
    - GDP and GDP per capita
- Nutrition and health (2)
    - Health expenditure
- Energy (3)
    - Energy production, trade and consumption
- Communication (4)
    - Internet usage
- Education (5)
    - Public expenditure on education
- Science and technology (7)
    - Human resources in R & D
- Finance (8)
    - Exchange rates
- Price and production indices (9)
    - Consumer price indices
   
### Objective:
To predict the CPI (customer price index) of a country by using predictors that are elements of economic growth.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
CPI = pd.read_csv("Consumer_Price_Index.csv", encoding ="ISO-8859-1")
CPI.head()
CPI.Series.unique()

FileNotFoundError: File b'Consumer_Price_Index.csv' does not exist

In [None]:
health = pd.read_csv("Expenditure_on_Health.csv", encoding ="ISO-8859-1")
health.head()
health.Series.unique()
health = health.loc[health.Series == 'Current health expenditure (% of GDP)']
health.head()

In [None]:
GDP = pd.read_csv("GDP_and_GDP_Per_Capita.csv", encoding ="ISO-8859-1")
GDP.Series.unique()
gdp = GDP.loc[GDP.Series == "GDP per capita (US dollars)"]

In [None]:
energy = pd.read_csv("Production_Trade_and_Supply_of_Energy.csv", encoding ="ISO-8859-1" )
energy.Series.unique()
energy = energy.loc[energy.Series == "Primary energy production (petajoules)"]

In [None]:
internet = pd.read_csv("Internet_Usage.csv", encoding ="ISO-8859-1" )

In [None]:
education = pd.read_csv("Public_Expenditure_on_Education.csv", encoding ="ISO-8859-1" )
education.Series.unique()
education.loc[education.Series == 'Current expenditure other than staff compensation as % of total expenditure in public institutions (%)']
education = education.loc[education.Series == "Public expenditure on education (% of government expenditure)"]

In [None]:
tourism = pd.read_csv("Tourist_Visitors_Arrival_and_Expenditure.csv", encoding ="ISO-8859-1" )
tourism.Series.unique()
tourism = tourism.loc[tourism.Series == "Tourism expenditure (millions of US dollars)"]

In [None]:
technology = pd.read_csv("GDP_on_R_D.csv", encoding ="ISO-8859-1" )
technology.Series.unique()
tech = technology.loc[technology.Series == 'Gross domestic expenditure on R & D: as a percentage of GDP (%)']

In [None]:
rates = pd.read_csv("Exchange_Rates.csv", encoding ="ISO-8859-1" )
rates.Series.unique()
rates = rates.loc[rates.Series == "Exchange rates: period average (national currency per US dollar)"]

In [None]:
gdp.pivot(index = "Year", columns = "Country", values = "Value").head()

In [None]:
energy.pivot(index = "Year", columns = "Country", values = "Value").head()

In [None]:
education.pivot(index = "Year", columns = "Country", values = "Value").head()

In [None]:
tourism.pivot(index = "Year", columns = "Country", values = "Value").head()

In [None]:
internet.pivot(index = "Year", columns = "Country", values = "Value").head()

In [None]:
tech.pivot(index = "Year", columns = "Country", values = "Value").head()

In [None]:
rates.pivot("Year", columns = "Country", values = "Value").head()

In [None]:
np.random.seed(1000)
ids = GDP.ID.unique()
ids  # unique country IDs

samples = np.random.choice(ids, 50)
samples # randomly selected country IDs

s = gdp.loc[gdp.ID.isin(samples),]
s.Country.unique()

In [None]:
CPI = CPI.pivot(index = "Year", columns = "Country", values = "Value")
GDP = gdp.pivot(index = "Year", columns = "Country", values = "Value")
energy = energy.pivot(index = "Year", columns = "Country", values = "Value")
health = health.pivot(index = "Year", columns = "Country", values = "Value")
education = education.pivot(index = "Year", columns = "Country", values = "Value")
tech = tech.pivot(index = "Year", columns = "Country", values = "Value")
internet = internet.pivot(index = "Year", columns = "Country", values = "Value")
rates = rates.pivot(index = "Year", columns = "Country", values = "Value")
tourism = tourism.pivot(index = "Year", columns = "Country", values = "Value")

In [None]:
names = s.Country.unique()
names

In [None]:
table = pd.DataFrame(GDP.loc[:, names[0]])
table = table.join(pd.DataFrame(energy.loc[:, names[0]]), lsuffix='GDP', rsuffix='Energy')
table = table.join(pd.DataFrame(tech.loc[:, names[0]]), lsuffix = "Energy", rsuffix='Tech')
table = table.join(pd.DataFrame(education.loc[:, names[0]]), lsuffix = "Tech", rsuffix='Education')
table = table.join(pd.DataFrame(rates.loc[:, names[0]]), lsuffix = "Tech", rsuffix='Rates')
table = table.join(pd.DataFrame(internet.loc[:, names[0]]), lsuffix = "Rates", rsuffix='Internet')
table = table.join(pd.DataFrame(tourism.loc[:, names[0]]), lsuffix = "Internet", rsuffix='Tourism')
table = table.join(pd.DataFrame(health.loc[:, names[0]]), lsuffix = "Tourism", rsuffix='Health')
table

In [None]:
table = pd.DataFrame(GDP.loc[:, names[14]])
table = table.join(pd.DataFrame(energy.loc[:, names[14]]), lsuffix='GDP', rsuffix='Energy')
table = table.join(pd.DataFrame(tech.loc[:, names[14]]), lsuffix = "Energy", rsuffix='Tech')
table = table.join(pd.DataFrame(education.loc[:, names[14]]), lsuffix = "Tech", rsuffix='Education')
table = table.join(pd.DataFrame(rates.loc[:, names[14]]), lsuffix = "Tech", rsuffix='rates')
table = table.join(pd.DataFrame(internet.loc[:, names[14]]), lsuffix = "Rates", rsuffix='Internet')
table = table.join(pd.DataFrame(tourism.loc[:, names[14]]), lsuffix = "Internet", rsuffix='Tourism')
table = table.join(pd.DataFrame(health.loc[:, names[14]]), lsuffix = "Tourism", rsuffix='Health')
table