In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.style.use('classic')
plt.rcParams['figure.facecolor'] = 'white'

# Homework 3

**Instructions:** Complete the notebook below. Download the completed notebook in HTML format. Upload assignment using Canvas.

## Exercise: Income and Growth Across Countries

The data file `cross_country_gdp_per_capita.csv` contains annual data on GDP per capita for over 100 countries and is available here: https://raw.githubusercontent.com/letsgoexploring/economic-data/master/cross-country-production/csv/cross_country_gdp_per_capita.csv. The data are was constructed from the Penn World Table (https://www.rug.nl/ggdc/productivity/pwt/).

In this exercise, you will:

1. Analyze some basic facts about GDP per capita across the world.
2. Compute the average annual growth rate of GDP per capita for each country in the data.
3. Compute a linear regression of average GDP per capita growth on GDP per capita in the initial year.

Follow the instructions in the following cells.

### Part (a)

In [None]:
# Create a variable called 'data' that stores the data in the file 'cross_country_gdp_per_capita.csv' in a DataFrame
# Make sure that 'Year', the left-most column of the data, is set to be the index column


# Print the first 5 rows of data


In [None]:
# Each column contains data for a specific country. Print the number of countries (e.g., the length of data.columns)


In [None]:
# Each column contains data for a specific year. Print the number of years (e.g., the length of data.index)


In [None]:
# Print the values of the initial year and the final year in the data



In [None]:
# Create a variable called 'data_initial' that is equal to the first row of the data DataFrame


# Create a variable called 'data_final' that is equal to the last row of the data DataFrame


In [None]:
# Use the sort_values() method of data_initial to sort the Series in ascending order.


# Print the name and GDP per capita of the countries with the 10 lowest GDPs per capita in the initial year of the data



In [None]:
# Use the sort_values() method of data_final to sort the Series in ascending order.


# Print the name and GDP per capita of the countries with the 10 lowest GDPs per capita in the final year of the data



In [None]:
# Print the names of the countries that are among the 10 with lowest GDPs per capita in BOTH the initial year and 
# the final year. If you can't think of an efficient way to do it, just print maunally. E.g.:
#     print('Country 1')
#     print('Country 2')
#     Etc.




In [None]:
# Print the name and GDP per capita of the countries with the 10 highest GDPs per capita in the initial year of the data



In [None]:
# Print the name and GDP per capita of the countries with the 10 highest GDPs per capita in the final year of the data



In [None]:
# Print the names of the countries that are among the 10 with highest GDPs per capita in BOTH the initial year
# and the final year




### Part (b)

Let $y_t$ denotes GDP per capita for some country in some year $t$ and let $g$ denotes the average annual growth in GDP per capita between years 0 and $T$. $g$ is defined by:
\begin{align}
y_T & = (1+g)^T y_0
\end{align}
which implies:
\begin{align}
g & = \left(\frac{y_T}{y_0}\right)^{1/T} - 1
\end{align}
We can use this equation to compute the aveage growth rate of GDP per capita for each country in our data. Note that for our data, $T$ is equal to `len(data.index)-1`.

In [None]:
# Create a variable called 'growth_rates' that contains the average growth rate of each country in the data.
# NOTE: You do not need to re-sort the variables data_initial and data_final. If their indices don't align,
# Pandas will automatically sort their indices alphabetically.


# Use the sort_values() method of growth_rates to sort the Series in ascending order.


In [None]:
# Print the average of the average growth rate of GDP per capita for all countries rounded to 4 decimal places


In [None]:
# Print the standard deviation of the growth rate of GDP per capita for all countries rounded to 4 decimal places


In [None]:
# Print the name and growth rates of the countries with the 10 lowest growth rates of GDP per capita



In [None]:
# Print the name and growth rates of the countries with the 10 highest growth rates of GDP per capita



In [None]:
# Print the names of the countries that were BOTH among the 10 with lowest GDPs per capita in the initial year and 
# among the 10 with the highest growth of GDP per capita.
# If you can't think of an efficient way to do it, just print maunally. E.g.:
#     print('Country 1')
#     print('Country 2')
#     Etc.




In [None]:
# Use the sort_index() method of data_initial to alphabetize the index [e.g., data_initial = data_initial.sort_index()]


# Use the sort_index() method of growth_rates to alphabetize the index


# Construct a scatter plot with:
#     1. GDP per capita in the initial year on the horizontal axis
#     2. Average growth rate of GDP per capita on the vertical axis
#     3. Size of scatter plot markers at least 50
#     4. Opacity of scatter plot markers no greater than 0.5
#     5. x-axis limit: [0,20000]
#     6. Suitable title and labels for the axes




**Question**

1. Describe in words the relationship between GDP per capita in the initial year and average growth of GDP per capita. How is the relationship different for low income countries and high income countries?

**Answer**

1.  

### Part (c)

In this part you will use OLS to estimate the relationship between GDP per capita in the initial year and average growth of GDP per capita.

\begin{align}
g_i & = \beta_0 + \beta_{\text{1}}y_i + \epsilon_i,
\end{align}

where $g_i$ is the average annual growth rate of country $i$, $y_i$ is country $i$'s initial GDP per capita, and $\epsilon_i$ is the residual of the regression.

In [None]:


# Create a variable called 'y' that is equal to the average growth rates of the countries


# Create a variable called x' that is equal to the GDP per capita of the countries in the initial year


# Use the Statsmodels function add_constant() to add a constant column to the variable x


# Create a variable called 'model' equal to the output of the Statsmodels OLS function regressing y upon x


# Create a variable called 'results' that stores the fitted model (i.e., the output of the fit() method of model)


# Print the summary of results generated by the summary() method of results


**Questions**

1. Interpret the estimate for the coefficient on $y_i$.
2. Interpret the estimate for the constant. Where did we see this number earlier?
3. Interpret the R-squared of the regression.

**Answers**

1.  

2.  

3.  

## Exercise: Income and Growth Across US States

The data file `state_income_data.csv` contains annual data on income per capita for the 48 continental states, Washington D.C., and the United States as a whole and is available here: https://raw.githubusercontent.com/letsgoexploring/economic-data/master/us-convergence/csv/state_income_data.csv. 

In this exercise, you will:

1. Analyze some basic facts about income per capita among US states.
2. Compute the average annual growth rate of income per capita for each state in the data.
3. Compute a linear regression of average income per capita growth on income per capita in the initial year.

Follow the instructions in the following cells.

### Part (a)

In [None]:
# Create a variable called 'state_df' that stores the data in the file 'cross_country_gdp_pc.csv' in a DataFrame
# Make sure that 'Year', the left-most column of the data, is set to be the index column


# Print the first 5 rows of state_df


The DataFrame method `dropna()` returns a DataFrame that omits rows that contain missing (NaN means *not a number*) values. The following command will remove the rows with missing values from the DataFrame `df`:

    df =  df.dropna()
    
The DataFrame method `drop()` returns a DataFrame that omits rows or columns with a given label. The following command will remove the column named `'LABEL'` from the DataFrame `df`:   
   
    df = df.drop('LABEL',axis=1)
    
The argument `axis=1` means to look among the columns for `'LABEL'`. If we had wanted to remove a row with the index value `'LABEL'`, then the axis argument would have been: `axis=0`.

In [None]:
# Remove the rows with missing data from state_df


# Remove the column named 'United States' from state_df


# Set state_df equal to itself divided by 1000 so that its units are in thousands of dollars


# Print the first 5 rows of state_df


In [None]:
# Create a variable called 'state_growth_rates' that contains the average growth rate of each state in the data.




In [None]:
# Construct a scatter plot with:
#     1. Income per capita in the initial year on the horizontal axis
#     2. Average growth rate of income per capita on the vertical axis
#     3. Size of scatter plot markers at least 50
#     4. Opacity of scatter plot markers no greater than 0.5
#     5. y-axis limit: [0.015,0.035]
#     6. Suitable title and labels for the axes




### Part (b)

Like you did for the country data above, use OLS to estimate the relationship between income per capita in the initial year and average growth of income per capita.

\begin{align}
g_i & = \beta_0 + \beta_{\text{1}}y_i + \epsilon_i,
\end{align}

where $g_i$ is the average annual growth rate of country $i$, $y_i$ is state $i$'s initial income per capita, and $\epsilon_i$ is the residual of the regression.

In [None]:
# Create a variable called 'y' that is equal to the average growth rates of the states


# Create a variable called x' that is equal to the income per capita of the states in the initial year


# Use the Statsmodels function add_constant() to add a constant column to the variable x


# Create a variable called 'model' equal to the output of the Statsmodels OLS function regressing y upon x


# Create a variable called 'results' that stores the fitted model (i.e., the output of the fit() method of model)


# Print the summary of results generated by the summary() method of results


**Questions**

1. Interpret the estimate for the coefficient on $y_i$.
2. Interpret the R-squared of the regression.
3. Why do you think that initial income and average growth are so strongly correlated in the state data but not so in the country data? (If you're not sure, referesh your knowledge about the difference between *conditional* and *unconditional* convergence: https://en.wikipedia.org/wiki/Convergence_(economics)#Types_of_Convergence)

**Answers**

1.  

2.  

3.  