In [42]:
import pandas as pd
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm2
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn import metrics 
from sklearn.model_selection  import train_test_split
import numpy as np
import statsmodels.api as sm

%matplotlib inline

In [43]:
df= pd.read_csv('Macro_Economic.csv', header=0)
df.head()

Unnamed: 0,year,quarter,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959,1,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0
1,1959,2,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74
2,1959,3,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09
3,1959,4,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06
4,1960,1,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19


#### We should use a two-tailed test since we are checking for inequality.

#### Null Hypothesis:  the mean values of unemployment rate from 1970's and 1980's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of unemployment rate from 1970's and 1980's are statistically different (𝜇1≠𝜇2)

In [44]:
df_1970=df[df['year']>=1970]
df_1970s=df_1970[df_1970['year']<1980]
df_1980=df[df['year']>= 1980]
df_1980s=df_1980[df_1980['year']< 1990]

In [45]:
n1 = len(df_1970s)
mu1 = df_1970s['unemp'].mean()
sd1 = df_1970s['unemp'].std()

(n1, mu1, sd1)

(40, 6.215000000000001, 1.1612438070823798)

In [46]:
n2 = len(df_1980s)
mu2 = df_1980s['unemp'].mean()
sd2 = df_1980s['unemp'].std()

(n2, mu2, sd2)

(40, 7.277500000000001, 1.4834968218748634)

In [47]:
sm.stats.ztest(df_1970s['unemp'].dropna(), df_1980s['unemp'].dropna(),alternative='two-sided')

(-3.5668975776190295, 0.0003612325858571891)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean unemp rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of inflation rate from 1970's and 1980's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of inflation rate from 1970's and 1980's are statistically different (𝜇1≠𝜇2)

In [48]:
n1 = len(df_1970s)
mu1 = df_1970s['infl'].mean()
sd1 = df_1970s['infl'].std()

(n1, mu1, sd1)

(40, 7.217000000000001, 3.4328594644354586)

In [49]:
n2 = len(df_1980s)
mu2 = df_1980s['infl'].mean()
sd2 = df_1980s['infl'].std()

(n2, mu2, sd2)

(40, 4.91375, 3.3730391739840155)

In [50]:
sm.stats.ztest(df_1970s['infl'].dropna(), df_1980s['infl'].dropna(),alternative='two-sided')

(3.0268006705863875, 0.002471568809149722)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean inflation rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of unemployment rate from 1970's and 1990's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of unemployment rate from 1970's and 1990's are statistically different (𝜇1≠𝜇2)

In [51]:
df_1990=df[df['year']>= 1990]
df_1990s=df_1990[df_1990['year']< 2000]

In [52]:
n1 = len(df_1970s)
mu1 = df_1970s['unemp'].mean()
sd1 = df_1970s['unemp'].std()

(n1, mu1, sd1)

(40, 6.215000000000001, 1.1612438070823798)

In [53]:
n2 = len(df_1990s)
mu2 = df_1990s['unemp'].mean()
sd2 = df_1990s['unemp'].std()

(n2, mu2, sd2)

(40, 5.764999999999999, 1.0586759994421673)

In [54]:
sm.stats.ztest(df_1970s['unemp'].dropna(), df_1990s['unemp'].dropna(),alternative='two-sided')

(1.8111614243268799, 0.07011586759826452)

Since the p-value is greater than the standard confidence level 0.05, we cannot reject the Null hypothesis.The mean unemp rate of this pair of decades are the same.

#### Null Hypothesis:  the mean values of inflation rate from 1970's and 1990's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of inflation rate from 1970's and 1990's are statistically different (𝜇1≠𝜇2)

In [55]:
n1 = len(df_1970s)
mu1 = df_1970s['infl'].mean()
sd1 = df_1970s['infl'].std()

(n1, mu1, sd1)

(40, 7.217000000000001, 3.4328594644354586)

In [56]:
n2 = len(df_1990s)
mu2 = df_1990s['infl'].mean()
sd2 = df_1990s['infl'].std()

(n2, mu2, sd2)

(40, 2.8347499999999988, 1.3152614783419518)

In [57]:
sm.stats.ztest(df_1970s['infl'].dropna(), df_1990s['infl'].dropna(),alternative='two-sided')

(7.539250415391108, 4.726805289824926e-14)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean inflation rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of unemployment rate from 1970's and 2000's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of unemployment rate from 1970's and 2000's are statistically different (𝜇1≠𝜇2)

In [58]:
df_2000=df[df['year']>= 2000]
df_2000s=df_2000[df_2000['year']< 2010]

In [59]:
n1 = len(df_1970s)
mu1 = df_1970s['unemp'].mean()
sd1 = df_1970s['unemp'].std()

(n1, mu1, sd1)

(40, 6.215000000000001, 1.1612438070823798)

In [60]:
n2 = len(df_2000s)
mu2 = df_2000s['unemp'].mean()
sd2 = df_2000s['unemp'].std()

(n2, mu2, sd2)

(39, 5.415384615384615, 1.267724538720015)

In [61]:
sm.stats.ztest(df_1970s['unemp'].dropna(), df_2000s['unemp'].dropna(),alternative='two-sided')

(2.9246106122154587, 0.0034488748334937746)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean unemp rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of inflation rate from 1970's and 2000's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of inflation rate from 1970's and 2000's are statistically different (𝜇1≠𝜇2)

In [62]:
n1 = len(df_1970s)
mu1 = df_1970s['infl'].mean()
sd1 = df_1970s['infl'].std()

(n1, mu1, sd1)

(40, 7.217000000000001, 3.4328594644354586)

In [63]:
n2 = len(df_2000s)
mu2 = df_2000s['infl'].mean()
sd2 = df_2000s['infl'].std()

(n2, mu2, sd2)

(39, 2.517179487179487, 2.887107411195274)

In [64]:
sm.stats.ztest(df_1970s['infl'].dropna(), df_2000s['infl'].dropna(),alternative='two-sided')

(6.5773213826325785, 4.7899872582103316e-11)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean inflation rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of unemployment rate from 1980's and 1980's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of unemployment rate from 1980's and 1990's are statistically different (𝜇1≠𝜇2)

In [65]:
n1 = len(df_1980s)
mu1 = df_1980s['unemp'].mean()
sd1 = df_1980s['unemp'].std()

(n1, mu1, sd1)

(40, 7.277500000000001, 1.4834968218748634)

In [66]:
n2 = len(df_1990s)
mu2 = df_1990s['unemp'].mean()
sd2 = df_1990s['unemp'].std()

(n2, mu2, sd2)

(40, 5.764999999999999, 1.0586759994421673)

In [67]:
sm.stats.ztest(df_1980s['unemp'].dropna(), df_1990s['unemp'].dropna(),alternative='two-sided')

(5.248732974660642, 1.531488727659354e-07)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean unemp rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of inflation rate from 1980's and 1990's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of inflation rate from 1980's and 1990's are statistically different (𝜇1≠𝜇2)

In [68]:
n1 = len(df_1980s)
mu1 = df_1980s['infl'].mean()
sd1 = df_1980s['infl'].std()

(n1, mu1, sd1)

(40, 4.91375, 3.3730391739840155)

In [69]:
n2 = len(df_1990s)
mu2 = df_1990s['infl'].mean()
sd2 = df_1990s['infl'].std()

(n2, mu2, sd2)

(40, 2.8347499999999988, 1.3152614783419518)

In [70]:
sm.stats.ztest(df_1980s['infl'].dropna(), df_1990s['infl'].dropna(),alternative='two-sided')

(3.6318488461389364, 0.0002813979370050539)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean inflation rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of unemployment rate from 1980's and 2000's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of unemployment rate from 1980's and 2000's are statistically different (𝜇1≠𝜇2)

In [71]:
n1 = len(df_1980s)
mu1 = df_1980s['unemp'].mean()
sd1 = df_1980s['unemp'].std()

(n1, mu1, sd1)

(40, 7.277500000000001, 1.4834968218748634)

In [72]:
n2 = len(df_2000s)
mu2 = df_2000s['unemp'].mean()
sd2 = df_2000s['unemp'].std()

(n2, mu2, sd2)

(39, 5.415384615384615, 1.267724538720015)

In [73]:
sm.stats.ztest(df_1980s['unemp'].dropna(), df_2000s['unemp'].dropna(),alternative='two-sided')

(5.990858181979272, 2.0873660574981933e-09)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean unemp rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of inflation rate from 1980's and 2000's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of inflation rate from 1980's and 2000's are statistically different (𝜇1≠𝜇2)

In [74]:
n1 = len(df_1980s)
mu1 = df_1980s['infl'].mean()
sd1 = df_1980s['infl'].std()

(n1, mu1, sd1)

(40, 4.91375, 3.3730391739840155)

In [75]:
n2 = len(df_2000s)
mu2 = df_2000s['infl'].mean()
sd2 = df_2000s['infl'].std()

(n2, mu2, sd2)

(39, 2.517179487179487, 2.887107411195274)

In [76]:
sm.stats.ztest(df_1980s['infl'].dropna(), df_2000s['infl'].dropna(),alternative='two-sided')

(3.388794605058735, 0.000702005729110477)

Since the p-value is lower than the standard confidence level 0.05, we can reject the Null hypothesis. The mean inflation rate of this pair of decades are different.

#### Null Hypothesis:  the mean values of unemployment rate from 1990's and 2000's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of unemployment rate from 1990's and 2000's are statistically different (𝜇1≠𝜇2)

In [77]:
n1 = len(df_1990s)
mu1 = df_1990s['unemp'].mean()
sd1 = df_1990s['unemp'].std()

(n1, mu1, sd1)

(40, 5.764999999999999, 1.0586759994421673)

In [78]:
n2 = len(df_2000s)
mu2 = df_2000s['unemp'].mean()
sd2 = df_2000s['unemp'].std()

(n2, mu2, sd2)

(39, 5.415384615384615, 1.267724538720015)

In [79]:
sm.stats.ztest(df_1990s['unemp'].dropna(), df_2000s['unemp'].dropna(),alternative='two-sided')

(1.3318085165848554, 0.18292311866562538)

Since the p-value is greater than the standard confidence level 0.05, we cannot reject the Null hypothesis.The mean unemp rate of this pair of decades are the same.

#### Null Hypothesis:  the mean values of inflation rate from 1990's and 2000's are statistically the same (𝜇1=𝜇2) 
#### Alternative Hypthosis:  the mean values of inflation rate from 1990's and 2000's are statistically different (𝜇1≠𝜇2)

In [80]:
n1 = len(df_1990s)
mu1 = df_1990s['infl'].mean()
sd1 = df_1990s['infl'].std()

(n1, mu1, sd1)

(40, 2.8347499999999988, 1.3152614783419518)

In [81]:
n2 = len(df_2000s)
mu2 = df_2000s['infl'].mean()
sd2 = df_2000s['infl'].std()

(n2, mu2, sd2)

(39, 2.517179487179487, 2.887107411195274)

In [82]:
sm.stats.ztest(df_1990s['infl'].dropna(), df_2000s['infl'].dropna(),alternative='two-sided')

(0.6317557018702965, 0.5275465239094845)

Since the p-value is greater than the standard confidence level 0.05, we cannot reject the Null hypothesis. The mean inflation rate of this pair of decades are the same.