# Economotrics Project (BGD708)

#### Author: 
Alban Pereira (alban.pereira@telecom-paris.fr)

Laurent Gayraud (laurent.gayraud@telecom-paris.fr)

## Imports

In [1]:
import pandas as pd
import numpy as np

## PART 1 - CROSS-SECTION DATA

This part uses the dataset HPRICE2.RAW described in HPRICE2.DES.

#### Q1. State the fundamental hypothesis under which the Ordinary Least Squares (OLS) estimators are unbiased.

#### Q2. Show that under this assumption the OLS estimators are indeed unbiased.

#### Q3. Explain the sample selection bias with an example from the course.

#### Q4. Explain the omitted variable bias with an example from the course.

#### Q5. Explain the problem of multicollinearity. Is it a problem in this dataset?

#### Q6. Create three categories of nox levels (low, medium, high), corresponding to the following percentiles: 0-25%, 26%-74%, 75%-100%

#### Q7. Compute for each category of <i>nox</i> level the average median price and comment on your results.

#### Q8. Produce a scatter plot with the variable <i>price</i> on the y-axis and the variable <i>nox</i> on the x-axis. Is this a ceteris paribus effect?

#### Q9. Run a regression of <i>price</i> on a <i>constant</i>, <i>crime</i>, <i>nox</i>, <i>rooms</i>, <i>proptax</i>. Comment on the histogram of the residuals. Interpret all coefficients.

#### Q10. Run a regression of <i>lprice</i> on a <i>constant</i>, <i>crime</i>, <i>nox</i>, <i>rooms</i>, <i>proptax</i>. Interpret all coefficients.

#### Q11. Run a regression of <i>lprice</i> on a <i>constant</i>, <i>crime</i>, <i>lnox</i>, <i>rooms</i>, <i>lproptax</i>. Interpret all coefficients.

#### Q12. In the specification of question 9, test the hypothesis H0: $\beta_{nox}$ = 0 vs. H1: $\beta_{nox}$ > 0 at the 1% level.

#### Q13. In the specification of question 9, test the hypothesis H0: $\beta_{nox}$ = 0 vs. H1: $\beta_{nox}$ ≠ 0 at the 1% level using the p-value of the test.

#### Q14. In the specification of question 9, test the hypothesis H0: $\beta_{crime}$ = $\beta_{proptax}$ at the 10% level.

#### Q15. In the specification of question 9, test the hypothesis H0: $\beta_{nox}$ = 0, $\beta_{proptax}$ = 0 at the 10% level.

#### Q16. In the specification of question 9, test the hypothesis H0: $\beta_{nox}$ = -500, $\beta_{proptax}$ = -100 at the 10% level using the p-value of the test.

#### Q17. In the specification of question 9, test the hypothesis that all coefficients are the same for observations with low levels of <i>nox</i> vs. medium and high levels of <i>nox</i>. 

#### Q18. Repeat the test of question 17 but now assuming that only the coefficients of <i>nox</i> and <i>proptax</i> can change between the two groups of observations. State and test $H_{0}$.

## PART 2 - HETEROSKEDASTICITY

#### Q19. Explain the problem of heteroskedasticity with an example of the course. 

#### Q20. In the specification of question 9, test the hypothesis of no heteroskedasticity of linear form, i.e. in the regression of $u^{2}$ on constant, crime, nox, rooms, proptax, test $H_{0}$: $\delta_{crime}$, $\delta_{nox}$, $\delta_{room}$, $\delta_{proptax}$ = 0, where the coefficients $\delta_{k}$ (k = crime, nox, rooms, proptax) are associated with the corresponding explanatory variables.

#### Q21. In the specification of question 10, test the hypothesis of no heteroskedasticity of linear form.

#### Q22. In the specification of question 11, test the hypothesis of no heteroskedasticity of linear form.

#### Q23. Comment on the differences between your results of questions 20,21,22.

#### Q24. Regardless of the results of the test of question 22, identify the most significant variable causing heteroskedasticity using the student statistics and run a WLS regression with the identified variable as weight. 

## PART 3 - TIME SERIES DATA

This part uses the threecenturies_v2.3 datasets. Import Real GDP at market prices, unemployment rate and consumer price inflation for the period 1900-2000 in Python from the A1 worksheet.

#### Q25. Define strict and weak stationarity.

#### Q26. Explain ergodicity and state the ergodic theorem. Illustrate with an example.

#### Q27. Why do we need both stationarity and ergodicity? 

#### Q28. Explain “spurious regression”.

#### Q29. Make all time series stationary by computing the difference between the original variable and a moving average of order 10. 

#### Q30. Using the original dataset, test the unit root hypothesis for all variables.

#### Q31. Transform all variables so that they are stationary using either your answers to questions 29 or to question 30.

#### Q32. Explain the difference between ACF and PACF. 

#### Q33. Plot and comment on the ACF and PACF of all variables. 

#### Q34. Explain the principle of parsimony and its relationship with Ockham’s razor using the theory of information criterion.

#### Q35. Explain the problem of auto-correlation of the errors.

#### Q36. Using only stationary variables, run a regression of GDP on constant, unemployment and inflation and test the hypothesis of no-autocorrelation of errors.

#### Q37. Regardless of your answer to question 36, correct auto-correlation with GLS. Comment on your results.

#### Q38. For all variables, construct their lag 1 and lag 2 variables.

#### Q39. Run a regression of GDP on constant, lag 1 unemployment, lag 2 unemployment, lag 1 inflation, lag 2 inflation. What is the number of observations and why?

#### Q40. State and test the no-Granger causality hypothesis of unemployment on GDP at the 1% level.

#### Q41. Divide the sample in two groups: 1900-1960 and 1961-2000. Test the stability of coefficients between the two periods.

#### Q42. Test the structural breakpoint using a trim ratio of 30% at the 1% level.