# Finance Research Practicum Python Diagnostic

This is a basic Python diagnostic designed to assess your familiarity with Python. You will also likely come across similar problems during FRP. Feel free to use online resources or any other materials, but it’s highly advised you work on this.

## Part 0: Set up

Import necessary Libraries

In [None]:
# If it is the first time, you may need to install the necessary packages
!pip install pandas

In [None]:
import pandas as pd
import numpy as np
import os
import statsmodels.api as sm
# ...

## Part 1: Working with Data

A large amount of time will be spent cleansing and manipulating data. This part will assess your familiarity in manipulating data types and creating new ones.
1. Please download daily data of the S&P500, Seasonally Adjusted Quarterly GDP Growth and the 10-Year Constant Maturity Rate from 2009-01-01. Load both datasets into Python. Links to both are below:


- https://fred.stlouisfed.org/series/DGS10
- https://fred.stlouisfed.org/series/SP500
- https://fred.stlouisfed.org/series/A191RP1Q027SBEA


In [None]:
### Load Dataset here
DATA_DIR = 
df_DGS10 = 
df_SP500 = 
df_GDP = 

In [None]:
df_DGS10.head()

In [None]:
df_SP500.head()

In [None]:
df_GDP.head()

2.	This data needs to be cleansed. Some days have a price level “.” which causes Python to read this column as an object type instead of numeric. Please:


- Remove all rows with price “.” in the DGS10 & SP500 datasets

- Reformat all price columns to type numeric

- Rescale the GDP dataset’s returns to be in real levels (divide everything by 100)

- Rename the second column of the GDP dataset to “GDPReturn”


In [None]:
# Remove all rows with price “.” in the DGS10 & SP500 datasets


In [None]:
# Reformat all price columns to type numeric


In [None]:
# Rescale the GDP dataset’s returns to be in real levels (divide everything by 100)


In [None]:
# Rename the second column of the GDP dataset to “GDPReturn”


3.	It’s not a good idea to work with level data, so let’s transform the data. Please compute the daily returns of both the S&P and 10 Yr CMT and create a new column called “SP_Return” and “CMT_Return” respectively. The first row’s return should be NA.

4.	Merge the two dataframes together into a master data frame. Please only keep rows where both dataframes have price data for. Also remove the first row since there is no return data here


In [None]:
df_master = 

5.	You’ll notice we have a period mismatch; quarterly returns for GDP but daily for S&P and CMT. Please create a final table containing quarterly GDP and quarterly S&P & CMT returns. Use dates according to the quarterly GDP dataset. Also remove the first row since there is no return data here



In [None]:
df_master_Q = 

## Part 2: Understanding your data

1.	Provide the following information:


- Min, Max, 1st & 3rd quartile, Mean, Median of both return columns
- Which days did the Max/Min returns occur for both columns?
- Bonus points: What happened on these days to justify the returns?
- Correlations between both
- Standard deviation of both columns


In [None]:
# Min, Max, 1st & 3rd quartile, Mean, Median of both return columns


In [None]:
# Which days did the Max/Min returns occur for both columns?

# Bonus points: What happened on these days to justify the returns?

In [None]:
# Correlations between both


In [None]:
# Standard deviation of both columns


## Part 3: Modelling & Analytics

1.	Let’s see if there is any predictability between the two columns. Please run a regression, the explanatory variable (X) is the CMT return. The response variable (Y) is the SP’s return. Include an intercept term as well! Please assign your regression results to a variable as well.


- Provide the coefficients
- Run a t-test and provide t-statistics on both coefficients (and p-values)
- What about R squared, Adjusted R Squared?
- Run an F-test and provide the F-statistic along with P-values


2.	From a 90% and 95% significance level, can bond returns explain the S&P?

In [None]:
# 90% is ok, 95% is not

3.	Is the regression model suitable for modelling this phenomena compared to just an intercept term? (Hint: F-Test)

In [None]:
# Yes.

4. Now run a regression where the explanatory variable (X) is the CMT return and GDP growth. The response variable (Y) is the SP’s return. Your regression will be on a quarterly basis. Include an intercept term as well! 