# Big Data in Finance
## Part IV: CRSP and Compustat
### Homework I: Due April 6, 2022, by 11:59 pm.

**Due:** Thursday, April 6, 2022, by 11:59 pm. Late submission will *not* be accepted. 

**Goal**: Explore the information CRSP and Compustat

**Delivery**: Please, upload on canvas a .zip file with your .ipynb and .html notebook. 


# Description

The year 2020 was remarkable for U.S. stock market investors. The S&P 500-stock index, the most widely watched gauge, finished the year up more than 16 percent. The Dow Jones industrial average and the tech-heavy Nasdaq gained 7.25 percent and 43.6 percent, respectively. The Dow and S&P 500 finished at record levels despite the public health and economic crises.

Let's look at what happened with stock returns using the data downloaded from CRSP and Compustat.

1. What was the cumulative average (market-cap weighted and equal-weighted) returns of CRSP stocks in 2020? Plot the cumulative monthly returns and report how much an investor would have the by the end of 2020 had she invested $1 in the CRSP stocks at the beginning of 2020. How much would the same investor have had she entered the market at the end of March, during which stock volatility was the highest? What were the 10 top performing stocks in 2020? Somethings to be aware:

    a. You should only consider ordinary common stocks, i.e., shrcd equal to 10 or 11.
    
    b. You should only consider stocks listed in one of the three main stock exchanges, i.e., exchcd equal to 1 (NYSE), 2 (NYSE MKT) or 3 (NASDAQ).
    
    c. Think about what market-cap you should use when calculating your weight.


2. How much an investor would have by the end of 2020 had he invested $1 in a TECH portfolio (market-cap weighted and equal-weighted) at the beginning of 2020? How much would the same investor have had she entered the market at the end of March? Plot the cumulative returns from the beginning of the year. What are the 10 top performing tech stocks in the year of 2020? Some things to be aware:

    a. The closest definition of Tech firms according to Fama and French is Business Equipment industry, which includes Computers, Software, and Electronic Equipment. These are firms with SIC codes in the following ranges:
          3570-3579
          3660-3692
          3694-3699
          3810-3829
          7370-7379


3. How much an investor would have by the end of 2020 had he invested instead in a portfolio (market-cap weighted and equal-weighted) of stocks of the 10 most profitable companies at the beginning of 2020? Use operating profitability normalized by book-equity (OPBE) from the lecture notes as a measure of profitability. How much would the same investor have had she entered the market at the end of March? Plot the cumulative returns from the beginning of the year. What are the 10 most profitable companies in the year of 2020? Some things to be aware:

    a. Publicly-listed companies must report their financials within 91 days after the end of the fiscal period. Most companies report on the deadline. Consider this when merging financial results to stock performance.


4. Compare your results from (2) and (3). Which portfolio performed better? Do the results make sense?

# Code 

## Packages

Below you have some code that should help you to get started. Make sure you have installed all required packages. Use "conda install"  or "pip install" if you you are missing any of the packages.

In [None]:
# Packages
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import wrds
from datetime import datetime, timedelta

# Setups
pd.set_option("display.max_rows", None)

##   WRDS Connection

Make sure you can connect to WRDS server. You should configure wrds connector before hand. Please check the syllabus for details.


In [None]:
# Set Up WRDS connection
db = wrds.Connection(wrds_username='khardnett') # make sure to change the username. 

## Determine the libraries available at WRDS

In [None]:
# List all libraries in WRDS
libs = db.list_libraries()
libs.__class__ # Notice that libs is a list. 
pd.DataFrame({'libraries':libs}).sort_values('libraries') # Transform libs to a Pandas data frame to have a better display.

 ## Determine the datasets within a given library

In [None]:
compd_tables = db.list_tables(library="comp") # compd: Compustat daily update
pd.DataFrame({'tables':compd_tables}) # Transform libs to a Pandas data frame to have a better display.

 ## Determine the column headers (variables) within a given dataset

In [None]:
db.describe_table(library="comp", table="funda")

## Example of submitting a SQL query to import data

In [None]:
comp = db.raw_sql('SELECT datadate, fyear, gvkey, conm, at, ebit, che FROM compm.funda LIMIT 10;', date_cols=['datadate'])
comp