# 00 - Raw Data Download
## Description

This notebook downloads the daily stock file data from CRSP to output tables containing the following variables:
- date
- permno as unique identifier
- mcap as shares outstanding times price
- return
- intraday extreme value volatility estimate $\bar{\sigma}^{2}_{i,t} = {0.3607}(p_{i,t}^{high}-p_{i,t}^{low})^{2}$ based on Parkinson (1980), where $p_{i,t}$ is the logarithm of the dollar price

Additionally, the following data is downloaded:
- Fama-French Factor data
- SPDR TRUST S&P500 ETF ("SPY")

Code to perform the steps is mainly in the `query.py` module

## Imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import wrds
import pandas as pd
import numpy as np
import datetime as dt
import sys
sys.path.append('../')
import src

  from pandas.util.testing import assert_frame_equal


## Set up WRDS Connection

In [3]:
wrds_conn = wrds.Connection(wrds_username='felixbru')
# wrds_conn.create_pgpass_file()
#wrds_connection.close()

Loading library list...
Done


#### Explore database

In [4]:
libraries = wrds_conn.list_libraries()
library = 'crsp'

In [5]:
library_tables = wrds_conn.list_tables(library=library)
table = 'dsf'

In [6]:
table_description = wrds_conn.describe_table(library=library, table=table)

Approximately 98253050 rows in crsp.dsf.


## Download CRSP data

### Daily stock data

EXCHCD:
- 1: NYSE
- 2: NYSE MKT
- 3: NASDAQ

SHRCD:
- 10: Ordinary common share, no special status found
- 11: Ordinary common share, no special status necessary

In [8]:
for year in range(1993, 2021): #range(1960, 2020):
    df = src.query.download_crsp_year(wrds_conn, year)
    df.to_pickle(path='../data/raw/crsp_{}.pkl'.format(year))
    if year % 5 == 0:
        print('    Year {} done.'.format(year))

collected 30.43 MB on 2020-11-17 15:39:17.591604 in 10 seconds
    Year 1960 done.
collected 30.80 MB on 2020-11-17 15:39:41.387838 in 23 seconds
collected 52.42 MB on 2020-11-17 15:39:54.909751 in 13 seconds
collected 81.53 MB on 2020-11-17 15:40:06.664087 in 11 seconds
collected 84.00 MB on 2020-11-17 15:40:18.797482 in 11 seconds
collected 85.48 MB on 2020-11-17 15:40:31.168864 in 11 seconds
    Year 1965 done.
collected 86.52 MB on 2020-11-17 15:40:43.794753 in 11 seconds
collected 86.54 MB on 2020-11-17 15:40:57.197068 in 12 seconds
collected 77.79 MB on 2020-11-17 15:41:09.503186 in 11 seconds
collected 88.92 MB on 2020-11-17 15:41:22.666796 in 12 seconds
collected 94.51 MB on 2020-11-17 15:41:38.805180 in 15 seconds
    Year 1970 done.
collected 97.04 MB on 2020-11-17 15:41:57.360970 in 17 seconds
collected 105.79 MB on 2020-11-17 15:42:15.943796 in 17 seconds
collected 216.77 MB on 2020-11-17 15:42:51.015076 in 33 seconds
collected 205.99 MB on 2020-11-17 15:43:34.172300 in 34 

### Delisting Returns

In [8]:
df_delist = src.query.download_delisting(wrds_conn)
df_delist.to_pickle(path='../data/raw/delisting.pkl')

collected 1.93 MB on 2021-02-03 15:19:27.638045 in 1 seconds


### Descriptive Data

In [9]:
df_descriptive = src.query.download_descriptive(wrds_conn)
df_descriptive.to_pickle(path='../data/raw/descriptive.pkl')

collected 24.70 MB on 2021-02-03 15:19:37.808350 in 5 seconds


## Download FF data

### SQL Query

In [10]:
df_ff = src.query.download_famafrench(wrds_conn)
df_ff.to_pickle(path='../data/raw/ff_factors.pkl')

collected 1.99 MB on 2021-02-03 15:19:39.892072 in 1 seconds


## SPDR Trust SPY Index data

In [11]:
df_spy = src.query.download_SPY(wrds_conn)
df_spy.to_pickle(path='../data/raw/spy.pkl')

collected 0.51 MB on 2021-02-03 15:19:40.618329 in 0 seconds
