# 00 - Raw Data Download
## Description

This notebook downloads the daily stock file data from CRSP to output tables containing the following variables:
- date
- permno as unique identifier
- mcap as shares outstanding times price
- return
- intraday extreme value volatility estimate $\bar{\sigma}^{}_{i,t} = \sqrt{0.3607}(p_{i,t}^{high}-p_{i,t}^{low})^{}$ based on Parkinson (1980), where $p_{i,t}$ is the logarithm of the dollar price


## TO DO
- Same permco can have multiple permno

## Imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import wrds
import pandas as pd
import numpy as np
import datetime as dt
import sys
sys.path.append('../')
import src

  from pandas.util.testing import assert_frame_equal


## Set up WRDS Connection

In [3]:
wrds_conn = wrds.Connection(wrds_username='felixbru')
# wrds_conn.create_pgpass_file()
#wrds_connection.close()

Loading library list...
Done


#### Explore database

In [4]:
libraries = wrds_conn.list_libraries()
library = 'crsp'

In [5]:
library_tables = wrds_conn.list_tables(library=library)
table = 'dsf'

In [6]:
table_description = wrds_conn.describe_table(library=library, table=table)

Approximately 96285900 rows in crsp.dsf.


## Download CRSP data

### Daily stock data

EXCHCD:
- 1: NYSE
- 2: NYSE MKT
- 3: NASDAQ

SHRCD:
- 10: Ordinary common share, no special status found
- 11: Ordinary common share, no special status necessary

In [9]:
for year in range(1960, 2020):
    df = src.crsp.download_crsp_year(wrds_conn, year)
    df.to_pickle(path='../data/raw/crsp_{}.pkl'.format(year))
    if year % 5 == 0:
        print('    Year {} done.'.format(year))

collected 28.26 MB on 2020-10-30 16:37:46.824557 in 15 seconds
    Year 1960 done.
collected 28.60 MB on 2020-10-30 16:37:58.097793 in 9 seconds
collected 49.32 MB on 2020-10-30 16:38:07.544762 in 8 seconds
collected 69.59 MB on 2020-10-30 16:38:19.989478 in 11 seconds
collected 71.70 MB on 2020-10-30 16:38:32.395927 in 11 seconds
collected 72.97 MB on 2020-10-30 16:38:45.134327 in 11 seconds
    Year 1965 done.
collected 73.85 MB on 2020-10-30 16:38:58.045991 in 11 seconds
collected 73.87 MB on 2020-10-30 16:39:11.121583 in 12 seconds
collected 66.40 MB on 2020-10-30 16:39:23.336135 in 11 seconds
collected 75.90 MB on 2020-10-30 16:39:36.234802 in 12 seconds
collected 80.67 MB on 2020-10-30 16:39:50.111423 in 13 seconds
    Year 1970 done.
collected 82.83 MB on 2020-10-30 16:40:04.940015 in 13 seconds
collected 90.28 MB on 2020-10-30 16:40:22.209802 in 16 seconds
collected 184.62 MB on 2020-10-30 16:40:55.716186 in 31 seconds
collected 175.75 MB on 2020-10-30 16:41:38.511124 in 25 sec

### Delisting Returns

In [10]:
df_delist = src.crsp.download_delisting(wrds_conn)
df_delist.to_pickle(path='../data/raw/delisting.pkl')

collected 1.88 MB on 2020-10-30 17:15:19.775020 in 0 seconds


### Descriptive Data

In [11]:
df_descriptive = src.crsp.download_descriptive(wrds_conn)
df_descriptive.to_pickle(path='../data/raw/descriptive.pkl')

collected 24.02 MB on 2020-10-30 17:15:24.385993 in 2 seconds


## Download FF data

### SQL Query

In [12]:
df_ff = src.crsp.download_famafrench(wrds_conn)
df_ff.to_pickle(path='../data/raw/ff_factors.pkl')

collected 1.99 MB on 2020-10-30 17:15:29.723783 in 0 seconds


## SPDR Trust SPY Index data

In [13]:
df_spy = src.crsp.download_SPY(wrds_conn)
df_spy.to_pickle(path='../data/raw/spy.pkl')

collected 0.43 MB on 2020-10-30 17:15:32.656684 in 0 seconds
