# Download and store data

This notebook contains information on downloading the Quandl Wiki stock prices and a few other sources that we use throughout the book. 

## Imports & Settings

In [40]:
import warnings
warnings.filterwarnings('ignore')

In [41]:
from pathlib import Path
import requests
from io import BytesIO
from zipfile import ZipFile, BadZipFile

import numpy as np
import pandas as pd
import pandas_datareader.data as web
from sklearn.datasets import fetch_openml

import yfinance as yf

pd.set_option('display.expand_frame_repr', False)

## Set Data Store path

Modify path if you would like to store the data elsewhere and change the notebooks accordingly

In [42]:
DATA_STORE = Path('assets.h5')

## YAHOO Prices

Using yahoo finance API for python to download the stock prices. See https://github.com/roboadvisor-fiuba/zipline-bundles to build the csv file.

In [43]:
df = (pd.read_csv('merval_stock_prices.csv',
                 parse_dates=['date'],
                 index_col=['date', 'ticker'],
                 infer_datetime_format=True)
     .sort_index())

# print(df.info(null_counts=True))
with pd.HDFStore(DATA_STORE) as store:
    store.put('merval/prices', df)

### Prices Metadata

In [44]:
df = pd.read_csv('merval_stock_metadata.csv')

with pd.HDFStore(DATA_STORE) as store:
    store.put('merval/stocks', df)

## S&P Merval Prices

The following code downloads historical S&P Merval prices from yahoo finance.

In [45]:
# Fetch data
merval_data = yf.download('^MERV')

# Display the fetched data
print(merval_data.head())

[*********************100%%**********************]  1 of 1 completed

                  Open        High         Low       Close   Adj Close  Volume
Date                                                                          
1996-10-08  579.400024  592.700012  579.400024  590.099976  590.099976       0
1996-10-09  590.099976  590.799988  582.200012  583.099976  583.099976       0
1996-10-10  583.200012  584.599976  578.299988  584.599976  584.599976       0
1996-10-11  584.900024  585.900024  582.400024  584.299988  584.299988       0
1996-10-14  584.299988  584.299988  580.599976  584.000000  584.000000       0





In [46]:
with pd.HDFStore(DATA_STORE) as store:
    store.put('spmerval', merval_data)

## Metadata on traded companies

In [47]:
df = pd.read_csv('merval_stock_metadata.csv')
df.head()

Unnamed: 0,ticker,name,lastsale,marketcap,ipoyear,sector,industry
0,ALUA.BA,Aluar Aluminio Argentino S.A.I.C.,1011.5,2833599954944,1969,Basic Materials,Aluminum
1,BBAR.BA,Banco BBVA Argentina S.A.,4630.25,2874222575616,1969,Financial Services,Banks - Regional
2,BMA.BA,Banco Macro S.A.,8030.7,5232722444288,1969,Financial Services,Banks - Regional
3,BYMA.BA,Bolsas y Mercados Argentinos S.A.,342.5,1296250044416,1969,Financial Services,Capital Markets
4,GGAL.BA,Grupo Financiero Galicia S.A.,4318.1,6422496018432,1969,Financial Services,Banks - Regional


See marketcap distributions

In [48]:
df.marketcap.describe(percentiles=np.arange(.1, 1, .1).round(1)).apply(lambda x: f'{int(x):,d}')

count                    21
mean      2,522,867,641,324
std       2,725,183,336,981
min             506,059,360
10%         270,654,996,480
20%         688,237,248,512
30%         800,530,366,464
40%         989,219,782,656
50%       1,296,250,044,416
60%       2,833,599,954,944
70%       3,218,686,869,504
80%       3,688,529,133,568
90%       5,232,722,444,288
max      11,426,198,454,272
Name: marketcap, dtype: object

Store metadata

In [49]:
with pd.HDFStore(DATA_STORE) as store:
    store.put('merval/stocks', df.set_index('ticker'))