# Tabular Data

## DataFrames

### Exercise 6
Load two years of stock data for every ticker in the current S&P 500. Perform the following:  

a. ‘Stack’ the table to get it into row form, and demonstrate ‘unstack’ to get it back into the original form.  
b. Melt and unmelt the table (using pivot), as shown in the lecture.  
c. Demonstrate usage of pivot_table.  
d. Create a derived table (a.k.a a ‘summary table’), containing the total volume (in millions) per year, of each ticker.  
e. Create a derived table containing the total volume (in millions) of each ticker.  
f. Add a column to the original table containing the daily return per ticker (using the above formula).  
g. Add a column to the original table containing the 1-week (5-day) moving average daily return, per ticker.  
h. Add a column to the original table containing the rolling cumulative sum of volume per ticker.  

In [6]:
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

import numpy as np

In [2]:
import pandas as pd
import yfinance as yf

# Get the table of all stocks in the S&P 500 from Wikipedia
payload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
first_table = payload[0]
second_table = payload[1]
gspcSummary = first_table
gspcSummary  # Display the wiki table of S&P500

# To repurpose this table in other notebooks
dataReference = gspcSummary
%store dataReference
del dataReference

gspcList = list(gspcSummary.Symbol)  # Put all the symbol into a list

# Remove invalid symbol with no data from yf
gspcList.remove('BRK.B')  
gspcList.remove('BF.B')

# Download data into a multiindex table
tickers = yf.Tickers(gspcList)
hist1 = tickers.history(group_by='ticker', period='max')
hist1.head()

Unnamed: 0,Symbol,Security,SEC filings,GICS Sector,GICS Sub-Industry,Headquarters Location,Date first added,CIK,Founded
0,MMM,3M Company,reports,Industrials,Industrial Conglomerates,"St. Paul, Minnesota",1976-08-09,66740,1902
1,ABT,Abbott Laboratories,reports,Health Care,Health Care Equipment,"North Chicago, Illinois",1964-03-31,1800,1888
2,ABBV,AbbVie Inc.,reports,Health Care,Pharmaceuticals,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
3,ABMD,ABIOMED Inc,reports,Health Care,Health Care Equipment,"Danvers, Massachusetts",2018-05-31,815094,1981
4,ACN,Accenture plc,reports,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989
...,...,...,...,...,...,...,...,...,...
500,YUM,Yum! Brands Inc,reports,Consumer Discretionary,Restaurants,"Louisville, Kentucky",1997-10-06,1041061,1997
501,ZBRA,Zebra Technologies,reports,Information Technology,Electronic Equipment & Instruments,"Lincolnshire, Illinois",2019-12-23,877212,1969
502,ZBH,Zimmer Biomet,reports,Health Care,Health Care Equipment,"Warsaw, Indiana",2001-08-07,1136869,1927
503,ZION,Zions Bancorp,reports,Financials,Regional Banks,"Salt Lake City, Utah",2001-06-22,109380,1873


Stored 'dataReference' (DataFrame)
[*********************100%***********************]  503 of 503 completed


Unnamed: 0_level_0,FTI,FTI,FTI,FTI,FTI,FTI,FTI,UAA,UAA,UAA,...,KLAC,KLAC,KLAC,TGT,TGT,TGT,TGT,TGT,TGT,TGT
Unnamed: 0_level_1,Open,High,Low,Close,Volume,Dividends,Stock Splits,Open,High,Low,...,Volume,Dividends,Stock Splits,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1962-01-02,,,,,,,,,,,...,,,,,,,,,,
1962-01-03,,,,,,,,,,,...,,,,,,,,,,
1962-01-04,,,,,,,,,,,...,,,,,,,,,,
1962-01-05,,,,,,,,,,,...,,,,,,,,,,
1962-01-08,,,,,,,,,,,...,,,,,,,,,,


#### a. ‘Stack’ the table to get it into row form, and demonstrate ‘unstack’ to get it back into the original form.

In [20]:
hist = hist1

# Stack the table
hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'})


# Unstack the table
hist.stack(level=0).unstack()

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume
0,1962-01-02,BA,0.187037,0.0000,0.190310,0.187037,0.190310,0.0,352200.0
1,1962-01-02,CAT,0.134703,0.0000,0.136015,0.133391,0.134703,0.0,163200.0
2,1962-01-02,CVX,0.050512,0.0000,0.050512,0.049714,0.000000,0.0,105600.0
3,1962-01-02,DIS,0.058360,0.0000,0.060318,0.058360,0.058360,0.0,817400.0
4,1962-01-02,DTE,0.484267,0.0000,0.489250,0.484267,0.000000,0.0,1600.0
...,...,...,...,...,...,...,...,...,...
3902790,2020-10-21,ZION,30.059999,0.0000,31.110001,30.040001,30.330000,0.0,1874900.0
3902791,2020-10-21,ZTS,161.179993,0.0000,162.130005,159.500000,159.500000,0.0,1450900.0
3902792,2020-10-22,CL,,0.4400,,,,0.0,
3902793,2020-10-22,PG,,0.7907,,,,0.0,


Unnamed: 0_level_0,Close,Close,Close,Close,Close,Close,Close,Close,Close,Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,A,AAL,AAP,AAPL,ABBV,ABC,ABMD,ABT,ACN,ADBE,...,XLNX,XOM,XRAY,XRX,XYL,YUM,ZBH,ZBRA,ZION,ZTS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1962-01-02,,,,,,,,,,,...,,902400.0,,,,,,,,
1962-01-03,,,,,,,,,,,...,,1200000.0,,,,,,,,
1962-01-04,,,,,,,,,,,...,,1088000.0,,,,,,,,
1962-01-05,,,,,,,,,,,...,,1222400.0,,,,,,,,
1962-01-08,,,,,,,,,,,...,,1388800.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-10-16,106.699997,12.46,154.990005,119.019997,86.269997,99.519997,286.480011,109.669998,230.050003,502.820007,...,3517900.0,22627200.0,830000.0,2051800.0,1506300.0,2282000.0,770300.0,218800.0,1831200.0,1631000.0
2020-10-19,105.489998,12.56,153.250000,115.980003,84.309998,97.800003,281.470001,108.040001,226.240005,495.200012,...,2689300.0,23645800.0,512300.0,2096700.0,779700.0,1087900.0,1012000.0,347600.0,1515600.0,1271100.0
2020-10-20,105.610001,12.80,153.020004,117.510002,84.250000,97.989998,277.529999,108.360001,230.389999,494.579987,...,3298600.0,23040600.0,679200.0,2140800.0,778800.0,1112400.0,585400.0,211100.0,2618700.0,2145700.0
2020-10-21,104.830002,12.75,154.380005,116.870003,82.889999,97.730003,272.600006,105.930000,233.100006,495.959991,...,3778200.0,28588900.0,906500.0,2131800.0,883700.0,1060600.0,510800.0,233100.0,1874900.0,1450900.0


#### b. Melt and unmelt the table (using pivot), as shown in the lecture.

In [22]:
# Stack the table
hist = hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'})

# Melt
melted = hist.reset_index().melt(id_vars=['Date', 'Ticker'], var_name='Attribute', value_name='Value')
melted

Unnamed: 0,Date,Ticker,Attribute,Value
0,1962-01-02,BA,index,0.0
1,1962-01-02,CAT,index,1.0
2,1962-01-02,CVX,index,2.0
3,1962-01-02,DIS,index,3.0
4,1962-01-02,DTE,index,4.0
...,...,...,...,...
31222355,2020-10-21,ZION,Volume,1874900.0
31222356,2020-10-21,ZTS,Volume,1450900.0
31222357,2020-10-22,CL,Volume,
31222358,2020-10-22,PG,Volume,


In [None]:
#pivoted = temp.pivot(index=['Date','Ticker'], columns='Attribute', values='Value')

In [None]:
#pivoted = melted.pivot(index=['Date','Ticker'], columns='Attribute', values='Value')  # Reverse melt state to pivot state
#pivoted = pivoted['Value'].reset_index()
#pivoted

#### c. Demonstrate usage of pivot_table.

In [23]:
hist.pivot_table(index = ['Ticker'], columns=[], values=['Open', 'Close'], aggfunc=np.mean)

Unnamed: 0_level_0,Close,Open
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
A,33.393732,33.377945
AAL,25.418013,25.443896
AAP,78.317532,78.326006
AAPL,8.851328,8.847963
ABBV,58.983881,58.977947
...,...,...
YUM,32.619909,32.603828
ZBH,77.200981,77.174655
ZBRA,49.554521,49.531407
ZION,22.482914,22.482102


#### d. Create a derived table (a.k.a a ‘summary table’), containing the total volume (in millions) per year, of each ticker.

In [24]:
annualVolume = hist.groupby([hist['Ticker'], hist['Date'].dt.year]).agg({'Volume':np.sum}).apply(lambda x:x*1e-6)
annualVolume.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Volume
Ticker,Date,Unnamed: 2_level_1
A,1999,172.1985
A,2000,1068.0283
A,2001,821.1521
A,2002,932.4779
A,2003,817.6457


#### e. Create a derived table containing the total volume (in millions) of each ticker.

In [25]:
totalVolume = hist.groupby(by=['Ticker']).agg({'Volume':np.sum}).apply(lambda x:x*1e-6)
totalVolume.head()

Unnamed: 0_level_0,Volume
Ticker,Unnamed: 1_level_1
A,19151.39
AAL,40053.14
AAP,5762.488
AAPL,3422474.0
ABBV,15777.59


#### f. Add a column to the original table containing the daily return per ticker (using the above formula).

In [26]:
hist['Daily Return'] = (hist['Close'] - hist['Close'].shift(1)) / hist['Close'].shift(1)
hist.head()

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume,Daily Return
0,1962-01-02,BA,0.187037,0.0,0.19031,0.187037,0.19031,0.0,352200.0,
1,1962-01-02,CAT,0.134703,0.0,0.136015,0.133391,0.134703,0.0,163200.0,-0.279808
2,1962-01-02,CVX,0.050512,0.0,0.050512,0.049714,0.0,0.0,105600.0,-0.625011
3,1962-01-02,DIS,0.05836,0.0,0.060318,0.05836,0.05836,0.0,817400.0,0.155362
4,1962-01-02,DTE,0.484267,0.0,0.48925,0.484267,0.0,0.0,1600.0,7.297973


#### g. Add a column to the original table containing the 1-week (5-day) moving average daily return, per ticker.

In [27]:
hist['1-week MA Daily Return'] = hist['Daily Return'].rolling(5).mean()
hist.tail()

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume,Daily Return,1-week MA Daily Return
3902790,2020-10-21,ZION,30.059999,0.0,31.110001,30.040001,30.33,0.0,1874900.0,-0.897005,0.86585
3902791,2020-10-21,ZTS,161.179993,0.0,162.130005,159.5,159.5,0.0,1450900.0,4.361943,1.00917
3902792,2020-10-22,CL,,0.44,,,,0.0,,,
3902793,2020-10-22,PG,,0.7907,,,,0.0,,,
3902794,2020-10-22,PNR,,0.19,,,,0.0,,,


#### h. Add a column to the original table containing the rolling cumulative sum of volume per ticker.

In [28]:
hist['Rolling Cumulative Volume'] = hist['Volume'].cumsum()
hist

# To repurpose this table in other notebooks
dataDaily = hist
%store dataDaily
del dataDaily

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume,Daily Return,1-week MA Daily Return,Rolling Cumulative Volume
0,1962-01-02,BA,0.187037,0.0000,0.190310,0.187037,0.190310,0.0,352200.0,,,3.522000e+05
1,1962-01-02,CAT,0.134703,0.0000,0.136015,0.133391,0.134703,0.0,163200.0,-0.279808,,5.154000e+05
2,1962-01-02,CVX,0.050512,0.0000,0.050512,0.049714,0.000000,0.0,105600.0,-0.625011,,6.210000e+05
3,1962-01-02,DIS,0.058360,0.0000,0.060318,0.058360,0.058360,0.0,817400.0,0.155362,,1.438400e+06
4,1962-01-02,DTE,0.484267,0.0000,0.489250,0.484267,0.000000,0.0,1600.0,7.297973,,1.440000e+06
...,...,...,...,...,...,...,...,...,...,...,...,...
3902790,2020-10-21,ZION,30.059999,0.0000,31.110001,30.040001,30.330000,0.0,1874900.0,-0.897005,0.86585,1.804966e+13
3902791,2020-10-21,ZTS,161.179993,0.0000,162.130005,159.500000,159.500000,0.0,1450900.0,4.361943,1.00917,1.804966e+13
3902792,2020-10-22,CL,,0.4400,,,,0.0,,,,
3902793,2020-10-22,PG,,0.7907,,,,0.0,,,,


Stored 'dataDaily' (DataFrame)
