# Tabular Data

## DataFrames

### Exercise 6
Load two years of stock data for every ticker in the current S&P 500. Perform the following:  

a. ‘Stack’ the table to get it into row form, and demonstrate ‘unstack’ to get it back into the original form.  
b. Melt and unmelt the table (using pivot), as shown in the lecture.  
c. Demonstrate usage of pivot_table.  
d. Create a derived table (a.k.a a ‘summary table’), containing the total volume (in millions) per year, of each ticker.  
e. Create a derived table containing the total volume (in millions) of each ticker.  
f. Add a column to the original table containing the daily return per ticker (using the above formula).  
g. Add a column to the original table containing the 1-week (5-day) moving average daily return, per ticker.  
h. Add a column to the original table containing the rolling cumulative sum of volume per ticker.  

In [1]:
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

import numpy as np

In [2]:
import pandas as pd
import yfinance as yf

# Get the table of all stocks in the S&P 500 from Wikipedia
payload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
first_table = payload[0]
second_table = payload[1]
gspcSummary = first_table
gspcSummary  # Display the wiki table of S&P500

# To repurpose this table in other notebooks
dataReference = gspcSummary
%store dataReference
del dataReference

gspcList = list(gspcSummary.Symbol)  # Put all the symbol into a list
gspcList

# Remove invalid symbol with no data from yf
gspcList.remove('BRK.B')  
gspcList.remove('BF.B')

# Download data into a multiindex table
tickers = yf.Tickers(gspcList)
hist = tickers.history(group_by='ticker', period='max')
hist.head()

Unnamed: 0,Symbol,Security,SEC filings,GICS Sector,GICS Sub-Industry,Headquarters Location,Date first added,CIK,Founded
0,MMM,3M Company,reports,Industrials,Industrial Conglomerates,"St. Paul, Minnesota",1976-08-09,66740,1902
1,ABT,Abbott Laboratories,reports,Health Care,Health Care Equipment,"North Chicago, Illinois",1964-03-31,1800,1888
2,ABBV,AbbVie Inc.,reports,Health Care,Pharmaceuticals,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
3,ABMD,ABIOMED Inc,reports,Health Care,Health Care Equipment,"Danvers, Massachusetts",2018-05-31,815094,1981
4,ACN,Accenture plc,reports,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989
...,...,...,...,...,...,...,...,...,...
500,YUM,Yum! Brands Inc,reports,Consumer Discretionary,Restaurants,"Louisville, Kentucky",1997-10-06,1041061,1997
501,ZBRA,Zebra Technologies,reports,Information Technology,Electronic Equipment & Instruments,"Lincolnshire, Illinois",2019-12-23,877212,1969
502,ZBH,Zimmer Biomet Holdings,reports,Health Care,Health Care Equipment,"Warsaw, Indiana",2001-08-07,1136869,1927
503,ZION,Zions Bancorp,reports,Financials,Regional Banks,"Salt Lake City, Utah",2001-06-22,109380,1873


Stored 'dataReference' (DataFrame)


['MMM',
 'ABT',
 'ABBV',
 'ABMD',
 'ACN',
 'ATVI',
 'ADBE',
 'AMD',
 'AAP',
 'AES',
 'AFL',
 'A',
 'APD',
 'AKAM',
 'ALK',
 'ALB',
 'ARE',
 'ALXN',
 'ALGN',
 'ALLE',
 'LNT',
 'ALL',
 'GOOGL',
 'GOOG',
 'MO',
 'AMZN',
 'AMCR',
 'AEE',
 'AAL',
 'AEP',
 'AXP',
 'AIG',
 'AMT',
 'AWK',
 'AMP',
 'ABC',
 'AME',
 'AMGN',
 'APH',
 'ADI',
 'ANSS',
 'ANTM',
 'AON',
 'AOS',
 'APA',
 'AIV',
 'AAPL',
 'AMAT',
 'APTV',
 'ADM',
 'ANET',
 'AJG',
 'AIZ',
 'T',
 'ATO',
 'ADSK',
 'ADP',
 'AZO',
 'AVB',
 'AVY',
 'BKR',
 'BLL',
 'BAC',
 'BK',
 'BAX',
 'BDX',
 'BRK.B',
 'BBY',
 'BIO',
 'BIIB',
 'BLK',
 'BA',
 'BKNG',
 'BWA',
 'BXP',
 'BSX',
 'BMY',
 'AVGO',
 'BR',
 'BF.B',
 'CHRW',
 'COG',
 'CDNS',
 'CPB',
 'COF',
 'CAH',
 'KMX',
 'CCL',
 'CARR',
 'CTLT',
 'CAT',
 'CBOE',
 'CBRE',
 'CDW',
 'CE',
 'CNC',
 'CNP',
 'CERN',
 'CF',
 'SCHW',
 'CHTR',
 'CVX',
 'CMG',
 'CB',
 'CHD',
 'CI',
 'CINF',
 'CTAS',
 'CSCO',
 'C',
 'CFG',
 'CTXS',
 'CLX',
 'CME',
 'CMS',
 'KO',
 'CTSH',
 'CL',
 'CMCSA',
 'CMA',
 'CAG',
 'CXO

[*********************100%***********************]  503 of 503 completed


Unnamed: 0_level_0,CHTR,CHTR,CHTR,CHTR,CHTR,CHTR,CHTR,XLNX,XLNX,XLNX,...,STT,STT,STT,ITW,ITW,ITW,ITW,ITW,ITW,ITW
Unnamed: 0_level_1,Open,High,Low,Close,Volume,Dividends,Stock Splits,Open,High,Low,...,Volume,Dividends,Stock Splits,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1962-01-02,,,,,,,,,,,...,,,,,,,,,,
1962-01-03,,,,,,,,,,,...,,,,,,,,,,
1962-01-04,,,,,,,,,,,...,,,,,,,,,,
1962-01-05,,,,,,,,,,,...,,,,,,,,,,
1962-01-08,,,,,,,,,,,...,,,,,,,,,,


#### a. ‘Stack’ the table to get it into row form, and demonstrate ‘unstack’ to get it back into the original form.

In [3]:
# Stack the table
hist = hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'})

# To repurpose this table in other notebooks
dataDaily = hist
%store dataDaily
del dataDaily

# Unstack the table
hist.stack(level=0).unstack()

Stored 'dataDaily' (DataFrame)


Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume
0,1962-01-02,BA,0.187037,0,0.19031,0.187037,0.19031,0,352200
1,1962-01-02,CAT,0.134703,0,0.136015,0.133391,0.134703,0,163200
2,1962-01-02,CVX,0.050512,0,0.050512,0.0497139,0,0,105600
3,1962-01-02,DIS,0.0583597,0,0.0603181,0.0583597,0.0583597,0,817400
4,1962-01-02,DTE,0.484267,0,0.48925,0.484267,0,0,1600
...,...,...,...,...,...,...,...,...,...
3902285,2020-10-20,ZBRA,294.83,0,300,292.65,293.24,0,211100
3902286,2020-10-20,ZION,30.52,0,31.58,30.01,31.58,0,2.6187e+06
3902287,2020-10-20,ZTS,159.69,0.2,161.54,159.12,160.28,0,2.1457e+06
3902288,2020-10-21,APA,,0.025,,,,0,


#### b. Melt and unmelt the table (using pivot), as shown in the lecture.

In [12]:
# Stack the table
#hist = hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'})
#hist

# Melt
melted = hist.reset_index().melt(id_vars=['Date', 'Ticker'], var_name='Attribute', value_name='Value')
melted

Unnamed: 0,Date,Ticker,Attribute,Value
0,1962-01-02,BA,index,0.000000e+00
1,1962-01-02,CAT,index,1.000000e+00
2,1962-01-02,CVX,index,2.000000e+00
3,1962-01-02,DIS,index,3.000000e+00
4,1962-01-02,DTE,index,4.000000e+00
...,...,...,...,...
42925185,2020-10-20,ZBRA,Rolling Cumulative Volume,1.804766e+13
42925186,2020-10-20,ZION,Rolling Cumulative Volume,1.804766e+13
42925187,2020-10-20,ZTS,Rolling Cumulative Volume,1.804767e+13
42925188,2020-10-21,APA,Rolling Cumulative Volume,


In [5]:
#pivoted = melted.pivot(index=['Date','Ticker'], columns='Attribute', values='Value')  # Reverse melt state to pivot state
#pivoted = pivoted['Value'].reset_index()
#pivoted

#### c. Demonstrate usage of pivot_table.

In [13]:
hist.pivot_table(index = ['Ticker'], columns=[], values=['Open', 'Close'], aggfunc=np.mean)

Unnamed: 0_level_0,Close,Open
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
A,33.380161,33.364164
AAL,25.421353,25.447258
AAP,78.301539,78.309676
AAPL,8.840578,8.837234
ABBV,58.971715,58.965427
...,...,...
YUM,32.608476,32.592698
ZBH,77.187632,77.161423
ZBRA,49.521563,49.497979
ZION,22.482174,22.481335


#### d. Create a derived table (a.k.a a ‘summary table’), containing the total volume (in millions) per year, of each ticker.

In [14]:
annualVolume = hist.groupby([hist['Ticker'], hist['Date'].dt.year]).agg({'Volume':np.sum}).apply(lambda x:x*1e-6)
annualVolume.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Volume
Ticker,Date,Unnamed: 2_level_1
A,1999,172.1985
A,2000,1068.0283
A,2001,821.1521
A,2002,932.4779
A,2003,817.6457


#### e. Create a derived table containing the total volume (in millions) of each ticker.

In [15]:
totalVolume = hist.groupby(by=['Ticker']).agg({'Volume':np.sum}).apply(lambda x:x*1e-6)
totalVolume.head()

Unnamed: 0_level_0,Volume
Ticker,Unnamed: 1_level_1
A,19150.48
AAL,40015.56
AAP,5761.603
AAPL,3422472.0
ABBV,15767.82


#### f. Add a column to the original table containing the daily return per ticker (using the above formula).

In [16]:
hist['Daily Return'] = (hist['Close'] - hist['Close'].shift(1)) / hist['Close'].shift(1)
hist.head()

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume,Daily Return,1-week MA Daily Return,Rolling Cumulative Volume
0,1962-01-02,BA,0.187037,0.0,0.19031,0.187037,0.19031,0.0,352200.0,,,352200.0
1,1962-01-02,CAT,0.134703,0.0,0.136015,0.133391,0.134703,0.0,163200.0,-0.279808,,515400.0
2,1962-01-02,CVX,0.050512,0.0,0.050512,0.049714,0.0,0.0,105600.0,-0.625011,,621000.0
3,1962-01-02,DIS,0.05836,0.0,0.060318,0.05836,0.05836,0.0,817400.0,0.155362,,1438400.0
4,1962-01-02,DTE,0.484267,0.0,0.48925,0.484267,0.0,0.0,1600.0,7.297973,,1440000.0


#### g. Add a column to the original table containing the 1-week (5-day) moving average daily return, per ticker.

In [17]:
hist['1-week MA Daily Return'] = hist['Daily Return'].rolling(5).mean()
hist.tail()

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume,Daily Return,1-week MA Daily Return,Rolling Cumulative Volume
3902285,2020-10-20,ZBRA,294.829987,0.0,300.0,292.649994,293.23999,0.0,211100.0,1.077731,0.940676,18047660000000.0
3902286,2020-10-20,ZION,30.52,0.0,31.58,30.01,31.58,0.0,2618700.0,-0.896483,0.873659,18047660000000.0
3902287,2020-10-20,ZTS,159.690002,0.2,161.539993,159.119995,160.279999,0.0,2145700.0,4.232307,0.987126,18047670000000.0
3902288,2020-10-21,APA,,0.025,,,,0.0,,,,
3902289,2020-10-21,CVS,,0.5,,,,0.0,,,,


#### h. Add a column to the original table containing the rolling cumulative sum of volume per ticker.

In [18]:
hist['Rolling Cumulative Volume'] = hist['Volume'].cumsum()
hist

Unnamed: 0,Date,Ticker,Close,Dividends,High,Low,Open,Stock Splits,Volume,Daily Return,1-week MA Daily Return,Rolling Cumulative Volume
0,1962-01-02,BA,0.187037,0.000,0.190310,0.187037,0.190310,0.0,352200.0,,,3.522000e+05
1,1962-01-02,CAT,0.134703,0.000,0.136015,0.133391,0.134703,0.0,163200.0,-0.279808,,5.154000e+05
2,1962-01-02,CVX,0.050512,0.000,0.050512,0.049714,0.000000,0.0,105600.0,-0.625011,,6.210000e+05
3,1962-01-02,DIS,0.058360,0.000,0.060318,0.058360,0.058360,0.0,817400.0,0.155362,,1.438400e+06
4,1962-01-02,DTE,0.484267,0.000,0.489250,0.484267,0.000000,0.0,1600.0,7.297973,,1.440000e+06
...,...,...,...,...,...,...,...,...,...,...,...,...
3902285,2020-10-20,ZBRA,294.829987,0.000,300.000000,292.649994,293.239990,0.0,211100.0,1.077731,0.940676,1.804766e+13
3902286,2020-10-20,ZION,30.520000,0.000,31.580000,30.010000,31.580000,0.0,2618700.0,-0.896483,0.873659,1.804766e+13
3902287,2020-10-20,ZTS,159.690002,0.200,161.539993,159.119995,160.279999,0.0,2145700.0,4.232307,0.987126,1.804767e+13
3902288,2020-10-21,APA,,0.025,,,,0.0,,,,
