### Working with Financial Data

Publicly traded companies are classified into one of 11 industry sectors, called GICS (Global Industry Classification Standard). Investors use these classes to design and construct portfolios, compare companies, evaluate industries, and isolate regional from global trends.

There are 11 GICS sectors

* Communication services
* Consumer discretionary
* Consumer staples
* Energy
* Financials
* Health care
* Industrials
* Information technology
* Materials
* Real estate
* Utilities


In this notebook we'll work with financial data, grouped by sector, to analyze different trends in companies and sectors listed on the S&P 500. 


Begin by loading the GICS sector mappings. They've been provided to you in a file called `gics.csv`


In [1]:
import pandas as pd
gicsDF = pd.read_csv('../08-Combining-Merging/gics.csv')
gicsDF['name_lower']= gicsDF['Name'].str.lower()
print(gicsDF)

    Code                    Name              name_lower
0     50  Communication services  communication services
1     25  Consumer discretionary  consumer discretionary
2     30        Consumer staples        consumer staples
3     10                  Energy                  energy
4     40              Financials              financials
5     35             Health care             health care
6     20             Industrials             industrials
7     45  Information technology  information technology
8     15               Materials               materials
9     60             Real estate             real estate
10    55               Utilities               utilities


Load the S&P 500 sector mappings and take a look at the data. The file is called `sp500-sector.csv`

In [2]:
snpDF = pd.read_csv('../08-Combining-Merging/sp500-sector.csv')
snpDF['sector_lower']= snpDF['Sector'].str.lower()
print(snpDF.head())

  Symbol                 Name                  Sector            sector_lower
0    MMM           3M Company             Industrials             industrials
1    AOS      A.O. Smith Corp             Industrials             industrials
2    ABT  Abbott Laboratories             Health Care             health care
3   ABBV          AbbVie Inc.             Health Care             health care
4    ACN        Accenture plc  Information Technology  information technology


Create a dataframe that has the S&P 500 stock symbols with the appropriate sector number. For example, `3M`, symbol `MMM` is in `industrials` which corresponds to sector 20.

In [3]:
combDF = pd.merge(snpDF, gicsDF, left_on="sector_lower", right_on="name_lower", how="left")
combDF.columns.values[1] = 'Name'
combDF.drop('Name_y', axis=1, inplace=True)
combDF.drop('name_lower', axis=1, inplace=True)
combDF.drop('sector_lower', axis=1, inplace=True)
print(combDF.head())

  Symbol                 Name                  Sector  Code
0    MMM           3M Company             Industrials  20.0
1    AOS      A.O. Smith Corp             Industrials  20.0
2    ABT  Abbott Laboratories             Health Care  35.0
3   ABBV          AbbVie Inc.             Health Care  35.0
4    ACN        Accenture plc  Information Technology  45.0


Working with strings in pandas can be much slower than working with integers and floats. Securities are typically represented by a security id, a numerical representation of the security. Ids have been provided in the file `sp500-ids.csv`. Load these into a dataframe and then generate a new dataframe that contains the columns `security_id` and `sector`.

In [4]:
idsDF = pd.read_csv('../08-Combining-Merging/sp500-ids.csv')
print(idsDF.head())
print('------')
secDF = pd.merge(idsDF, combDF[['Symbol', 'Sector']])
print(secDF.head())

  Symbol  Security_id
0    MMM            1
1    AOS            2
2    ABT            3
3   ABBV            4
4    ACN            5
------
  Symbol  Security_id                  Sector
0    MMM            1             Industrials
1    AOS            2             Industrials
2    ABT            3             Health Care
3   ABBV            4             Health Care
4    ACN            5  Information Technology


Load the file `sp500-5yr.csv` into a dataframe. This file contains 5 years worth of historical prices for securities listed on the S&P 500. Convert the symbols into security ids, and bring in the sector colums as well.

In [5]:
import datetime
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d')

yrDF = pd.read_csv('../08-Combining-Merging/sp500-5yr.csv', parse_dates=['date'], date_parser=dateparse)
print(yrDF.head())
print('-------')
secYrDF = pd.merge(yrDF, secDF, left_on='symbol', right_on='Symbol')
secYrDF.drop('Symbol', axis=1, inplace=True)
print(secYrDF.head())
print(secYrDF.tail())

        date   open   high    low  close    volume symbol
0 2013-02-08  15.07  15.12  14.63  14.75   8407500    AAL
1 2013-02-11  14.89  15.01  14.26  14.46   8882000    AAL
2 2013-02-12  14.45  14.51  14.10  14.27   8126000    AAL
3 2013-02-13  14.30  14.94  14.25  14.66  10259500    AAL
4 2013-02-14  14.94  14.96  13.16  13.99  31879900    AAL
-------
        date   open   high    low  close    volume symbol  Security_id  \
0 2013-02-08  15.07  15.12  14.63  14.75   8407500    AAL           33   
1 2013-02-11  14.89  15.01  14.26  14.46   8882000    AAL           33   
2 2013-02-12  14.45  14.51  14.10  14.27   8126000    AAL           33   
3 2013-02-13  14.30  14.94  14.25  14.66  10259500    AAL           33   
4 2013-02-14  14.94  14.96  13.16  13.99  31879900    AAL           33   

        Sector  
0  Industrials  
1  Industrials  
2  Industrials  
3  Industrials  
4  Industrials  
             date   open   high    low  close   volume symbol  Security_id  \
610222 2018-02-01  

Determine which sectors had the best performance for the past year, past three years and past five years. Note the start and end dates on the dataset.

In [6]:
secYrDF['perf'] = (secYrDF['open'] - secYrDF['close']) * secYrDF['volume']

import datetime
newest_date = secYrDF.date.max()
yearago = newest_date - datetime.timedelta(days=365)
threeago = newest_date - datetime.timedelta(days=365*3)
fiveago = newest_date - datetime.timedelta(days=365*5)

oneDF = secYrDF.loc[secYrDF.date >= yearago]
threeDF = secYrDF.loc[secYrDF.date >= threeago]
fiveDF = secYrDF.loc[secYrDF.date >= fiveago]

oneSec = oneDF[['Sector', 'perf']].set_index('Sector').groupby('Sector').sum()
threeSec = threeDF[['Sector', 'perf']].set_index('Sector').groupby('Sector').sum()
fiveSec = fiveDF[['Sector', 'perf']].set_index('Sector').groupby('Sector').sum()

print("1yr:", oneSec.loc[oneSec.idxmax()])
print("3yr:", threeSec.loc[oneSec.idxmax()])
print("5yr:", fiveSec.loc[oneSec.idxmax()])

1yr:                                 perf
Sector                              
Information Technology  2.404607e+09
3yr:                                 perf
Sector                              
Information Technology  5.039513e+08
5yr:                                 perf
Sector                              
Information Technology  1.959932e+09


For each sector, determine the top performer and the bottom performer for the 1 year, 3 year, and 5 year windows.

In [7]:
oneSymbol = oneDF[['symbol', 'perf']].set_index('symbol').groupby('symbol').sum()
print("Best:", oneSymbol.loc[oneSymbol.idxmax()])
print("Worst:", oneSymbol.loc[oneSymbol.idxmin()])

Best:                 perf
symbol              
GE      1.328622e+09
Worst:                 perf
symbol              
NFLX   -8.140149e+08


Imagine two portfolios, one comprised of the securites represented by even ids and one by odd ids. Which portfolio has had the best performance over the 1,3 and 5 year time windows?


Combine these two portfolios into a single portfolio. What was the overall performance of the SP500 for the same time periods?
