# Pandas Exercise

When working on real world data tasks, you'll quickly realize that a large portion of your time is spent manipulating raw data into a form that you can actually work with, a process often called *data munging* or *data wrangling*.  Different programming langauges have different methods and packages to handle this task, with varying degrees of ease, and luckily for us, Python has an excellent one called Pandas which we will be using in this exercise.

In [1]:
# Import pandas
import pandas as pd

# Imports for stock prices
import datetime as dt
import pandas_datareader.data as web

## Time Series
In many situations you may not know the relationship between two variables but you do know that there ought to be one.  Take for example the daily price of beef and grain.  It is reasonable to assume that there exists *some*, perhaps even  a causal, relationship between these two, but due to the complexity of the phenomenon, and the vast number of underlying latent variables involved (fuel price, politics, famine, etc...), you likely have little hope to uncover such a relationship in a reasonable amount of time.  However, you do know that these two variables *are* related in time and may exibit some pattern that repeats itself in time.  Identifying these types of patterns is called Time Series Analysis and sequencing your data such that each data point is represented as a unique point in time is called a Time Series.  The canonical example of a Time Series is, of course, stock market data which is what we will be using for this exercise

Do the following exercises.

1. Create a `start` and `end` `datetime` object, starting at a date of your choosing and ending today.
1. For three stocks of your choosing, put their symbols into a list and use pandas to [retrieve their data](http://pandas-datareader.readthedocs.io/en/latest/remote_data.html) from google for the time frame you created in part (1).  Print the results.
1. Create a Data Frame called `stock_open` for the open prices of the stocks you retrieved in part (2).  Print the first few rows.
1. Compute the total, average, and maximum price for each stock weekly.
1. For each stock, return the weeks for which the opening stock price was greater than the yearly daily average.

In [2]:
# Question 1
start = dt.datetime(2017, 1, 1)
end = dt.datetime.today()

In [3]:
# Question 2
# AAPL = Apple, LUV = Southwest Airlines, TGT = Target Corporation
stocks = ['AAPL', 'LUV', 'TGT']
data = web.DataReader(stocks, 'google', start, end)
data

<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 142 (major_axis) x 3 (minor_axis)
Items axis: Open to Volume
Major_axis axis: 2017-01-03 00:00:00 to 2017-07-26 00:00:00
Minor_axis axis: AAPL to TGT

In [4]:
# Question 3
stock_open = data['Open']
stock_open.head()

Unnamed: 0_level_0,AAPL,LUV,TGT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-03,115.8,50.4,72.66
2017-01-04,115.85,50.25,73.1
2017-01-05,115.92,51.44,72.33
2017-01-06,116.78,50.49,71.98
2017-01-09,117.95,50.0,71.55


In [5]:
# Question 4
# Compute the total, average, and maximum price for each stock weekly.
# Break into week and year to avoid mixing previous years
stock_open['week'] = stock_open.index.week
stock_open['year'] = stock_open.index.year

# Totals
totals_by_week = stock_open.groupby(['week', 'year']).sum()
print("Total:")
print(totals_by_week.head())

# Averages
means_by_week = stock_open.groupby(['week', 'year']).mean()
print("\nAverage:")
print(means_by_week.head())

# Maximums
max_by_week = stock_open.groupby(['week', 'year']).max()
print("\nMaximum:")
print(max_by_week.head())

Total:
             AAPL     LUV     TGT
week year                        
1    2017  464.35  202.58  290.07
2    2017  593.47  255.00  357.11
3    2017  478.19  201.54  270.48
4    2017  603.78  255.72  322.05
5    2017  625.40  262.01  319.25

Average:
               AAPL     LUV      TGT
week year                           
1    2017  116.0875  50.645  72.5175
2    2017  118.6940  51.000  71.4220
3    2017  119.5475  50.385  67.6200
4    2017  120.7560  51.144  64.4100
5    2017  125.0800  52.402  63.8500

Maximum:
             AAPL    LUV    TGT
week year                      
1    2017  116.78  51.44  73.10
2    2017  119.11  51.77  71.68
3    2017  120.45  50.88  70.76
4    2017  122.14  54.08  64.85
5    2017  128.31  53.00  63.98


In [6]:
# Question 5
above_average = []
for stock in stocks:
    average = stock_open[stock].mean()
    above_average.append(means_by_week[stock][means_by_week[stock] > average])