# Data Dive, Part II: Exploring Data

The second part of today's exercise takes a look at stock data downloaded from [Yahoo Finance](https://finance.yahoo.com/lookup). Stock returns can translate to enormous swings of wealth, and thus have long been the subject of statistical analysis. Today we'll take a look at just a handful of properties of stock returns. The file linked below includes all available data for four stocks: Apple (AAPL), Facebook (FB), General Electric (GE), and IBM (IBM), and one index: the Dow Jones Industrial Average (DJIA).  

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt


#### Download Raw Data

In [None]:
raw_df = pd.read_csv('https://grantmlong.com/data/stocks.csv')
raw_df.set_index('Date', inplace=True)
print('Raw rows: %i' % raw_df.shape[0])
price_df = raw_df.dropna(axis=0)
print('Full rows: %i' % price_df.shape[0])


#### Transform Prices to Returns

In [None]:
return_df = price_df.pct_change(periods=1).dropna(axis=0)
return_df.tail(10)

### Part I: Visualize Returns

Look at the Build histograms for the returns for each of the stocks. What do the distributions of these stocks look like?

Which is has the highest average return? Which is the most volatile?

1. Identify and plot the summary statistics to answer each of these 
2. Are there other plots that might also be useful in illustrating these concepts?

### Part II: Confidence Intervals

Based on this data, find the most you would lose on a \$10,000 investment with 95 percent and 99 percent confidence.
* Is it fair to call this a confidence interval?
* How else might you calculate such as confidence interval?
* [Time permitting] For the stocks with more data available, how does the inclusion of the historical returns change things?

### Part III: Correlation

For each of the four stocks, which are most correlated with each other?
* Why might this be the case?
* Can we visualize these correlations?

For each of the four stocks, which are most correlated with the broader market index?
* How might we use the data?
* If we regress these returns against the market which has the biggest slope and intercept?

## Bonus Round: Make a case, using data, for investing in one of these four stocks.