# Solutions

## About the Data
In this notebook, we will be working with 2018 stock data for Facebook, Apple, Amazon, Netflix, and Google (obtained using the [`stock_analysis` package](https://github.com/stefmolin/stock-analysis)).

## Setup

In [1]:
import pandas as pd

## Exercise 1
We want to look at data for the FAANG stocks (Facebook, Apple, Amazon, Netflix, and Google), but we were given each as a separate CSV file. Make them into a single file and store the dataframe of the FAANG data as `faang`:
1. Read each file in.
2. Add a column to each dataframe indicating the ticker it is for.
3. Append them together into a single dataframe.
4. Save the result to a CSV file.

In [2]:
faang = pd.DataFrame()
for ticker in ['fb', 'aapl', 'amzn', 'nflx', 'goog']:
    df = pd.read_csv(f'../../ch_03/exercises/{ticker}.csv')
    # make the ticker the first column
    df.insert(0, 'ticker', ticker.upper())
    faang = faang.append(df)

faang.to_csv('faang.csv', index=False)

## Exercise 2
With `faang`, use type conversion to change the `date` column to datetime and the `volume` column to integers. Then sort by `date` and `ticker`.

In [3]:
faang = faang.assign(
    date=pd.to_datetime(faang.date),
    volume=faang.volume.astype(int)
).sort_values(
    ['date', 'ticker']
)

In [4]:
faang.head()

Unnamed: 0,ticker,date,open,high,low,close,volume
0,AAPL,2018-01-02,166.9271,169.0264,166.0442,168.9872,25555934
0,AMZN,2018-01-02,1172.0,1190.0,1170.51,1189.01,2694494
0,FB,2018-01-02,177.68,181.58,177.55,181.42,18151903
0,GOOG,2018-01-02,1048.34,1066.94,1045.23,1065.0,1237564
0,NFLX,2018-01-02,196.1,201.65,195.42,201.07,10966889


## Exercise 3
Find the 7 rows with the highest value for `volume`.

In [5]:
faang.nlargest(7, 'volume')

Unnamed: 0,ticker,date,open,high,low,close,volume
142,FB,2018-07-26,174.89,180.13,173.75,176.26,169803668
53,FB,2018-03-20,167.47,170.2,161.95,168.15,129851768
57,FB,2018-03-26,160.82,161.1,149.02,160.06,126116634
54,FB,2018-03-21,164.8,173.4,163.3,169.39,106598834
182,AAPL,2018-09-21,219.0727,219.6482,215.6097,215.9768,96246748
245,AAPL,2018-12-21,156.1901,157.4845,148.9909,150.0862,95744384
212,AAPL,2018-11-02,207.9295,211.9978,203.8414,205.8755,91328654


## Exercise 4
Right now, the data is somewhere between long and wide format. Use `melt()` to make it completely long format.

In [6]:
melted_faang = faang.melt(
    id_vars=['ticker', 'date'], 
    value_vars=['open', 'high', 'low', 'close', 'volume']
)
melted_faang.head()

Unnamed: 0,ticker,date,variable,value
0,AAPL,2018-01-02,open,166.9271
1,AMZN,2018-01-02,open,1172.0
2,FB,2018-01-02,open,177.68
3,GOOG,2018-01-02,open,1048.34
4,NFLX,2018-01-02,open,196.1


## Exercise 5
Suppose we found out there was a glitch in how the data was recorded on July 26, 2018. How should we handle this?

> Given that this is a large data set (~ 1 year), we would be tempted to just drop that date and interpolate. However, some preliminary research on that date for the FAANG stocks reveals that FB took a huge tumble that day. If we had interpolated, we would have missed the magnitude of the drop.