# Solutions

## Setup

In [1]:
import pandas as pd

## Exercise 1
We want to look at data for the FAANG stocks (Facebook, Apple, Amazon, Netflix, and Google), but we were given each as a separate CSV file (from nasdaq.com). Make them into a single file and store the dataframe of the FAANG data as `faang`:
1. Read each file in.
2. Add a column to each dataframe indicating the ticker it is for.
3. Append them together into a single dataframe.
4. Save the result to a CSV file.

In [2]:
faang = pd.DataFrame()
for ticker in ['fb', 'aapl', 'amzn', 'nflx', 'goog']:
    df = pd.read_csv(f'../FAANG/{ticker}.csv')
    # make the ticker the first column
    df.insert(0, 'ticker', ticker.upper())
    faang = faang.append(df)

faang.to_csv('faang.csv', index=False)

## Exercise 2
With `faang`, use type conversion to change the `date` column to datetime and the `volume` column to integers. Then sort by `date` and `ticker`.

In [3]:
faang = faang.assign(
    date=pd.to_datetime(faang.date),
    volume=faang.volume.astype(int)
).sort_values(
    ['date', 'ticker']
)

## Exercise 3
Find the 7 rows with the highest value for `volume`.

In [4]:
faang.nlargest(7, 'volume')

Unnamed: 0,ticker,date,close,volume,open,high,low
121,FB,2018-07-26,176.26,169059900,174.89,180.13,173.75
210,FB,2018-03-20,168.15,129654100,167.47,170.2,161.95
206,FB,2018-03-26,160.06,125971800,160.82,161.1,149.02
209,FB,2018-03-21,169.39,106166700,164.8,173.4,163.3
81,AAPL,2018-09-21,217.66,95584080,220.78,221.36,217.29
18,AAPL,2018-12-21,150.73,95497900,156.86,158.16,149.63
11,AAPL,2019-01-03,142.19,91106840,143.98,145.72,142.0


## Exercise 4
Right now, the data is somewhere between long and wide format. Use `melt()` to make it completely long format.

In [5]:
melted_faang = faang.melt(
    id_vars=['ticker', 'date'], 
    value_vars=['open', 'high', 'low', 'close', 'volume']
)
melted_faang.head()

Unnamed: 0,ticker,date,variable,value
0,AAPL,2018-01-18,open,179.37
1,AMZN,2018-01-18,open,1293.95
2,FB,2018-01-18,open,178.13
3,GOOG,2018-01-18,open,1131.41
4,NFLX,2018-01-18,open,220.34


## Exercise 5
Suppose we found out there was a glitch in how the data was recorded on July 26, 2018. How should we handle this?

> Given that this is a large data set (~ 1 year), we would be tempted to just drop that date and interpolate. However, some preliminary research on that date for the FAANG stocks reveals that FB took a huge tumble that day. If we had interpolated, we would have missed the magnitude of the drop.