# Solutions

## About the Data
In this notebook, we will be working with 2018 stock data for Facebook, Apple, Amazon, Netflix, and Google (obtained using the [`stock_analysis` package](https://github.com/stefmolin/stock-analysis)).

## Setup

In [1]:
import pandas as pd

## Exercise 1
We want to look at data for the FAANG stocks (Facebook, Apple, Amazon, Netflix, and Google), but we were given each as a separate CSV file. Make them into a single file and store the dataframe of the FAANG data as `faang`:
1. Read each file in.
2. Add a column to each dataframe indicating the ticker it is for.
3. Append them together into a single dataframe.
4. Save the result to a CSV file.

In [2]:
faang = pd.DataFrame()
for ticker in ['fb', 'aapl', 'amzn', 'nflx', 'goog']:
    df = pd.read_csv(f'../../ch_03/exercises/{ticker}.csv')
    # make the ticker the first column
    df.insert(0, 'ticker', ticker.upper())
    faang = faang.append(df)

faang.to_csv('faang.csv', index=False)

## Exercise 2
With `faang`, use type conversion to change the `date` column to datetime and the `volume` column to integers. Then sort by `date` and `ticker`.

In [3]:
faang = faang.assign(
    date=pd.to_datetime(faang.date),
    volume=faang.volume.astype(int)
).sort_values(
    ['date', 'ticker']
)

faang.head()

Unnamed: 0,ticker,date,high,low,open,close,volume
0,AAPL,2018-01-02,43.075001,42.314999,42.540001,43.064999,102223600
0,AMZN,2018-01-02,1190.0,1170.51001,1172.0,1189.01001,2694500
0,FB,2018-01-02,181.580002,177.550003,177.679993,181.419998,18151900
0,GOOG,2018-01-02,1066.939941,1045.22998,1048.339966,1065.0,1237600
0,NFLX,2018-01-02,201.649994,195.419998,196.100006,201.070007,10966900


## Exercise 3
Find the 7 rows with the highest value for `volume`.

In [4]:
faang.nlargest(7, 'volume')

Unnamed: 0,ticker,date,high,low,open,close,volume
182,AAPL,2018-09-21,55.34,54.322498,55.195,54.415001,384986800
245,AAPL,2018-12-21,39.540001,37.407501,39.215,37.682499,382978400
212,AAPL,2018-11-02,53.412498,51.357498,52.387501,51.869999,365314800
22,AAPL,2018-02-02,41.700001,40.025002,41.5,40.125,346375200
23,AAPL,2018-02-05,40.970001,39.0,39.775002,39.122501,290954000
27,AAPL,2018-02-09,39.4725,37.560001,39.267502,39.102501,282690400
24,AAPL,2018-02-06,40.93,38.5,38.7075,40.7575,272975200


## Exercise 4
Right now, the data is somewhere between long and wide format. Use `melt()` to make it completely long format.

In [5]:
melted_faang = faang.melt(
    id_vars=['ticker', 'date'], 
    value_vars=['open', 'high', 'low', 'close', 'volume']
)
melted_faang.head()

Unnamed: 0,ticker,date,variable,value
0,AAPL,2018-01-02,open,42.540001
1,AMZN,2018-01-02,open,1172.0
2,FB,2018-01-02,open,177.679993
3,GOOG,2018-01-02,open,1048.339966
4,NFLX,2018-01-02,open,196.100006


## Exercise 5
Suppose we found out there was a glitch in how the data was recorded on July 26, 2018. How should we handle this?

> Given that this is a large data set (~ 1 year), we would be tempted to just drop that date and interpolate. However, some preliminary research on that date for the FAANG stocks reveals that FB took a huge tumble that day. If we had interpolated, we would have missed the magnitude of the drop.