# Solutions

## About the Data
In this notebook, we will be working with 2 datasets:
- 2018 stock data for Facebook, Apple, Amazon, Netflix, and Google (obtained using the [`stock_analysis` package](https://github.com/stefmolin/stock-analysis)) and earthquake data from the USGS API.
- Earthquake data from September 18, 2018 - October 13, 2018 (obtained from the US Geological Survey (USGS) using the [USGS API](https://earthquake.usgs.gov/fdsnws/event/1/))

## Setup

In [1]:
import pandas as pd
import numpy as np

quakes = pd.read_csv('../../ch_04/exercises/earthquakes.csv')
faang = pd.read_csv('../../ch_04/exercises/faang.csv', index_col='date', parse_dates=True)

## Exercise 1
With the `exercises/earthquakes.csv` file, select all the earthquakes in Japan with a `magType` of `mb` and a magnitude of 4.9 or greater.

In [2]:
quakes.query(
    "parsed_place == 'Japan' and magType == 'mb' and mag >= 4.9"
)[['mag', 'magType', 'place']]

Unnamed: 0,mag,magType,place
1563,4.9,mb,"293km ESE of Iwo Jima, Japan"
2576,5.4,mb,"37km E of Tomakomai, Japan"
3072,4.9,mb,"15km ENE of Hasaki, Japan"
3632,4.9,mb,"53km ESE of Hitachi, Japan"


## Exercise 2
Create bins for each full number of magnitude (for example, the first bin is 0-1, the second is 1-2, and so on) with `magType` of `ml` and count how many are in each bin.

In [3]:
quakes.query("magType == 'ml'").assign(
    mag_bin=lambda x: pd.cut(x.mag, np.arange(0, 10))
).mag_bin.value_counts()

(1, 2]    3105
(0, 1]    2207
(2, 3]     862
(3, 4]     122
(4, 5]       2
(5, 6]       1
(8, 9]       0
(7, 8]       0
(6, 7]       0
Name: mag_bin, dtype: int64

## Exercise 3
Using the `exercises/faang.csv` file, group by the ticker and resample to monthly frequency. Aggregate the open and close prices with the mean, the high price with the max, the low price with the min, and the volume with the sum.

In [4]:
faang.groupby('ticker').resample('1M').agg(
    {
        'open': np.mean,
        'high': np.max,
        'low': np.min,
        'close': np.mean,
        'volume': np.sum
    }
)

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAPL,2018-01-31,43.505357,45.025002,41.174999,43.501309,2638718000.0
AAPL,2018-02-28,41.819079,45.154999,37.560001,41.909737,3711577000.0
AAPL,2018-03-31,43.761786,45.875,41.235001,43.624048,2854911000.0
AAPL,2018-04-30,42.44131,44.735001,40.157501,42.458572,2664617000.0
AAPL,2018-05-31,46.239091,47.592499,41.317501,46.384205,2483905000.0
AAPL,2018-06-30,47.180119,48.549999,45.182499,47.155357,2110498000.0
AAPL,2018-07-31,47.549048,48.990002,45.855,47.577857,1574766000.0
AAPL,2018-08-31,53.121739,57.217499,49.327499,53.336522,2801276000.0
AAPL,2018-09-30,55.582763,57.4175,53.825001,55.518421,2715888000.0
AAPL,2018-10-31,55.3,58.3675,51.522499,55.211413,3158994000.0


## Exercise 4
Build a crosstab with the earthquake data between the `tsunami` column and the `magType` column. Rather than showing the frequency count, show the maximum magnitude that was observed for each combination.

In [5]:
pd.crosstab(quakes.tsunami, quakes.magType, values=quakes.mag, aggfunc='max')

magType,mb,mb_lg,md,mh,ml,ms_20,mw,mwb,mwr,mww
tsunami,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,5.6,3.5,4.11,1.1,4.2,,3.83,5.8,4.8,6.0
1,6.1,,,,5.1,5.7,4.41,,,7.5


## Exercise 5
Calculate the rolling 60-day aggregations of OHLC data by ticker for the FAANG data. Use the same aggregations as exercise 3.

In [6]:
faang.groupby('ticker').rolling('60D').agg(
    {
        'open': np.mean,
        'high': np.max,
        'low': np.min,
        'close': np.mean,
        'volume': np.sum
    }
)

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AAPL,2018-01-02,42.540001,43.075001,42.314999,43.064999,102223600.0
AAPL,2018-01-03,42.836250,43.637501,42.314999,43.061249,220295200.0
AAPL,2018-01-04,42.935833,43.637501,42.314999,43.126666,310033600.0
AAPL,2018-01-05,43.041875,43.842499,42.314999,43.282499,404673600.0
AAPL,2018-01-08,43.151000,43.902500,42.314999,43.343500,486944800.0
...,...,...,...,...,...,...
NFLX,2018-12-24,283.509251,332.049988,233.679993,281.931750,525657600.0
NFLX,2018-12-26,281.844501,332.049988,231.229996,280.777750,520444300.0
NFLX,2018-12-27,281.070489,332.049988,231.229996,280.162927,532679500.0
NFLX,2018-12-28,279.916342,332.049988,231.229996,279.461464,521973500.0


## Exercise 6
Create a pivot table of the FAANG data that compares the stocks.

In [7]:
faang.pivot_table(index='ticker')

Unnamed: 0_level_0,close,high,low,open,volume
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,47.263357,47.748526,46.795877,47.277859,136080300.0
AMZN,1641.726176,1662.839839,1619.840519,1644.072709,5648994.0
FB,171.510956,173.613347,169.303148,171.472948,27658600.0
GOOG,1113.225134,1125.777606,1101.001658,1113.554101,1741965.0
NFLX,319.290319,325.219322,313.18733,319.620558,11469620.0


## Exercise 7
Calculate the Z-scores of Netflix's data (ticker: NFLX).

In [8]:
faang.query("ticker == 'NFLX'").drop(columns='ticker').apply(
    lambda x: x.sub(x.mean()).div(x.std())
).head()

Unnamed: 0_level_0,high,low,open,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-02,-2.515825,-2.410229,-2.500752,-2.416646,-0.088638
2018-01-03,-2.422985,-2.285796,-2.380291,-2.335287,-0.507472
2018-01-04,-2.405883,-2.234631,-2.296272,-2.323431,-0.959154
2018-01-05,-2.345415,-2.20209,-2.275014,-2.234304,-0.782205
2018-01-08,-2.294923,-2.143761,-2.218933,-2.192194,-1.03839


## Exercise 8
Adding event descriptions:
1. Create a dataframe with three columns: ticker, date, and event.
    1. ticker will be 'FB'.
    2. date will be datetimes ['2018-07-25', '2018-03-19', '2018-03-20']
    3. event will be ['Disappointing user growth announced after close.', 'Cambridge Analytica story', 'FTC investigation'].
2. Merge this data to the FAANG data with a outer join.

In [9]:
events = pd.DataFrame({
    'ticker': 'FB',
    'date': pd.to_datetime(
         ['2018-07-25', '2018-03-19', '2018-03-20']
    ), 'event': [
         'Disappointing user growth announced after close.',
         'Cambridge Analytica story',
         'FTC investigation'
    ]
}).set_index(['date', 'ticker'])
faang.reset_index().set_index(['date', 'ticker']).join(
    events, how='outer'
).sample(10, random_state=0)

Unnamed: 0_level_0,Unnamed: 1_level_0,high,low,open,close,volume,event
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-03,AAPL,43.637501,42.990002,43.1325,43.057499,118071600.0,
2018-05-23,NFLX,345.0,328.089996,329.040009,344.720001,10049100.0,
2018-01-17,FB,179.320007,175.800003,179.259995,177.600006,27992400.0,
2018-10-17,AMZN,1845.0,1807.0,1842.790039,1831.72998,5295200.0,
2018-02-26,AMZN,1522.839966,1507.0,1509.199951,1521.949951,4955000.0,
2018-01-05,GOOG,1104.25,1092.0,1094.0,1102.22998,1279100.0,
2018-04-04,FB,155.559998,150.509995,152.029999,155.100006,49885600.0,
2018-05-30,AMZN,1626.0,1612.930054,1618.099976,1624.890015,2907400.0,
2018-04-17,NFLX,338.619995,323.769989,329.660004,336.059998,33866500.0,
2018-06-15,AMZN,1720.869995,1708.52002,1714.0,1715.969971,4777600.0,


## Exercise 9
Use the `transform()` method on the FAANG data, to represent all the values in terms of the first date in the data. To do so, divide all values by the values of the first date. This is referred to as an index and the first date is the base. [More information](https://ec.europa.eu/eurostat/statistics-explained/index.php/Beginners:Statistical_concept_-_Index_and_base_year).

In [10]:
faang = faang.reset_index().set_index(['ticker', 'date'])
(faang / faang.groupby(level=['ticker']).transform('first')).head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,high,low,open,close,volume
ticker,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
FB,2018-01-02,1.0,1.0,1.0,1.0,1.0
FB,2018-01-03,1.017623,1.02129,1.023638,1.017914,0.930294
FB,2018-01-04,1.025498,1.036891,1.040635,1.01604,0.764708
FB,2018-01-05,1.029298,1.041566,1.044518,1.029931,0.747828
FB,2018-01-08,1.040313,1.049451,1.053579,1.037813,0.99134
FB,2018-01-09,1.039762,1.053788,1.062022,1.035553,0.682744
FB,2018-01-10,1.034751,1.045508,1.052116,1.035387,0.580099
FB,2018-01-11,1.037559,1.055365,1.060333,1.035002,0.528242
FB,2018-01-12,0.999449,0.999155,1.002139,0.9887,4.272352
FB,2018-01-16,1.000936,1.00276,1.021499,0.983298,1.993389
