# Exercise 4 - occ volume report

The purpose of this analysis is to practice `.groupby()` and `.agg()` by performing additional analysis on our occ volume data.  This exercise is fairly straight forward, so you should be able to complete it even if you haven't finished the occ volume tutorial.

#### 1) Import the `pandas` and `numpy` packages.

In [1]:
import numpy as np
import pandas as pd

#### 2) Read in the CSV file called `occ_option_volume_201808.csv`, call it `df_occ`.  This report consists of option volume data for the month of August 2018.

In [21]:
df_occ = pd.read_csv('../data/occ_option_volume_201808.csv')
df_occ.head()

Unnamed: 0,quantity,underlying,symbol,actype,porc,exchange,actdate
0,5850,ABX,1ABX,C,C,AMEX,08/02/2018
1,3050,ABX,1ABX,C,C,AMEX,08/16/2018
2,3050,ABX,1ABX,F,C,AMEX,08/16/2018
3,22600,AGI,1AGI,C,C,ARCA,08/13/2018
4,7600,AGI,1AGI,M,C,ARCA,08/13/2018


#### 3) Create a custom aggregation function called `volume_millions()`.  The aggreation function will take as its argument a vector of numbers called `quantity`.  The function will: 1) take the sum of `quantity`; 2) divide the sum by 1 million; 3) round to two decimal places.

In [9]:
def volume_millions(quantity):
    return round(np.sum(quantity) / 1000000, 2)

#### 4) Use the `.groupby()` method, along with our function `volume_millions` to create a `DataFrame` that shows the total volume by exchange for August 2018.  Sort the `DataFrame` from biggest to smallest.

In [20]:
df_occ.groupby(['exchange'])['quantity'].agg([volume_millions]).reset_index()\
    .sort_values('volume_millions', ascending=False)

Unnamed: 0,exchange,volume_millions
5,CBOE,202.54
14,PHLX,120.61
1,ARCA,82.07
8,ISE,71.82
13,NSDQ,66.01
2,BATS,64.79
0,AMEX,62.2
11,MPRL,40.74
7,GEM,37.26
10,MIAX,34.54


#### 5) Use `pd.to_datetime()` to refactor that `actdate` column of `df_occ` as a date. 

In [22]:
df_occ['actdate'] = pd.to_datetime(df_occ['actdate'], format='%m/%d/%Y')
df_occ.dtypes

quantity               int64
underlying            object
symbol                object
actype                object
porc                  object
exchange              object
actdate       datetime64[ns]
dtype: object

#### 6) Use a `.groupby()` to calculate daily option volume (in millions of contracts).

In [17]:
df_occ_daily = \
    df_occ.groupby(['actdate'])['quantity'].agg([volume_millions]).reset_index()

#### 7) Use `DataFrame` masking to find the day with the largest daily volume for August 2018.  

In [18]:
df_occ_daily[df_occ_daily.volume_millions == np.max(df_occ_daily.volume_millions)]

Unnamed: 0,actdate,volume_millions
10,2018-08-15,52.58


#### 8) There are three different account types: customer, firm, market-maker.  Calculate the total volume for each account type (in terms of millions of contracts).

In [19]:
df_occ.groupby(['actype'])['quantity'].agg([volume_millions]).reset_index()

Unnamed: 0,actype,volume_millions
0,C,367.44
1,F,111.76
2,M,373.28
