# Tutorial 13 - Bar Charts with Pandas

The purpose of this tutorial is to demonstrate how to easily generate bar charts with the `pandas` built-in `.plot()` function.

We apply this technique to the task of visualizing monthly pnls for the `spy_2018_call_pnl.csv` data set.

### Load Packages

Let's begin by loading the packages we need:

In [None]:
##> import numpy as np
##> import pandas as pd
##> %matplotlib inline




**Knowledge Challenge:** What is the purpose of this line of code in the above cell: `%matplotlib inline`?

### Reading-In Data

Next, let's read in the data from the CSV file:

In [None]:
##> df_pnl = pd.read_csv('../data/spy_2018_call_pnl.csv')
##> df_pnl.head()




This is the same data set as Exercise 06.  It consists of daily PNLs from 12 different SPY short call trades throughout 2018.

### Wrangling

First, we will need to refactor the `expiration` and `data_date` columns to `datetime` using the `pd.to_datetime()` method.

In [None]:
##> df_pnl['expiration'] = pd.to_datetime(df_pnl['expiration'])
##> df_pnl['data_date'] = pd.to_datetime(df_pnl['data_date'])




We are interested in total pnl, which is the sum of the option pnl and the delta-hedge PNL.  Let's add a column called `dly_tot_pnl` which captures this logic.

In [None]:
##> df_pnl['dly_tot_pnl'] = df_pnl['dly_opt_pnl'] + df_pnl['dly_dh_pnl']



As the final step of our wrangling, let's extract the year and month of the expiration date, as this is what we will use for grouping.

In [None]:
##> df_pnl['year'] = df_pnl['expiration'].dt.year
##> df_pnl['month'] = df_pnl['expiration'].dt.month




### `groupby()` and `agg()`

We are interested in graphing the PNLs by expiration, so let's sum up the total-pnl by the year and month of the expiration:

In [None]:
##> df_monthly = \
##>     df_pnl.groupby(['year', 'month'])['dly_tot_pnl'].agg([np.sum]).reset_index()




We're ultimately going to use the `month` column to the graph the data, so let's set this as the index.

In [None]:
##> df_monthly.set_index(['month'], inplace=True)



Before we proceed to graphing, let's change the name of the aggregated pnl column to something more meaningful.

In [None]:
##> df_monthly.rename(columns={'sum':'monthly_pnl'}, inplace=True)



### Visualizing the Data

Creating a simple bar graph of the pnls in `df_monthly` can be done easily with a single line of code:

In [None]:
##> df_monthly['monthly_pnl'].plot(kind='bar');



While the above graph may be fine for EDA purposes, it still leaves much to be desired, especially if our intention is to share it with a broader audience. 


The following code makes several of modifications to improve its appearance:

In [None]:
##> ax =\
##>     df_monthly['monthly_pnl'].\
##>         plot(
##>             kind='bar'
##>             , color='k' # color is greay
##>             , grid=True # adding a grid
##>             , alpha=0.75 # translucence
##>             , width=0.8 # increasing the width of the bars
##>             , title='Monthly PNL for SPY Calls'
##>             , figsize=(8, 4) # modifying the figure size
##>         );
##> 
##> ax.set_xlabel("Month"); # x axis label
##> ax.set_ylabel("PNL");   # y axis label




**Code Challenge:** Google and try to find how you creat a *horizontal* bar graph using `pandas`.

### A Few Words About Visualization

Visualizing data can be an effective way of communicating results to others or exploring data on your own.  The benefit of visualization becomes self-evident when we can convey a particular result more quickly and more viscerally with a graph rather than a table of number.

This is nicely illustrated by comparing our bar graph to the original `DataFrame` of data.  Consider the following question: 

*What were the two worst PNL months for these SPY calls?*

Do you find it easier to answer the question using the bar graph or the table?  Explain why?

In [None]:
##> ax = \
##>     df_monthly['monthly_pnl'].\
##>         plot(
##>             kind='bar'
##>             , color='k' # color is greay
##>             , grid=True # adding a grid
##>             , alpha=0.75 # translucence
##>             , width=0.8 # increasing the width of the bars
##>             , title='Monthly PNL for SPY Calls'
##>             , figsize=(8, 4) # modifying the figure size
##>         );
##> 
##> ax.set_xlabel("Month"); # x axis label
##> ax.set_ylabel("PNL");   # y axis label




In [None]:
##> df_monthly



### Related Reading

*P4DA* - 8.2 - Plotting Functions in `pandas` 