# Python Jumpstart

The purpose of this tutorial is to introduce *Jupyter Notebook* files, and to give a glimpse of how to use them to work with financial data. 

In particular, we will visualize stock index data to observe the leverage effect: when the market suffers losses, prices become more volatile.

### What is a Notebook?

This file - the one you are currently interacting with - is a Jupyter Notebook.  

The notebook format conveniently allows you to combine words/sentences, computer code, code output (including plots), and mathematical notation.  Notebooks have proven to be a convenient and productive programming environment for data analysis.

For those of you familiar with R, a Jupyter Notebook is similar in functionality to R Markdown notebooks.

Behind the scenes of a Jupyter Notebook is a *kernel* that is responsible for executing computations.  The kernel can live locally on your machine or on a remote server.

### Code Cells

A notebook is structured as a sequence of *cells*.  There are two kinds of cells: 1) code cells that contain code; 2) markdown cells which contain markdown or latex.

The cell below is a code cell - try typing the code that is commented out and the press **shift + enter**.

In [None]:
##> from IPython.display import Image
##> Image("not_ethical.png")




### Edit Mode vs Command Mode

There are two modes in a notebook: 1) **edit** mode; 2) **command** mode.  

In **edit** mode you are *inside* a cell and you can edit the contents of the cell.  

In **command** mode, you are *outside* the cells and you can navigate between them.  

### Keyboard Shortcuts

Here are some of my favorite keyboard shortcuts:

edit mode: **enter**

command mode: **esc**

navigate up: **k**

navigate down: **j**

insert cell above: **a**

insert cell below: **b**

delete cell: **d, d** (press **d** twice)

switch to code cell: **y**

switch to markup cell: **m**

execute and stay on current cell: **ctrl + enter**

execute and move down a cell: **shift + enter**

### Drop Down Menus

Here are a few of the drop down menu functions that I use frequently:

*Kernel > Restart Kernel and Clear All Outputs*

*Kernel > Restart Kearnel and Run All Cells*

*Run > Run All Above Selected Cell*

### Importing Packages

The power and convenience of Python as a data analysis tool comes from the ecosystem of freely available third party packages.

Here are the packages that we will be using in this tutorial:

`numpy` - efficient vector and matrix computations

`pandas` - working with `DataFrames`

`pandas_datareader` - reading data from Yahoo Finance

The following code imports these packages and assigns them each an alias.

In [None]:
##> import numpy as np
##> import pandas as pd
##> import pandas_datareader as pdr




### Reading-In Stock Data into a `DataFrame`

Let's begin by reading in 5 years of SPY price data from Yahoo Finance.  

SPY is an ETF that tracks the performace of the SP500 stock index.

In [None]:
##> df_spy = pdr.get_data_yahoo('SPY', start='2014-01-01', end='2019-01-01')
##> df_spy = df_spy.round(2)
##> df_spy.head()




Our stock data now lives in the variable called `df_spy`, which is a `pandas` data structure known as a `DataFrame`.  We can see this by using the following code:

In [None]:
##> type(df_spy)



### `DataFrame` Index

In `pandas`, a `DataFrame` always has an index.  For `df_spy` the `Dates` form the index.

In [None]:
##> df_spy.index



I don't use indices very much, so let's make the `Date` index just a regular column.  Notice that we can modify `DataFrames` inplace.

In [None]:
##> df_spy.reset_index(inplace=True)
##> df_spy




Notice that `df_spy` still has an index, now it's just a sequence of integers.

In [None]:
##> df_spy.index



### A Bit of Cleaning

As a matter of preference, I like to make my column names snake case.

In [None]:
##> df_spy.columns = df_spy.columns.str.lower().str.replace(' ','_')
##> df_spy.head()




Let's also remove the columns that we won't need.  We first create a `list` of the column names that we want to get rid of and then we use the `DataFrame.drop()` method.

In [None]:
##> lst_cols = ['high', 'low', 'open', 'close', 'volume',]
##> df_spy.drop(columns=lst_cols, inplace=True)
##> df_spy.head()




Notice that trailing commas are not an issue in Python.

### `Series`

You can isolate the columns of a `DataFrame` with square brackets as follows:

In [None]:
##> df_spy['adj_close']



The columns of a `DataFrame` are a `pandas` data structure called a `Series`.

In [None]:
##> type(df_spy['adj_close'])



###  `numpy` and `ndarrays`

Python is a general purpose programming language and was not created for scientific computing in particular.  One of the foundational packages that makes Python well suited to scientific computing is `numpy`, which has a variety of features including a data type called `ndarrays`.  One of the benefits of `ndarrays` is that they allow for efficient vector and matrix computation.

The `values` of a `Series` object is a `numpy.ndarray`.  This is one sense in which `pandas` is *built on top of* `numpy`.

In [None]:
##> df_spy['adj_close'].values



In [None]:
##> type(df_spy['adj_close'].values)



### `Series` Built-In Methods

`Series` have a variety of built-in methods that provide convenient summarization and modification functionality.  For example, you can `.sum()` all the elements of the `Series`.

In [None]:
##> df_spy['adj_close'].sum()



Next, we calculate the standard deviation of all the elements of the `Series`.

In [None]:
##> df_spy['adj_close'].std()



The `.shift()` built-in method will be useful for calculating returns in the next section.

In [None]:
##> df_spy['adj_close'].shift()



### Calculating Daily Returns

Our analysis analysis of the leverage effect will involve daily returns for all the days in `df_spy`.  Let's calculate those now.

Recall that the end-of-day day $t$ return of a stock is defined as: $r_{t} = \frac{S_{t}}{S_{t-1}} - 1$, where $S_{t}$ is the stock price at end-of-day $t$.

Here is a vectorized approach to calculating all the daily returns in a single line of code.

In [None]:
##> df_spy['ret'] = df_spy['adj_close'] / df_spy['adj_close'].shift(1) - 1
##> df_spy.head()



Notice that we can create a new column of a `DataFrame` by using variable assignment syntax.

### Visualizing Adjusted Close Prices

Python has a variety of packages that can be used for visualization.  For this tutorial, we will focus on built-in plotting capabilities of `pandas`.  These capabilities are built on top of the `matplotlib` package, which is the foundation of much of Python's visualization ecosystem.

`DataFrames` have a built-in `.plot()` method that makes creating simple line graphs quite easy.

In [None]:
##> df_spy.plot(x='date', y='adj_close');



If we wanted to make this graph more presentable we could do something like:

In [None]:
##> ax = df_spy.\
##>         plot(
##>             x = 'date',
##>             y = 'adj_close',
##>             title = 'SPY: 2014-2018',
##>             grid = True,
##>             style = 'k',
##>             alpha = 0.75,
##>             figsize = (9, 4),
##>         );
##> ax.set_xlabel('Trade Date');
##> ax.set_ylabel('Close Price');




Notice that the `ax` variable created above is a `matplotlib` object.

In [None]:
##> type(ax)



### Visualizing Returns

Pandas also gives us the ability to simultaneously plot two different columns of a `DataFrame` in separate subplots of a single graph.  Here is what that code looks like:

In [None]:
##> df_spy.plot(x='date', y=['adj_close', 'ret',], subplots=True, style='k', alpha=0.75, figsize=(9, 8), grid=True);



The `returns` graph above is a bit of a hack, but it's used all the time in finance to demonstrate volatility clustering.

Notice that whenever there is a sharp drop in the `adj_close` price graph, that the magnitude of the nearby returns becomes large.  In contrast, during periods of steady growth (e.g. all of 2017) the magnitude of the returns is small.

### Calculating Realized Volatility

Realized volatility is defined as the standard deviation of the daily returns; it indicates how much variability in the stock price there has been.  It is a matter of convention to annualize this quantity, so we multiply it by $\sqrt{252}$.

The following vectorized code calculates a rolling 2-month volatility for our SPY price data.

In [None]:
##> df_spy['ret'].rolling(42).std() * np.sqrt(252)



Let's add these realized volatility calculations to`df_spy` this with the following code: 

In [None]:
##> df_spy['realized_vol'] = df_spy['ret'].rolling(42).std() * np.sqrt(252)
##> df_spy




### Visualizing Realized Volatility

We can easily add `realized_vol` to our graph with the following code:

In [None]:
##> df_spy.plot(x = 'date', y = ['adj_close','ret','realized_vol',], subplots=True, style='k', alpha=0.75, figsize=(9, 12), grid=True);



This graph is an excellent illustration of the leverage effect.  When SPY suffers losses, there is a spike in realized volatility, which is to say that the magnitude of the nearby returns increases.

## Further Reading

*Python Data Science Handbook* - Jake VanderPlas

*Python for Finance* - Yves Hilpisch

*Python for Data Analysis* - Wes McKinney

*Automate the Boring Stuff* - Al Sweigert