# Tutorial 06 - Calculating Returns

In this tutorial we start to perform some finance related calculations.  

In particular, we will calculate daily returns for SPY during the month of December 2018.

In order to do this we will first need to discuss two preliminary topics:

1. conditional statemments: `if`-`else`

2. iterating with `for` loops.

After we calculate returns with a `for` loop, we will show how to perform them in a *vectorized* manner, which is considered best practice in data analysis.

## Loading Packages

Let's begin by loading the packages we will need.

In [1]:
##> import numpy as np
##> import pandas as pd




## Conditional Statements: `if` - `else`

Conditional statements are ubiquitous in programming and data analysis.  


Here is a toy example to demonstrate the syntax of `if` statements in python:

In [2]:
##> payoff = np.nan
##> strike = 50
##> upx = 45
##> if (upx - strike) > 0:
##>     payoff = upx - strike
##> else:
##>     payoff = 0
##>     
##> print(payoff)




**Code Challenge:** Try modifying `upx` above and below `strike` and rerunning the code.

**Question:** The above code represents the payoff of a vanilla option?  Is it a put or call?

## Reading-In Data

In order to discuss `for` loops we will need a `DataFrame` to play with; let's read in our data set of December 2018 SPY prices.

In [3]:
##> df_spy = pd.read_csv('../data/spy_dec_2018.csv')
##> df_spy.head()




## Iteration: `for` loops

A `for` loop allows you to iterate through a block of code a fixed number of times.

#### iterating through a list

Here is a toy example to demonstrate how to use a for-loop to iterate through the contents of a `list`:

In [4]:
##> L = [np.pi, True, 'SPY']
##> for ix in L:
##>     print(ix)




#### iterating over a set of integers with `range()`

A very common pattern is to iterate through a consecutive set positive integers.  

In Python, this is usually accomplished by using a `range` object.  A `range` object is a built-in Python type that creates a sequence of integers on the fly, as you need them. 


The benefit of using a `range` vs a `list` or `tuple` is that you don't create all the integers at once; this is helpful because if the set of integers is very large, it can take up a lot of memory.

In [5]:
##> for ix in range(10):
##>     print(ix)




#### iterating through the rows of a `DataFrame`

A pattern that is often utilized in financial data analysis is iterating through a `DataFrame`.

In [6]:
##> for ix in df_spy.index:
##>     print(df_spy.at[ix, 'date'])




The effect of the above code was to iterate through `df_spy` row by row, and to print the date column for each row.  

Notice that we use the indexer attribute `.at[]` - it is similar to `.loc[]` but `.at[]` can only grab a single row at a time.

## Calculating Returns - `for` loop

We can now put together our tools of conditional exexcution and iteration to calculuate the daily returns of SPY during the month of December 2018.

Our code will encapsulate the following two principles about returns:

1. On the first day in our data set, the return is undefined and will be set to `NaN`.

2. On all subsequent days the simple return is defined as `(curr_price / prev_price) - 1`. 

Let's begin with a couple of preliminary steps. 

First, we will sort the `DataFrame` by the `date` column because our algorithm for calculating returns requires the prices to be in the correct order.

In [7]:
##> df_spy.sort_values('date', axis = 0, ascending = True, inplace = True)
##> df_spy




Second, let's add a `return` column to `df_spy` and initialize it with `NaN` values.  `NaN` values represent missing numerical values. 

In [8]:
##> df_spy['return'] = np.nan
##> df_spy




Now we are ready to complete the task of calculating the daily returns.

The following code iterates through `df_spy` and performs different calculation depending on whether it's the first date in the dataset.

In [9]:
##> for ix in df_spy.index:
##>     if ix > 0:
##>         # grabbing prices from data frame
##>         curr_price = df_spy.at[ix, 'close']
##>         prev_price = df_spy.at[ix - 1, 'close']
##>         df_spy.at[ix, 'return'] = (curr_price / prev_price) - 1 # principle 2
##>     else:
##>         # for the first price, just set return to NaN
##>         df_spy.at[ix, 'return'] = np.nan # principle 1
##>         
##> df_spy




**Coding Challenge:** Calculate the one month return over the entirety of December 2018. (Bonus points if you can do it with using division; hint - `np.prod()` caclulates the products of an array of numbers.) 

## Calculating Returns - Vectorized

For a variety of reasons, writing `for` loops is often not considered best practice.

That being said, don't confuse this to mean that you should *never* write a `for` loop.  In many situations, there isn't any harm in writing a `for`.  And in some situations, a `for` loop is necessary, or is the best way to keep careful track of what exactly you are doing.  I empower you to write `for` loops if it works.

However, our returns calculation is a good example of a situation in which there is a better alternative to writing a `for` loop. In this final section of the tutorial we will show a more Pythonic way of doing this via *vectorized* code.

Let's begin by seperating out the `close` column from `df_spy`, calling it `ser_close`.  Recall that `ser_close` is a `pandas.Series` object.  

Next we are going to apply the `.shift(1)` method to it, which has the effect of *pushing* down the elements of the `Series`.

In [10]:
##> ser_close = df_spy['close']
##> print(ser_close.head())
##> print(ser_close.shift(1).head())




Using `ser_close.shift(1)` we can calculate the returns in a single line.

In [11]:
##> df_spy['return_vec'] = (ser_close / ser_close.shift(1)) - 1
##> df_spy




**Code Challenge:** Add a column called `ret_adj` to `df_spy` that contains the returns using the `adjusted` prices.  Use a vectorized approach. 

**Discussion:** Do you like vectorized approach more or less than the `for` loop above (this is an opinion question so there is no right or wrong answer).

Final remarks on `for` loops:  There are a couple of considerations to keep in mind when choosing whether to use a `for` loop or not.

1. Is there a big difference in performance one way or the other?
2. Is one method easier for me to implement (given my current state of knowledge)?
3. Does one help me write code that's more organized or readable?

If performance is not a concern, and it very often is not, then do whatever makes your life easier (which is essentially a combination of #2 and #3)

## Related Reading

*WTP* - 8 - Control Flow

*WTP* - 11 - Iterators

*PDSH* - 2.5 - Computation on Arrays: Broadcasting

*RFF* - 5.4 - Returns