# Tutorial 8 - The `DataFrame.apply()` Method 

We have already discussed how to add a new column to a `DataFrame` that is a simple function of existing columns.  

Suppose the situation is a little more complicated, and that the column we want to add is some kind of custom (user defined) function of existing columns.

In this tutorial we discuss two ways of doing this:

1. A `for` loop

2. `DataFrame.apply()`

We will use a simple finance task to motivate these two techniques: calculating the payoffs of expiring options.

## Loading Packages

Let's begin by loading the packages that we will need.

In [1]:
import numpy as np
import pandas as pd

## Reading-In Data

Next, let's read in a data file called `spy_expiring_option.csv`. 

This data set consists of 21 different options on `SPY` that expire on November 16, 2018.  

The `upx` column is the settle price of `SPY` from that day, and it will be used to calculate the payoff of each of these options.

In [2]:
df_opt = pd.read_csv("../data/spy_expiring_option.csv")
df_opt.head()

Unnamed: 0,underlying,upx,type,expiration,data_date,strike
0,SPY,273.730011,put,2018-11-16,2018-11-16,270.0
1,SPY,273.730011,put,2018-11-16,2018-11-16,270.5
2,SPY,273.730011,put,2018-11-16,2018-11-16,271.0
3,SPY,273.730011,put,2018-11-16,2018-11-16,271.5
4,SPY,273.730011,put,2018-11-16,2018-11-16,272.0


## Initializing Payoff Column

Our ultimate objective is to add a column of option payoffs to `df_opt`.  We are going to accomplish this task using two different methods.  

As a first step, let's add two column,  one for each method, to `df_opt` and then initialize them both with `np.nan`, which is a special data type that represents missing numerical data.

In [3]:
df_opt['payoff_loop'] = np.nan
df_opt['payoff_apply'] = np.nan
df_opt.head()

Unnamed: 0,underlying,upx,type,expiration,data_date,strike,payoff_loop,payoff_apply
0,SPY,273.730011,put,2018-11-16,2018-11-16,270.0,,
1,SPY,273.730011,put,2018-11-16,2018-11-16,270.5,,
2,SPY,273.730011,put,2018-11-16,2018-11-16,271.0,,
3,SPY,273.730011,put,2018-11-16,2018-11-16,271.5,,
4,SPY,273.730011,put,2018-11-16,2018-11-16,272.0,,


## Defining the `option_payoff()` Function

In the previous tutorial we defined an option payoff function.  Let's recycle that code here:

In [4]:
def option_payoff(cp, strike, upx):
    if cp == 'call':
        payoff = max(upx - strike, 0)
    elif cp == 'put':
        payoff = max(strike - upx, 0)
    
    return payoff

Whenever I create a function, I like to test it out on a few values to make sure that it works as I expect.

In [5]:
print(option_payoff('call', 100, 150))
print(option_payoff('put', 100, 150))

50
0


## Calculate `option_payoff` via `for` loop

Let's iterate through `df_opt` with a `for` loop and calculate the payoffs one by one.

In [6]:
for ix in df_opt.index:
    
    # grabbing data from dataframe
    opt_type = df_opt.at[ix, 'type']
    strike = df_opt.at[ix, 'strike']
    upx = df_opt.at[ix, 'upx']
    
    # calculating payoff
    payoff = option_payoff(opt_type, strike, upx)
    
    # putting payoff in dataframe
    df_opt.at[ix, 'payoff_loop'] = payoff
    
    
df_opt

Unnamed: 0,underlying,upx,type,expiration,data_date,strike,payoff_loop,payoff_apply
0,SPY,273.730011,put,2018-11-16,2018-11-16,270.0,0.0,
1,SPY,273.730011,put,2018-11-16,2018-11-16,270.5,0.0,
2,SPY,273.730011,put,2018-11-16,2018-11-16,271.0,0.0,
3,SPY,273.730011,put,2018-11-16,2018-11-16,271.5,0.0,
4,SPY,273.730011,put,2018-11-16,2018-11-16,272.0,0.0,
5,SPY,273.730011,put,2018-11-16,2018-11-16,272.5,0.0,
6,SPY,273.730011,put,2018-11-16,2018-11-16,273.0,0.0,
7,SPY,273.730011,put,2018-11-16,2018-11-16,273.5,0.0,
8,SPY,273.730011,put,2018-11-16,2018-11-16,274.0,0.269989,
9,SPY,273.730011,put,2018-11-16,2018-11-16,274.5,0.769989,


# Calculate `option_payoff` via `.apply()`

The `DataFrame.apply()` method allows us to perform these calculations without explicitly iterating through `df_opt` with a `for` loop.

In order to make use of `.apply()`, we will have to construct our custom payoff function slightly differently.

In [7]:
def opt_pay(row):
    cp = row['type']
    strike = row['strike']
    upx = row['upx']
    
    if cp == 'call':
        payoff = max(upx - strike, 0)
    elif cp == 'put':
        payoff = max(strike - upx, 0)
    
    return payoff

We can use `.apply()` to calculate the payoffs in a single line of code.

In [8]:
df_opt['payoff_apply'] = df_opt.apply(opt_pay, axis = 1)
df_opt

Unnamed: 0,underlying,upx,type,expiration,data_date,strike,payoff_loop,payoff_apply
0,SPY,273.730011,put,2018-11-16,2018-11-16,270.0,0.0,0.0
1,SPY,273.730011,put,2018-11-16,2018-11-16,270.5,0.0,0.0
2,SPY,273.730011,put,2018-11-16,2018-11-16,271.0,0.0,0.0
3,SPY,273.730011,put,2018-11-16,2018-11-16,271.5,0.0,0.0
4,SPY,273.730011,put,2018-11-16,2018-11-16,272.0,0.0,0.0
5,SPY,273.730011,put,2018-11-16,2018-11-16,272.5,0.0,0.0
6,SPY,273.730011,put,2018-11-16,2018-11-16,273.0,0.0,0.0
7,SPY,273.730011,put,2018-11-16,2018-11-16,273.5,0.0,0.0
8,SPY,273.730011,put,2018-11-16,2018-11-16,274.0,0.269989,0.269989
9,SPY,273.730011,put,2018-11-16,2018-11-16,274.5,0.769989,0.769989


**Code Challenge:** Add a column to `df_opt` that identifies if the `upx` is bigger or smaller than `strike`.  Do this by writing a custom function and then using `DataFrame.apply()`.

## Related Reading

*WTP* - 8 - Control Flow

*WTP* - 9 - Defining Functions

*LOD* - 2 - Options 101 