# Functions and `DataFrame.apply()` Method 

We have already discussed how to add a new column to a `DataFrame` that is a simple function of existing columns.  

Suppose the situation is a little more complicated, and that the column we want to add is some kind of custom (user defined) function of existing columns.

In this tutorial we discuss two ways of doing this:

1. A `for` loop

2. `DataFrame.apply()`

We will use a finance task to motivate these two techniques: calculating the payoffs of expiring options.

### Defining Functions

Defining functions in Python is straightforward.  They syntax is simply `def function_name(arguments):`.  The following function squares two numbers.

In [1]:
##> def square(x):
##>     sq = x ** 2
##>     return(sq)




Let's verify that our function works:

In [2]:
##> print(square(2))
##> print(square(5))




**Code Challenge:** Write a `cube()` function that cubes a number, and along the way, verify that indentation is required after the `def` statement.

### Option Payoff Function

Let's now write a more financially interesting function.

Options are insurance contracts that are written on top of an underlying stock, much like car insurance is written *on top* of your car.  There are two types of options: *puts* and *calls*.  Put options protect you from the stock price going too low, while call options protect you from the stock price going too high.  Both types have a feature called a *strike* price, which acts much like the deductable of your car insurance.  Options expire sometime in the future,  and the payoff (payout) of the option at the time of the expiration is as follows:

Let $K$ be the strike price of an option, and let $S_{T}$ price of its underlying at the time of expiration.  Then the payoff of each type of option is as follows:

1. **call**: $\max(S_T - K, 0)$

2. **put**: $\max(K - S_T, 0)$

We can codify this as follows:

In [3]:
##> def option_payoff(cp, strike, upx):
##>     if cp == 'call':
##>         payoff = max(upx - strike, 0)
##>     elif cp == 'put':
##>         payoff = max(strike - upx, 0)
##>     
##>     return payoff




Let's verify that our function works:

In [4]:
##> print(option_payoff("call", 100, 110))
##> print(option_payoff("put", 100, 110))
##> print(option_payoff("call", 100, 90))
##> print(option_payoff("put", 100, 90))




### Loading Packages

Let's now load the packages that we will need.

In [5]:
##> import numpy as np
##> import pandas as pd




### Reading-In Data

Next, let's read in a data file called `spy_expiring_options.csv`. 

This data set consists of 21 different options on `SPY` that expired on November 16, 2018.  

The `upx` column is the settle price of `SPY` from that day, and it will be used to calculate the payoff of each of these options.

In [6]:
##> df_opt = pd.read_csv("spy_expiring_option.csv")
##> df_opt = df_opt.round(2)
##> df_opt.head()




### Initializing Payoff Columns

Our ultimate objective is to add a column of option payoffs to `df_opt`.  We are going to accomplish this task using two different methods: (1) a `for` loop; (2) the `DataFrame.apply()` method.  

As a first step, let's add two columns to `df_opt`,  one for each method, and then initialize them both with `np.nan`, which is a special data type that represents missing numerical data.

In [7]:
##> df_opt['payoff_loop'] = np.nan
##> df_opt['payoff_apply'] = np.nan
##> df_opt.head()




### Calculate `option_payoff` via `for` loop

Let's iterate through `df_opt` with a `for` loop and calculate the payoffs one by one.  Notice that we are useing the `.at` indexer which is specifically designed to grab a single value from a column.

In [8]:
##> for ix in df_opt.index:
##>     
##>     # grabbing data from dataframe
##>     opt_type = df_opt.at[ix, 'type']
##>     strike = df_opt.at[ix, 'strike']
##>     upx = df_opt.at[ix, 'upx']
##>     
##>     # calculating payoff
##>     payoff = option_payoff(opt_type, strike, upx)
##>     
##>     # writing payoff to dataframe
##>     df_opt.at[ix, 'payoff_loop'] = payoff
##>       
##> df_opt




### Calculate `opt_pay` via `.apply()`

The `DataFrame.apply()` method allows us to perform these calculations without explicitly iterating through `df_opt` with a `for` loop.  It is a way to *vectorize* user defined functions.

In order to make use of `.apply()`, we will have to construct our custom payoff function slightly differently.  The following `opt_pay()` function expects as its argument the row of a `DataFrame`:

In [9]:
##> def opt_pay(row):
##>     # reading function inputs from DataFrame row
##>     cp = row['type']
##>     strike = row['strike']
##>     upx = row['upx']
##>     
##>     # option payoff logic
##>     if cp == 'call':
##>         payoff = max(upx - strike, 0)
##>     elif cp == 'put':
##>         payoff = max(strike - upx, 0)
##>     
##>     return payoff




We can use `.apply()` to calculate the payoffs in a single line of code.

In [10]:
##> df_opt['payoff_apply'] = df_opt.apply(opt_pay, axis = 1)
##> df_opt




**Code Challenge:** Add a column to `df_opt` that identifies if the `upx` is bigger or smaller than `strike`.  Do this by writing a custom function and then using `DataFrame.apply()`.

### Related Reading

*WTP* - 8 - Control Flow

*WTP* - 9 - Defining Functions