## The Apply Function

- What if we want to assign a new column value where each cell is derived from the values already in its row?

- Ex. Model interaction between X1 and X2 -> X1*X2

- We use the apply function!

*df[ 'x1x2' ] = df.apply(lambda row: row[ 'x1' ] &ast; row[ 'x2' ], axis=1)*

- Pass in axis=1 so the function gets applied across each row instead of each column

- Think of it like Python's map function

- If you're not familiar with lambda, this is equivalent:

```
def get_interaction(row):
    return row['x1']*row['x2']
```

*df[ 'x1x2' ] = df.apply(get_interaction, axis=1)*

- Function you pass in takes 1 argument, the row

- Equivalent to:

```
interactions = []

for idx, row in df.iterrows():
    x1x2 = row['x1']*row['x2']
    interactions.append(x1x2)

df['x1x2'] = interactions
```

- Never actually do this because for loops are very slow

In [1]:
import pandas as pd

# skipfooter does not work with default engine which is in C,
# which is why we pass in engine="python"
df = pd.read_csv("international-airline-passengers.csv", engine="python", skipfooter=3)

# change column names
df.columns = ["month", "passengers"]
df.columns

# add a ones column with all 1's
df['ones'] = 1

In [2]:
from datetime import datetime

datetime.strptime("1949-05", "%Y-%m")

datetime.datetime(1949, 5, 1, 0, 0)

In [3]:
df['dt'] = df.apply(lambda row: datetime.strptime(row['month'], "%Y-%m"), axis=1)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 144 entries, 0 to 143
Data columns (total 4 columns):
month         144 non-null object
passengers    144 non-null int64
ones          144 non-null int64
dt            144 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 4.6+ KB
