## `DataFrame.apply()`

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("./data/DRYAD_Data_Johansson_Updated2.1.csv")

In [3]:
df['Time of Detection']

0           13:32:00
1            9:20:01
2            9:48:05
3           20:38:11
4           22:02:55
            ...     
43601     9:09:39 AM
43602    10:01:37 AM
43603    10:19:58 AM
43604     4:41:13 PM
43605    11:06:12 PM
Name: Time of Detection, Length: 43606, dtype: object

Notice that the `dtype` is `object`.  We really want it to be a `datetime`.

In [4]:
df['Time of Detection'] = df['Time of Detection'].apply(lambda d: pd.to_datetime(d))

In [5]:
df['Time of Detection']

0       2023-10-14 13:32:00
1       2023-10-14 09:20:01
2       2023-10-14 09:48:05
3       2023-10-14 20:38:11
4       2023-10-14 22:02:55
                ...        
43601   2023-10-14 09:09:39
43602   2023-10-14 10:01:37
43603   2023-10-14 10:19:58
43604   2023-10-14 16:41:13
43605   2023-10-14 23:06:12
Name: Time of Detection, Length: 43606, dtype: datetime64[ns]

Of course, another way to do it would be (and it is MUCH faster):

In [6]:
df['Time of Detection'] = pd.to_datetime(df['Time of Detection'])

## `apply` with functions

You can also use `apply` for other things like your own functions.

In [7]:
df.columns

Index(['Yard', 'Date of Detection', 'Time of Detection', 'Species Detected',
       'Number of the Species Detected in the Photo', 'Sunset Time',
       'Sunrise Time', 'Number of Bird Feeders', 'Area of Garden (m^2)',
       'Fence Type', 'Poultry Presence', 'Relative Abundance of Dogs',
       'Relative Abundance of Predators',
       'Forest Cover within 400m of camera (km^2)',
       'Open Land Cover within 400m of camera (km^2)',
       'Agricultural Land Cover within 400m of camera (km^2)',
       'Developed Land Cover within 400m of camera (km^2)',
       'Maximum Housing Unit Density within a 400m buffer (Houses/km^2)',
       'Year', 'Simpson Diversity', 'Richness', 'Mesopredator Richness',
       'Meso_Diversity', 'Herb_Richness', 'Herb_Div',
       'Forest Cover within 1.5km of camera (km^2)',
       'Open Land Cover within 1.5km of camera (km^2)',
       'Agricultural Land Cover within 1.5km of camera (km^2)',
       'Developed Land Cover within 1.5km of camera (km^2) (High

Let's say we want to convert the `"Area of Garden (m^2)"` to ft^2.

To convert, we need the function:

$$
f(m) = 10.7639104167 \times m
$$

A garden that is 10 ($m^2$), becomes in $ft^2$:

$$
f(10) = 10.7639104167 \times 10 = 107.639104167 (ft^2)
$$

Let's write a function to do this:

In [8]:
def msq2ftsq(msq):
    return 10.7639104167 * msq

In [9]:
msq2ftsq(10)

107.639104167

In [10]:
msq2ftsq(3.2)

34.44451333344

Let's make a new column `Area of Garden (ft^2)` and apply the function appropriately:

In [11]:
df["Area of Garden (ft^2)"] = df["Area of Garden (m^2)"].apply(msq2ftsq) 

In [12]:
df[["Area of Garden (ft^2)", "Area of Garden (m^2)"]]

Unnamed: 0,Area of Garden (ft^2),Area of Garden (m^2)
0,50.052183,4.65
1,50.052183,4.65
2,50.052183,4.65
3,50.052183,4.65
4,50.052183,4.65
...,...,...
43601,0.000000,0.00
43602,0.000000,0.00
43603,0.000000,0.00
43604,0.000000,0.00


## Another example

In [13]:
df['Time of Detection'][0]

Timestamp('2023-10-14 13:32:00')

In [14]:
df['Time of Detection'][0].hour

13

In [15]:
df['Time of Detection'][0].minute

32

Now let's say we want to create a new column `Hour of Detection`:

In [16]:
df['Time of Detection'].apply(lambda t: t.hour)

0        13
1         9
2         9
3        20
4        22
         ..
43601     9
43602    10
43603    10
43604    16
43605    23
Name: Time of Detection, Length: 43606, dtype: int64

In [17]:
df['Hour of Detection'] = df['Time of Detection'].apply(lambda t: t.hour)

In [18]:
df.columns

Index(['Yard', 'Date of Detection', 'Time of Detection', 'Species Detected',
       'Number of the Species Detected in the Photo', 'Sunset Time',
       'Sunrise Time', 'Number of Bird Feeders', 'Area of Garden (m^2)',
       'Fence Type', 'Poultry Presence', 'Relative Abundance of Dogs',
       'Relative Abundance of Predators',
       'Forest Cover within 400m of camera (km^2)',
       'Open Land Cover within 400m of camera (km^2)',
       'Agricultural Land Cover within 400m of camera (km^2)',
       'Developed Land Cover within 400m of camera (km^2)',
       'Maximum Housing Unit Density within a 400m buffer (Houses/km^2)',
       'Year', 'Simpson Diversity', 'Richness', 'Mesopredator Richness',
       'Meso_Diversity', 'Herb_Richness', 'Herb_Div',
       'Forest Cover within 1.5km of camera (km^2)',
       'Open Land Cover within 1.5km of camera (km^2)',
       'Agricultural Land Cover within 1.5km of camera (km^2)',
       'Developed Land Cover within 1.5km of camera (km^2) (High

## Putting it all together with `groupby`

Like in SQL, you can group by multiple attributes -- this is VERY useful

In [19]:
df.groupby(['Species Detected', 'Hour of Detection']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Yard,Date of Detection,Time of Detection,Number of the Species Detected in the Photo,Sunset Time,Sunrise Time,Number of Bird Feeders,Area of Garden (m^2),Fence Type,Poultry Presence,...,Precipitation 7,Precipitation 8,Precipitation 9,Precipitation 10,Precipitation 11,Precipitation 12,Precipitation 13,Precipitation 14,Precipitation 15,Area of Garden (ft^2)
Species Detected,Hour of Detection,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
45 Opossum,0,136,136,136,136,136,136,136,136,136,136,...,132,132,132,132,132,132,132,132,132,136
45 Opossum,1,126,126,126,126,126,126,126,126,126,126,...,122,122,122,122,122,122,122,122,122,126
45 Opossum,2,141,141,141,141,141,141,141,141,141,141,...,140,140,140,140,140,140,140,140,140,141
45 Opossum,3,140,140,140,140,140,140,140,140,140,140,...,137,137,137,137,137,137,137,137,137,140
45 Opossum,4,126,126,126,126,126,126,126,126,126,126,...,125,125,125,125,125,125,125,125,125,126
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
White-tailed Deer,19,508,508,508,508,508,508,508,508,508,508,...,480,480,480,480,480,480,480,480,480,508
White-tailed Deer,20,418,418,418,418,418,418,418,418,418,418,...,406,406,406,406,406,406,406,406,406,418
White-tailed Deer,21,282,282,282,282,282,282,282,282,282,282,...,274,274,274,274,274,274,274,274,274,282
White-tailed Deer,22,273,273,273,273,273,273,273,273,273,273,...,269,269,269,269,269,269,269,269,269,273


In [20]:
df.groupby(['Species Detected', 'Hour of Detection']).count().loc[:,'Yard']

Species Detected   Hour of Detection
45 Opossum         0                    136
                   1                    126
                   2                    141
                   3                    140
                   4                    126
                                       ... 
White-tailed Deer  19                   508
                   20                   418
                   21                   282
                   22                   273
                   23                   298
Name: Yard, Length: 447, dtype: int64

In [21]:
df.groupby(['Species Detected', 'Hour of Detection']).count().loc[:,'Yard'].unstack()

Hour of Detection,0,1,2,3,4,5,6,7,8,9,...,14,15,16,17,18,19,20,21,22,23
Species Detected,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
45 Opossum,136.0,126.0,141.0,140.0,126.0,74.0,16.0,,2.0,,...,,,,,,10.0,32.0,131.0,161.0,147.0
Beaver,1.0,,,,,,,,,,...,,,,,,,,,,
Bird,11.0,10.0,3.0,13.0,13.0,99.0,263.0,362.0,425.0,462.0,...,538.0,523.0,466.0,361.0,365.0,226.0,42.0,5.0,7.0,12.0
Black Bear,,,,,,,,,,,...,,,,,,,,1.0,1.0,
Bobcat,1.0,2.0,3.0,,2.0,1.0,2.0,2.0,,,...,,,2.0,,2.0,,4.0,,2.0,
Chipmunk,,,2.0,,,23.0,64.0,78.0,54.0,38.0,...,38.0,34.0,33.0,26.0,33.0,15.0,1.0,,1.0,
Cottontail,35.0,29.0,29.0,21.0,29.0,61.0,48.0,53.0,45.0,45.0,...,16.0,40.0,46.0,52.0,60.0,85.0,67.0,42.0,36.0,29.0
Coyote,52.0,39.0,32.0,38.0,32.0,21.0,10.0,4.0,3.0,4.0,...,,1.0,6.0,4.0,2.0,12.0,29.0,43.0,43.0,43.0
Domestic Cat,119.0,126.0,129.0,139.0,177.0,172.0,150.0,121.0,123.0,117.0,...,78.0,68.0,67.0,96.0,110.0,129.0,161.0,168.0,167.0,138.0
Domestic Dog,18.0,9.0,10.0,16.0,27.0,83.0,109.0,115.0,115.0,106.0,...,60.0,89.0,114.0,114.0,105.0,107.0,86.0,67.0,31.0,25.0


In [22]:
df.groupby(['Species Detected', 'Hour of Detection']).count().loc[:,'Yard'].unstack().fillna(0)

Hour of Detection,0,1,2,3,4,5,6,7,8,9,...,14,15,16,17,18,19,20,21,22,23
Species Detected,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
45 Opossum,136.0,126.0,141.0,140.0,126.0,74.0,16.0,0.0,2.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,32.0,131.0,161.0,147.0
Beaver,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Bird,11.0,10.0,3.0,13.0,13.0,99.0,263.0,362.0,425.0,462.0,...,538.0,523.0,466.0,361.0,365.0,226.0,42.0,5.0,7.0,12.0
Black Bear,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0
Bobcat,1.0,2.0,3.0,0.0,2.0,1.0,2.0,2.0,0.0,0.0,...,0.0,0.0,2.0,0.0,2.0,0.0,4.0,0.0,2.0,0.0
Chipmunk,0.0,0.0,2.0,0.0,0.0,23.0,64.0,78.0,54.0,38.0,...,38.0,34.0,33.0,26.0,33.0,15.0,1.0,0.0,1.0,0.0
Cottontail,35.0,29.0,29.0,21.0,29.0,61.0,48.0,53.0,45.0,45.0,...,16.0,40.0,46.0,52.0,60.0,85.0,67.0,42.0,36.0,29.0
Coyote,52.0,39.0,32.0,38.0,32.0,21.0,10.0,4.0,3.0,4.0,...,0.0,1.0,6.0,4.0,2.0,12.0,29.0,43.0,43.0,43.0
Domestic Cat,119.0,126.0,129.0,139.0,177.0,172.0,150.0,121.0,123.0,117.0,...,78.0,68.0,67.0,96.0,110.0,129.0,161.0,168.0,167.0,138.0
Domestic Dog,18.0,9.0,10.0,16.0,27.0,83.0,109.0,115.0,115.0,106.0,...,60.0,89.0,114.0,114.0,105.0,107.0,86.0,67.0,31.0,25.0


In [23]:
df_tmp = df.groupby(['Species Detected', 'Hour of Detection']).count().loc[:,'Yard'].unstack().fillna(0)

In [24]:
df_tmp.iloc[:,4].sort_values()

Species Detected
Fox Squirrel               0.0
Beaver                     0.0
River Otter                0.0
Black Bear                 0.0
Chipmunk                   0.0
Mink                       0.0
Bobcat                     2.0
Striped Skunk              2.0
Groundhog                  3.0
Gray Fox                   9.0
Gray Squirrel             12.0
Bird                      13.0
Person                    15.0
Mouse                     18.0
Domestic Dog              27.0
Cottontail                29.0
Coyote                    32.0
Red Fox                   56.0
Virginia Opossum          89.0
Nine-banded Armadillo     95.0
Rat                      111.0
45 Opossum               126.0
Domestic Cat             177.0
White-tailed Deer        332.0
Raccoon                  427.0
Name: 4, dtype: float64