## 9.1: Archive Exploration

The data is about capture-recapture studies of snowshoe hares at 5 locales in the Tanana valley, from Tok in the east to Clear in the west from 1999 to 2002. There is no sensitive data. The data was published in 2017 by Bonanza Creek.

Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2023-10-19). Accessed 19 October 2023.

![Snowshoe hare](https://en.wikipedia.org/wiki/File:Snowshoe_Hare,_Shirleys_Bay.jpg)

In [16]:
import pandas as pd
import numpy as np

hares = pd.read_csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-bnz.55.22&entityid=f01f5d71be949b8c700b6ecd1c42c701")

hares.head

<bound method NDFrame.head of             date      time    grid trap       l_ear r_ear  sex  age  weight  \
0     11/26/1998       NaN  bonrip   1A  414D096A08   NaN  NaN  NaN  1370.0   
1     11/26/1998       NaN  bonrip   2C  414D320671   NaN    M  NaN  1430.0   
2     11/26/1998       NaN  bonrip   2D  414D103E3A   NaN    M  NaN  1430.0   
3     11/26/1998       NaN  bonrip   2E  414D262D43   NaN  NaN  NaN  1490.0   
4     11/26/1998       NaN  bonrip   3B  414D2B4B58   NaN  NaN  NaN  1710.0   
...          ...       ...     ...  ...         ...   ...  ...  ...     ...   
3375    8/8/2002  18:00:00  bonrip  1b         1201  1202  NaN  NaN  1400.0   
3376    8/8/2002   6:00:00  bonrip  4b         1201  1202  NaN  NaN     NaN   
3377    8/7/2002       NaN  bonrip   4b        1217  1218  NaN  NaN  1000.0   
3378    8/8/2002       NaN  bonrip   6d        1217  1218  NaN  NaN   990.0   
3379    8/6/2002       NaN  bonrip   4b        1058  1060    M  NaN  1460.0   

      hindft notes  b

In [13]:
sex_value_count = hares.sex.value_counts(dropna = False)

unique_sex = hares.sex.unique()

unique_sex

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

In [20]:
# Define conditions for categorizing sex:
# 1. Check if the 'sex' column, when converted to uppercase, is 'F'
# 2. Check if the 'sex' column, when converted to uppercase, is 'M'
conditions = [
    hares['sex'].str.upper() == 'F',
    hares['sex'].str.upper() == 'M'
]

# Define the corresponding choices for the conditions:
# 1. If the condition for 'F' is met, assign 'female'
# 2. If the condition for 'M' is met, assign 'male'
choices = ['female', 'male']

# Create new column using np.select
hares['sex_simple'] = np.select(conditions, choices, default=np.nan)

print(hares.sex_simple.unique())

['nan' 'male' 'female']


In [21]:
mean_weight_by_sex = hares.groupby('sex_simple')['weight'].mean()

print(mean_weight_by_sex)

sex_simple
female    1366.920372
male      1352.145553
nan       1176.511111
Name: weight, dtype: float64
