# Week 3 Bonanza Creek Experimental Forest

The following data contains capture and relase survey data about the snowshoe hare in Bonanza Creek.


Citation: Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2025-10-17).

![snowshoe_hare](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1452px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)

Copyright: Publically available 

In [6]:
# Import libraries 
import pandas as pd 
import numpy as np

In [7]:
# Read in data 
hares = pd.read_csv("https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701")

In [9]:
hares.head()

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study
0,11/26/1998,,bonrip,1A,414D096A08,,,,1370.0,160.0,,917.0,51,Population
1,11/26/1998,,bonrip,2C,414D320671,,M,,1430.0,,,936.0,51,Population
2,11/26/1998,,bonrip,2D,414D103E3A,,M,,1430.0,,,921.0,51,Population
3,11/26/1998,,bonrip,2E,414D262D43,,,,1490.0,135.0,,931.0,51,Population
4,11/26/1998,,bonrip,3B,414D2B4B58,,,,1710.0,150.0,,933.0,51,Population


In [10]:
hares.dtypes

date           object
time           object
grid           object
trap           object
l_ear          object
r_ear          object
sex            object
age            object
weight        float64
hindft        float64
notes          object
b_key         float64
session_id      int64
study          object
dtype: object

In [11]:
hares.shape

(3380, 14)

In [12]:
hares.columns

Index(['date', 'time', 'grid', 'trap', 'l_ear', 'r_ear', 'sex', 'age',
       'weight', 'hindft', 'notes', 'b_key', 'session_id', 'study'],
      dtype='object')

In [13]:
hares.trap.unique()

array(['1A', '2C', '2D', '2E', '3B', '3D', '4A', '4B', '4C', '4E', '5A',
       '5C', '5D', '5E', '10C', '1C', '1E', '2A', '2B', '3C', '3E', '5B',
       '6A', '6B', '6C', '7B', '7C', '7E', '8A', '8B', '8E', '9A', '9D',
       '1D', '6E', '7D', '8C', '8D', '9B', '3A', '10B', '1B', '7A', '9E',
       '4D', '10A', '6D', '9C', '10D', '10E', '10b', '2a', '2b', '2d',
       '3b', '4a', '4c', '4e', '5b', '6c', '7a', '7b', '7d', '7e', '8e',
       '9a', '1b', '2c', '2e', '3c', '1e', '3e', '5d', '3d', '4d', '7c',
       '8c', '10c', '1c', '1d', '9d', '5e', '6a', '8a', '8b', '6b', '10e',
       '6e', nan, '4b', '5c', '9c', '10a', '5a', '9b', '9e', '6d', '1a',
       '3a', '10d', '8d', '4f', '5f', '3f', '2f', '2g', '5g', '4g', '1g',
       '7f', '6f', '6g', '3g', '4c ', '4e ', '1e ', '1b ', '2b ', '6b ',
       '2c ', '5c ', '4b '], dtype=object)

In [14]:
hares.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [18]:
# Check maximum weight of hare 
hares["weight"].max()

2365.0

In [19]:
# Check min weight of hare 
hares["weight"].min()

0.0

In [20]:
# Hare min hindfoot size 
hares["hindft"].min()

60.0

In [21]:
# Hare max hindfoot size 
hares["hindft"].max()

160.0

# Look into categorical variables

In [24]:
hares["sex"].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

In [26]:
hares["notes"].unique()

array([nan, 'No right ear tag', 'Escapee', 'Mortality', 'Mortality ',
       'Old tag lost in L ear',
       'Bunny escaped before second ear tag was added',
       'Rabbit too bloody, released', 'R Front Foot Injured',
       'L Hind Leg Injured',
       'Left Front Foot Injured by Mink. Mink Still Around, Not Shy',
       'Injured Bunny, Released, No Tags', 'Died after release',
       'Dead in trap', 'Dead', 'non-pregnant',
       'pregnant (2 peanut sized babies)', 'pregnant', 'Pregnant',
       'Pregnant; last collar was chewed off',
       '149.074 recapture; collar loose, removed and replaced; non-pregnant',
       'previous collar was chewed off',
       '149.013 came off/removed; replaced',
       '149.033 recapture; collar loose, removed and replaced',
       'previous collar fell off',
       'collar previously chewed off (put back on the same bunny!)',
       'collar broke off, caught in cage', 'dead in trap',
       '149.754 recapture; no VHF signal, removed and replaced',

## Study question: 

Is there a positive relationship between snowshoe hare weight and hindfoot size. 

| sex         | definition  |
| ----------- | ----------- |
| M           | Male        |
| m           | Male        |
| m?          | Male not confirmed       |
| F           | Female      |
| f           | Female        |
| p          | unknown        |
| ?          | unknown        |
| F?          | unknown        |
| M?          | unknown        |

In [33]:
# Number of unique sex codes
hares["sex"].value_counts()

sex
F     1161
M      730
f      556
m      515
?       40
F?      10
f        4
m        4
f?       3
M?       2
m?       2
pf       1
Name: count, dtype: int64

In [36]:
# checking the number without NAs
hares["sex"].value_counts(dropna = False)

sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

What could be the reason multiple entries for sex that are not male or female? 

This can be because the data was not collected correctly. Surveyors might not have been trained correctly. 


Are there duplicates in the sex column? 
- There are four duplicate rows in the data frame, this might be because the same capture was recorded multiple times accidently.

In [49]:
hares[hares.duplicated()]

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study
2893,7/1/2011,,bonbs,10a,,,,,,,juvenile,,23,Population
2894,7/1/2011,,bonbs,10a,,,,,,,juvenile,,23,Population
2895,7/1/2011,,bonbs,10a,,,,,,,juvenile,,23,Population
3071,9/11/2012,,bonbs,10d,b2834,b2835,f,j,840.0,114.0,,838.0,31,Population


## Step by step instructions for wrangling data so that we clean data to only have male and female. 

1. Create a series containing all string values corresponding to male and female. 
2. Use a function that takes this series and uses it to reassign values in sex column with its correct label. 
3. This would have M for values like M?, m?, m_ and for F values like F?, f?, f_, etc., while ? can be sued to fill in all that are neither explicitly male or female. 

In [54]:
hares["sex_simple"] = hares["sex"].str.strip().np.select(
condlist = (["F"]), 
choice)

In [58]:
# Set conditions to select 

conditions = [(hares["sex"].isin(['m', 'M', "m_"])),
             (hares["sex"].isin(["f", "F", "f_"]))]

choices = ['male', "female"]

default = "unknown"

hares['sex_simple'] = np.select(conditions, choices, default = default)

In [61]:
hares["sex_simple"].value_counts()

sex_simple
female     1717
male       1245
unknown     418
Name: count, dtype: int64

In [63]:
hares.sex.unique()
x = hares.sex

In [70]:
condition_list = [x.isin(["M", "m", "m_"]), x.isin(["F", "f", "f_"])]
output_list = ["male", "female"]

hares["sex_simple"] = np.select(condition_list, output_list, default = np.nan)

# Calculate mean weights

In [71]:
hares.groupby(by = "sex_simple")["weight"].mean()

sex_simple
female    1366.920372
male      1352.145553
nan       1176.511111
Name: weight, dtype: float64