## 1. Archive Exploration

Take some time to look through the datasetâ€™s description in EDI and click around. Discuss the following questions with your team:

What is this data about?

Snowshoe hair physical data in Bonzana Creek Experimental Forest.

During what time frame were the observations in the dataset collected?

1999 - 2012

Does the dataset contain sensitive data?

No

Is there a publication associated with this dataset?

Flora, B.K. 2002. Comparison of snowshoe hare populations in Interior. M.S. Thesis. University of Alaska Fairbanks. Fairbanks, AK, USA.


In your notebook: use a markdown cell to add a brief description of the dataset, including a citation, date of access, and a link to the archive.

brief description: Capture/recapture studies of snowshoe hares at 5 locales in Tanana valley.  

citation: Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2025-10-17).

date of access: 10/16/2025

link: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-bnz.55.22

## 2. Adding an image 

![Snowshoe hare: Photographs by Alan Schmierer](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1452px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)

## 3. Data loading and preliminary exploration

In [41]:
import pandas as pd
import numpy as np 

In [66]:
url = 'https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701'
snowshoe = pd.read_csv(url)

In [14]:
snowshoe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3380 entries, 0 to 3379
Data columns (total 14 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        3380 non-null   object 
 1   time        264 non-null    object 
 2   grid        3380 non-null   object 
 3   trap        3368 non-null   object 
 4   l_ear       3332 non-null   object 
 5   r_ear       3211 non-null   object 
 6   sex         3028 non-null   object 
 7   age         1269 non-null   object 
 8   weight      2845 non-null   float64
 9   hindft      1633 non-null   float64
 10  notes       243 non-null    object 
 11  b_key       3333 non-null   float64
 12  session_id  3380 non-null   int64  
 13  study       3217 non-null   object 
dtypes: float64(3), int64(1), object(10)
memory usage: 369.8+ KB


In [15]:
snowshoe.shape

(3380, 14)

In [17]:
snowshoe.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [20]:
snowshoe[['weight', 'hindft']].min()

weight     0.0
hindft    60.0
dtype: float64

In [21]:
snowshoe[['weight', 'hindft']].max()

weight    2365.0
hindft     160.0
dtype: float64

In [26]:
snowshoe['notes'].unique()

array([nan, 'No right ear tag', 'Escapee', 'Mortality', 'Mortality ',
       'Old tag lost in L ear',
       'Bunny escaped before second ear tag was added',
       'Rabbit too bloody, released', 'R Front Foot Injured',
       'L Hind Leg Injured',
       'Left Front Foot Injured by Mink. Mink Still Around, Not Shy',
       'Injured Bunny, Released, No Tags', 'Died after release',
       'Dead in trap', 'Dead', 'non-pregnant',
       'pregnant (2 peanut sized babies)', 'pregnant', 'Pregnant',
       'Pregnant; last collar was chewed off',
       '149.074 recapture; collar loose, removed and replaced; non-pregnant',
       'previous collar was chewed off',
       '149.013 came off/removed; replaced',
       '149.033 recapture; collar loose, removed and replaced',
       'previous collar fell off',
       'collar previously chewed off (put back on the same bunny!)',
       'collar broke off, caught in cage', 'dead in trap',
       '149.754 recapture; no VHF signal, removed and replaced',

Is there an association between weight and hind feet length?

## 4. Detecting messy values 


| Sex Value   | Description        |
| ----------- | -------------------|
| m           | male               |
| f           | female             |
| m?          | male not confirmed | 
| f?          | fm not confirmed   |


In [38]:
snowshoe['sex'].value_counts(dropna = False) 

sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

The values in the sex column do not correspond to the values in the metadata 
Lack of clear sampling methods between researchers 
There are repeated values; some counts were capitalized, also some used f/F? 

In [40]:
# Confirm suspicions 
snowshoe['sex'].nunique()

12

## 5. Brainstorm

a. Step by step instructions on how to wrangle hare data: 
1) merge m and M 
2) merge f and F 
3) delete m? and f? 

## 6. Clean Values 

In [67]:
conditions = [
    ((snowshoe['sex'] == 'm') | (snowshoe['sex'] == 'm_')|(snowshoe['sex'] == 'M')),
    ((snowshoe['sex'] == 'f') | (snowshoe['sex'] == 'f_')| (snowshoe['sex'] == 'F')),
]

choices = ['male', 'female']

snowshoe['sex_clean'] = np.select(conditions, choices, default=np.nan)



In [68]:
snowshoe.head()

Unnamed: 0,date,time,grid,trap,l_ear,r_ear,sex,age,weight,hindft,notes,b_key,session_id,study,sex_clean
0,11/26/1998,,bonrip,1A,414D096A08,,,,1370.0,160.0,,917.0,51,Population,
1,11/26/1998,,bonrip,2C,414D320671,,M,,1430.0,,,936.0,51,Population,male
2,11/26/1998,,bonrip,2D,414D103E3A,,M,,1430.0,,,921.0,51,Population,male
3,11/26/1998,,bonrip,2E,414D262D43,,,,1490.0,135.0,,931.0,51,Population,
4,11/26/1998,,bonrip,3B,414D2B4B58,,,,1710.0,150.0,,933.0,51,Population,


## 7. Calculate mean weight 

In [63]:
snowshoe.groupby(by = 'sex_clean')['weight'].mean()

sex_clean
nan    1346.081547
Name: weight, dtype: float64

## 8. Collect your code and explain results 