# 1. Archive exploration

i. What is this data about?: Snowshoe ahres in the Tanana valley of Alaska.

ii. During what time frame were the observations in the dataset collected?: 1999 - 2002

iii. Does the dataset contain sensitive data?: No.

iv. Is there a publication associated with this dataset?:

**Description**: The data set contains capture-recapture date of the keystone prey species, Snowshoe hares. This data was collected in the Tanana valley, from To in the east to Clear in the west to understand population fluctuations and gather quantitative data of population densities in Alaska.

**Citation**: 

Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2025-10-17).

**Date of access**: October 16, 2025

**Link**: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-bnz.55.22


![Photograph by Alan Schmierer](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1452px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)
Photograph by Alan Schmierer

No licensing required.

# 3. Data loading and preliminary exploration

In [1]:
import pandas as pd

hares = pd.read_csv('https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701')

In [3]:
# Using shape() to look at the shape of the dataframe
hares.shape

(3380, 14)

In [None]:
# Using dtypes to find the type of data in each column.
hares.dtypes

date           object
time           object
grid           object
trap           object
l_ear          object
r_ear          object
sex            object
age            object
weight        float64
hindft        float64
notes          object
b_key         float64
session_id      int64
study          object
dtype: object

In [6]:
# Determining if there are any NA values in the columns.
hares.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [12]:
hares[['weight', 'hindft']].describe()

Unnamed: 0,weight,hindft
count,2845.0,1633.0
mean,1346.081547,130.872627
std,345.160112,16.155295
min,0.0,60.0
25%,1180.0,128.0
50%,1400.0,135.0
75%,1580.0,140.0
max,2365.0,160.0


In [14]:
hares['sex'].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

In [16]:
hares['notes'].unique()

array([nan, 'No right ear tag', 'Escapee', 'Mortality', 'Mortality ',
       'Old tag lost in L ear',
       'Bunny escaped before second ear tag was added',
       'Rabbit too bloody, released', 'R Front Foot Injured',
       'L Hind Leg Injured',
       'Left Front Foot Injured by Mink. Mink Still Around, Not Shy',
       'Injured Bunny, Released, No Tags', 'Died after release',
       'Dead in trap', 'Dead', 'non-pregnant',
       'pregnant (2 peanut sized babies)', 'pregnant', 'Pregnant',
       'Pregnant; last collar was chewed off',
       '149.074 recapture; collar loose, removed and replaced; non-pregnant',
       'previous collar was chewed off',
       '149.013 came off/removed; replaced',
       '149.033 recapture; collar loose, removed and replaced',
       'previous collar fell off',
       'collar previously chewed off (put back on the same bunny!)',
       'collar broke off, caught in cage', 'dead in trap',
       '149.754 recapture; no VHF signal, removed and replaced',

In [17]:
hares['trap'].unique()

array(['1A', '2C', '2D', '2E', '3B', '3D', '4A', '4B', '4C', '4E', '5A',
       '5C', '5D', '5E', '10C', '1C', '1E', '2A', '2B', '3C', '3E', '5B',
       '6A', '6B', '6C', '7B', '7C', '7E', '8A', '8B', '8E', '9A', '9D',
       '1D', '6E', '7D', '8C', '8D', '9B', '3A', '10B', '1B', '7A', '9E',
       '4D', '10A', '6D', '9C', '10D', '10E', '10b', '2a', '2b', '2d',
       '3b', '4a', '4c', '4e', '5b', '6c', '7a', '7b', '7d', '7e', '8e',
       '9a', '1b', '2c', '2e', '3c', '1e', '3e', '5d', '3d', '4d', '7c',
       '8c', '10c', '1c', '1d', '9d', '5e', '6a', '8a', '8b', '6b', '10e',
       '6e', nan, '4b', '5c', '9c', '10a', '5a', '9b', '9e', '6d', '1a',
       '3a', '10d', '8d', '4f', '5f', '3f', '2f', '2g', '5g', '4g', '1g',
       '7f', '6f', '6g', '3g', '4c ', '4e ', '1e ', '1b ', '2b ', '6b ',
       '2c ', '5c ', '4b '], dtype=object)

In [18]:
hares['study'].unique()

array(['Population', 'Collar', nan, 'Metabolic', 'Metabolic/Collar'],
      dtype=object)

**Exploratory question**: Within population studies, how does average weight vary by trap?

In [28]:
hares_pop = hares[hares['study'] == 'Population']
hares_pop_mean_wt = hares_pop.groupby('trap').mean('weight').sort_values('weight', ascending = False)
hares_pop_mean_wt

Unnamed: 0_level_0,weight,hindft,b_key,session_id
trap,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
7f,1640.000000,,734.250000,35.750000
5c,1540.000000,,63.000000,64.000000
7D,1535.000000,125.500000,388.777778,64.833333
4f,1534.166667,142.000000,487.125000,38.375000
9C,1525.000000,131.777778,433.400000,69.866667
...,...,...,...,...
4e,565.000000,90.000000,72.000000,50.000000
6b,450.000000,96.000000,76.000000,50.000000
4c,427.500000,87.500000,68.500000,50.000000
3g,,,975.000000,37.000000


# 4. Detecting messy values

| Sex         | Description             |
| ----------- | -----------             |
| m           | male                    |
| f           | female                  |
| m?          | male not confirmed      |
| f?          | female not confirmed    |

In [None]:
hares['sex'].str.lower().str.strip().value_counts().drop(['?', 'pf'])

# .str.strip() removes whitespace which was causing repeats in 'sex' values

sex
f     1721
m     1249
f?      13
m?       4
Name: count, dtype: int64

# 5. Brainstorm

# 6. Clean values

# 7. Calculate mean weight

# 8. Collect your code and explain your results