# Section 3: Snowshoe hares at Bonanza Creek Experimental Forest

## 1. Archive exploration
i. This is data from capture-recapture studies of snowshoe hares in the Bonanza Riparian forest.
ii. This data was collected between 1999-06-01 and 2012-09-14.
iii. This data is not sensitive
iv. I do not see a related publication

## 2. Adding an image
![image description](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1089px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)

No licensing required

## 3. Data loading and preliminary exploration


In [1]:
import pandas as pd
import numpy as np

hares = pd.read_csv("https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701")

In [2]:
# Find dimensions of dataframe
hares.shape

(3380, 14)

In [3]:
# Display data types
hares.dtypes

date           object
time           object
grid           object
trap           object
l_ear          object
r_ear          object
sex            object
age            object
weight        float64
hindft        float64
notes          object
b_key         float64
session_id      int64
study          object
dtype: object

In [4]:
# Explore na values
hares.isna().sum()

date             0
time          3116
grid             0
trap            12
l_ear           48
r_ear          169
sex            352
age           2111
weight         535
hindft        1747
notes         3137
b_key           47
session_id       0
study          163
dtype: int64

In [5]:
# Min and max weight and feet measurements
hares[["weight", "hindft"]].describe()

Unnamed: 0,weight,hindft
count,2845.0,1633.0
mean,1346.081547,130.872627
std,345.160112,16.155295
min,0.0,60.0
25%,1180.0,128.0
50%,1400.0,135.0
75%,1580.0,140.0
max,2365.0,160.0


In [6]:
# Explore unique values
hares['sex'].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

In [7]:
hares['trap'].unique().size

122

In [8]:
hares['grid'].unique()

array(['bonrip', 'bonbs', 'Bonbs', 'bonmat', 'bonmat '], dtype=object)

In [9]:
hares['study'].unique()

array(['Population', 'Collar', nan, 'Metabolic', 'Metabolic/Collar'],
      dtype=object)

Exploratory question: Within population studies, how does the average weight of hares vary by trap?

In [10]:
hares_pop = hares[hares['study'] == 'Population']
hares_pop_mean_wt = hares_pop.groupby('trap').mean('weight').sort_values('weight', ascending = False)
hares_pop_mean_wt


Unnamed: 0_level_0,weight,hindft,b_key,session_id
trap,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
7f,1640.000000,,734.250000,35.750000
5c,1540.000000,,63.000000,64.000000
7D,1535.000000,125.500000,388.777778,64.833333
4f,1534.166667,142.000000,487.125000,38.375000
9C,1525.000000,131.777778,433.400000,69.866667
...,...,...,...,...
4e,565.000000,90.000000,72.000000,50.000000
6b,450.000000,96.000000,76.000000,50.000000
4c,427.500000,87.500000,68.500000,50.000000
3g,,,975.000000,37.000000


## 4. Detecting messy values
| Code      | Definition |
| ----------- | ----------- |
| m      | male       |
| f   | female        |
| m?   | male not confirmed |
| f?   | female not confirmed       |

In [11]:
hares['sex'].str.lower().str.strip().value_counts().drop(["?", "pf"])

# .str.strip() removes whitespace which was causing duplicate sex values

sex
f     1721
m     1249
f?      13
m?       4
Name: count, dtype: int64

In [14]:
condition_list = [hares.sex.isin(['M', 'm', 'm_']), hares.sex.isin(['F', 'f', 'f_'])]
output_list = ['male', 'female']

new_column = np.select(condition_list, output_list, default = np.nan)

hares['sex_simple'] = new_column

In [15]:
sex_mapping = {"m":"male", "m?": "male", "f":"female", "f?": "female", "?":np.nan, "pf":np.nan}
hares['sex_simple2'] = (hares['sex'].dropna().str.lower().str.strip().replace(sex_mapping))

hares['sex_simple2'].value_counts()

sex_simple2
female    1734
male      1253
Name: count, dtype: int64

## 7. Calculate mean weight
a. Use groupby() to calculate the mean weight by sex using the new column.

b. Write a full sentence explaining the results you obtained. Don’t forget to include units.

In [16]:
hares.groupby("sex_simple")['weight'].mean()

sex_simple
female    1366.920372
male      1352.145553
nan       1176.511111
Name: weight, dtype: float64

The mean weight of female hares is 1366.92 grams, while male hares have a mean weight of 1352.15 grams, showing that females are slightly heavier than males by approximately 15 grams on average.