URL: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-bnz.55.22
citation:
    Kielland, K., F.S. Chapin, R.W. Ruess, and Bonanza Creek LTER. 2017. Snowshoe hare physical data in Bonanza Creek Experimental Forest: 1999-Present ver 22. Environmental Data Initiative. https://doi.org/10.6073/pasta/03dce4856d79b91557d8e6ce2cbcdc14 (Accessed 2025-10-17).
Date of access: 10/17/2025

These data were collected from 1999-2002, covering snowshoe hare densities and metadata in Alaska at a landscape-scale observations. Data were collected from 5 sites in the Tanana Valley. There is no publication associated with this data.


![image of a snowshoe hare](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg/1089px-SNOWSHOE_HARE_%28Lepus_americanus%29_%285-28-2015%29_quoddy_head%2C_washington_co%2C_maine_-01_%2818988734889%29.jpg?20170313021652)

)	ALAN SCHMIERER, Set 72157600401137773, ID 18988734889, Original title SNOWSHOE HARE (Lepus americanus) (5-28-2015) quoddy head, washington co, maine -01

# 3. Data loading and preliminary exploration

In [1]:
import pandas as pd
import numpy as np

URL = "https://pasta.lternet.edu/package/data/eml/knb-lter-bnz/55/22/f01f5d71be949b8c700b6ecd1c42c701"
hares = pd.read_csv(URL)

In [10]:
# Do some data exploration

hares.info

<bound method DataFrame.info of             date      time    grid trap       l_ear r_ear  sex  age  weight  \
0     11/26/1998       NaN  bonrip   1A  414D096A08   NaN  NaN  NaN  1370.0   
1     11/26/1998       NaN  bonrip   2C  414D320671   NaN    M  NaN  1430.0   
2     11/26/1998       NaN  bonrip   2D  414D103E3A   NaN    M  NaN  1430.0   
3     11/26/1998       NaN  bonrip   2E  414D262D43   NaN  NaN  NaN  1490.0   
4     11/26/1998       NaN  bonrip   3B  414D2B4B58   NaN  NaN  NaN  1710.0   
...          ...       ...     ...  ...         ...   ...  ...  ...     ...   
3375    8/8/2002  18:00:00  bonrip  1b         1201  1202  NaN  NaN  1400.0   
3376    8/8/2002   6:00:00  bonrip  4b         1201  1202  NaN  NaN     NaN   
3377    8/7/2002       NaN  bonrip   4b        1217  1218  NaN  NaN  1000.0   
3378    8/8/2002       NaN  bonrip   6d        1217  1218  NaN  NaN   990.0   
3379    8/6/2002       NaN  bonrip   4b        1058  1060    M  NaN  1460.0   

      hindft notes 

In [16]:
hares.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3380 entries, 0 to 3379
Data columns (total 14 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        3380 non-null   object 
 1   time        264 non-null    object 
 2   grid        3380 non-null   object 
 3   trap        3368 non-null   object 
 4   l_ear       3332 non-null   object 
 5   r_ear       3211 non-null   object 
 6   sex         3028 non-null   object 
 7   age         1269 non-null   object 
 8   weight      2845 non-null   float64
 9   hindft      1633 non-null   float64
 10  notes       243 non-null    object 
 11  b_key       3333 non-null   float64
 12  session_id  3380 non-null   int64  
 13  study       3217 non-null   object 
dtypes: float64(3), int64(1), object(10)
memory usage: 369.8+ KB


In [19]:
print(hares['weight'].max())
print(hares['weight'].min())

print(hares['hindft'].max())
print(hares['hindft'].min())

2365.0
0.0
160.0
60.0


In [24]:
hares['l_ear'].unique()

array(['414D096A08', '414D320671', '414D103E3A', ..., '1827', '1825',
       '1215'], dtype=object)

# 4. Detecting messy values

In [55]:
# Find unique observations in the sex column
hares['sex'].unique()

array([nan, 'M', 'F', '?', 'F?', 'M?', 'pf', 'm', 'f', 'f?', 'm?', 'f ',
       'm '], dtype=object)

### Table of allowed values for sex

| variable name | definition |
| --- | --- |
| m | male | 
| f | female |
| m? | male unconfirmed |

In [54]:
# Find unique value counts for the sex column
hares.groupby('sex').size()

sex
?       40
F     1161
F?      10
M      730
M?       2
f      556
f        4
f?       3
m      515
m        4
m?       2
pf       1
dtype: int64

In [48]:
# hares.value_counts?

# dropna will remove rows with NA values
# default is TRUE, meaning it will remove NA rows by default

In [53]:
# Find unique value counts for the sex column without NAs

hares['sex'].value_counts(dropna = False)

sex
F      1161
M       730
f       556
m       515
NaN     352
?        40
F?       10
f         4
m         4
f?        3
M?        2
m?        2
pf        1
Name: count, dtype: int64

In [56]:
hares['sex'].nunique()

12

The values in the `sex` column do not match the metadata; there are many repeated observations with codes not allowed according to the metadata. This could be caused by sampling/data entry differences, and lack of protocol.

# 5. Brainstorm

How to wrangle the `sex` column: change the observation names to match 'm', 'f', or 'm?'

- m_, M?, M become m,
- f_, F?, f?, F, pf become f
- ? becomes NaN

# 6. Clean Values

Use `numpy.select()` to create a new column of correct values

In [60]:
x = hares.sex
condlist = (x.isin(['M', 'm', 'm ']), x.isin(['F', 'f', 'f ']))
choicelist = ['female', 'male']
sex_simple = np.select(condlist, choicelist, default = np.nan)

In [61]:
# np.select?

In [62]:
hares['sex_simple'] = sex_simple

In [63]:
hares.sex_simple.value_counts()

sex_simple
male      1721
female    1249
nan        410
Name: count, dtype: int64

In [64]:
# calculate weight 
hares.groupby(by = 'sex_simple')['weight'].mean()

sex_simple
female    1349.935542
male      1365.164792
nan       1193.364055
Name: weight, dtype: float64