# Gator hunt

The Florida Fish and Wildlife Conservation Commission keeps track of [gators killed by hunters](http://myfwc.com/wildlifehabitats/managed/alligator/harvest/data-export/). A cut of this data lives in `../data/gators.csv`.

Let's take a look.

In [None]:
# import pandas


In [None]:
# read in the CSV


In [None]:
# check the output with `head()`


### Check it out

First, let's take a look at our data and examine some of the column values that we might be interested in analyzing. We're already starting to think about the questions we want this data to help us answer.

In [None]:
# get the info()


In [None]:
# use value_counts() --> what's the `year` range?


In [None]:
# let's also peep the unique() carcass size values to get the pattern


### Come up with a list of questions

- What's the longest gator in our data?
- Average length by year?
- How many gators are killed by month?

### Write a function to calculate gator length in inches

Right now, the value for the gator's length is a string following this pattern: `{} ft. {} in.`.

Let's create a new column to get the gator's length in a constant, numeric value: inches.

We're going to write a function to do these steps:
- Given a row of data, capture the feet and inch values in the carcass size column -- we can split the string on 'ft.' and clean up each piece from there
- Multiply feet by 12
- Add that number to the inch value
- `return` the result

We shall call this function on the data frame using the [`.apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method.

👉 Learn more about functions in [this notebook](../reference/Functions.ipynb).

👉 Learn more about using the `apply()` function in [this notebook](../reference/Using%20the%20apply%20method%20in%20pandas.ipynb).

👉 Learn more about string methods like `split()` in [this notebook](../reference/Python%20data%20types%20and%20basic%20syntax.ipynb#String-methods).

In [None]:
# define a function to calculate inches from the length string

    '''Given a row of gator data, parse out gator length in inches'''

    # get the carcass size string

    
    # split on 'ft.'

    
    # grab the first item in the resulting list [0] - the number of feet
    # strip whitespace
    # coerce to integer

    
    # get the second item [1] in that list - the number of inches
    # replace 'in.' with nothing
    # strip whitespace
    # coerce to integer

    
    # return inches plus ft*12


In [None]:
# apply our new formula, specifying axis=1
# for row-wise application


In [None]:
# check the output with head()


In [None]:
# sort by length descending on our new column, check it out with head()


### Count by year

Our friend `value_counts()` is _on it_.

In [None]:
# value_counts() on year column


### Average length by year

To get the average length of gators by year, we'll run a [pivot table](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html).

👉 For more details on creating pivot tables, [see this notebook](../reference/Grouping%20data%20in%20pandas.ipynb#pivot_table()).

In [None]:
# get average length harvested by year
# pivot table values are inch length column
# index is Year
# aggfunc is 'mean'


### Treating dates as dates

This data include the date on which the gator was killed, but the date values are being stored as strings. If we want to do some time-based analysis -- comparing the gator hunt by month, or whatever -- we'd want to deal directly with native dates.

Noting the format (month-day-year), let's use the [`to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) method to convert the dates into native date objects. We'll tell pandas to use the [correct date specification](http://strftime.org/) and to coerce errors to null values rather than throw a giant exception.

👉 For more information on handling dates in pandas, [see this notebook](../reference/Date%20and%20time%20data%20types.ipynb#Working-with-dates-in-pandas).

In [None]:
# new column, harvest_date_clean
# to datetime, pattern is '%m-%d-%Y'
# coerce errors


In [None]:
# check the results with head()


If you want to doublecheck that the data type is correct, you can access the `dtypes` attribute.

In [None]:
# check dtypes


### Gator hunt by month

[According to](http://myfwc.com/media/310257/Alligator-processors.pdf) the Florida Fish and Wildlife Conservation Commission, the gator hunt season is in the fall:

![gatorhunt](../img/gatorhunt.png "gatorhunt")

Let's look at the totals by month:
- Create a new column for the month using `apply()` with a [lambda expression](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) -- we'll access the `month` attribute of the date
- Do value counts by month

👉 For more information on using lambda expressions, [see this notebook](../reference/Functions.ipynb#Lambda-expressions).

In [None]:
# new column, month, extracted from our clean harvest date column
# in a lambda expression


In [None]:
# get unique() month values


In [None]:
# value_counts() on month, sort_index()


What if we wanted to get a count by month _by year_? Pivot tables to the rescue, again.

We'll provide the `pivot_table` method with five things:
- `df` specifies what data frame we're pivoting
- `index='month'` specifies the column we're grouping on
- `columns='Year'` specifies the columns value
- `aggfunc='count'` tells pandas how to aggregate the data -- we want to count the values
- `values='length_in'` specifies the column of data to apply the aggregation to -- we're going to count up every record of a carcass that has a length

In [None]:
# pivot to get by month by year


All those `NaN`s mixed in with our numbers gives me the fantods. Let's use the `.fillna()` method to replace those with `0`.

In [None]:
# fillna(0)


In [None]:
# what else?