Let's start with a brief introduction to Pandas before creating some graphs with Plotly. We'll explore a simple data set from the state of Delaware's Open Data portal. Specifically, we'll be using the [Aerial Waterfowl Survey Data](https://data.delaware.gov/Energy-and-Environment/Aerial-Waterfowl-Survey-Data/bxyv-7mgn). 

Tips for those new to Python:
* lines starting with '#' are comments
* Run the cells in order!

In [None]:
# Run this if you need to install the requirements (Prefixing with ! runs it as a shell command).
#!pip install -r requirements.txt

In [None]:
# Import the libraries we're using
import pandas as pd

In [None]:
# Load the waterfowl data into a dataframe
url = "https://data.delaware.gov/api/views/bxyv-7mgn/rows.csv?accessType=DOWNLOAD"
waterfowl_df = pd.read_csv(url)

In [None]:
# Check the number of rows and columns against the data on the portal!
print(len(waterfowl_df), len(waterfowl_df.columns))

In [None]:
# Look at the first few rows of data. Compare to the data on the data portal!
waterfowl_df.head()

In [None]:
# Look at the last few rows:
waterfowl_df.tail()

In [None]:
# Pandas has a handy describe() function
# count tells the number of values that column has (some columns can be NaN (Not a Number))
# Look at the mean, median (50%) and max
waterfowl_df.describe()

In [None]:
# Let's sum all the columns to select what birds we want
waterfowl_df.sum()

In [None]:
# Let's look at the number of rows for each year
waterfowl_df.groupby('Year').count()

In [None]:
# ***********************

# Copy the previous command, and paste it below.
# Before running, edit it to get the sum by year


In [None]:
# Most have 44 rows, but definitley some discrepency. 
# Let's look at the counts of January in each year
waterfowl_df_january = waterfowl_df[waterfowl_df['Month']=='January']
waterfowl_df_january.groupby('Year').count()

In [None]:
# In 2010 and before the number of observations in January was 11.
# Since 2011 it has 14. Let's look at 2010 and 2011
waterfowl_df_january[waterfowl_df['Year'].isin([2010, 2011])]

In [None]:
# ***********************

# 2011 has three observations with the timeperiod set to 'Late'
# Remove observations where the timeperiod = 'Late' 
# (in otherwords, keep the observations where the time period does not (!=) equal 'Late')
waterfowl_df_january_sub = waterfowl_df_january[waterfowl_df_january['Time Period']!='Late']

# Finish the next line, then we'll check the counts again
waterfowl_df_january_sub.groupby('Year').count()

In [None]:
# Let's check unit counts
waterfowl_df_january_sub.groupby('Unit').count()

### Done Part 1
We now have 11 observations for each year, and each of the 11 observations has a different unit number. I think we now have data we can safely compare year after year