# ebird geocoding project

My ultimate goal is to make a visualization of the checklist locations in a person's ebird data.  Right now, I'm using this notebook to play with manipulating the data using pandas, then I will move into the geocoding and visualization parts.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

Import the ebird data and view some basic info to check it

In [None]:
df = pd.read_csv('StevenSelfeBirdData.csv')

In [None]:
print(df.info())

The only columns I care about right now are: Submission ID (a uniq id for a checklist), state/province, county, location id, location, lat, long, date.  So if I drop all other columns, and then dedup on submission ID, I should have a list of each location and when I visited it. (I think?)

In [None]:
df_locations = df.iloc[:, [0,5,6,7,8,9,10,11]].drop_duplicates(subset='Submission ID')
print(df_locations.head())

View the most visited locations, just for fun.

In [None]:
df_locations.Location.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Add new column for Year

In [None]:
df_locations['Year'] = df_locations['Date'].apply(lambda x : x[:4])
print(df_locations.sort_values('Location ID').head())

Create two new dataframes: one with number of checklists per location per year, and one with the data about each location. Then we'll combine those two to get the final dataframe.

*There is almost certainly a more efficient way to do this with pandas, but I can't figure it out.*

In [None]:
df_loc_count = df_locations.groupby(["Location ID","Year"], as_index=False)["Submission ID"].count()
print(df_loc_count.sort_values('Location ID').head(10))
print(df_loc_count.shape)

In [None]:
df_loc_data = df_locations.iloc[:, [3,4,1,2,5,6]].drop_duplicates(subset='Location ID')
print(df_loc_data.sort_values('Location ID').head(10))
print(df_loc_data.shape)

In [None]:
# Merge df_loc_count (count of checklists per location) with df_loc_data (geo data about each location)
df_merge2 = pd.merge(df_loc_count, df_loc_data, how="left", on="Location ID")
df_merge2 = df_merge2.rename(columns={'Submission ID': 'Checklists'}).sort_values(['Year','Location ID'])
print(df_merge2.head(10))
print(df_merge2.shape)

In [None]:
fig = px.scatter_geo(df_merge2, lat='Latitude', lon='Longitude', title='ebird checklists', size=df_merge2['Checklists']**0.5, hover_name='Location', color='Checklists',
                     animation_frame='Year', 
                     height=600)
fig.show()

In [None]:
# Styling for world map
fig.update_geos(projection_type="natural earth",
               showcountries=True,
               showsubunits=True)
fig.show()

In [None]:
# Save figure as html file
fig.write_html("world_map.html")

In [None]:
# Styling for US map
# TO DO: currently have to re-run the cell that creates the fig before running this cell. Figure out better way to handle that
fig.update_geos(scope='usa',
                showcountries=True,
                showsubunits=True
               )
fig.show()

In [None]:
# Save figure as html file
fig.write_html("usa_map.html")