# MyEBird location map

This notebook will guide you through creating visualizations of the places where you have birded.  The goal is to generate map visualizations of the locations where you have submitted eBird checklists.

To use this notebook, you will need to have downloaded your data from eBird.  That data should be in CSV format (which is how it comes when you download it from eBird), and should be in the same folder as this jupyter notebook.

### Setting up and importing our data

The first thing we need to do is download the python libraries needed for this project.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

Next, we'll import your eBird data into a dataframe that we can work with.  

Change the filename in the cell below to match the filename of your eBird data.

In [None]:
# Import data and create dataframe
df = pd.read_csv('MyEBirdData_2023-12-30.csv')

In [None]:
# Print info about the dataframe, to check it looks correct (This step is optional, but helps to make sure the data imported okay)
print(df.info())

### Data preparation

The eBird data contains a lot of information that we don't need.  The only columns we care about are: Submission ID (a uniq id for a checklist), state/province, county, location id, location, latitutde, longitude, and date.  So we will create a new dataframe ("df_locations") that only includes those columns.

We also only want one row for each checklist.  The eBird data includes a row for every species recorded in each checklist, but we are only interested in the checklists themselves.  So we will drop duplicates based on the Submission ID columns, so the dataframe will only include one row for each checklist.

In [None]:
# Create new df with desired columns, and drop duplicates based on Submission ID
df_locations = df.iloc[:, [0,5,6,7,8,9,10,11]].drop_duplicates(subset='Submission ID')

# Display first 5 rows (again, this is optional, but it helps see the data that we have)
print(df_locations.head())

Now that we have the data about checklists, including locations and dates, we can start seeing what our most visited birding locations are.

Let's view the most visited locations by counting the number of times each location name appears in df_locations.

In [None]:
# Count number of occurrences of each location and sort by number of occurrences
df_locations.Location.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

We're going to want to separate the checklists by year, but the existing data only have a column with the full date.  Let's create a new column called "Year" but taking the first four chatacters of the Date column.

In [None]:
# Add new column to df_locations by using the first four characters of the Date column
df_locations['Year'] = df_locations['Date'].apply(lambda x : x[:4])

# Again, print the first five rows to check that the new column looks correct
print(df_locations.sort_values('Location ID').head())

Okay, now is where things start to get complicated.  What we want is to have a list of the locations, with all of their location data (like state, county, latitude, and longitude), and to include a new column with the number of checklists at that location.

We're going to create two new dataframes: one with just the location ID, year, and number of checklists at that location per year (we'll call this one "df_loc_count"); and one with all of the data about each location (we'll call this one "df_loc_data").  Then we'll combine those together, using the location ID as the matching point.

Create two new dataframes: one with number of checklists per location per year, and one with the data about each location. Then we'll combine those two to get the final dataframe.

*There is almost certainly a more efficient way to do this with pandas, but I haven't been able to figure it out.*

In [None]:
# Create df_loc_count by using groupby on the Location ID and Year columns
df_loc_count = df_locations.groupby(["Location ID","Year"], as_index=False)["Submission ID"].count()

# Rename the "Submission ID" column to "Checklists", because it now includes the number of checklists at each location per year
df_loc_count = df_loc_count.rename(columns={'Submission ID': 'Checklists'})

# Print the first 10 rows and overall shape of the dataframe to check that everything looks right
print(df_loc_count.sort_values('Location ID').head(10))
print(df_loc_count.shape)

In [None]:
# Create df_loc_data by selecting the desired columns and deduping on Location ID so that we have one row per location
df_loc_data = df_locations.iloc[:, [3,4,1,2,5,6]].drop_duplicates(subset='Location ID')

# As always, display the first 10 rows and overall shape of the dataframe to confirm
print(df_loc_data.sort_values('Location ID').head(10))
print(df_loc_data.shape)

In [None]:
# Merge df_loc_count (count of checklists per location) with df_loc_data (geo data about each location)
df_merge2 = pd.merge(df_loc_count, df_loc_data, how="left", on="Location ID")
df_merge2 = df_merge2.sort_values(['Year','Location ID'])

# You know the drill: display some info to confirm that things look right
print(df_merge2.head(10))
print(df_merge2.shape)

### Let's make some maps!

Now we've got the data in the format that we want it.  Now we can start making some visualizations.

We'll start with just a global map of all of the locations where you have birded, separated by year.  The color of the dot corresponds to how many checklists you have submitted at that location.

Press play to see the map animate and show your birding over time, or select a year to see the map for that year.  You can also zoom in, and hover over a dot to see information about that location.

In [None]:
# Create world map figure
fig_world = px.scatter_geo(df_merge2, lat='Latitude', lon='Longitude', title='ebird checklists', size=df_merge2['Checklists']**0.5, hover_name='Location', color='Checklists',
                     animation_frame='Year', 
                     height=600)

# Styling for world map
fig_world.update_geos(projection_type="natural earth",
               showcountries=True,
               showsubunits=True)
fig_world.show()

If you want to save that map, run the cell below to save it as an HTML file.

In [None]:
# Save world map figure as html file
fig_world.write_html("world_map.html")

I live in the US, so I'd like to see just my birding data for that country (since it's where I do most of my birding).  So the cell below creates another map, just scoped to the US.

In [None]:
# Create US map figure
fig_usa = px.scatter_geo(df_merge2, lat='Latitude', lon='Longitude', title='ebird checklists', size=df_merge2['Checklists']**0.5, hover_name='Location', color='Checklists',
                     animation_frame='Year', 
                     height=600)

# Styling for US map
fig_usa.update_geos(scope='usa',
                showcountries=True,
                showsubunits=True)
fig_usa.show()

Again, if you want to save that map, the cell below will save it as an HTML file.

In [None]:
# Save US map figure as html file
fig.write_html("usa_map.html")

🐦 And that's it!  I hope you enjoy seeing a visualization representation of your birding!