# Black Skimmer (Rynchops niger) Data Exploration Of US Gulf Coast Populations

## About this Project
In this project, my focus is dedicated to analyzing the population trends of the Black Skimmer along the expansive Gulf Coast of the United States, stretching from Texas to Florida. Utilizing data collected between 2010 and 2023, my primary objective is to uncover and understand any dynamic shifts in the distribution and abundance of these bird populations over this extended timeframe. To accomplish this, I will merge two distinct datasets: the first comprises submitted checklists gleaned from the comprehensive Cornell University Ebird database, while the second stems from species counts obtained through aerial surveys of the Gulf Coast barrier islands—a collaborative endeavor led by The Water Institute. This integration of datasets will yield a more holistic comprehension of avian populations along the Gulf Coast of the US than either dataset could individually provide.

The choice of the Black Skimmer is underpinned by two fundamental reasons. Firstly, my personal fascination with this species has led me to spend numerous early morning hours observing these birds gracefully skimming across the water's surface to capture fish. Secondly, and more importantly, the Black Skimmer's precarious situation warrants attention; its nesting habitat, comprising coastal sandy or gravelly bars, faces substantial threats from pollution and coastal erosion.


## About the Data

### Cornell University eBird Data

The eBird dataset, sourced from The Cornell Lab of Ornithology, comprises checklist submissions spanning from January 2010 to November 2023, specifically gathered from Texas, Louisiana, Mississippi, Alabama, and Florida. This dataset includes detailed information on the presence  of Black Skimmers as recorded in submitted checklists. Additionally, I've procured comprehensive effort data corresponding to the same time frame, allowing for an in-depth analysis of both the occurrence patterns and the instances where Black Skimmers were not observed within these submissions.

Citation: eBird Basic Dataset. Version: EBD_relNov-2023. Cornell Lab of Ornithology, Ithaca, New York. Nov 2023.

[More Info: https://ebird.org/about](https://ebird.org/about)

### The Water Institute Aerial Survey Data

The second dataset was sourced from The Water Institute's ["Avian Data Monitoring Portal"](https://experience.arcgis.com/experience/010503b4c64b4ff6a7f3570220a53647/page/Project-Information/). It comprises aerial surveys conducted via fixed-wing aircraft along the Gulf Coast from 2010 to 2021. These surveys aimed to count nests and nesting pairs of various avian species, covering diverse spatial extents across different years. Conducted primarily in May and June, with exceptions in 2010 due to airspace constraints during spill response and delayed identification of survey needs in certain areas, the dataset involves meticulous nest counting and categorization by species and nesting status. It serves to assess distribution trends, relative abundance, nest quantities, and breeding statuses of these avian populations.

[More Info: "2010-2021 COLONIAL WATERBIRD DATA READ-ME"](https://twi-aviandata.s3.amazonaws.com/avian_monitoring/dotting_information/ColibriReadMe_2010-2021_Final.pdf)


## Prepping the notebook

Below are the required libraries for this project. I have installed them into a conda environment using either the conda package manager or the pip package manger when the library was not available in conda. You may also install them from this notebook by removing the # symbol in front of the install commands.

Once we have installed the necessary libraries into our environment we will import them into the notebook.

In [1]:
# Required Libraries to install in your environment
#!pip install pandas
#!pip install geopandas
#!pip install shapely
#!pip install simplekml


In [5]:
# Below are the library imports needed for this project
import pandas as pd
import numpy as np
import geopandas as gpd
import glob
from shapely.geometry import Point, Polygon
import simplekml



## Importing the files into our notebook

We're working with multiple files that need to be imported, cleaned and then combined.

The first set are all of the eBird checklists from 01-2010 to 11-2023 in TX, LA, MS, AL and FL that contain an observation of a Black Skimmer.

The second set is the sampling event data from the same time frame and locations.  This sample data is provided by eBird in conjunction with the checklist data.  This data will help us understand when Black Skimmers were NOT observed on submitted checklists.

The third dataset is The Water Institute data discussed above.

In [8]:
# importing eBird data into the notebook

blsk_ebird_file_paths = glob.glob('data_files/ebird_blsk_data/*.txt') # using glob to make a list for blsk observation data

sample_data_file_paths = glob.glob('data_files/ebird_sampling_data/*.txt') # using glob to make a list for sample data to gauge overall bird observation effort

# importing the water institute data into the notebook

twi_file_path = 'data_files/the_water_institute_data/the_water_institute_20102021.csv'

In [12]:
# Converting the ebird black skimmer observation data files into Pandas DataFrames for cleanup process

# ebird blsk dataframe
ebird_blsk_dfs = [] # initializing a list to store the sepearte dataframes before combining
for fp in blsk_ebird_file_paths: # a for loop to loop through the file path list, create a dataframe, then add the dataframe to the above list
    df = pd.read_csv(fp, sep='\t', low_memory=False)
    ebird_blsk_dfs.append(df)

ebird_blsk_df = pd.concat(ebird_blsk_dfs, ignore_index=True) #combine all the dataframes into one single dataframe

In [15]:
# Converting the ebird sample data files into Pandas DataFrames for cleanup process

ebird_sample_dfs = [] # initializing a list to store the sepearte dataframes before combining
for fp in sample_data_file_paths: # a for loop to loop through the file path list, create a dataframe, then add the dataframe to the above list
    df = pd.read_csv(fp, sep='\t', low_memory=False)
    ebird_sample_dfs.append(df)

ebird_sample_df = pd.concat(ebird_sample_dfs, ignore_index=True) #combine all the dataframes into one single dataframe

In [19]:
# Converting The Water Institute data file into a Pandas DataFrame

twi_df = pd.read_csv(twi_file_path, low_memory=False)

## Initial Exploration

Below is an initial exploration of our datasets.  My goal is to create a PLAN to do the following:
* Reduce the size by removing unnecessary columns
* Decide which columns need a change of datatype
* Create a column naming convention to align all the datasets
* Decide if I need to confine the data to a more confined location
* Prep the datasets to be combined into a single dataset

In [16]:
# exploring the columns and data types for the ebird_blsk_df
ebird_blsk_df.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 202675 entries, 0 to 202674
Data columns (total 50 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   GLOBAL UNIQUE IDENTIFIER    202675 non-null  object 
 1   LAST EDITED DATE            202675 non-null  object 
 2   TAXONOMIC ORDER             202675 non-null  int64  
 3   CATEGORY                    202675 non-null  object 
 4   TAXON CONCEPT ID            202675 non-null  object 
 5   COMMON NAME                 202675 non-null  object 
 6   SCIENTIFIC NAME             202675 non-null  object 
 7   SUBSPECIES COMMON NAME      2686 non-null    object 
 8   SUBSPECIES SCIENTIFIC NAME  2686 non-null    object 
 9   EXOTIC CODE                 0 non-null       float64
 10  OBSERVATION COUNT           202675 non-null  object 
 11  BREEDING CODE               1427 non-null    object 
 12  BREEDING CATEGORY           1427 non-null    object 
 13  BEHAVIOR CODE 

In [17]:
ebird_sample_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8456498 entries, 0 to 8456497
Data columns (total 31 columns):
 #   Column                     Dtype  
---  ------                     -----  
 0   LAST EDITED DATE           object 
 1   COUNTRY                    object 
 2   COUNTRY CODE               object 
 3   STATE                      object 
 4   STATE CODE                 object 
 5   COUNTY                     object 
 6   COUNTY CODE                object 
 7   IBA CODE                   object 
 8   BCR CODE                   float64
 9   USFWS CODE                 object 
 10  ATLAS BLOCK                float64
 11  LOCALITY                   object 
 12  LOCALITY ID                object 
 13  LOCALITY TYPE              object 
 14  LATITUDE                   float64
 15  LONGITUDE                  float64
 16  OBSERVATION DATE           object 
 17  TIME OBSERVATIONS STARTED  object 
 18  OBSERVER ID                object 
 19  SAMPLING EVENT IDENTIFIER  object 
 20  PR