# FWS Endangered Species Data

### Data Source:

Endangered Species Data was collected from the FWS Data Explorer: https://ecos.fws.gov/ecp/report/adhocCreator?catalogId=species&reportId=species&columns=%2Fspecies@cn,sn,status,desc,listing_date&sort=%2Fspecies@cn%20asc;%2Fspecies@sn%20asc

### Module Imports:

In [1]:
#import necessary modules
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
sns.set(rc  = {'figure.figsize':(15,8)}, color_codes=True)

### Data Cleaning:

In [2]:
df = pd.read_csv('FWS_Species_Data_Explorer.csv')

In [3]:
# check data types and column names
print(df.dtypes)

df.columns

Common Name             object
Scientific Name         object
Scientific Name_url     object
ESA Listing Status      object
Entity Description      object
ESA Listing Date        object
Foreign or Domestic     object
Inverted Common Name    object
Species Group           object
Status Category         object
dtype: object


Index(['Common Name', 'Scientific Name', 'Scientific Name_url',
       'ESA Listing Status', 'Entity Description', 'ESA Listing Date',
       'Foreign or Domestic', 'Inverted Common Name', 'Species Group',
       'Status Category'],
      dtype='object')

In [4]:
#check to see how the data frame looks
df.head()

Unnamed: 0,Common Name,Scientific Name,Scientific Name_url,ESA Listing Status,Entity Description,ESA Listing Date,Foreign or Domestic,Inverted Common Name,Species Group,Status Category
0,Aaa water treader bug,Cavaticovelia aaa,https://ecos.fws.gov/ecp/species/8146,Species of Concern,Wherever found,,Domestic,"Bug, Aaa water treader",Insects,Not Listed
1,Aalbu's cave pseudoscorpion,Archeolarca aalbui,https://ecos.fws.gov/ecp/species/6406,Species of Concern,Wherever found,,Domestic,"Pseudoscorpion, Aalbu's cave",Arachnids,Not Listed
2,Aardhals springsnail,Pyrgulopsis aardahli,https://ecos.fws.gov/ecp/species/4789,Species of Concern,Wherever found,,Domestic,"Springsnail, Aardhals",Snails,Not Listed
3,Aase's onion,Allium aaseae,https://ecos.fws.gov/ecp/species/2608,Species of Concern,Wherever found,,Domestic,"Onion, Aase's",Flowering Plants,Not Listed
4,Abajo daisy,Erigeron abajoensis,https://ecos.fws.gov/ecp/species/6559,Resolved Taxon,Wherever found,,Domestic,"Daisy, Abajo",Flowering Plants,Not Listed


Dropping extra columns and renaming columns for consistency across dataframes:

In [5]:
# Drop the columns that we will not use for the analysis
df = df.drop(['Scientific Name_url'], axis = 1)

#Change data type to date time
df["ESA Listing Date"] = pd.to_datetime(df["ESA Listing Date"])

# Rename the rest of the columns for easier accessing
df = df.rename(columns={'Entity Description':'Area',
                        'Species Group':'Group',
                        'ESA Listing Status':'Status',
                        'Foreign or Domestic':'Location',
                        'ESA Listing Date':'Listing Date'
                        })

Keeping continuity across the project:

In [6]:
# convert the long format ESA Listing Status to short code according
# to FWS status codes
#source: https://ecos.fws.gov/ecp0/html/db-status.html
status_codes = {
    'Species of Concern':'SC','Resolved Taxon':'RT','Endangered':'E', 
    'Threatened':'T','Under Review':'UR','Status Undefined':'SU',
    'Experimental Population, Non-Essential':'EXPN, XN','Recovery':'R',
    'Proposed Threatened':'PT','Candidate':'C',
    'Proposed Endangered':'PE','Not Listed':'NL',
    'Similarity of Appearance (Threatened)':'SAT','Extinction':'D3A',
    'Original Data in Error - Taxonomic Revision':'DR', 
    'Original Data in Error - New Information Discovered':'DP',
    'Original Data in Error - Not a listable entity':'DNS',
    'Original Data in Error - Act Amendment':'DA',
    'Original Data in Error - Erroneous Data':'DO',
    'Proposed Similarity of Appearance (Threatened)':'PSAT, PT',
    'Pre-Act Delisted Taxon':'Unlist',
    'Emergency Listing, Endangered':'EmE',
}


df['Status'] = df['Status'].apply(lambda x : status_codes[x])

In [7]:
# checking the cleaning process
print(df.shape)
df.head()

(10318, 9)


Unnamed: 0,Common Name,Scientific Name,Status,Area,Listing Date,Location,Inverted Common Name,Group,Status Category
0,Aaa water treader bug,Cavaticovelia aaa,SC,Wherever found,NaT,Domestic,"Bug, Aaa water treader",Insects,Not Listed
1,Aalbu's cave pseudoscorpion,Archeolarca aalbui,SC,Wherever found,NaT,Domestic,"Pseudoscorpion, Aalbu's cave",Arachnids,Not Listed
2,Aardhals springsnail,Pyrgulopsis aardahli,SC,Wherever found,NaT,Domestic,"Springsnail, Aardhals",Snails,Not Listed
3,Aase's onion,Allium aaseae,SC,Wherever found,NaT,Domestic,"Onion, Aase's",Flowering Plants,Not Listed
4,Abajo daisy,Erigeron abajoensis,RT,Wherever found,NaT,Domestic,"Daisy, Abajo",Flowering Plants,Not Listed


In [8]:
#Changing the order of the columns displayed
df = df[['Group','Status','Scientific Name','Common Name',
         'Location','Status Category', 'Listing Date',
         'Inverted Common Name','Area']] 

#sorting the values by 'Group' and resetting the index
df.sort_values(by=['Group'], inplace=True)
df = df.reset_index()
df = df.drop(['index'], axis = 1)

df.head()

Unnamed: 0,Group,Status,Scientific Name,Common Name,Location,Status Category,Listing Date,Inverted Common Name,Area
0,Algae,E,Isogomphodon oxyrhynchus,Daggernose Shark,Foreign,Listed,NaT,"Shark, Daggernose",
1,Amphibians,UR,Desmognathus abditus,Cumberland Dusky salamander,Domestic,"Petitioned for Listing, Under Review",NaT,"salamander, Cumberland Dusky",Wherever found
2,Amphibians,SC,Plethodon elongatus,Del Norte salamander,Domestic,Not Listed,NaT,"Salamander, Del Norte",Wherever found
3,Amphibians,SC,Eurycea aquatica,Dark-sided salamander,Domestic,Not Listed,NaT,"Salamander, dark-sided",Wherever found
4,Amphibians,RT,Bufo retiformis,Sonoran green toad,Both Domestic and Foreign,Not Listed,NaT,"Toad, Sonoran green",Wherever found


Checking the null values: Because many of the species are not listed under the ESA, I am expecting null values

In [9]:
print(df.isnull().sum())

Group                      0
Status                     0
Scientific Name            0
Common Name                0
Location                  72
Status Category           13
Listing Date            7771
Inverted Common Name       0
Area                     371
dtype: int64


Now the dataframe is ready for analysis. I convert the copy (without subtotals) to a new csv

In [10]:
df.to_csv('esa_species.csv', index=False)