# ESA Recovery Plans

### Data Source:

Endangered Species Data was collected from the FWS Data Explorer: https://ecos.fws.gov/ecp/report/adhocCreator?catalogId=species&reportId=recoveryDocs&columns=%2FrecoveryDocs@comname,sciname,pop_abbrev,group_text,plan_title,datesort,plan_stage,species_status,lead_region,recovery_region_name

### Module Imports:

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
sns.set(rc  = {'figure.figsize':(15,8)}, color_codes=True)

### Data Cleaning:

In [2]:
df = pd.read_csv('Species_With_Recovery_Documents_Data_Explorer.csv')

In [3]:
# check data types and column names
print(df.dtypes)
df.columns

Species Common Name             object
Species Scientific Name         object
Species Scientific Name_url     object
Species Population              object
ECOS Species Group              object
Recovery Document Date          object
Recovery Document Stage         object
ESA Listing Status              object
Lead Region                      int64
Region Name                     object
Number of Recovery Actions     float64
dtype: object


Index(['Species Common Name', 'Species Scientific Name',
       'Species Scientific Name_url', 'Species Population',
       'ECOS Species Group', 'Recovery Document Date',
       'Recovery Document Stage', 'ESA Listing Status', 'Lead Region',
       'Region Name', 'Number of Recovery Actions'],
      dtype='object')

In [4]:
#displaying snapshot of the dataframe
df.head()

Unnamed: 0,Species Common Name,Species Scientific Name,Species Scientific Name_url,Species Population,ECOS Species Group,Recovery Document Date,Recovery Document Stage,ESA Listing Status,Lead Region,Region Name,Number of Recovery Actions
0,Indiana bat,Myotis sodalis,https://ecos.fws.gov/ecp/species/5949,Wherever found,Mammals,2007/04/16,Draft Revision 1,E,3,Great Lakes-Big Rivers Region,116.0
1,Grizzly bear,Ursus arctos horribilis,https://ecos.fws.gov/ecp/species/7642,"U.S.A., conterminous (lower 48) States, except...",Mammals,1993/09/10,Final Revision 1,T,6,Mountain-Prairie Region,329.0
2,Columbian white-tailed deer,Odocoileus virginianus leucurus,https://ecos.fws.gov/ecp/species/154,Columbia River DPS,Mammals,1983/06/14,Final Revision 1,T,1,Pacific Region,14.0
3,Black-footed ferret,Mustela nigripes,https://ecos.fws.gov/ecp/species/6953,"Wherever found, except where listed as an expe...",Mammals,2013/12/23,Final Revision 2,E,6,Mountain-Prairie Region,84.0
4,San Joaquin kit fox,Vulpes macrotis mutica,https://ecos.fws.gov/ecp/species/2873,wherever found,Mammals,1998/09/30,Final,E,8,California/Nevada Region,84.0


Dropping extra columns and renaming columns for consistency across dataframes:

In [5]:
# Drop the columns that we will not use for the analysis
df = df.drop(['Species Scientific Name_url', 'Lead Region',
              'Recovery Document Stage'], axis = 1)
#Change data type to date time
df['Recovery Document Date'] = pd.to_datetime(df['Recovery Document Date']) 
# Rename the rest of the columns for easier accessing
df = df.rename(columns={'Species Common Name':'Common Name',
                        'Species Scientific Name':'Scientific Name',
                        'Species Population':'Area',
                        'Region Name':'Region',
                        'ECOS Species Group':'Group',
                        'ESA Listing Status':'Status',
                        'Number of Recovery Actions':'Recovery Actions'
                        })

In [6]:
df['Scientific Name'] = df['Scientific Name'].str.lower()
df['Common Name'] = df['Common Name'].str.lower()
df['Area'] = df['Area'].str.lower()

Checking for duplicate rows

In [7]:
print(df.shape)
df.duplicated().sum()

(1788, 8)


11

In [8]:
df = df.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)

In [9]:
# checking the cleaning process
print(df.shape)
df.head()

(1777, 8)


Unnamed: 0,Common Name,Scientific Name,Area,Group,Recovery Document Date,Status,Region,Recovery Actions
0,indiana bat,myotis sodalis,wherever found,Mammals,2007-04-16,E,Great Lakes-Big Rivers Region,116.0
1,grizzly bear,ursus arctos horribilis,"u.s.a., conterminous (lower 48) states, except...",Mammals,1993-09-10,T,Mountain-Prairie Region,329.0
2,columbian white-tailed deer,odocoileus virginianus leucurus,columbia river dps,Mammals,1983-06-14,T,Pacific Region,14.0
3,black-footed ferret,mustela nigripes,"wherever found, except where listed as an expe...",Mammals,2013-12-23,E,Mountain-Prairie Region,84.0
4,san joaquin kit fox,vulpes macrotis mutica,wherever found,Mammals,1998-09-30,E,California/Nevada Region,84.0


Checking the null values: Because many of the recovery plans are not finished, I am expecting null values

In [10]:
print(df.isnull().sum())

Common Name                 0
Scientific Name             0
Area                       94
Group                       0
Recovery Document Date      0
Status                      0
Region                      0
Recovery Actions          305
dtype: int64


Now the dataframe is ready for analysis. I convert the copy (without subtotals) to a new csv

In [11]:
df.to_csv('esa_recovery_plans.csv', index=False)