# FY 2018 Endangered Species Expenditures

### Data Source:

2018 Endangered Species Expenditures Data was collected from the following public report on pages 8-104:
https://www.fws.gov/sites/default/files/documents/endangered-and-threatened-species-expenditures-fiscal-year-2018.pdf

Note: This does not include expenditures for land acquistion

### Module Imports:

In [1]:
import pandas as pd
import numpy as np
import tabula as tb
import re
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
sns.set(rc = {'figure.figsize':(15,8)}, color_codes=True)

### Data Cleaning:

First I converted table 1 in the pdf to a csv file. This can be found in the files section of the repository

In [None]:
page_list = list(range(8, 105))
file = 'endangered-and-threatened-species-expenditures-fiscal-year-2018.pdf'
tb.convert_into(file, "expenditures2018.csv", pages = page_list, output_format ="csv", stream = True)


In [None]:
df = pd.read_csv('expenditures2018.csv', error_bad_lines=False)
df.head()

In [None]:
df.tail()

After viewing a snapshot of the data, I realized that there were inconsistencies in the columns and rows needed to be manually cleaned. This can also be found in the files section

In [2]:
# after cleaning renamed data to 'cleaned_expenditures2018.csv'
df2018 = pd.read_csv('cleaned_expenditures2018.csv')
print(df2018.columns)
df2018.head()

Index(['Group Name', 'Rank', 'Species ', 'Status', 'FWS Total', 'Other Fed',
       'Federal Total', 'States Total', 'Species Total'],
      dtype='object')


Unnamed: 0,Group Name,Rank,Species,Status,FWS Total,Other Fed,Federal Total,States Total,Species Total
0,Mammals,158.0,"Bat, Florida bonneted (Eumops floridanus) - Wh...",E,"$424,000","$366,607","$790,607",$0,"$790,607"
1,Mammals,143.0,"Bat, gray (Myotis grisescens) - Wherever found",E,"$397,924","$522,119","$920,043","$25,800","$945,843"
2,Mammals,93.0,"Bat, Hawaiian hoary (Lasiurus cinereus semotus...",E,"$616,841","$1,121,167","$1,738,008",$0,"$1,738,008"
3,Mammals,50.0,"Bat, Indiana (Myotis sodalis) - Wherever found",E,"$1,820,373","$3,471,152","$5,291,525","$191,727","$5,483,252"
4,Mammals,606.0,"Bat, Mariana fruit (=Mariana flying fox) - Whe...",E,"$42,842",$0,"$42,842",$0,"$42,842"


Renaming columns for consistency across dataframes & splitting combined columns:

In [3]:
df2018 = df2018.rename(columns={'Species ':'Species',
                                'Group Name':'Group',
                                'FWS Total':'FWS 2018',
                                'Other Fed':'Other Fed 2018',
                                'States Total':'States 2018', 
                                'Species Total':'Total 2018'})
# split column and add new columns to df
df2018[['Inverted Common Name','Scientific Name',
        'Noname1', 'Noname2', 'Noname3', 'Noname4' ]] = df2018['Species'].str.split('(', expand=True)
df2018[['Scientific Name','Area', 'Noname5', 'Noname6', 'Noname7']] = df2018['Scientific Name'].str.split('-', expand=True)

#drop extra columns
df2018 = df2018.drop(['Rank','Federal Total','Species','Noname1',
                      'Noname2','Noname3','Noname4','Noname5',
                      'Noname6','Noname7'], axis = 1)
df2018.head()

Unnamed: 0,Group,Status,FWS 2018,Other Fed 2018,States 2018,Total 2018,Inverted Common Name,Scientific Name,Area
0,Mammals,E,"$424,000","$366,607",$0,"$790,607","Bat, Florida bonneted",Eumops floridanus),Wherever found
1,Mammals,E,"$397,924","$522,119","$25,800","$945,843","Bat, gray",Myotis grisescens),Wherever found
2,Mammals,E,"$616,841","$1,121,167",$0,"$1,738,008","Bat, Hawaiian hoary",Lasiurus cinereus semotus),Wherever found
3,Mammals,E,"$1,820,373","$3,471,152","$191,727","$5,483,252","Bat, Indiana",Myotis sodalis),Wherever found
4,Mammals,E,"$42,842",$0,$0,"$42,842","Bat, Mariana fruit",=Mariana flying fox),Wherever found


In [4]:
df2018['Scientific Name'] = df2018['Scientific Name'].str.lower()
df2018['Inverted Common Name'] = df2018['Inverted Common Name'].str.lower()
df2018['Area'] = df2018['Area'].str.lower()
df2018.head()

Unnamed: 0,Group,Status,FWS 2018,Other Fed 2018,States 2018,Total 2018,Inverted Common Name,Scientific Name,Area
0,Mammals,E,"$424,000","$366,607",$0,"$790,607","bat, florida bonneted",eumops floridanus),wherever found
1,Mammals,E,"$397,924","$522,119","$25,800","$945,843","bat, gray",myotis grisescens),wherever found
2,Mammals,E,"$616,841","$1,121,167",$0,"$1,738,008","bat, hawaiian hoary",lasiurus cinereus semotus),wherever found
3,Mammals,E,"$1,820,373","$3,471,152","$191,727","$5,483,252","bat, indiana",myotis sodalis),wherever found
4,Mammals,E,"$42,842",$0,$0,"$42,842","bat, mariana fruit",=mariana flying fox),wherever found


In order to perform EDA, needed to remove symbols and change data types:

In [5]:
#remove unnecessary symbols
df2018['Scientific Name'] = df2018['Scientific Name'].str.replace('[()=]', '', regex=True)

#changed data type to integer for analysis
df2018['Total 2018'] = df2018['Total 2018'].str.replace('[/$,]', '', regex=True).astype(int)
df2018['States 2018'] = df2018['States 2018'].str.replace('[/$,]', '', regex=True).astype(int)
df2018['FWS 2018'] = df2018['FWS 2018'].str.replace('[/$,]', '', regex=True).astype(int)
df2018['Other Fed 2018'] = df2018['Other Fed 2018'].str.replace('[/$,]', '', regex=True).astype(int)

# display the dataframe
df2018.head()

Unnamed: 0,Group,Status,FWS 2018,Other Fed 2018,States 2018,Total 2018,Inverted Common Name,Scientific Name,Area
0,Mammals,E,424000,366607,0,790607,"bat, florida bonneted",eumops floridanus,wherever found
1,Mammals,E,397924,522119,25800,945843,"bat, gray",myotis grisescens,wherever found
2,Mammals,E,616841,1121167,0,1738008,"bat, hawaiian hoary",lasiurus cinereus semotus,wherever found
3,Mammals,E,1820373,3471152,191727,5483252,"bat, indiana",myotis sodalis,wherever found
4,Mammals,E,42842,0,0,42842,"bat, mariana fruit",mariana flying fox,wherever found


In [6]:
print(df2018.shape)
# checking the stats for the expenditures
df2018.describe()

(1762, 9)


Unnamed: 0,FWS 2018,Other Fed 2018,States 2018,Total 2018
count,1762.0,1762.0,1762.0,1762.0
mean,262021.8,1843339.0,213213.7,2318567.0
std,3996781.0,32180260.0,4236960.0,39743920.0
min,0.0,0.0,0.0,20.0
25%,4500.0,0.0,0.0,7568.0
50%,9146.5,1588.0,0.0,18328.0
75%,37300.0,31491.75,0.0,102527.8
max,154791000.0,1127319000.0,125747400.0,1407857000.0


In [7]:
df2018.groupby("Group").size()

Group
Amphibians                       38
Amphibians subtotal               1
Arachnids                        11
Arachnids subtotal                1
Birds                           110
Birds subtotal                    1
Clams                           112
Clams subtotal                    1
Conifers and Cycads               4
Conifers and Cycads subtotal      1
Corals                           16
Corals subtotal                   1
Crustaceans                      26
Crustaceans subtotal              1
Ferns and Allies                 38
Ferns and Allies subtotal         1
Fishes                          180
Fishes subtotal                   1
Flowering Plants                889
Flowering Plants subtotal         1
Insects                          88
Insects subtotal                  1
Lichens                           2
Lichens subtotal                  1
Mammals                         121
Mammals subtotal                  1
Multi-species subtotal            1
Reptiles              

In [8]:
#Changing the order of the columns displayed
df2018 = df2018[['Group','Status','Scientific Name','Inverted Common Name',
        'FWS 2018','Other Fed 2018','States 2018','Total 2018','Area']]

#sorting the values by 'Group' and resetting the index
df2018.sort_values(by=['Group'], inplace=True)
df2018 = df2018.reset_index()
df2018 = df2018.drop(['index'], axis = 1)

df2018.head()

Unnamed: 0,Group,Status,Scientific Name,Inverted Common Name,FWS 2018,Other Fed 2018,States 2018,Total 2018,Area
0,Amphibians,T,peltophryne lemur,"toad, puerto rican crested",5000,142491,0,147491,wherever found
1,Amphibians,E,bufo hemiophrys baxteri,"toad, wyoming",331300,14500,15251,361051,wherever found
2,Amphibians,E,rana muscosa,"frog, mountain yellow-legged",76115,60706,0,136821,southern california dps
3,Amphibians,E,bufo houstonensis,"toad, houston",111000,132007,4387,247394,wherever found
4,Amphibians,E,arroyo southwestern,"toad, arroyo",257341,737609,0,994950,


Now that the index is reset, I can use the group count totals from groupby("Group").size() to drop the group subtotal rows that could skew my analysis.

In [9]:
# to preserve df2017, I made a copy
df2018_copy = df2018

# performing the drop on the copy only
df2018_copy = df2018_copy.drop(index=[
    38,50,161,274,279,296,323,362,543,
    1433,1522,1525,1647,1648,1709,1760,1761]) 

# checking to make sure that only the subtotals were dropped
df2018_copy['Group'].value_counts()

Flowering Plants       889
Fishes                 180
Mammals                121
Clams                  112
Birds                  110
Insects                 88
Reptiles                60
Snails                  50
Amphibians              38
Ferns and Allies        38
Crustaceans             26
Corals                  16
Arachnids               11
Conifers and Cycads      4
Lichens                  2
Name: Group, dtype: int64

Now that the subtotal rows are dropped:

In [10]:
#Checking the null values.
print(df2018_copy.isnull().sum())


Group                     0
Status                    0
Scientific Name           0
Inverted Common Name      0
FWS 2018                  0
Other Fed 2018            0
States 2018               0
Total 2018                0
Area                    139
dtype: int64


In [11]:
#checking display before exporting
df2018_copy.head()

Unnamed: 0,Group,Status,Scientific Name,Inverted Common Name,FWS 2018,Other Fed 2018,States 2018,Total 2018,Area
0,Amphibians,T,peltophryne lemur,"toad, puerto rican crested",5000,142491,0,147491,wherever found
1,Amphibians,E,bufo hemiophrys baxteri,"toad, wyoming",331300,14500,15251,361051,wherever found
2,Amphibians,E,rana muscosa,"frog, mountain yellow-legged",76115,60706,0,136821,southern california dps
3,Amphibians,E,bufo houstonensis,"toad, houston",111000,132007,4387,247394,wherever found
4,Amphibians,E,arroyo southwestern,"toad, arroyo",257341,737609,0,994950,


Now the dataframe is ready for analysis. I convert the copy (without subtotals) to a new csv

In [12]:
df2018_copy.to_csv('esa_expenditures2018.csv', index=False)