# FY 2017 Endangered Species Expenditures

### Data Source:

2017 Endangered Species Expenditures Data was collected from the following public report on pages 8-57:
https://www.fws.gov/sites/default/files/documents/endangered-species-expenditures-report-fiscal-year-2017.pdf

Note: This does not include expenditures for land acquistion

### Module Imports:

In [1]:
import pandas as pd
import numpy as np
import tabula as tb
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
sns.set(rc = {'figure.figsize':(15,8)}, color_codes=True)

### Data Cleaning

First I converted table 1 in the pdf to a csv file:

In [2]:
page_list = list(range(8, 58))
file = 'endangered-species-expenditures-report-fiscal-year-2017.pdf'
tb.convert_into(file, "expenditures2017.csv", pages= page_list, output_format="csv", stream=True)
df = pd.read_csv('expenditures2017.csv')
df.head()

Unnamed: 0,Group Name,Rank,Species (50 CFR Part 17),Status,FWS Total,Unnamed: 5,Other Fed,Unnamed: 7,Federal Total,States Total,Unnamed: 10,Species Total
0,Mammals,1426,Addax (Addax nasomaculatus) - Wherever found,E,$0,,$200,,$200,$0,,$200
1,,1040,"Ass, Asian wild (Equus hemionus) - Wherever found",E,$0,,"$12,500",,"$12,500",$0,,"$12,500"
2,,1423,Babirusa (Babyrousa babyrussa) - Wherever found,E,$0,,$300,,$300,$0,,$300
3,,165,"Bat, Florida bonneted (Eumops floridanus) - Wh...",E,"$369,500",,"$399,334",,"$768,834","$33,965",,"$802,799"
4,,133,"Bat, gray (Myotis grisescens) - Wherever found",E,"$300,780",,"$646,723",,"$947,503","$127,190",,"$1,074,693"


In [3]:
df.tail()

Unnamed: 0,Group Name,Rank,Species (50 CFR Part 17),Status,FWS Total,Unnamed: 5,Other Fed,Unnamed: 7,Federal Total,States Total,Unnamed: 10,Species Total
2663,,1013.0,"Lichen, rock gnome (Gymnoderma lineare) - Wher...",E,"$12,200","$1,100","$13,300",$0,"$13,300",,,
2664,Lichens subtotal,,,,"$14,200","$82,900","$97,100","$1,900","$99,000",,,
2665,Multi-species,,,,"$3,625,640","$77,472,044","$81,097,684","$8,704,696","$89,802,380",,,
2666,Multi-species subtotal,,,,"$3,625,640","$77,472,044","$81,097,684","$8,704,696","$89,802,380",,,
2667,Total Species Expenditures,,,,"$188,504,080","$1,047,794,289","$1,236,298,369","$53,074,754","$1,289,373,123",,,


After viewing a snapshot of the data, I realized that there were inconsistencies columns and rows needed to be manually cleaned.

In [4]:
# after cleaning renamed data to 'cleaned_expenditures2017.csv'
df2017 = pd.read_csv('cleaned_expenditures2017.csv')
print(df2017.columns)
df2017.head()

Index(['Group Name', 'Rank', 'Species', 'Status', 'FWS Total', 'Other Fed',
       'Federal Total', 'States Total', 'Species Total'],
      dtype='object')


Unnamed: 0,Group Name,Rank,Species,Status,FWS Total,Other Fed,Federal Total,States Total,Species Total
0,Mammals,1426.0,Addax (Addax nasomaculatus) - Wherever found,E,$0,$200,$200,$0,$200
1,Mammals,1040.0,"Ass, Asian wild (Equus hemionus) - Wherever found",E,$0,"$12,500","$12,500",$0,"$12,500"
2,Mammals,1423.0,Babirusa (Babyrousa babyrussa) - Wherever found,E,$0,$300,$300,$0,$300
3,Mammals,165.0,"Bat, Florida bonneted (Eumops floridanus) - Wh...",E,"$369,500","$399,334","$768,834","$33,965","$802,799"
4,Mammals,133.0,"Bat, gray (Myotis grisescens) - Wherever found",E,"$300,780","$646,723","$947,503","$127,190","$1,074,693"


Renaming columns for consistency across dataframes & splitting combined columns:

In [5]:
df2017 = df2017.rename(columns={'Group Name':'Group',
                                'FWS Total':'FWS 2017',
                                'Other Fed':'Other Fed 2017',
                                'States Total':'States 2017', 
                                'Species Total':'Total 2017'
                               })
# split column and add new columns to df
df2017[['Inverted Common Name','Scientific Name',
        'Noname1', 'Noname2', 'Noname3', 'Noname4', 'Noname5']] = df2017['Species'].str.split('(', expand=True)
df2017[['Scientific Name','Area', 'Noname6']] = df2017['Scientific Name'].str.split('-', expand=True)

#drop extra columns
df2017 = df2017.drop(['Rank','Species','Federal Total','Noname1','Noname2','Noname3',
                     'Noname4', 'Noname5', 'Noname6'], axis = 1)

In [6]:
df2017.head()

Unnamed: 0,Group,Status,FWS 2017,Other Fed 2017,States 2017,Total 2017,Inverted Common Name,Scientific Name,Area
0,Mammals,E,$0,$200,$0,$200,Addax,Addax nasomaculatus),Wherever found
1,Mammals,E,$0,"$12,500",$0,"$12,500","Ass, Asian wild",Equus hemionus),Wherever found
2,Mammals,E,$0,$300,$0,$300,Babirusa,Babyrousa babyrussa),Wherever found
3,Mammals,E,"$369,500","$399,334","$33,965","$802,799","Bat, Florida bonneted",Eumops floridanus),Wherever found
4,Mammals,E,"$300,780","$646,723","$127,190","$1,074,693","Bat, gray",Myotis grisescens),Wherever found


In order to perform EDA, needed to remove symbols and change data types:

In [7]:
#remove unnecessary symbols
df2017['Scientific Name'] = df2017['Scientific Name'].str.replace('[()=]', '', regex=True)

#changed data type to integer for analysis
df2017['Total 2017'] = df2017['Total 2017'].str.replace('[/$,]', '', regex=True).astype(int)
df2017['States 2017'] = df2017['States 2017'].str.replace('[/$,]', '', regex=True).astype(int)
df2017['FWS 2017'] = df2017['FWS 2017'].str.replace('[/$,]', '', regex=True).astype(int)
df2017['Other Fed 2017'] = df2017['Other Fed 2017'].str.replace('[/$,]', '', regex=True).astype(int)

# display the dataframe
df2017.head()

Unnamed: 0,Group,Status,FWS 2017,Other Fed 2017,States 2017,Total 2017,Inverted Common Name,Scientific Name,Area
0,Mammals,E,0,200,0,200,Addax,Addax nasomaculatus,Wherever found
1,Mammals,E,0,12500,0,12500,"Ass, Asian wild",Equus hemionus,Wherever found
2,Mammals,E,0,300,0,300,Babirusa,Babyrousa babyrussa,Wherever found
3,Mammals,E,369500,399334,33965,802799,"Bat, Florida bonneted",Eumops floridanus,Wherever found
4,Mammals,E,300780,646723,127190,1074693,"Bat, gray",Myotis grisescens,Wherever found


In [8]:
print(df2017.shape)
# checking the stats for the expenditures
df2017.describe()

(1792, 9)


Unnamed: 0,FWS 2017,Other Fed 2017,States 2017,Total 2017
count,1792.0,1792.0,1792.0,1792.0
mean,315442.1,1754120.0,89393.1,2158415.0
std,4857363.0,30612370.0,1460260.0,36349900.0
min,0.0,0.0,0.0,20.0
25%,4500.0,0.0,0.0,8425.5
50%,10077.0,2513.0,0.0,22446.0
75%,39656.75,53240.0,0.0,138377.8
max,188504100.0,1047794000.0,53074750.0,1289373000.0


In [9]:
df2017.groupby("Group").size()

Group
Amphibians                       37
Amphibians subtotal               1
Arachnids                        12
Arachnids subtotal                1
Birds                           112
Birds subtotal                    1
Clams                           108
Clams subtotal                    1
Conifers and Cycads               4
Conifers and Cycads subtotal      1
Corals                           15
Corals subtotal                   1
Crustaceans                      27
Crustaceans subtotal              1
Ferns and Allies                 37
Ferns and Allies subtotal         1
Fishes                          176
Fishes subtotal                   1
Flowering Plants                883
Flowering Plants subtotal         1
Insects                          85
Insects subtotal                  1
Lichens                           2
Lichens subtotal                  1
Mammals                         157
Mammals subtotal                  1
Multi-species                     1
Multi-species subtotal

In [10]:
#Changing the order of the columns displayed
df2017 = df2017[['Group','Status','Scientific Name','Inverted Common Name',
        'FWS 2017','Other Fed 2017','States 2017','Total 2017','Area']]

#sorting the values by 'Group' and resetting the index
df2017.sort_values(by=['Group'], inplace=True)
df2017 = df2017.reset_index()
df2017 = df2017.drop(['index'], axis = 1)

df2017.head()

Unnamed: 0,Group,Status,Scientific Name,Inverted Common Name,FWS 2017,Other Fed 2017,States 2017,Total 2017,Area
0,Amphibians,E,arroyo southwestern,"Toad, arroyo",245087,375272,0,620359,
1,Amphibians,T,Ambystoma californiense,"Salamander, California tiger",737957,1052249,0,1790206,U.S.A.
2,Amphibians,E,Ambystoma californiense,"Salamander, California tiger",570447,52156,0,622603,U.S.A.
3,Amphibians,E,Ambystoma californiense,"Salamander, California tiger",541608,100600,0,642208,U.S.A.
4,Amphibians,E,Eurycea sosorum,"Salamander, Barton Springs",15000,300,0,15300,Wherever found


Now that the index is reset, I can use the group count totals from groupby("Group").size() to drop the group subtotal rows that could skew my analysis.

In [11]:
# to preserve df2017, I made a copy
df2017_copy = df2017

# performing the drop on the copy only
df2017_copy = df2017_copy.drop(index=[
    37,50,163,272,277,293,321,359,536,
    1420,1506,1509,1667,1668,1669,1740,1790,1791])

# checking to make sure that only the subtotals were dropped
df2017_copy['Group'].value_counts()


Flowering Plants       883
Fishes                 176
Mammals                157
Birds                  112
Clams                  108
Insects                 85
Reptiles                70
Snails                  49
Amphibians              37
Ferns and Allies        37
Crustaceans             27
Corals                  15
Arachnids               12
Conifers and Cycads      4
Lichens                  2
Name: Group, dtype: int64

Now the dataframe is ready for analysis. I convert the copy (without subtotals) to a new csv

In [12]:
df2017_copy.to_csv('esa_expenditures2017.csv', index=False)