## Jonathan Bunch
## Project Milestone 4
## 23 May, 2021
---
For this milestone I will be working with the 'BISON' API provided by the USGS.
This API provides occurrence data for animal species, and I will be extracting the number of observations of the
wood stork (Mycteria americana) in each county of Florida.  This is considered an indicator species for habitat health.
https://bison.usgs.gov/#api

In [10]:
import pandas as pd
import requests

# This function sends the HTTP request and handles any connection errors that may come up.  It returns a dictionary
# containing the contents of the JSON file returned by the API call.
def send_request(address):
    print('Attempting to connect to API...')
    try:
        data_request = requests.get(address, timeout=30)
        data_request.raise_for_status()
    except requests.exceptions.HTTPError:
        print("Unfortunately there was an error with your request (error type: HTTP Error). Please try again.")
    except requests.exceptions.ConnectionError:
        print("Unfortunately there was an error with your request (error type: Connection Error). Please try again.")
    except requests.exceptions.Timeout:
        print("Unfortunately there was an error with your request (error type: Timeout Error). Please try again.")
    except requests.exceptions.RequestException:
        print("Unfortunately there was an error with your request (error type: Other). Please try again.")
    else:
        data_json = data_request.json()
        if data_json is None:
            return 0
        else:
            print('API connection successful! Retrieving data...')
            return data_json


# Choose the species and create the url.
species = 'Mycteria%20americana'
url = f'https://bison.usgs.gov/api/search.json?species={species}&type=scientific_name&state=Florida&start=0&count=1000'

# Send the request and create a data frame from the "counties" data.
wood_stork_data = send_request(url)
ws_df = pd.DataFrame(wood_stork_data['counties']['data'])
ws_df

Attempting to connect to API...
API connection successful! Retrieving data...


Unnamed: 0,12071,12073,12031,12075,12033,12077,12119,12035,12079,12111,...,12109,12101,12023,12067,12103,12069,12105,12027,12107,12029
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48
name,Lee County,Leon County,Duval County,Levy County,Escambia County,Liberty County,Sumter County,Flagler County,Madison County,St. Lucie County,...,St. Johns County,Pasco County,Columbia County,Lafayette County,Pinellas County,Lake County,Polk County,DeSoto County,Putnam County,Dixie County
state,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,...,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida


The column names are currently the county codes. I will change the column names to the county names, but first I want
to change the formatting of the county names.

First, I will remove the " County" after the name of each county.

In [11]:
ws_df.iloc[1, :] = ws_df.iloc[1, :].apply(lambda x: x.replace(" County", ""))
ws_df

Unnamed: 0,12071,12073,12031,12075,12033,12077,12119,12035,12079,12111,...,12109,12101,12023,12067,12103,12069,12105,12027,12107,12029
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48
name,Lee,Leon,Duval,Levy,Escambia,Liberty,Sumter,Flagler,Madison,St. Lucie,...,St. Johns,Pasco,Columbia,Lafayette,Pinellas,Lake,Polk,DeSoto,Putnam,Dixie
state,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,...,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida


Next, I will change all the casing to lower.

In [12]:
ws_df.iloc[1, :] = ws_df.iloc[1, :].apply(lambda x: x.lower())
ws_df

Unnamed: 0,12071,12073,12031,12075,12033,12077,12119,12035,12079,12111,...,12109,12101,12023,12067,12103,12069,12105,12027,12107,12029
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48
name,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st. lucie,...,st. johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
state,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,...,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida


Next, I will remove the periods used to abbreviate some names.

In [13]:
ws_df.iloc[1, :] = ws_df.iloc[1, :].apply(lambda x: x.replace(".", ""))
ws_df

Unnamed: 0,12071,12073,12031,12075,12033,12077,12119,12035,12079,12111,...,12109,12101,12023,12067,12103,12069,12105,12027,12107,12029
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48
name,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st lucie,...,st johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
state,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,...,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida


I will also replace any spaces with underscores.

In [14]:
ws_df.iloc[1, :] = ws_df.iloc[1, :].apply(lambda x: x.replace(" ", "_"))
ws_df

Unnamed: 0,12071,12073,12031,12075,12033,12077,12119,12035,12079,12111,...,12109,12101,12023,12067,12103,12069,12105,12027,12107,12029
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48
name,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st_lucie,...,st_johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
state,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,...,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida


Now I can use the "name" row to rename the columns.

In [15]:
ws_df = ws_df.rename(columns=ws_df.iloc[1, :])
ws_df

Unnamed: 0,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st_lucie,...,st_johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48
name,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st_lucie,...,st_johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
state,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,...,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida,Florida


Now I can drop the "state" and "name" rows.

In [16]:
ws_df = ws_df.drop(labels=['state', 'name'], axis=0)
ws_df

Unnamed: 0,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st_lucie,...,st_johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
total,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48


Finally, I will rename the index.

In [17]:
ws_df = ws_df.rename(index={'total': 'occurrences'})
ws_df

Unnamed: 0,lee,leon,duval,levy,escambia,liberty,sumter,flagler,madison,st_lucie,...,st_johns,pasco,columbia,lafayette,pinellas,lake,polk,desoto,putnam,dixie
occurrences,12979,3040,5620,636,15,8,376,528,50,1218,...,5765,3731,221,19,10760,1125,7643,481,115,48


Now I should be able to integrate these data with my other sources without too much further processing.  I may try
repeating the same procedure with other species for comparison as well.