# NAS Module Demonstration

In [6]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
import numpy as np
import pandas as pd
import flbs_ais.nas as nas

---
## Search for a species

The following code will take a species' scientific name and search the NAS database for any matching species. The purpose of this is to obtain a species ID, which is needed for further methods of this module. The first method, *species_search*, returns a list of python dictionaries, each of which contains a series of values related to that species. The second function, *species_string*, returns a string of that list in a readable form.

In [8]:
search_result = nas.species_search("Oncorhynchus", "mykiss");
search_string = nas.species_string(search_result);
print(search_string)

speciesID: 910
itis_tsn: 161989
group: Fishes
family: Salmonidae
genus: Oncorhynchus
species: mykiss
subspecies: 
variety: 
authority: (Walbaum, 1792)
common_name: Rainbow Trout
native_exotic: Native
Fresh/Marine/Brackish: Freshwater-Marine

speciesID: 911
itis_tsn: 553422
group: Fishes
family: Salmonidae
genus: Oncorhynchus
species: mykiss
subspecies: irideus
variety: 
authority: (Gibbons, 1955)
common_name: coastal rainbow trout
native_exotic: Native
Fresh/Marine/Brackish: Freshwater-Marine

speciesID: 913
itis_tsn: None
group: Fishes
family: Salmonidae
genus: Oncorhynchus
species: mykiss
subspecies: kamloops strain
variety: 
authority: 
common_name: Kamloops trout
native_exotic: Native
Fresh/Marine/Brackish: Freshwater-Marine

speciesID: 914
itis_tsn: None
group: Fishes
family: Salmonidae
genus: Oncorhynchus
species: mykiss
subspecies: ssp.
variety: 
authority: 
common_name: redband trout
native_exotic: Native
Fresh/Marine/Brackish: Freshwater-Marine




---
## Get a pandas dataframe for a species id

Once a species ID is known, the *getdf* method can be used to get a pandas dataframe containing all the occurrence data for that species in the NAS database. The columns parameter determines which of the columns should be kept in the dataframe. The limit parameter determines how many rows will be in the dataframe, with a value of -1 returning unlimited results. An API key is necessary to get more than 100 rows.

To return a list of column names, use *get_header*. All column names are lower case with no spaces.

In [9]:
print("\nList of accepted column names for the keep_column parameter:\n")
name_list = nas.get_header()
for i, name in enumerate(name_list):
    if i:
        # Print a comma if this is not the first element
        print(', ', end='')
    # Print the column name after the comma
    print(name, end='')
print()

print("\n\nResults from an API query placed into a pandas dataframe:\n")
redband_trout_df = nas.getdf(914, keep_columns=None, limit=20, api_key=None)
redband_trout_df.head()


List of accepted column names for the keep_column parameter:

specimennumber, speciesid, group, family, genus, species, scientificname, commonname, country, state, county, locality, decimallatitude, decimallongitude, latlongsource, latlongaccuracy, drainagename, centroidtype, huc8name, huc8, huc10name, huc10, huc12name, huc12, date, year, month, day, status, comments, recordtype, disposal, museumcatnumber, freshmarineintro, references


Results from an API query placed into a pandas dataframe:



Unnamed: 0,specimennumber,speciesid,group,family,genus,species,scientificname,commonname,country,state,...,year,month,day,status,comments,recordtype,disposal,museumcatnumber,freshmarineintro,references
0,43833,914,Fishes,Salmonidae,Oncorhynchus,mykiss,Oncorhynchus mykiss,redband trout,,Texas,...,1983,,,stocked,,Literature,,,Freshwater,"[{'key': 360, 'refType': 'Report', 'year': 199..."
1,43834,914,Fishes,Salmonidae,Oncorhynchus,mykiss,Oncorhynchus mykiss,redband trout,,Texas,...,1986,,,stocked,,Literature,,,Freshwater,"[{'key': 360, 'refType': 'Report', 'year': 199..."
2,43835,914,Fishes,Salmonidae,Oncorhynchus,mykiss,Oncorhynchus mykiss,redband trout,,Texas,...,1992,,,unknown,,Literature,,,Freshwater,"[{'key': 360, 'refType': 'Report', 'year': 199..."
3,290383,914,Fishes,Salmonidae,Oncorhynchus,mykiss,Oncorhynchus mykiss,redband trout,,Montana,...,1982,8.0,19.0,stocked,,Literature,,,Freshwater,"[{'key': 24224, 'refType': 'Database', 'year':..."
4,290385,914,Fishes,Salmonidae,Oncorhynchus,mykiss,Oncorhynchus mykiss,redband trout,,Montana,...,1982,9.0,8.0,stocked,,Literature,,,Freshwater,"[{'key': 24224, 'refType': 'Database', 'year':..."


---
## Processing NAS CSV files into a pandas dataframe

A CSV file containing similar data can be downloaded from the NAS website. However, its form is slightly different from the data that is pulled from an API request. The *process_csv_df* method converts such a CSV file into a pandas dataframe. Much like the *getdf* method, *process_csv_df* accepts a parameter for a list of column names to keep.

This method does not require an API key parameter or a limit parameter as it simply processes a file that's already been downloaded.

In [11]:
print("\n\nResults from a CSV file placed into a pandas dataframe:\n")
nas_csv_df = nas.process_csv_df("../demo/NAS_data_914.csv", keep_columns=None)
nas_csv_df.head()



Results from a CSV file placed into a pandas dataframe:



Unnamed: 0,specimennumber,speciesid,group,family,genus,species,scientificname,commonname,country,state,...,year,month,day,status,comments,recordtype,disposal,museumcatnumber,freshmarineintro,references
0,292742,914,Fishes,Salmonidae,,,Oncorhynchus mykiss ssp.,redband trout,United States of America,MT,...,1996,8.0,20.0,stocked,,Literature,,,Freshwater,"[{'key': 24224, 'type': 'Database', 'date': 20..."
1,292709,914,Fishes,Salmonidae,,,Oncorhynchus mykiss ssp.,redband trout,United States of America,MT,...,2002,8.0,30.0,stocked,,Literature,,,Freshwater,"[{'key': 24224, 'type': 'Database', 'date': 20..."
2,292718,914,Fishes,Salmonidae,,,Oncorhynchus mykiss ssp.,redband trout,United States of America,MT,...,2007,7.0,20.0,stocked,,Literature,,,Freshwater,"[{'key': 24224, 'type': 'Database', 'date': 20..."
3,292735,914,Fishes,Salmonidae,,,Oncorhynchus mykiss ssp.,redband trout,United States of America,MT,...,1993,9.0,1.0,stocked,9/1/1993-9/2/1993,Literature,,,Freshwater,"[{'key': 24224, 'type': 'Database', 'date': 20..."
4,292737,914,Fishes,Salmonidae,,,Oncorhynchus mykiss ssp.,redband trout,United States of America,MT,...,1993,9.0,1.0,stocked,,Literature,,,Freshwater,"[{'key': 24224, 'type': 'Database', 'date': 20..."


In [12]:
print(list(nas_csv_df.columns))

['specimennumber', 'speciesid', 'group', 'family', 'genus', 'species', 'scientificname', 'commonname', 'country', 'state', 'county', 'locality', 'decimallatitude', 'decimallongitude', 'latlongsource', 'latlongaccuracy', 'drainagename', 'centroidtype', 'huc8name', 'huc8', 'huc10name', 'huc10', 'huc12name', 'huc12', 'date', 'year', 'month', 'day', 'status', 'comments', 'recordtype', 'disposal', 'museumcatnumber', 'freshmarineintro', 'references']
