# DART_API

The Columbia Basin Research website belonging to the University of Washington School of Aquatic & Fishery Sciences provides access to a variety of datasets on the Columbia River.

The Data Access in Real Time (DART) project has several pages set up to make queries. Although there does not appear to be an API, one of the pages http://www.cbr.washington.edu/dart/query/adult_graph_text will create the URL to download data in csv format.

This notebook builds a simple API-like interface to download data for this project.

 - [Dissecting the URL](#Dissecting-the-URL)
 - [Notebook setup](#Notebook-setup)
 - [Downloading data](#Downloading-data)
 - [Output files](#Output-files)

## Dissecting the URL


Sockeye daily counts for all locations (Columns) for all days of the year (rows) for 2019

This query selected
 - Output format: CSV
 - Year: 2019
 - Project: BON, JDA, MCN, PRD, RIS, RRH, TDA, WAN, WEL
 - Species: Sockeye
 - 10 Year: No Selection
 - River Data: No Selection

Note: spaces and newlines inserted for readability
http: //www. cbr .washington. edu/dart/cs/php/rpt/mg.php?sc=1&mgconfig=adult

    &outputFormat=csv    # csv - locations in columns, csvSingle = single data point per row
    &year%5B%5D=2019    
    
    &loc%5B%5D=BON&loc%5B%5D=JDA&loc%5B%5D=MCN    
    &loc%5B%5D=PRD&loc%5B%5D=RIS&loc%5B%5D=RRH    
    &loc%5B%5D=TDA&loc%5B%5D=WAN&loc%5B%5D=WEL    
    
    &ftype%5B%5D=fb     
    &data%5B%5D=   
    &data%5B%5D=
    &startdate=1%2F1
    &enddate=12%2F31
    &avgyear=0
    &sumAttribute=none
    &consolidate=1
    &zeros=1
    &grid=1
    &y1min=0
    &y1max=
    &y2min=
    &y2max=
    &size=medium

In [1]:
# Example: 2018, 3 species, all 9 locations
#
# http://www.cbr.washington.edu/dart/cs/php/rpt/mg.php?sc=1&mgconfig=adult&outputFormat=csvSingle
#    &year%5B%5D=2018&loc%5B%5D=BON&loc%5B%5D=JDA&loc%5B%5D=MCN&loc%5B%5D=PRD&loc%5B%5D=RIS&loc%5B%5D=RRH
#    &loc%5B%5D=TDA&loc%5B%5D=WAN&loc%5B%5D=WEL

#    &ftype%5B%5D=fc&ftype%5B%5D=fk&ftype%5B%5D=fb&ftype%5B%5D=fp

#    &data%5B%5D=&data%5B%5D=&startdate=1%2F1&enddate=12%2F31&avgyear=0&sumAttribute=none&consolidate=1&zeros=1&grid=1&y1min=0&y1max=&y2min=&y2max=&size=medium

## Notebook setup

In [2]:
import requests
import csv

import numpy as np
import pandas as pd

In [3]:
# global variables

# DART codes for fish species
species_dict = {'sockeye':'fb',
                'chinook':'fc',
                'coho':'fk',
                'chum':'fe',
                'pink':'fp'}

In [4]:
def construct_query(year,species_list,single=True):
    '''Construct a query for a particular year and species of fish. All other options are hard coded.'''
    
    q_base   = 'http://www.cbr.washington.edu/dart/cs/php/rpt/mg.php?sc=1&mgconfig=adult'
    
    if (single):
        q_format = '&outputFormat=csvSingle'
    else:
        q_format = '&outputFormat=csv'
    q_year   = '&year%5B%5D=' + str(year)
    q_loc    = '&loc%5B%5D=BON&loc%5B%5D=JDA&loc%5B%5D=MCN&loc%5B%5D=PRD&loc%5B%5D=RIS' + \
                '&loc%5B%5D=RRH&loc%5B%5D=TDA&loc%5B%5D=WAN&loc%5B%5D=WEL'
    
    q_fish = ''
    for species in species_list:
        q_fish += '&ftype%5B%5D=' + species_dict[species]
    q_tail = '&data%5B%5D=&data%5B%5D=&startdate=1%2F1&enddate=12%2F31&avgyear=0&sumAttribute=none' + \
                '&consolidate=1&zeros=1&grid=1&y1min=0&y1max=&y2min=&y2max=&size=medium'
    
    final_url = q_base + q_format + q_year + q_loc + q_fish + q_tail
    return final_url

In [5]:
def fetch_data(CSV_URL):
    #https://stackoverflow.com/questions/35371043/use-python-requests-to-download-csv
    #CSV_URL = get_sock

    with requests.Session() as s:
        download = s.get(CSV_URL)

        decoded_content = download.content.decode('utf-8')

        cr = csv.reader(decoded_content.splitlines(), delimiter=',')
        my_list = list(cr)
        #for row in my_list:
            #print(row)
    return my_list

In [6]:
# Example .head() if 'csv' is requested
#     mm/dd  2019:JDA:Sock (fish/day)  2019:BON:Sock (fish/day)
# 0   1/1    0                         0
# 0   1/2    0                         0 


# Example .head() if 'csvSingle' is requested         (also known as 'tidy' data)
#      year   mm-dd   location  parameter  unit       datatype         value
# 0    2019   1-1     JDA       Chin       fish/day   Adult Passage    0
# 1    2019   1-2     JDA       Chin       fish/day   Adult Passage    0

In [7]:
def construct_df(data_as_list, single=True):
    '''Converts the downloaded data to a pandas DataFrame and reformats the columns.'''
    # construct the initial dataframe
    df = pd.DataFrame(data_as_list[2:],columns=data_as_list[1])
    
    if (single):
        # construct a date column
        df['Date'] = df['year'] + '-' + df['mm-dd']  
    else:
        # extract year from column location name ' 2019:MCN:Sock (fish/day)' to '2019'
        year = df.columns[1].split(':')[0].strip()

        # convert column location names from ' 2019:MCN:Sock (fish/day)' to 'MCN'
        locations = [df.columns[i].split(':')[1]  for i in range(1,len(df.columns))]

        # replace the 'mm/dd' column name with 'Date', then convert '1/1' etc. to '1/1/YYYY'
        df.columns = ['Date'] + locations    
        df['Date'] = df['Date'] + '/' + year
    
    # remove the last 7 rows which contain download timestamp and similar info
    df = df[:-7]
    
    return df

## Downloading data

**Example - single species**

In [8]:
# # Example: Single species, single year, all 9 Columbia locations, WIDE format

# my_url = construct_query(2019,['sockeye'],single=False)
# data_as_list = fetch_data(my_url)
# df_sockeye_2019 = construct_df(data_as_list,single=False) # works with non-tidy data

# df_sockeye_2019.head()

In [9]:
# df_sockeye_2019.tail()

**Example - several species**

In [10]:
# # Example: 5 species, single year, all 9 Columbia locations, TIDY format 

# my_url = construct_query(2019,['sockeye','coho','chinook','chum','pink'],single=True)

# data_as_list = fetch_data(my_url)
# df = construct_df(data_as_list,single=True)

# df.head()

In [11]:
# df['parameter'].value_counts()

**Project Data - Fish Counts**

In [12]:
years_of_interest = np.arange(1999,2021,1)
years_of_interest

array([1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
       2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020])

In [13]:
species_of_interest = ['sockeye','coho','chinook','chum','pink']

In [14]:
df = pd.DataFrame(columns = ['year', 'mm-dd', 'location', 'parameter', 'unit', 'datatype', 'value',
       'Date']) # giant df accummulator

In [15]:
df.shape

(0, 8)

In [16]:
# fetch data for each year
for year in years_of_interest:
    my_url = construct_query(year,species_of_interest,single=True)
    data_as_list = fetch_data(my_url)
    df = df.append(construct_df(data_as_list,single=True),ignore_index=True)

In [17]:
# Save fish data to file
df.to_csv('../data/DART_csv_files/salmon5_20years.csv',index=False)

In [18]:
df.shape

(292585, 8)