# Library Usage in Seattle, 2005-2020

## API Calls

The following notebook (utilizing functions found in the [api_caller.py](functions/api_caller.py) file) can be used as a framework for calling the API to look for data in a specific date range.

Since I had originally downloaded the data on December 15, 2020, I walk through collecting the rest of the data for the year of 2020 (i.e. December 15 through December 31) in this notebook.

In [1]:
# standard dataframe libraries
import pandas as pd; pd.set_option('display.max_columns', 50)
import numpy as np

# api libraries
from sodapy import Socrata
import json

# custom functions
from functions.data_cleaning import *
from functions.api_caller import *

# reload functions/libraries when edited
%load_ext autoreload
%autoreload 2

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
# parse api credentials
file_path = '/Users/p.szymo/Documents/code_world/projects/library_usage_seattle/data/api_keys.json'

with open(file_path, 'r') as json_file:
    api_dict = json.load(json_file)
    
api_token = api_dict['api_token']

In [3]:
# define several variables for api function

# data-specific url code
url_addon_code = '5src-czff'

# personal api token
api_token = api_dict['api_token']

# name of date column
date_column = 'checkoutdatetime'

# date to start collecting data
begin_date = '2020-12-15'

# date to stop collecting data (non-inclusive)
end_date = '2021-01-01'

In [4]:
# call api
results_df = api_date_caller(
    url_addon_code,
    api_token,
    date_column,
    begin_date,
    end_date,
)

# check shape
results_df.shape

(77882, 10)

In [5]:
# take a look
results_df.head()

Unnamed: 0,id,checkoutyear,bibnumber,itembarcode,itemtype,collection,callnumber,itemtitle,subjects,checkoutdatetime
0,202012150923000010099923236,2020,3486549,10099923236,acbk,cafic,FIC GHOSH 2019,Gun Island,"Booksellers and bookselling Fiction, Self real...",2020-12-15T09:23:00.000
1,202012150923000010088730089,2020,2163686,10088730089,jcbk,ncfic,J HUNTER,Into the wild,"Cats Juvenile fiction, Feral cats Juvenile fic...",2020-12-15T09:23:00.000
2,202012150925000010090618306,2020,2800147,10090618306,acbk,cacomic,741.5973 W678F17 2012,Fables 17 Inherit the wind,"Comic books strips etc United States, Comic bo...",2020-12-15T09:25:00.000
3,202012150925000010101360443,2020,3149052,10101360443,acbk,cacomic,741.5973 M8344N 2016,Nameless,Adventure and adventurers Comic books strips e...,2020-12-15T09:25:00.000
4,202012150925000010090494013,2020,2698178,10090494013,acbk,cacomic,741.5973 W678F15 2011,Fables 15 Rose Red,"Comic books strips etc, Fairy tales Comic book...",2020-12-15T09:25:00.000


In [6]:
# columns to subset on (to match work in 01_data_cleaning.ipynb notebook)
cols = ['collection', 'itemtitle', 'subjects', 'checkoutdatetime']

# rename columns (to match work in 01_data_cleaning.ipynb notebook)
new_col_names = ['collection', 'title', 'subjects', 'date']

In [7]:
# clean and merge data from data dictionary
results_transformed = data_transformer(
    results_df,
    'data/data_dictionary.csv',
    usecols=cols,
    rename=new_col_names,
    dt_format='%Y-%m-%dT%H:%M:%S.%f'
)

# check shape
results_transformed.shape

(77882, 7)

In [8]:
# confirm dates
results_transformed.date.unique()

array([datetime.date(2020, 12, 15), datetime.date(2020, 12, 16),
       datetime.date(2020, 12, 17), datetime.date(2020, 12, 18),
       datetime.date(2020, 12, 19), datetime.date(2020, 12, 20),
       datetime.date(2020, 12, 21), datetime.date(2020, 12, 22),
       datetime.date(2020, 12, 23), datetime.date(2020, 12, 26),
       datetime.date(2020, 12, 27), datetime.date(2020, 12, 28),
       datetime.date(2020, 12, 29), datetime.date(2020, 12, 30),
       datetime.date(2020, 12, 31)], dtype=object)

In [10]:
# load final part of big dataset
df_final_part = pd.read_pickle('data/seattle_lib_11.pkl', compression='gzip')

# check shape
df_final_part.shape

(6503843, 7)

In [11]:
# combine with new results
df = pd.concat([df_final_part, results_transformed], ignore_index=True)

# check shape
df.shape

(6581725, 7)

In [13]:
# take a look
df.tail()

Unnamed: 0,title,subjects,date,format_group,format_subgroup,category_group,age_group
6581720,ILLM Italian diary,,2020-12-23,Other,,Interlibrary Loan,Adult
6581721,ILLM Italian diary,,2020-12-29,Other,,Interlibrary Loan,Adult
6581722,ILLM Toronto eats 100 signature recipes from t...,,2020-12-30,Other,,Interlibrary Loan,Adult
6581723,ILLM Mad about the house how to decorate your ...,,2020-12-31,Other,,Interlibrary Loan,Adult
6581724,ILLM Borderline narcissistic and schizoid adap...,,2020-12-31,Other,,Interlibrary Loan,Adult


In [14]:
# save
df.to_pickle(f'data/seattle_lib_11.pkl', compression='gzip')