## DSTEP20 // Application Programming Interface (API)

<small> January 17, 2020 </small>

In many contexts, API is a "catch all" term for syntax.  For example, [this is the pandas api documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html).

In the context of web-based interaction, API refers specifically to the syntax used to interact with web resources, including **data sets**.

By way of example, let's look at the Socrata Open Data API.



### SODA: the Socrata Open Data API

<img src="https://dev.socrata.com/img/snuffleupacan.png" width=150px>

[Socrata](https://www.tylertech.com/products/socrata) is the currently most popular backend software used by governments to serve up data <small> (though recall that Minneapolis was using an Esri backend) </small>.  It provides functionality to

- send requests for data access
- download capabilities

First things first, let's install sodapy, a useful tool for dealing with Socrata backends:

In [0]:
!pip install sodapy

Collecting sodapy
  Downloading https://files.pythonhosted.org/packages/64/06/6144b36a4b4470bef1fb17d7b98b82a202b5e918f7e0a2c123004f73ca07/sodapy-2.0.0-py2.py3-none-any.whl
Installing collected packages: sodapy
Successfully installed sodapy-2.0.0


There are three fundamental variables you need when accessing data via web-based APIs:

- the domain (or endpoint)
- dataset identifier
- query parameters

Let's use [NYC's 311](https://www1.nyc.gov/311/) complaint call in line data as an example.  First, we need to [locate the **domain** and **dataset identifier**](https://www.google.com/search?source=hp&ei=YWuWXNKLH6WD5wKj0Iko&q=nyc+open+data+311&btnK=Google+Search&oq=nyc+open+data+311&gs_l=psy-ab.3..0.1433.5206..5328...2.0..1.90.1431.20......0....1..gws-wiz.....0..35i39j0i67j0i131j0i131i20i263j0i20i263j0i22i10i30j0i22i30j38.oYv5S6vyfto).  Our **query parameters** in this case will be a simple limit on the number of records that we get.

In [0]:
# -- import useful functionality
import numpy as np
import pandas as pd
import sodapy

In [0]:
# -- set the domain endpoint
dom  = "data.cityofnewyork.us"
dsid = "erm2-nwe9"
lim  = 50000

# -- open the "client"
#    nb, for the moment, this is for public data with no access token
client = sodapy.Socrata(dom, None, timeout=120)

# -- pull the data from the domain into a list of dictionaries
result = client.get(dsid, limit=lim)

# -- convert to dataframe
data = pd.DataFrame.from_records(result)



In [0]:
# -- as always, print the data, column names, number of NaN values, etc.
display(data)
print("")
print(data.columns)
print("")
print(data.isna().sum(axis=0))

Unnamed: 0,unique_key,created_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,street_name,cross_street_1,cross_street_2,intersection_street_1,intersection_street_2,city,landmark,status,community_board,bbl,borough,x_coordinate_state_plane,y_coordinate_state_plane,open_data_channel_type,park_facility_name,park_borough,latitude,longitude,location,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,:@computed_region_yeji_bk3q,:@computed_region_92fq_4b7q,:@computed_region_sbqj_enih,taxi_pick_up_location,closed_date,resolution_description,resolution_action_updated_date,address_type,facility_type,due_date,bridge_highway_direction,road_ramp,bridge_highway_segment,taxi_company_borough,vehicle_type,bridge_highway_name
0,45397049,2020-01-16T02:01:19.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10030,141 WEST 139 STREET,WEST 139 STREET,LENOX AVENUE,ADAM CLAYTON POWELL JR BOULEVARD,LENOX AVENUE,ADAM CLAYTON POWELL JR BOULEVARD,NEW YORK,WEST 139 STREET,In Progress,10 MANHATTAN,1020080006,MANHATTAN,1000518,237156,PHONE,Unspecified,MANHATTAN,40.81759914207825,-73.94122655946431,"{'latitude': '40.81759914207825', 'longitude':...",12427,18,4,36,20,,,,,,,,,,,,,
1,45393008,2020-01-16T02:00:57.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,,10468,2320 AQUEDUCT AVENUE,AQUEDUCT AVENUE,EVELYN PLACE,NORTH STREET,EVELYN PLACE,NORTH STREET,BRONX,AQUEDUCT AVENUE,In Progress,07 BRONX,2032090021,BRONX,1010295,252759,PHONE,Unspecified,BRONX,40.86040140051718,-73.90584339745936,"{'latitude': '40.86040140051718', 'longitude':...",11606,24,5,29,34,,,,,,,,,,,,,
2,45394106,2020-01-16T02:00:46.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10456,1481 WASHINGTON AVENUE,WASHINGTON AVENUE,ST PAULS PLACE,EAST 171 STREET,ST PAULS PLACE,EAST 171 STREET,BRONX,WASHINGTON AVENUE,In Progress,03 BRONX,2029020036,BRONX,1010934,244328,ONLINE,Unspecified,BRONX,40.83725890119578,-73.90356687170548,"{'latitude': '40.83725890119578', 'longitude':...",10934,34,5,42,25,,,,,,,,,,,,,
3,45397614,2020-01-16T02:00:14.000,NYPD,New York City Police Department,Blocked Driveway,No Access,,10456,1240 COLLEGE AVENUE,COLLEGE AVENUE,EAST 168 STREET,EAST 169 STREET,EAST 168 STREET,EAST 169 STREET,BRONX,COLLEGE AVENUE,In Progress,04 BRONX,2024360008,BRONX,1008423,243135,PHONE,Unspecified,BRONX,40.83399169289827,-73.91264565082275,"{'latitude': '40.83399169289827', 'longitude':...",10934,50,5,42,27,,,,,,,,,,,,,
4,45394020,2020-01-16T01:59:56.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10039,208 WEST 151 STREET,WEST 151 STREET,ADAM CLAYTON POWELL JR BOULEVARD,MACOMBS PLACE,ADAM CLAYTON POWELL JR BOULEVARD,MACOMBS PLACE,NEW YORK,WEST 151 STREET,In Progress,10 MANHATTAN,1020360038,MANHATTAN,1001791,240054,PHONE,Unspecified,MANHATTAN,40.82555089484149,-73.93661985907333,"{'latitude': '40.82555089484149', 'longitude':...",13097,18,4,36,20,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,45333795,2020-01-07T04:03:44.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10032,498 WEST 158 STREET,WEST 158 STREET,ST NICHOLAS AVENUE,AMSTERDAM AVENUE,ST NICHOLAS AVENUE,AMSTERDAM AVENUE,NEW YORK,WEST 158 STREET,Closed,12 MANHATTAN,1021080059,MANHATTAN,1000454,242921,PHONE,Unspecified,MANHATTAN,40.833422541343964,-73.94144385735437,"{'latitude': '40.833422541343964', 'longitude'...",13090,47,4,23,21,,2020-01-07T04:16:15.000,The Police Department responded to the complai...,2020-01-07T09:16:23.000,,,,,,,,,
49996,45335838,2020-01-07T04:01:31.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,,11225,270 CROWN STREET,CROWN STREET,DEARBORN COURT,NOSTRAND AVENUE,DEARBORN COURT,NOSTRAND AVENUE,BROOKLYN,CROWN STREET,Closed,09 BROOKLYN,3012960028,BROOKLYN,997741,182034,PHONE,Unspecified,BROOKLYN,40.66630773210683,-73.95136992983514,"{'latitude': '40.66630773210683', 'longitude':...",13509,17,2,48,44,,2020-01-07T07:31:31.000,The Police Department responded to the complai...,2020-01-07T12:31:33.000,,,,,,,,,
49997,45334239,2020-01-07T04:01:13.000,NYPD,New York City Police Department,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,10469,3048 WILSON AVENUE,WILSON AVENUE,ADEE AVENUE,BURKE AVENUE,ADEE AVENUE,BURKE AVENUE,BRONX,WILSON AVENUE,Closed,11 BRONX,2045890030,BRONX,1026028,256305,MOBILE,Unspecified,BRONX,40.870073645711294,-73.84894421538367,"{'latitude': '40.870073645711294', 'longitude'...",11607,59,5,2,32,,2020-01-07T10:14:02.000,The Police Department responded to the complai...,2020-01-07T15:14:08.000,,,,,,,,,
49998,45328018,2020-01-07T04:00:41.000,DOB,Department of Buildings,Building/Use,Zoning - Non-Conforming/Illegal Vehicle Storage,,10308,11 IRONWOOD STREET,IRONWOOD STREET,,,,,STATEN ISLAND,,Open,03 STATEN ISLAND,5045750073,STATEN ISLAND,943434,143827,UNKNOWN,Unspecified,STATEN ISLAND,40.561353761970956,-74.14689552385457,"{'latitude': '40.561353761970956', 'longitude'...",10695,15,1,9,76,,2020-01-07T04:00:41.000,Your Service Request has been submitted to the...,2020-01-07T00:00:00.000,ADDRESS,,,,,,,,



Index(['unique_key', 'created_date', 'agency', 'agency_name', 'complaint_type',
       'descriptor', 'location_type', 'incident_zip', 'incident_address',
       'street_name', 'cross_street_1', 'cross_street_2',
       'intersection_street_1', 'intersection_street_2', 'city', 'landmark',
       'status', 'community_board', 'bbl', 'borough',
       'x_coordinate_state_plane', 'y_coordinate_state_plane',
       'open_data_channel_type', 'park_facility_name', 'park_borough',
       'latitude', 'longitude', 'location', ':@computed_region_efsh_h5xi',
       ':@computed_region_f5dn_yrer', ':@computed_region_yeji_bk3q',
       ':@computed_region_92fq_4b7q', ':@computed_region_sbqj_enih',
       'taxi_pick_up_location', 'closed_date', 'resolution_description',
       'resolution_action_updated_date', 'address_type', 'facility_type',
       'due_date', 'bridge_highway_direction', 'road_ramp',
       'bridge_highway_segment', 'taxi_company_borough', 'vehicle_type',
       'bridge_highway_name']

As with direct URL querying, we can make queries (and subselect columns):

In [0]:
# -- pull the data from the domain into a list of dictionaries
result_sel = client.get(dsid, limit=lim, select="borough,bbl,agency,unique_key", 
                        where="agency='NYPD'")

# -- convert to dataframe
sub = pd.DataFrame.from_records(result_sel)

Let's check if we're getting the same records each time:

In [0]:
# -- first print the first 10 rows of the sub selection
print(sub[:10])

# -- now we'll recreate the selection and print the first 10
ind = data["agency"] == "NYPD"

print("")
print(data[["agency", "bbl", "borough", "unique_key"]][ind][:10])

     borough agency unique_key         bbl
0  MANHATTAN   NYPD   45399033         NaN
1   BROOKLYN   NYPD   45398371  3018630026
2  MANHATTAN   NYPD   45398124  1010200038
3  MANHATTAN   NYPD   45396335  1016730006
4     QUEENS   NYPD   45397059  4027040093
5   BROOKLYN   NYPD   45397752  3065860057
6   BROOKLYN   NYPD   45399099         NaN
7     QUEENS   NYPD   45399078  4136430029
8   BROOKLYN   NYPD   45399395  3009497504
9   BROOKLYN   NYPD   45396629  3047800026

  agency         bbl    borough unique_key
0   NYPD  1020080006  MANHATTAN   45397049
1   NYPD  2032090021      BRONX   45393008
2   NYPD  2029020036      BRONX   45394106
3   NYPD  2024360008      BRONX   45397614
4   NYPD  1020360038  MANHATTAN   45394020
5   NYPD  2031150005      BRONX   45398062
6   NYPD  2055760092      BRONX   45398569
7   NYPD  2024360008      BRONX   45396636
8   NYPD  2029020036      BRONX   45399116
9   NYPD  3065030083   BROOKLYN   45396971


**Note that we are not getting exactly the same records!**

Let's try one more request to demonstrate working with timestamps (note: sodapy accepts [SoQL](https://dev.socrata.com/docs/queries/)-like queries):

In [0]:
# -- put together a more complex query
aquery = "agency='NYPD'"
dquery = "created_date between '2015-03-26T00:00:00' and '2015-03-27T00:00:00.000'"
fullq  = aquery + " and " + dquery

# -- pull the data from the domain into a list of dictionaries
result3 = client.get(dsid, limit=lim, select="borough,bbl,agency,unique_key,created_date", where=fullq)

# -- convert to dataframe
data3 = pd.DataFrame.from_records(result3)

In [0]:
display(data3)

Unnamed: 0,borough,agency,unique_key,created_date,bbl
0,QUEENS,NYPD,30263290,2015-03-26T00:00:33.000,
1,MANHATTAN,NYPD,30259913,2015-03-26T00:00:38.000,1016800030
2,BRONX,NYPD,30261009,2015-03-26T00:03:36.000,2023240001
3,BRONX,NYPD,30257805,2015-03-26T00:05:21.000,2023240001
4,BRONX,NYPD,30262656,2015-03-26T00:06:17.000,
...,...,...,...,...,...
1176,BROOKLYN,NYPD,30263782,2015-03-26T23:54:01.000,3068260048
1177,QUEENS,NYPD,30267577,2015-03-26T23:55:42.000,4083510048
1178,MANHATTAN,NYPD,30265772,2015-03-26T23:55:47.000,1020270001
1179,BRONX,NYPD,30268419,2015-03-26T23:56:24.000,


**Note that, because we restricted to only 1 day, our limit is > the total number for that day so we are getting them all in this case.**

In [0]:
data3.isna().sum()

borough           0
agency            0
unique_key        0
created_date      0
bbl             196
dtype: int64

In [0]:
data3.isna().sum(axis=0)

borough           0
agency            0
unique_key        0
created_date      0
bbl             196
dtype: int64

In [0]:
data3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1181 entries, 0 to 1180
Data columns (total 5 columns):
borough         1181 non-null object
agency          1181 non-null object
unique_key      1181 non-null object
created_date    1181 non-null object
bbl             985 non-null object
dtypes: object(5)
memory usage: 46.3+ KB
