# Web Scraping Intro, Part 2: APIs

An Application Programming Interface, or **API**, is a structured way to retrieve data from a website. Using an API is safer and easier than something like webscraping, since what you get back is already in a usable format. Many organizations use APIs like:
- Government organizations ([US Government](https://www.data.gov/developers/apis))
- Large companies ([Twitter API](https://developer.twitter.com/en/docs))
- News organizations ([NYT API](https://developer.nytimes.com/))
- And [many more](https://github.com/public-apis/public-apis)

If you type "how to use an api in python" in google, you get back many articles walking through how to use an API. It is a well documented and useful tool to be familiar with.

We can use the `requests` library to retrieve data from an API.

In [1]:
import requests

## Using the data.nashville.gov API

The Nashville Open Data Portal provides an API for retrieveing data.

Let's look at the traffic accidents data: https://data.nashville.gov/Police/Traffic-Accidents/6v6w-hpcw

Notice that in the upper right corner there is an API button. 

You'll see that data.nashville.gov allows accessing many of their datasets through the Socrata Open Data API (SODA).

Click on this and choose the **CSV** endpoint. Copy the url below.

In [2]:
url = 'https://data.nashville.gov/resource/6v6w-hpcw.csv?'

We can send a GET request to this url to fetch the associated csv.

In [3]:
r = requests.get(url)

Let's see that is returned.

In [4]:
print(r.text[:1000])

"accident_number","date_and_time","number_of_motor_vehicles","number_of_injuries","number_of_fatalities","property_damage","hit_and_run","reporting_officer","collision_type","collision_type_description","weather","weather_description","illuaccidemination","illumination_description","harmfulcodes","harmfuldescriptions","street_address","city","state","zip","rpa","precinct","lat","long","mapped_location"
"VU 130097461","2013-08-13T00:00:00.000","2","0","0",,"true","494","3",,"1","NO ADVERSE CONDITIONS","1","DAYLIGHT","12;14","MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE","POWELL AVE","NASHVILLE","TN","37204","8525","MIDTOW","36.1099","-86.7644","POINT (-86.7644 36.1099)"
"20400639785","2014-07-06T12:45:00.000","2.0000","0.0000","0.0000",,"false","473187","5","SIDESWIPE - SAME DIRECTION","21","CLEAR","1","DAYLIGHT","12","MOTOR VEHICLE IN TRANSPORT","MM 93 0 I65 N","MADISON","TN","37115","17040","MADISO","0.0000","0.0000","POINT (0 0)"
"20200662011","2020-10-20T21:00:00.000","1","0","0"

It is a string formatted like a csv file. If we want to convert this to a dataframe, we can do so using the StringIO method.

In [5]:
import pandas as pd
from io import StringIO

In [6]:
crashes = pd.read_csv(StringIO(r.text))
crashes.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,property_damage,hit_and_run,reporting_officer,collision_type,collision_type_description,...,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location
0,VU 130097461,2013-08-13T00:00:00.000,2.0,0.0,0.0,,True,494,3,,...,MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE,POWELL AVE,NASHVILLE,TN,37204,8525,MIDTOW,36.1099,-86.7644,POINT (-86.7644 36.1099)
1,20400639785,2014-07-06T12:45:00.000,2.0,0.0,0.0,,False,473187,5,SIDESWIPE - SAME DIRECTION,...,MOTOR VEHICLE IN TRANSPORT,MM 93 0 I65 N,MADISON,TN,37115,17040,MADISO,0.0,0.0,POINT (0 0)
2,20200662011,2020-10-20T21:00:00.000,1.0,0.0,0.0,,False,151132,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,...,DEER (ANIMAL),PAWNEE TRL & MOHAWK TRL,MADISON,TN,37115,1619,MADISO,36.231,-86.657,POINT (-86.657 36.231)
3,20200661991,2020-10-20T20:50:00.000,2.0,0.0,0.0,,False,299243,11,Front to Rear,...,MOTOR VEHICLE IN TRANSPORT,BRICK CHURCH PKE & EWING DR,NASHVILLE,TN,37207,2001,MADISO,36.2316,-86.7815,POINT (-86.7815 36.2316)
4,20200661969,2020-10-20T20:00:00.000,2.0,1.0,0.0,,False,384420,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,DIVISION ST & 8TH AVS,NASHVILLE,TN,37203,4011,CENTRA,36.1498,-86.7801,POINT (-86.7801 36.1498)


Another way to make it work is to pass the url itself to `read_csv`.

In [7]:
crashes = pd.read_csv(r.url)
crashes.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,property_damage,hit_and_run,reporting_officer,collision_type,collision_type_description,...,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location
0,VU 130097461,2013-08-13T00:00:00.000,2.0,0.0,0.0,,True,494,3,,...,MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE,POWELL AVE,NASHVILLE,TN,37204,8525,MIDTOW,36.1099,-86.7644,POINT (-86.7644 36.1099)
1,20400639785,2014-07-06T12:45:00.000,2.0,0.0,0.0,,False,473187,5,SIDESWIPE - SAME DIRECTION,...,MOTOR VEHICLE IN TRANSPORT,MM 93 0 I65 N,MADISON,TN,37115,17040,MADISO,0.0,0.0,POINT (0 0)
2,20200662011,2020-10-20T21:00:00.000,1.0,0.0,0.0,,False,151132,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,...,DEER (ANIMAL),PAWNEE TRL & MOHAWK TRL,MADISON,TN,37115,1619,MADISO,36.231,-86.657,POINT (-86.657 36.231)
3,20200661991,2020-10-20T20:50:00.000,2.0,0.0,0.0,,False,299243,11,Front to Rear,...,MOTOR VEHICLE IN TRANSPORT,BRICK CHURCH PKE & EWING DR,NASHVILLE,TN,37207,2001,MADISO,36.2316,-86.7815,POINT (-86.7815 36.2316)
4,20200661969,2020-10-20T20:00:00.000,2.0,1.0,0.0,,False,384420,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,DIVISION ST & 8TH AVS,NASHVILLE,TN,37203,4011,CENTRA,36.1498,-86.7801,POINT (-86.7801 36.1498)


Finally, we can save the text as a csv file and then read it back in using pandas:

In [8]:
with open('crashes.csv', 'w') as fi:
    fi.write(r.text)

In [9]:
crashes = pd.read_csv('crashes.csv')
crashes.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,property_damage,hit_and_run,reporting_officer,collision_type,collision_type_description,...,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location
0,VU 130097461,2013-08-13T00:00:00.000,2.0,0.0,0.0,,True,494,3,,...,MOTOR VEHICLE IN TRANSPORT;PARKED MOTOR VEHICLE,POWELL AVE,NASHVILLE,TN,37204,8525,MIDTOW,36.1099,-86.7644,POINT (-86.7644 36.1099)
1,20400639785,2014-07-06T12:45:00.000,2.0,0.0,0.0,,False,473187,5,SIDESWIPE - SAME DIRECTION,...,MOTOR VEHICLE IN TRANSPORT,MM 93 0 I65 N,MADISON,TN,37115,17040,MADISO,0.0,0.0,POINT (0 0)
2,20200662011,2020-10-20T21:00:00.000,1.0,0.0,0.0,,False,151132,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,...,DEER (ANIMAL),PAWNEE TRL & MOHAWK TRL,MADISON,TN,37115,1619,MADISO,36.231,-86.657,POINT (-86.657 36.231)
3,20200661991,2020-10-20T20:50:00.000,2.0,0.0,0.0,,False,299243,11,Front to Rear,...,MOTOR VEHICLE IN TRANSPORT,BRICK CHURCH PKE & EWING DR,NASHVILLE,TN,37207,2001,MADISO,36.2316,-86.7815,POINT (-86.7815 36.2316)
4,20200661969,2020-10-20T20:00:00.000,2.0,1.0,0.0,,False,384420,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,DIVISION ST & 8TH AVS,NASHVILLE,TN,37203,4011,CENTRA,36.1498,-86.7801,POINT (-86.7801 36.1498)


We can also request only a subset of the data. One way to do so is by adding additional parameters to the url.

For example, let's say we want to look for hit and run crashes that happened near the Nashville Software School (zipcode 37217). We can encode these as a dictionary.

See the [documentation](https://dev.socrata.com/foundry/data.nashville.gov/6v6w-hpcw) for more information about what parameters you can pass in.

In [10]:
payload = {
    'zip': '37217',
    'hit_and_run': 'True'
          }

Then pass this dictionary to the `.get` request using the `params` argument.

In [11]:
r = requests.get(url=url, params=payload)

If you want to inspect the resulting url, you can access the response url attribute. You can see how the parameters we created are tacked onto the original url.

In [12]:
print(r.url)

https://data.nashville.gov/resource/6v6w-hpcw.csv?zip=37217&hit_and_run=True


In [13]:
crashes = pd.read_csv(StringIO(r.text))
crashes.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,property_damage,hit_and_run,reporting_officer,collision_type,collision_type_description,...,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location
0,20200604308,2020-09-20T20:14:00.000,2.0,0.0,0.0,,True,330413,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,EDGE O LAKE DR & MURFREESBORO PKE,NASHVILLE,TN,37217,8939.0,HERMIT,36.0797,-86.6443,POINT (-86.6443 36.0797)
1,20200615890,2020-09-26T12:18:00.000,2.0,0.0,0.0,,True,161946,11,Front to Rear,...,PARKED MOTOR VEHICLE,REGENTS PARK CIR & BAYSWATER CIR,NASHVILLE,TN,37217,8853.0,SOUTH,36.0798,-86.6656,POINT (-86.6656 36.0798)
2,20200618252,2020-09-27T22:30:00.000,2.0,0.0,0.0,,True,256569,11,Front to Rear,...,MOTOR VEHICLE IN TRANSPORT,BELL RD & EDGE O LAKE DR,NASHVILLE,TN,37217,8963.0,HERMIT,36.0794,-86.6326,POINT (-86.6326 36.0794)
3,20200625499,2020-10-01T12:03:00.000,2.0,2.0,0.0,,True,414731,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,BOWWOOD CT & VULTEE BLVD,NASHVILLE,TN,37217,8811.0,SOUTH,36.1255,-86.71,POINT (-86.71 36.1255)
4,20200627725,2020-10-02T12:42:00.000,2.0,0.0,0.0,,True,476332,5,SIDESWIPE - SAME DIRECTION,...,MOTOR VEHICLE IN TRANSPORT,MURFREESBORO PKE & MILLWOOD DR,NASHVILLE,TN,37217,8815.0,HERMIT,36.1295,-86.7147,POINT (-86.7147 36.1295)


We are somewhat limited in how we can filter our request by just using parameter. For more complicated types of queries, check out [SoQL](https://dev.socrata.com/docs/queries/), the Socrata Query Language. SoQL has many similarities to SQL.

A request using SoQL might look like this:

In [14]:
url = "https://data.nashville.gov/resource/6v6w-hpcw.csv?$where=date_and_time between '2019-01-10T12:00:00' and '2020-01-10T14:00:00' AND number_of_injuries > 0&$limit=2000"

In [15]:
r = requests.get(url)

In [16]:
crashes = pd.read_csv(StringIO(r.text))
crashes.head()

Unnamed: 0,accident_number,date_and_time,number_of_motor_vehicles,number_of_injuries,number_of_fatalities,property_damage,hit_and_run,reporting_officer,collision_type,collision_type_description,...,harmfuldescriptions,street_address,city,state,zip,rpa,precinct,lat,long,mapped_location
0,20190024859,2019-01-10T12:28:00.000,1.0,1.0,0.0,True,False,179602,0,NOT COLLISION W/MOTOR VEHICLE-TRANSPORT,...,DITCH,WHITES CREEK PKE & JACKMAN RD,JOELTON,TN,37080,2311.0,NORTH,36.3575,-86.8826,POINT (-86.8826 36.3575)
1,20190024986,2019-01-10T13:28:00.000,2.0,2.0,0.0,True,False,226136,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,WEST END AV & 32ND AVS,NASHVILLE,TN,37212,5817.0,MIDTOW,36.1421,-86.8161,POINT (-86.8161 36.1421)
2,20190025257,2019-01-10T13:42:00.000,2.0,1.0,0.0,,True,256411,11,Front to Rear,...,MOTOR VEHICLE IN TRANSPORT,DR D B TODD JR BLVD & OSAGE ST,NASHVILLE,TN,37208,4455.0,NORTH,36.1771,-86.8098,POINT (-86.8098 36.1771)
3,20190025009,2019-01-10T13:50:00.000,3.0,1.0,0.0,,False,716886,4,ANGLE,...,MOTOR VEHICLE IN TRANSPORT,BELL RD & CEDAR POINTE PKWY,ANTIOCH,TN,37013,8753.0,SOUTH,36.0449,-86.6671,POINT (-86.6671 36.0449)
4,20190025269,2019-01-10T15:10:00.000,2.0,1.0,0.0,,False,330411,11,Front to Rear,...,MOTOR VEHICLE IN TRANSPORT,I24 W ENT RAMP & I 40,NASHVILLE,TN,37217,8818.0,HERMIT,36.1397,-86.7275,POINT (-86.7275 36.1397)
