# Data Download

To make an API call using python, we use the `requests` library. Before we start coding, we need to read the documentation to understand if we need to set-up any configuration beforehand.

Below is a copy+paste of the MVCC [API Documentation](https://dev.socrata.com/foundry/data.cityofnewyork.us/h9gi-nx95).
#### START
### API Documentation 

#### Getting Started
All communication with the API is done through HTTPS, and errors are communicated through HTTP response codes. Available response types include JSON (including GeoJSON), XML, and CSV, which are selectable by the "extension" (.json, etc.) on the API endpoint or through content-negotiation with HTTP Accepts headers.

This documentation also includes inline, runable examples. Click on any link that contains a  gear symbol next to it to run that example live against the Motor Vehicle Collisions - Crashes API. If you just want to grab the API endpoint and go, you'll find it below.

#### Tokens
All requests should include an app token that identifies your application, and each application should have its own unique app token. A limited number of requests can be made without an app token, but they are subject to much lower throttling limits than request that do include one. With an app token, your application is guaranteed access to it's own pool of requests. If you don't have an app token yet, click the button to the right to sign up for one.

Once you have an app token, you can include it with your request either by using the X-App-Token HTTP header, or by passing it via the $$app_token parameter on your URL.

#### END

**The above tells us a few important points:**
1. All API calls are done via HTTPS and errors are communicated through HTTP response codes. Response codes indicate to the client what has happened. Typically a response of 200/201 indicates a success, while a 401/403 indicates an error.

2. We need to use {'X-App-Token': APP_TOKEN} as our headers to pass to our GET request. Even though this is mentioned, there is no mention on where we need to pass our SECRET. Thus, I suspect this header is an optional argument.

### Things to Know

We have multiple methods to query the underlying API. We can use a basic cURL request, the NYC OpenData [Socrata API](https://dev.socrata.com/docs/queries/), and/or leveraging SQL to pull the information from the google [bigquery-public-data project](https://cloud.google.com/bigquery/public-data). 

1. The API Documentation from the Socrate API above tells us that we need to use [Paging Through Data](https://dev.socrata.com/docs/paging.html) to pull all the 1.8 million records from the table. This is because the API defaults the limit to 1000 records returned. Paging through the data allows us to set an offset index, which tells the API where to start the returned list of results. It is important to mention that the data has to be ordered properly to ensure the results will be stable as we page through the dataset.
2. The dataset has 1.83 Million Rows.
3. There are several noticable data quality issues which we will discuss in this Notebook.

In [None]:
import requests
from tqdm import tqdm
from common.utilities import decorators

### Step 1: Query the Data

We are going to create a function to call the API using the offset and limit parameters in our request URL. The below code snippet downloads all the data and returns a single Pandas Dataframe.

In [3]:
@decorators.timeit
def create_urls():
    
    # we start with an offset of 0, we then increment the offset to be equal to the number of records returned. We are specifying the 
    # number of records returned via the API_LIMIT. 
    
    ENDPOINT = f'https://data.cityofnewyork.us/resource/h9gi-nx95.json?$limit={API_LIMIT}&$offset={offset}&$order={ID}'




In [None]:
limit = 5
for i in range()