# **Motor Vehicle Collisions - Vehicles in New York City**

## NYC Open Data API - powered by Socrata

API documentation available here: https://dev.socrata.com/foundry/data.cityofnewyork.us/bm4k-52h4 
New York City open data source: https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Vehicles/bm4k-52h4 

### Data description
The Motor Vehicle Collisions vehicle table contains details on each vehicle involved in the crash. Each row represents a motor vehicle involved in a crash. The data in this table goes back to April 2016 when crash reporting switched to an electronic system.

The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The police report (MV104-AN) is required to be filled out for collisions where someone is injured or killed, or where there is at least $1000 worth of damage (https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/nyoverlaymv-104anrev052004.pdf). It should be noted that the data is preliminary and subject to change when the MV-104AN forms are amended based on revised crash details.

### Libraries and Packages
Make sure to install these packages before running:

In [5]:
# !pip install pandas
# !pip install sodapy



Now import them:

In [1]:
import pandas as pd
from sodapy import Socrata
from datetime import datetime
import json
from requests.exceptions import ReadTimeout

### Generate an App Token
All requests should include an app token that identifies your application, and each application should have its own unique app token. A limited number of requests can be made without an app token, but they are subject to much lower throttling limits than request that do include one. With an app token, your application is guaranteed access to it's own pool of requests. If you don't have an app token yet, click the button to the right to sign up for one.

Once you have an app token, you can include it with your request either by using the *X-App-Token HTTP header*, or by passing it via the *$$app_token* parameter on your URL.

Here you can get your own token: https://data.cityofnewyork.us/profile/edit/developer_settings 

In [2]:
# Insert credential
from getpass import getpass

user = input('Insert your email: ')
password = getpass('Insert your password: ')
token = getpass('Insert your app token: ') 

### Client Authentication
Initialize the `Socrata` client.

In [3]:
client = Socrata("data.cityofnewyork.us",
                  token,
                  username=user,
                  password=password)

### Get Data
Define the main variables.

In [4]:
# Variables to keep
vars_to_select = "unique_id, collision_id, vehicle_occupants, driver_license_status, pre_crash, contributing_factor_1"
dataset_id = "bm4k-52h4"
# Define start and end dates
start_date_str = '2018-01-01T00:00:00'
end_date_str = '2018-12-31T23:59:59'

# Convert dates to datetime objects
start_date = datetime.strptime(start_date_str, '%Y-%m-%dT%H:%M:%S')
end_date = datetime.strptime(end_date_str, '%Y-%m-%dT%H:%M:%S')

# Format dates for SoQL query
start_date_formatted = start_date.strftime('%Y-%m-%dT%H:%M:%S')
end_date_formatted = end_date.strftime('%Y-%m-%dT%H:%M:%S')

Retrieves data from a Socrata dataset using the `sodapy` library, a Python client for the Socrata Open Data API. Initiates an API request to the Socrata dataset identified by the code assigned to the `dataset_id` variable. Utilizes a SoSQL query to filter data based on specified variables and a temporal interval defined by `start_date_formatted` and `end_date_formatted`. The `get` method from the Socrata client returns the data in JSON format.

In [5]:
# Maximum number of attempts
max_attempts = 5

# Initialize attempts counter
attempts = 0

# Execute the loop until the maximum number of attempts is reached or the timeout error is resolved
while attempts < max_attempts:
    try:
        # # Retrieve data from the dataset using the Socrata client
        results = client.get(dataset_id, 
                             select=vars_to_select, 
                             where=f"crash_date >= '{start_date_formatted}' AND crash_date <= '{end_date_formatted}'", 
                             limit=600000)
        # Exit the loop if the query is successful
        break
    except ReadTimeout:
        # Handle the timeout error
        print("Read timeout. Retrying...")
        attempts += 1  # Increment the attempts counter

if attempts == max_attempts:
    print(f"Maximum number of attempts ({max_attempts}) reached. Unable to complete the query. Try again.")

Read timeout. Retrying...


In [6]:
# Transform results into a Dataframe pandas
data = pd.DataFrame(results)

### Export
Export the Dataframe pandas locally in csv.

In [8]:
# Export data without automatic row identifier
data.to_csv('Vehicles.csv', index=False)