# New York City 311 Data Motor Vehicle Crash

### This information comes from the website :
#### https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95

#### The data may be explored in entirety at this site. 
#### The import for this assignment will be limited to 50,000 rows. 

## Overview of Data

#### The data from the site and this dataset is comprised of information from police reports on crashes involving injuries, deaths, or greater than $1,0000 worth of damage. These records are avaiable publicly at the 311 Open Data Source. The site:  https://nycopendata.socrata.com/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9. 

## Description

#### Each row represents an individual crash occurence. There are 29 columns each representing aspects related to the crash, such as location in latitude, longitude, vehicle type involved, persons injured or persons killed. This dataset was first made publicly available on May 7, 2014 and is updated daily with information from the New York Police Department. 

## Ingestion of Data

#### I will ingest data through pandas and limit to 300,000 rows. I would like to look at only data involving crashes involving Sedans.


In [None]:
import numpy as np
import pandas as pd
import datetime as dt
import sys
from IPython.core.display import display, HTML

In [None]:

vehicles = pd.read_csv("https://data.cityofnewyork.us/resource/h9gi-nx95.csv?$limit=500000")



In [None]:
data_dict = pd.read_excel("https://data.cityofnewyork.us/api/views/h9gi-nx95/files/2e58023a-21a6-4c76-b9e8-0101bf7509ca?download=true&filename=MVCollisionsDataDictionary.xlsx",
                         sheet_name='Column Info')
data_dict.head()

### Look at the dataset and shape of this information

In [None]:
vehicles.head()

#### Set options that allow us to view all columns of the dataframe and specific number of rows of the dataframe. 

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 500)
vehicles.head()

#### Look at the types of data in the set and overall information contained.

In [None]:
pd.options.display.max_info_rows = 5000000
vehicles.info()

#### drop columns with information we will not use


In [None]:
vehicles1 = vehicles.drop(columns = ['on_street_name','off_street_name'])
vehicles1

In [None]:
import seaborn as sns
sns.set()
import matplotlib.pyplot as plt
ax = sns.scatterplot(x="number_of_persons_injured", y="borough", data=vehicles1)
ax

In [None]:
!pip install plotly

In [None]:
import plotly.express as px

# error may be due to too many types of cars / too many required colors.  
# You want color to be used for a small number of categories where there are no NaN.
# When you're doing categorical variables (like type of car), you will probably want to 
# supply a color map.

fig = px.scatter_3d(vehicles1, x='borough', y='number_of_persons_injured', z='number_of_persons_killed')
fig.show()

In [None]:
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.scatter_geo(gapminder, locations="iso_alpha", color="continent",
                     hover_name="country", size="pop",
                     animation_frame="year",
                     projection="natural earth")
fig.show()