# Loading Datasets

This notebook shows an example of how to load a dataset. 
It assumes you found the dataset using techniques shown in `finding_datasets.ipynb`
The basic steps it demonstrates to load data is:
1. Find available datasets with `opd.datasets.query`
2. Create a data source using `opd.Source` and information from the previous step.
3. Find available data types for given years using `get_tables_types` and `get_years`
4. Load the data type for a given year using `load`

In [1]:
import openpolicedata as opd

In [2]:
# We will load Montgormery County, Maryland traffic stop data. First show our dataset options.
df = opd.datasets.query(table_type='TRAFFIC STOPS', state="Maryland")
df.head()

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Description,source_url,readme,URL,Year,DataType,date_field,dataset_id,agency_field,min_version,query
479,Maryland,Maryland,MULTIPLE,,TRAFFIC STOPS,2007-01-01,2014-03-31,01/10/2024,Standardized stop data from the Stanford Open ...,https://openpolicing.stanford.edu/data/,https://github.com/stanford-policylab/opp/blob...,https://stacks.stanford.edu/file/druid:yg821jf...,MULTIPLE,CSV,date,,department_name,,
485,Maryland,Montgomery County,Montgomery County,Montgomery County Police Department,TRAFFIC STOPS,2012-06-07,2024-05-09,05/10/2024,This dataset contains traffic violation inform...,https://data.montgomerycountymd.gov/Public-Saf...,,data.montgomerycountymd.gov,MULTIPLE,Socrata,date_of_stop,4mse-ku6q,,,


In [3]:
# To access the data, create a source using a Source Name (usually a police department name). There is an optional state input to clarify ambiguities.
# We will use the above cell's information for Maryland to choose the agency "Montgomery County" which we select for the source_name

src = opd.Source(source_name="Montgomery County", state="Maryland")
src.datasets.head()

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Description,source_url,readme,URL,Year,DataType,date_field,dataset_id,agency_field,min_version,query
480,Maryland,Montgomery County,Montgomery County,Montgomery County Police Department,COMPLAINTS,2013-10-24,2024-05-06,05/10/2024,This dataset contains allegations brought to t...,https://data.montgomerycountymd.gov/Public-Saf...,,data.montgomerycountymd.gov,MULTIPLE,Socrata,created_dt,usip-62e2,,,
481,Maryland,Montgomery County,Montgomery County,Montgomery County Police Department,CRASHES - INCIDENTS,2015-12-20,2024-01-03,05/10/2024,general information about each collision and d...,https://data.montgomerycountymd.gov/Public-Saf...,,data.montgomerycountymd.gov,MULTIPLE,Socrata,crash_date_time,bhju-22kf,,0.4,
482,Maryland,Montgomery County,Montgomery County,Montgomery County Police Department,CRASHES - NONMOTORIST,2015-03-23,2023-12-31,05/10/2024,information on non-motorists (pedestrians and ...,https://data.montgomerycountymd.gov/Public-Saf...,,data.montgomerycountymd.gov,MULTIPLE,Socrata,crash_date_time,n7fk-dce5,,0.5,
483,Maryland,Montgomery County,Montgomery County,Montgomery County Police Department,CRASHES - SUBJECTS,2015-06-30,2024-01-03,05/10/2024,information on motor vehicle operators (driver...,https://data.montgomerycountymd.gov/Public-Saf...,,data.montgomerycountymd.gov,MULTIPLE,Socrata,crash_date_time,mmzv-x632,,0.4,
484,Maryland,Montgomery County,Montgomery County,Montgomery County Police Department,INCIDENTS,2017-04-02,2024-05-10,05/10/2024,list of Police Dispatched Incidents records,https://data.montgomerycountymd.gov/Public-Saf...,,data.montgomerycountymd.gov,MULTIPLE,Socrata,start_time,98cc-bc7d,,,


In [4]:
# Find out what types of data are available from this source
types = src.get_tables_types()

print(types)

['COMPLAINTS', 'CRASHES - INCIDENTS', 'CRASHES - NONMOTORIST', 'CRASHES - SUBJECTS', 'INCIDENTS', 'TRAFFIC STOPS']


In [5]:
# Find out what years are available from the stops table
# IF you do not have a key setup you may see the message: "WARNING:root:Requests made without an app_token will be subject to strict throttling limits." This is normal.
years = src.get_years(table_type=types[0])
print(years)

[2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]


In [None]:
# Load traffic stop data for 2021
t = src.load(date=2021, table_type='TRAFFIC STOPS')

In [7]:
# The loaded table is stored in the table parameter as a pandas DataFrame (https://pandas.pydata.org/docs/user_guide/10min.html#min)
# Show the first 5 rows of the table
t.table.head(n=5)
# Now you are ready for analyzing the data in the table t.

Unnamed: 0,geometry,seq_id,date_of_stop,time_of_stop,agency,subagency,description,location,latitude,longitude,...,driver_state,dl_state,arrest_type,search_conducted,search_outcome,search_reason_for_stop,search_disposition,search_reason,search_type,search_arrest_reason
0,POINT (-77.13047 39.01268),f08d0293-6ade-4802-84c1-4b7b1a707245,2021-01-01,03:12:00,MCP,"2nd District, Bethesda",RECKLESS DRIVING VEHICLE IN WANTON AND WILLFUL...,IFO 9609 SINGLETON DR,39.0126813333333,-77.130466,...,MD,MD,A - Marked Patrol,,,,,,,
1,POINT (-77.13047 39.01268),f08d0293-6ade-4802-84c1-4b7b1a707245,2021-01-01,03:12:00,MCP,"2nd District, Bethesda",FAILURE OF VEH. DRIVER IN ACCIDENT TO LOCATE A...,IFO 9609 SINGLETON DR,39.0126813333333,-77.130466,...,MD,MD,A - Marked Patrol,,,,,,,
2,POINT (-77.13047 39.01268),f08d0293-6ade-4802-84c1-4b7b1a707245,2021-01-01,03:12:00,MCP,"2nd District, Bethesda",NEGLIGENT DRIVING VEHICLE IN CARELESS AND IMPR...,IFO 9609 SINGLETON DR,39.0126813333333,-77.130466,...,MD,MD,A - Marked Patrol,,,,,,,
3,POINT (-77.13047 39.01268),f08d0293-6ade-4802-84c1-4b7b1a707245,2021-01-01,03:12:00,MCP,"2nd District, Bethesda",FAILURE OF VEH. DRIVER TO STOP AFTER UNATTENDE...,IFO 9609 SINGLETON DR,39.0126813333333,-77.130466,...,MD,MD,A - Marked Patrol,,,,,,,
4,POINT (-77.13047 39.01268),f08d0293-6ade-4802-84c1-4b7b1a707245,2021-01-01,03:12:00,MCP,"2nd District, Bethesda",FAILURE OF VEH. DRIVER INVOLVED IN ACCIDENT TO...,IFO 9609 SINGLETON DR,39.0126813333333,-77.130466,...,MD,MD,A - Marked Patrol,,,,,,,
