# Getting Started
Welcome to the quickstart guide for OpenPoliceData (OPD)! Here, you should find all you need to learn the basics of OPD.

* **New to Python?**: Check out the free [first python notebook](https://firstpythonnotebook.org/) course

* **Questions or Comments?**: If you questions or comments about anything related to installing or using OPD, please reach out on our [discussion board](https://github.com/openpolicedata/openpolicedata/discussions).


## Installation
Install OPD with pip from [PyPI](https://pypi.org/project/openpolicedata/)

```bash
pip install openpolicedata
```
For installation in a Jupyter Notebook, replace `pip` with `%pip`. 

See [here](installation.rst) for advanced installation including how to install [GeoPandas] alongside OPD to enable geospatial analysis of data loaded by OPD.

## Import
To use OPD, you must always start by importing it into your Python code:

In [3]:
import openpolicedata as opd

We recommend shortening openpolicedata to `opd` to make your code more readable. 

## The Basics
OPD provides access to over 300 police datasets with just 2 simple lines of code:

In [13]:
# Load traffic stops data from Lousiville for the year 2022.
src = opd.Source("Louisville")
tbl = src.load_from_url(2022, table_type="TRAFFIC STOPS") 

                                                                                                                                                              

The table attribute contains the loaded data as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) so it can be analyzed with [pandas' simple and powerful capabilities](https://pandas.pydata.org/docs/user_guide/10min.html).

In [12]:
# View the 1st 5 rows with pandas' head function
tbl.table.head()

Unnamed: 0,TYPE_OF_STOP,CITATION_CONTROL_NUMBER,ACTIVITY_RESULTS,OFFICER_GENDER,OFFICER_RACE,OFFICER_AGE_RANGE,ACTIVITY_DATE,ACTIVITY_TIME,ACTIVITY_LOCATION,ACTIVITY_DIVISION,ACTIVITY_BEAT,DRIVER_GENDER,DRIVER_RACE,DRIVER_AGE_RANGE,NUMBER_OF_PASSENGERS,WAS_VEHCILE_SEARCHED,REASON_FOR_SEARCH,ObjectId
0,COMPLAINT/CRIMINAL VIOLATION,DU03293,CITATION ISSUED,M,WHITE,21 - 30,01/02/2022,21:44,M ST ...,4TH DIVISION,BEAT 4,M,WHITE,26 - 30,2,YES,0,1
1,COMPLAINT/CRIMINAL VIOLATION,DV75866,CITATION ISSUED,M,WHITE,51 - 60,07/21/2022,02:00,KEEGAN WAY ...,7TH DIVISION,BEAT 1,M,HISPANIC,16 - 19,1,YES,4,2
2,COMPLAINT/CRIMINAL VIOLATION,DV87754,CITATION ISSUED,M,WHITE,51 - 60,07/21/2022,02:00,KEEGAN WAY ...,7TH DIVISION,BEAT 1,M,HISPANIC,16 - 19,1,NO,0,3
3,COMPLAINT/CRIMINAL VIOLATION,DW19051,CITATION ISSUED,M,WHITE,21 - 30,01/25/2022,11:23,4500 BLOCK SOUTHERN PKWY,4TH DIVISION,BEAT 6,M,WHITE,20 - 25,0,YES,4,4
4,COMPLAINT/CRIMINAL VIOLATION,DX65321,CITATION ISSUED,M,WHITE,31 - 40,01/13/2022,05:30,PRESTON HWY @ OUTER LOOP ...,7TH DIVISION,BEAT 6,M,WHITE,51 - 60,1,YES,3,5


## Finding Datasets
OPD provides the `datasets` module for querying what datasets are available in OPD. To get all available datasets, query the source table with no inputs:

In [11]:
all_datasets = opd.datasets.query()
all_datasets.head()

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Description,source_url,readme,URL,Year,DataType,date_field,dataset_id,agency_field,min_version
0,Arizona,Gilbert,Gilbert,Gilbert Police Department,CALLS FOR SERVICE,2006-11-15,2023-05-14,05/15/2023,,https://data.gilbertaz.gov/maps/2dcb4c20c9a444...,,https://maps.gilbertaz.gov/arcgis/rest/service...,MULTIPLE,ArcGIS,EventDate,,,
1,Arizona,Gilbert,Gilbert,Gilbert Police Department,EMPLOYEE,NaT,NaT,05/15/2023,A data set of all employees that have previous...,https://data.gilbertaz.gov/datasets/TOG::gilbe...,,https://services1.arcgis.com/JLuzSHjNrLL4Okwb/...,NONE,ArcGIS,,,,
2,Arizona,Gilbert,Gilbert,Gilbert Police Department,STOPS,2008-01-01,2018-05-23,05/15/2023,Standardized stop data from the Stanford Open ...,https://openpolicing.stanford.edu/data/,https://github.com/stanford-policylab/opp/blob...,https://stacks.stanford.edu/file/druid:yg821jf...,MULTIPLE,CSV,date,,,
3,Arizona,Mesa,Mesa,Mesa Police Department,CALLS FOR SERVICE,2017-01-01,2023-05-12,05/15/2023,,https://data.mesaaz.gov/Police/Police-Computer...,,data.mesaaz.gov,MULTIPLE,Socrata,creation_datetime,ex94-c5ad,,
4,Arizona,Mesa,Mesa,Mesa Police Department,INCIDENTS,2016-01-01,2023-03-31,05/15/2023,Incidents based on initial police reports take...,https://data.mesaaz.gov/Police/Police-Incident...,,data.mesaaz.gov,MULTIPLE,Socrata,report_date,39rt-2rfj,,


The source table provides the information needed to create sources and load data as well as background information. It is a DataFrame that can be filtered with [pandas filtering operations](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html#min-tut-03-subset). Key information includes:

 * **State**: Optionally used when creating a `Source` to distinguish ambiguous sources (i.e. same city name in different states)
 * **SourceName**: Original source of the data (typically a shortened name for a police department). Used when creating a `Source`.
 * **Agency**: Shortened agency / police department name. Typically the same as SourceName. However, it may be `MULTIPLE` if a datasets contains data for multiple agencies.
 * **TableType**: Type of data (TRAFFIC STOPS, USE OF FORCE, etc.). Used when loading data.
 * **coverage_start**: Start date of data contained in dataset. Combined with coverage_end, this determines the years available for this datasets when loading data. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.
  * **coverage_end**: Most recently checked date for data contained in dataset. Combined with coverage_start, this determines the years available for this datasets when loading data. If the data has been updated by the dataset owner since the date in `last_coverage_check`, more recent years may be available. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.
  * **source_url**: Homepage for dataset
  * **readme**: Direct URL for data dictionary containing definitions of columns, etc. If empty, the `source_url` may also contain a data dictionary.

With its optional inputs, `query` can be used to filter for desired data. Here is a very specific query using all optional inputs:

In [15]:
ds = opd.datasets.query(source_name="Menlo Park", state="California", agency="Menlo Park", table_type="CALLS FOR SERVICE")
ds

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Description,source_url,readme,URL,Year,DataType,date_field,dataset_id,agency_field,min_version
73,California,Menlo Park,Menlo Park,Menlo Park Police Department,CALLS FOR SERVICE,2018-01-01,2018-12-31,05/15/2023,,https://data.menlopark.org/datasets/4036c27030...,https://data.menlopark.org/datasets/4036c27030...,https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...,2018,ArcGIS,,,,
74,California,Menlo Park,Menlo Park,Menlo Park Police Department,CALLS FOR SERVICE,2019-01-01,2019-12-31,05/15/2023,,https://data.menlopark.org/datasets/e88877f5d9...,https://data.menlopark.org/datasets/e88877f5d9...,https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...,2019,ArcGIS,,,,
75,California,Menlo Park,Menlo Park,Menlo Park Police Department,CALLS FOR SERVICE,2020-01-01,2020-12-31,05/15/2023,,https://data.menlopark.org/datasets/510eb69337...,https://data.menlopark.org/datasets/510eb69337...,https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...,2020,ArcGIS,,,,
76,California,Menlo Park,Menlo Park,Menlo Park Police Department,CALLS FOR SERVICE,2021-01-01,2021-12-31,05/15/2023,,https://data.menlopark.org/datasets/4c04a71c71...,https://data.menlopark.org/datasets/4c04a71c71...,https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...,2021,ArcGIS,,,,


`get_table_types` finds available table types in OPD. Here, we use optional `contains` input to only get the table types containing the word "STOPS":

In [5]:
table_types = opd.datasets.get_table_types(contains="STOPS")
table_types

['PEDESTRIAN STOPS', 'STOPS', 'TRAFFIC STOPS']

## Loading Data
The `Source` class is used to explore datasets and load data. We first need to create a source, which we can use to view all datasets from that source. Let's create a source of Columbia, South Carolina. We need to specify the state because there are datasets from Columbias from multiple states

In [6]:
src = opd.Source("Columbia", state="South Carolina")
src.datasets

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Description,source_url,readme,URL,Year,DataType,date_field,dataset_id,agency_field,min_version
744,South Carolina,Columbia,Columbia,Columbia Police Department,ARRESTS,2016-01-01,2022-03-31,05/15/2023,,https://coc-colacitygis.opendata.arcgis.com/da...,,https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...,MULTIPLE,ArcGIS,Arrest_Date,,,0.2
745,South Carolina,Columbia,Columbia,Columbia Police Department,FIELD CONTACTS,2016-01-01,2022-01-01,05/15/2023,Field Interview is a collection of data result...,https://coc-colacitygis.opendata.arcgis.com/da...,,https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...,MULTIPLE,ArcGIS,TOC,,,


To get a list of available table types:

In [17]:
src.get_tables_types()

['ARRESTS', 'FIELD CONTACTS']

You can get the number of records for a dataset using `get_count`. Let's get the number of records in the year 2022 for the FIELD CONTACTS dataset.

In [18]:
src.get_count(2022, "FIELD CONTACTS")

2382

You can find which years are available for a given table type:

In [12]:
src.get_years(table_type="FIELD CONTACTS")

[2016, 2017, 2018, 2019, 2020, 2021, 2022]

Now, let's load in some field contacts data for 2022.

In [8]:
tbl = src.load_from_url(2022, "FIELD CONTACTS")
tbl

                                                                                                                                                        

state: South Carolina,
source_name: Columbia,
agency: Columbia,
table_type: TableType.FIELD_CONTACTS,
year: 2022,
description: Field Interview is a collection of data resulting from citizen contact related to suspicious activity.,
url: https://services1.arcgis.com/Mnt8FoJcogKtoVBs/arcgis/rest/services/FieldInterview/FeatureServer/0,
source_url: https://coc-colacitygis.opendata.arcgis.com/datasets/ColaCityGIS::field-interview-1-1-2016-3-31-2022/about

The loaded data is contained in a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) in the table attribute.

In [13]:
tbl.table.head(2)

Unnamed: 0,OBJECTID,Case_Num,TOC,Address,City,Zip,State,Age,Race,Sex,Contact_Type,Year,geometry
0,25351,220000108,2022-01-01 21:47:00,12XX Main St,,29201,,32,W,M,Field Interview,2022.0,POINT (1989801.776 788862.968)
1,25350,220000161,2022-01-02 15:05:00,21XX Main St,,29201,,29,B,M,Field Interview,2022.0,POINT (1988210.189 793174.093)


Data can be saved locally as CSV files. This allows you to:

 * Open the data using the software of your choice
 * Re-open the data in OPD from a local copy

In [19]:
tbl.to_csv()
new_src = opd.Source("Columbia", state="South Carolina")
new_tbl = new_src.load_from_csv(2022, table_type="FIELD CONTACTS")
new_tbl.table.head(2)

Unnamed: 0,OBJECTID,Case_Num,TOC,Address,City,Zip,State,Age,Race,Sex,Contact_Type,Year,geometry
0,25351,220000108.0,2022-01-01 21:47:00,12XX Main St,,29201,,32.0,W,M,Field Interview,2022.0,POINT (1989801.7762467265 788862.9678477645)
1,25350,220000161.0,2022-01-02 15:05:00,21XX Main St,,29201,,29.0,B,M,Field Interview,2022.0,POINT (1988210.189304456 793174.0931758583)


Some datasets contain data for every agency in a state. In this case, you may want to know what agencies are available and optionally, only want agencies containing the word Arlington.

In [24]:
src = opd.Source("Virginia")
agencies = src.get_agencies(table_type="STOPS", partial_name="Arlington")
agencies

['Arlington County Police Department', "Arlington County Sheriff's Office"]

We may also want only load data from a specific agency.
```python
tbl = src.load_from_url(2022, table_type="STOPS", agency="Arlington County Police Department")
```