# Dataset Specification and Date Filtering Guide
When interacting with a dataset (`load`, `get_count`, etc.), users generally must: 
1. Specify a single dataset
2. Define the date range of the data to request from that dataset

In most cases, the basics in the [Getting Started Guide](./index.ipynb) are sufficient. This guide defines the full scope of allowed date range specifications and how to specify a single dataset in the rare case where use of `table_type` and `date` only do not uniquely specify a dataset.

Input parameters for dataset specification when loading data are:

| Parameter      | When To Use      | Related Dataset Column(s)* | Examples |
| ------------- | ------------- | ------------- | ------------- |
| `table_type` | Always | TableType | 'USE OF FORCE', 'STOPS' |
| `date` | Always | YEAR and coverage_start/coverage_end | 2025, ['2025-01-10', '2025-03-30'] |
| `url` | Multiple datasets match table_type and date<br>and URLs are unique | URL | full or partial URL from URL<br>column |
| `id` |  Multiple datasets match table_type and date<br>and dataset IDs are unique | dataset_id | 'ex94-c5ad' |

\* Where to find valid values in the datasets table

This guide describes how to handle the following cases:

* [Data Specification](#data-specification)
  * [Basic Data Specification (this will work in most cases)](#basic-data-specification)
  * [Advanced Data Specification (this will always work)](#advanced-data-specification)
* [Date Filtering](#date-filtering)
  * [Example: Request Entire N/A Date Dataset](#example-request-entire-na-date-dataset)
  * [Example: Request Entire Multi-Year Dataset](#example-request-entire-multi-year-dataset)
  * [Example: Request Entire Year of Multi-Year Dataset](#example-request-entire-year-of-multi-year-dataset)
  * [Example: Request Entire Single-Year Dataset](#example-request-entire-single-year-dataset)
  * [Example: Request Date Range](#example-request-date-range)
  * [Example: Request Start of Year to Date](#example-request-start-of-year-to-date)
  * [Example: Request Date to End of Year](#example-request-date-to-end-of-year)
  * [Example: Request Year Range](#example-request-year-range)

In [1]:
# This cell should have "nbsphinx": "hidden" in its metadata and not be included in the documentation!
import sys
sys.path.append("../../..")

import warnings
warnings.filterwarnings("ignore")

## Data Specification
The information needed to specify a dataset can be found in the datasets table for a source (See [Getting Started Guide](./index.ipynb) for viewing all datasets):

In [2]:
import openpolicedata as opd
src = opd.Source('Phoenix')
src.datasets.tail(2) # tail panda function requests only last 2 rows of datasets

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Year,agency_originated,...,source_url,readme,URL,DataType,date_field,dataset_id,agency_field,min_version,py_min_version,query
25,Arizona,Phoenix,Phoenix,Phoenix Police Department,TRAFFIC CITATIONS,2018-01-01,2025-05-30,06/01/2025,MULTIPLE,,...,https://www.phoenixopendata.com/dataset/citations,https://phoenixopendata.com/datastore/dictiona...,https://www.phoenixopendata.com,CKAN,TICK_DATE,7725bbf3-7829-4f57-8cc2-0faac51b90de,,0.6,,
26,Arizona,Phoenix,Phoenix,Phoenix Police Department,USE OF FORCE,2017-04-07,2025-02-19,06/01/2025,MULTIPLE,,...,https://www.phoenixopendata.com/dataset/ouof,https://phoenixopendata.com/datastore/dictiona...,https://www.phoenixopendata.com,CKAN,INC_DATE,c79b2135-e936-439e-a8a3-79e61d4518d2,,0.6,,


### Basic Data Specification
In most cases, data can be requested using a table type and date range. The following requests Phoenix use of force data for all of 2024.

In [3]:
table = src.load(table_type='USE OF FORCE', date=2024)

USE OF FORCE was an available table type in Phoenix's datasets. The Year for this dataset is MULTIPLE indicating it contains multiple years. Other options for Year in the datasets table are a numeric year or NONE (i.e. dataset not well-defined by time). The coverage_start/coverage_end columns indicate the timeframe the dataset covers (at last check, sometimes newer data is available). The coverage indicated that data is available from 2024.

### Advanced Data Specification
In some cases, multiple datasets exist for a table type and date range. Asheville has multiple use of force datasets:

In [4]:
src = opd.Source("Asheville")
uof_datasets = src.datasets[src.datasets['TableType']=='USE OF FORCE']
uof_datasets

Unnamed: 0,State,SourceName,Agency,AgencyFull,TableType,coverage_start,coverage_end,last_coverage_check,Year,agency_originated,...,source_url,readme,URL,DataType,date_field,dataset_id,agency_field,min_version,py_min_version,query
1158,North Carolina,Asheville,Asheville,Asheville Police Department,USE OF FORCE,2018-04-12,2020-12-26,05/10/2024,MULTIPLE,,...,https://data-avl.opendata.arcgis.com/datasets/...,https://docs.google.com/document/d/1sScS5Jez1w...,https://services.arcgis.com/aJ16ENn1AaqdFlqx/a...,ArcGIS,date_occurred,,,0.7,,
1159,North Carolina,Asheville,Asheville,Asheville Police Department,USE OF FORCE,2020-12-16,2025-03-30,06/01/2025,MULTIPLE,,...,https://data-avl.opendata.arcgis.com/datasets/...,https://docs.google.com/document/d/1sScS5Jez1w...,https://services.arcgis.com/aJ16ENn1AaqdFlqx/a...,ArcGIS,occurred_date,,,0.7,,


One starts in 2018 and ends on 2020-12-26, and one starts on 2020-12-16 and goes through the present. Data from 2018-2019 or 2021-Present can be requested using the [basic specification](#basic-data-specification). However, if data is desired from 2020, which dataset is desired is ambiguous. In this rare type of case, additional information is needed from the `URL` or `dataset_id` columns to uniquely identify a dataset.

In [5]:
table = src.load(table_type='USE OF FORCE', date=2020, url=uof_datasets['URL'].iloc[1]) # Request data from the 2nd use of force dataset

If `url` and `id` inputs are used, there will never be any ambiguity:

In [6]:
table = src.load(table_type='USE OF FORCE', date=2020, url=uof_datasets['URL'].iloc[1], id=uof_datasets['dataset_id'].iloc[1]) 

## Date Filtering
The `date` input to `load`, `get_count`, and other functions can take several forms. These are defined in the below table. **TIP**: Inputting the value in the Year column as the `date` will always request the whole dataset.

| Value in Year Column | `date` Input Type | Example | Requested Date Range |
| ------------- | ------------- | ------------- | ------------- |
| NONE | 'NONE' | 'NONE' | Entire dataset where date is N/A |
| MULTIPLE | 'MULTIPLE' | 'MULTIPLE' | Entire multi-year dataset |
| A single year (e.g. 2025) | a year | 2024 | Entire annual dataset |
| MULTIPLE | a year | 2024 | All data from a year |
| MULTIPLE *or* a single year (e.g. 2025) | date range | ['2024-05-01', '2024-06-15'] | All data from the 1st date through the 2nd date |
| MULTIPLE *or* a single year (e.g. 2025) | mixed range (1) | [2024, '2024-06-15'] | All data from Jan. 1 of the year through the date |
| MULTIPLE *or* a single year (e.g. 2025) | mixed range (2) | ['2024-06-15', 2024] | All data from the date through Dec. 31 of the year |
| MULTIPLE | year range | [2022, 2024] | All data from Jan. 1 of the 1st year through Dec. 31 of the 2nd year |

### Example: Request Entire N/A Date Dataset


In [19]:
src = opd.Source('Louisville')
table = src.load('EMPLOYEE', 'NONE')
print(f'{len(table.table)} results found')

                                                                                                                                                           

1700 results found




### Example: Request Entire Multi-Year Dataset

In [24]:
src = opd.Source('Minneapolis')
table = src.load('OFFICER-INVOLVED SHOOTINGS', 'MULTIPLE')
print(f'{len(table.table)} results found')

90 results found


### Example: Request Entire Year of Multi-Year Dataset

In [25]:
table = src.load('OFFICER-INVOLVED SHOOTINGS', 2020)
print(f'{len(table.table)} results found')

6 results found


### Example: Request Entire Single-Year Dataset

In [22]:
src = opd.Source('Oakland')
table = src.load('USE OF FORCE', 2024)
print(f'{len(table.table)} results found')

5031 results found


### Example: Request Date Range

In [23]:
src = opd.Source('Minneapolis')
table = src.load('OFFICER-INVOLVED SHOOTINGS', ['2019-12-15', '2020-05-30'])
print(f'{len(table.table)} results found')

10 results found


### Example: Request Start of Year to Date

In [26]:
table = src.load('OFFICER-INVOLVED SHOOTINGS', [2019, '2020-05-30'])
print(f'{len(table.table)} results found')

12 results found


### Example: Request Date to End of Year

In [28]:
table = src.load('OFFICER-INVOLVED SHOOTINGS', ['2020-05-30', 2020])
print(f'{len(table.table)} results found')

6 results found


### Example: Request Year Range

In [29]:
table = src.load('OFFICER-INVOLVED SHOOTINGS', [2019, 2020])
print(f'{len(table.table)} results found')

16 results found
