# Report Part 7: Partial Reporters

The goal of this report is to highlight agencies which show particular patterns in reporting incidents, specifically incidents which report in some months and not others.

The Reporting agency types are defined as follows:

**Full Reporters**: Agencies which reported for all months.

**Late Starters**: Agencies which started reporting some month after January and reported consistently for the remaining months.

**Quitters**: Agencies which stopped reporting some time in the year but consistently reported in all months before the month they stopped.

**Inconsistent**: Agencies with some other pattern of reporting/non-reporting.


Note that in all cases, we are only looking at eligible agencies. Agency eligibility is identified using the Missing Months report data. An eligible agency has been identified to be active, not covered by a different agency, and not dormant.

In [None]:
from datetime import datetime
import os
print("Author: Automated Pipeline")
year = int(os.getenv('DATA_YEAR'))
print("Generating reports for year:",year)
print("Report date:", datetime.now().strftime("%m/%d/%y"))

In [None]:
from utils import *
from dictionaries import *
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from IPython.display import display, Markdown
import plotly.express as px
from datetime import datetime as dt
from plotly.subplots import make_subplots
import math


output_folder = Path(os.getenv("OUTPUT_PIPELINE_DIR"))
artifact_output_folder = output_folder / "artifacts"

years = get_available_years()
output_dir = output_folder / "QC_output_files"
output_dir.mkdir(parents=True, exist_ok=True)

engine_database = connect_to_database()

### Import some variables we need.
state_name_to_abbrev = {v: k for k, v in us_state_abbrev.items()}

states_list = list(us_state_abbrev.values())
states_list.sort()

print("--------------------------------")
print(" Loading datasets, please wait. ")


df_partial_reporters = pd.read_csv(artifact_output_folder / "NIBRS_reporting_pattern_with_reta-mm.csv",parse_dates=["nibrs_agn_nibrs_start_date"])
df_partial_reporters = df_partial_reporters.loc[df_partial_reporters["incident_year"] == year]

# subset by eligible agencies
df_partial_reporters_el = df_partial_reporters.loc[(df_partial_reporters["AGENCY_STATUS"].isin(["Active"])) & \
                                                   (df_partial_reporters["COVERED_FLAG"] == "N") & \
                                                   (df_partial_reporters["DORMANT_FLAG"] == "N")].copy()


# Flag large agencies (>100,000 population or >750 officers)
df_partial_reporters_el["LARGE_FLAG"] = df_partial_reporters_el.apply(lambda row:(int(row["univ_population"]) > 100000) \
                                                                      or ((row["nibrs_agn_male_total"] + row["nibrs_agn_female_total"]) >= 750),axis=1)

pattern_code_dict = {
    "011-111-111-111": "late starter",
    "001-111-111-111": "late starter",
    "000-111-111-111": "late starter",
    "000-011-111-111": "late starter",
    "000-001-111-111": "late starter",
    "000-000-111-111": "late starter",
    "000-000-011-111": "late starter",
    "000-000-001-111": "late starter",
    "000-000-000-111": "late starter",
    "000-000-000-011": "late starter",
    "000-000-000-001": "late starter",
    "100-000-000-000": "quitter",
    "110-000-000-000": "quitter",
    "111-000-000-000": "quitter",
    "111-100-000-000": "quitter",
    "111-110-000-000": "quitter",
    "111-111-000-000": "quitter",
    "111-111-100-000": "quitter",
    "111-111-110-000": "quitter",
    "111-111-111-000": "quitter",
    "111-111-111-100": "quitter",
    "111-111-111-110": "quitter",
    "111-111-111-111": "full reporter"
}

agency_categories = [
    "City 0",
    "City < 9,999",
    "City 10,000-249,999",
    "City 250,000 +",
    "Non-MSA County 0",
    "Non-MSA County < 25,000",
    "Non-MSA County 25,000 +",
    "Non-MSA State Police",
    "MSA County 0",
    "MSA County < 25,000",
    "MSA County 25,000 +"
]

nibrs_agn_population_group_id_dict = {
    1: None,
    3:"City 250,000 +",
    4:"City 250,000 +",
    5:"City 250,000 +",
    6:"City 10,000-249,999",
    7:"City 10,000-249,999",
    8:"City 10,000-249,999",
    9:"City 10,000-249,999",
    10:"City < 9,999",
    11:"City < 9,999",
    12: None,
    13:"Non-MSA County 25,000 +",
    14:"Non-MSA County 25,000 +",
    15:"Non-MSA County < 25,000",
    16:"Non-MSA County < 25,000",
    17:"Non-MSA State Police",
    18:None,
    19:"MSA County 25,000 +",
    20:"MSA County 25,000 +",
    21:"MSA County < 25,000",
    22:"MSA County < 25,000",
    23:"MSA State Police"
}

nibrs_agn_population_group_id_detailed_dict = {
    1: None,
    3:"City 250,000 +",
    4:"City 250,000 +",
    5:"City 250,000 +",
    6:"City 100,000-249,999",
    7:"City 50,000-99,999",
    8:"City 25,000-49,999",
    9:"City 10,000-24,999",
    10:"City 2,500-9,999",
    11:"City < 2,500",
    12: None,
    13:"Non-MSA County 100,000 +",
    14:"Non-MSA County 25,000-99,999",
    15:"Non-MSA County 10,000-24,999",
    16:"Non-MSA County < 10,000",
    17:"Non-MSA State Police",
    18:None,
    19:"MSA County 100,000 +",
    20:"MSA County 25,000-99,999",
    21:"MSA County 10,000-24,999",
    22:"MSA County < 10,000",
    23:"MSA State Police"
}

agency_categories_detailed = [
    "City 250,000 +",
    "City 100,000-249,999",
    "City 50,000-99,999",
    "City 25,000-49,999",
    "City 10,000-24,999",
    "City 2,500-9,999",
    "City < 2,500",
    "City 0",
    "Non-MSA County 100,000 +",
    "Non-MSA County 25,000-99,999",
    "Non-MSA County 10,000-24,999",
    "Non-MSA County < 10,000",
    "Non-MSA State Police",
    "Non-MSA County 0",
    "MSA County 100,000 +",
    "MSA County 25,000-99,999",
    "MSA County 10,000-24,999",
    "MSA County < 10,000",
    "MSA State Police",
    "MSA County 0"
]


zero_pop_dict = {
    11:"City 0",
    16:"Non-MSA County 0",
    22:"MSA County 0"
}

def classify_agency_type(row):
    pop_group_id = row["nibrs_agn_population_group_id"].astype(int)
    if (row["nibrs_agn_population"].astype(int) == 0 and pop_group_id in zero_pop_dict.keys()):
        return zero_pop_dict[pop_group_id]  
    else: 
        return nibrs_agn_population_group_id_dict[pop_group_id]
    
def classify_agency_type_detailed(row):
    pop_group_id = row["nibrs_agn_population_group_id"].astype(int)
    if (row["nibrs_agn_population"].astype(int) == 0 and pop_group_id in zero_pop_dict.keys()):
        return zero_pop_dict[pop_group_id]  
    else: 
        return nibrs_agn_population_group_id_detailed_dict[pop_group_id]

df_partial_reporters_el["der_group"] = df_partial_reporters_el[["nibrs_agn_population","nibrs_agn_population_group_id"]].apply(classify_agency_type,axis=1)
df_partial_reporters_el["der_group_detailed"] = df_partial_reporters_el[["nibrs_agn_population","nibrs_agn_population_group_id"]].apply(classify_agency_type_detailed,axis=1)


df_partial_reporters_el["der_miss_pattern"] = df_partial_reporters_el["nibrs_missing_pattern_all"].apply(lambda p: pattern_code_dict[p] if p in pattern_code_dict else "inconsistent")

df_partial_reporters_el["new_joiner"] = df_partial_reporters_el["nibrs_agn_nibrs_start_date"].apply(lambda d: "New Joiners" if d >= dt(day=1,month=2,year=year-1) else "Experienced Joiners")

table_partial_reporter = pd.get_dummies(df_partial_reporters_el[["der_group_detailed","new_joiner","der_miss_pattern"]],columns=["der_miss_pattern"],prefix="",prefix_sep="")
agency_type_table = table_partial_reporter.drop(columns=["new_joiner"]).rename(columns={"der_group_detailed":"Agency Characteristics",
                                       "late starter":"Late Starters","quitter":"Quitters",
                                       "inconsistent":"Inconsistent Reporters"}).groupby("Agency Characteristics").sum()

joiner_type_table = table_partial_reporter.drop(columns=["der_group_detailed"]).rename(columns={"new_joiner":"Agency Characteristics",
                                       "late starter":"Late Starters","quitter":"Quitters",
                                       "inconsistent":"Inconsistent Reporters"}).groupby("Agency Characteristics").sum()

table_partial_reporter_merged = pd.concat([agency_type_table,joiner_type_table])
table_partial_reporter_merged["All Agencies in NIBRS Data"] =  table_partial_reporter_merged.sum(axis=1)
table_partial_reporter_merged.drop(columns=["full reporter"],inplace=True)
table_partial_reporter_merged = table_partial_reporter_merged[["All Agencies in NIBRS Data","Late Starters","Quitters","Inconsistent Reporters"]]

table_partial_reporter_merged = table_partial_reporter_merged.reindex(agency_categories_detailed + ["New Joiners","Experienced Joiners"],fill_value=0).astype(int)
table_partial_reporter_merged.to_csv(output_dir / f"Partial_Reporters_Agency_Joiner_Type_{year}.csv",index=True)
print("              done              ")
print("--------------------------------")

In [None]:
display(Markdown(f"### Number of Partial Reporters by Agency Type and Joiner Type: {year} NIBRS"))
display(table_partial_reporter_merged)
display(Markdown(f"*New joiners are LEAs whose NIBRS start date is starting on and including February 1, {year-1}.  Experienced joiners are LEAs whose NIBRS start date is before February 1, {year-1}.*"))

In [None]:
month_list = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]

display(Markdown("### High Level Counts for Late Starters/Quitters/Inconsistent Reporters"))

# for late starters, what month did they start?
late_starters = df_partial_reporters_el.loc[df_partial_reporters_el["der_miss_pattern"] == "late starter"].copy()
late_starters["late_start_month"] = late_starters["nibrs_missing_pattern_all"].apply(lambda x: month_list[x.replace("-","").index("1")])
late_starter_counts = late_starters.groupby("late_start_month")["ori"].count().reset_index().rename(columns={"late_start_month":"First Month they Report","ori":"Agency Count"})
fig = px.bar(late_starter_counts, 
             x="First Month they Report", 
             y="Agency Count",
             title="Late Starters: First Month of Reporting",
             category_orders={"First Month they Report":month_list})
display(fig)

# for quitters, what month did they quit?
quitters = df_partial_reporters_el.loc[df_partial_reporters_el["der_miss_pattern"] == "quitter"].copy()
quitters["quitter_month"] = quitters["nibrs_missing_pattern_all"].apply(lambda x: month_list[x.replace("-","").index("0")])
quitter_counts = quitters.groupby("quitter_month")["ori"].count().reset_index().rename(columns={"quitter_month":"First Month they Didn't Report","ori":"Agency Count"})
fig = px.bar(quitter_counts, 
             x="First Month they Didn't Report", 
             y="Agency Count",
             title="Quitters: First Month of Non-Reporting",
             category_orders={"First Month they Didn't Report":month_list})
display(fig)

# for inconsistent reporters, how many months did they report? What were the most popular patterns?
inconsistent = df_partial_reporters_el.loc[df_partial_reporters_el["der_miss_pattern"] == "inconsistent"].copy()

inconsistent["inconsistent_count"] = inconsistent["nibrs_missing_pattern_all"].apply(lambda x: np.sum([1 if c == "1" else 0 for c in x.replace("-","")]))
inconsistent_counts = inconsistent["inconsistent_count"].value_counts().reset_index().rename(columns={"count":"Number of Months Reported","inconsistent_count":"Agency Count"})
fig = px.bar(inconsistent_counts, 
             x="Number of Months Reported", 
             y="Agency Count",
             title="Inconsistent: Number of Months Agencies are Reporting",
            category_orders={"Number of Months Reported":[1,2,3,4,5,6,7,8,9,10,11]})
display(fig)

display(Markdown("### Top 20 Most Popular Inconsistent Reporter Patterns (1=reported, 0=did not report)"))
display(inconsistent["nibrs_missing_pattern_all"].value_counts().reset_index().rename(columns={"count":"Reporting Pattern","nibrs_missing_pattern_all":"Agency Count"})[:20])

In [None]:
df_partial_reporters_el["der_group"].value_counts()
"""
Can we add a pie chart to show the composition of agencies under each agency type, 
including: NIBRS nonreporters, later starters, quitters, inconsistent reporters and consistent reporters.
"""
irises_colors = ['rgb(27,120,55)','rgb(231,212,232)', 'rgb(175,141,195)', 'rgb(118,42,131)']


cols = 3
rows = math.ceil(len(agency_categories) / cols)


fig = make_subplots(rows=rows, 
                    cols=cols, 
                    specs=[[{'type':'domain'} for x in range(cols)] for y in range(rows)],
                    subplot_titles=agency_categories
                   )


col = 0
row = 1
for cat in agency_categories:
    col += 1
    if col > cols:
        col = 1
        row += 1
    
        
    reporting_distribution = df_partial_reporters_el.loc[df_partial_reporters_el["der_group"] == cat]["der_miss_pattern"].value_counts().sort_index()
    reporting_distribution = reporting_distribution.reset_index().rename(columns={"count":"Reporting Pattern","der_miss_pattern":"Agency Count"})
    fig.add_trace(go.Pie(labels=reporting_distribution["Reporting Pattern"], 
                         values=reporting_distribution["Agency Count"], 
                         name=cat,
                         marker_colors=irises_colors
                        ), row, col)
    
fig.update_layout(title_text="Reporting Types for different Agency Groups",height=1200)
fig.show()

In [None]:
# list large agencies in each category (other than full)
cols_of_interest = ["nibrs_agn_state_name","ori","ucr_agency_name","nibrs_agn_population_group_desc",'nibrs_agn_nibrs_start_date',"nibrs_missing_pattern_all","nibrs_total_crime_all","der_miss_pattern","univ_population","nibrs_agn_male_total","nibrs_agn_female_total","LARGE_FLAG"]

                                        
not_full_large = df_partial_reporters_el.loc[(df_partial_reporters_el["der_miss_pattern"] != "full reporter") &\
                                             (df_partial_reporters_el["LARGE_FLAG"] == 1)].copy()

not_full_large = not_full_large[cols_of_interest].sort_values(by=["nibrs_agn_state_name","univ_population"],ascending=[True,False])

not_full_large.to_csv(output_dir/f"not_full_reporter_large_agencies_{year}.csv",index=False)

display(Markdown("### Large Late Starters (large agencies mean >100,000 population or >750 officers)"))
display(not_full_large.loc[not_full_large["der_miss_pattern"] == "late starter"].reset_index(drop=True))

display(Markdown("### Large Quitters (large agencies mean >100,000 population or >750 officers)"))
display(not_full_large.loc[not_full_large["der_miss_pattern"] == "quitter"].reset_index(drop=True))

display(Markdown("### Large Inconsistent Reporters (large agencies mean >100,000 population or >750 officers)"))
display(not_full_large.loc[not_full_large["der_miss_pattern"] == "inconsistent"].reset_index(drop=True))

## Datasets Used:

`Missing months datafile`: missing_months_<year>.csv (reta missing months)
* **Source**: NIBRS database
* **Description**: All law enforcement agencies in the US, whether or not they should be reporting crimes, and what months they reported incidents. Lists eligible agencies and whether or not they reported for different months.
* **Typical data**: 23 columns of ORI, state, status flags, population information, and indicators for if they reported crimes for each month.


`Universe datafile`: ref_agency_year.xlsx
* **Source**: FBI CJIS
* **Description**: Annual Snapshot List of all agencies and meta-data, regardless of NIBRS reporting status.
* **Typical data**: 66 columns of ORI, population region and officer meta-data. This includes both NIBRS and non-NIBRS agencies.
   * Agency Population loaded from column (POPULATION)
   * Agency Officer Count loaded from columns (PE_MALE_OFFICER_COUNT + PE_FEMALE_OFFICER_COUNT)

`SRS datafile (historic)`: srs2016_2020_smoothed.csv
* **Source**: FBI CJIS
* **Description**: Summary Reporting System (SRS) Crime data smoothed across four years.
* **Typical data**: Several hundred columns of crime counts by month/category. For NIBRS agencies, the SRS crime counts should reflect the subset of incidents reported to NIBRS which are relevant.
   * SRS incident count is sum of all monthly total columns (v95,v213,v331,v449,v567,v685,v803,v921,v1039,v1157,v1275,v1393)

`NIBRS datafile`: Amazon Web Services database
* **Source**: FBI CJIS
* **Description**: Incident/Offender/Victim dataset of crimes published by FBI
* **Typical data**: Incident level data can be retrieved in various ways (e.g. incident, victim, offender, or agency centric viewpoints)
   * Eligible ORIs selected from reta-mm have
     * AGENCY_STATUS is 'Active' or 'Federal' (reject 'LEOKA', blanks)
     * COVERED_FLAG is 'N' (reject 'Y')
     * DORMANT_FLAG is 'N' (reject 'Y')