# MMP Enforcement Data Exploration

### What can we learn about the enforcement data in `enf_actions_export.csv`?

<i> Maggie Hilderbran

**TO DO:**

* 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import warnings
import yaml
from pathlib import Path
from IPython.display import clear_output
import sys

# display all rows & columns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# turn off warning messages
warnings.filterwarnings('ignore')

In [2]:
# import & use utility functions sitting in 'main.py'
sys.path.append('../../../ca_mmp')
from policy_eval import main

In [23]:
NPDES_PROGRAMS = ['DODNPDESSW', 'DODNPDESWW', 'NPDESWW', 'NPDINDLRG',
                  'NPDINDSML', 'NPDMINING', 'NPDMUNILRG', 'NPDMUNIOTH',
                  'NPDNONMUNIPRCS']

##### 1. Reading in and preprocessing data.

In [3]:
print('Reading in configuration and data files.')
clear_output(wait=True)

# read in configuration file
print('Reading in configuration file.')
with open(Path().resolve().parent / 'config.yml', 'r') as file:
    configs = yaml.safe_load(file)
data_path = Path(configs['data_path'])
clear_output(wait=True)

# read in data
print('Reading enforcements file.')
enforcements = pd.read_csv(data_path / 'enf_actions_export.csv',
                           dtype={'FACILITY ID': object},
                           parse_dates=['DATE OF OLDEST VIOLATION LINKED TO ENFORCEMENT ACTION', 'EFFECTIVE DATE.1'],
                           date_parser=lambda x: pd.to_datetime(x, errors='coerce'))
enforcements.rename(columns={'FACILITY ID': 'FACILITY_ID'}, inplace=True)
clear_output(wait=True)

Reading enforcements file.


In [4]:
print('Cleaning enforcements data.')
enforcements_clean = main.clean_enforcements(enforcements, mmp_only=False)
clear_output(wait=True)

Cleaning enforcements data.


##### 2. Exploring the `PROGRAM` fields.

Recall that these are the NPDES programs according to Nicole's code: `DODNPDESSW`, `DODNPDESWW`, `NPDESWW`, `NPDINDLRG`, `NPDINDSML`, `NPDMINING`, `NPDMUNILRG`, `NPDMUNIOTH`, and `NPDNONMUNIPRCS`.

In [6]:
# total number of enforcement actions
len(enforcements)

49249

In [7]:
len(enforcements['PROGRAM'].dropna())

40465

In [8]:
len(enforcements['PROGRAM.1'].dropna())

49051

In [32]:
# enforcement actions where 'PROGRAM' and 'PROGRAM.1' have different, non-NA values
diff = enforcements[(enforcements['PROGRAM.1'] != enforcements['PROGRAM']) &
                    (~pd.isna(enforcements['PROGRAM'])) &
                    (~pd.isna(enforcements['PROGRAM.1']))]

In [93]:
len(diff[(diff['PROGRAM'].isin(NPDES_PROGRAMS)) & (diff['PROGRAM.1'].isin(NPDES_PROGRAMS))])

530

How often does one field have a NPDES program while the other does not?

In [35]:
len(diff[(~diff['PROGRAM'].isin(NPDES_PROGRAMS)) & (diff['PROGRAM.1'].isin(NPDES_PROGRAMS))])

129

In [36]:
len(diff[(diff['PROGRAM'].isin(NPDES_PROGRAMS)) & (~diff['PROGRAM.1'].isin(NPDES_PROGRAMS))])

65

Let's look at a sample of these enforcement actions.

In [91]:
diff[['PROGRAM', 'PROGRAM CATEGORY', 'PROGRAM.1', 'PROGRAM CATEGORY.1']].sample(frac=1).head(20)

Unnamed: 0,PROGRAM,PROGRAM CATEGORY,PROGRAM.1,PROGRAM CATEGORY.1
11518,SSOMUNISML,SSO,WDRMUNIOTH,WDR
25295,WDRMUNIOWTS,WDR,WDRMUNIOTH,WDR
13114,UNREGS,UNREGS,ENF13267,UNREGS
1356,WDRNONMUNIPRCS,WDR,WDR,WDR
26361,NPDMUNILRG,NPDESWW,NPDESWW,NPDESWW
34999,LNDISPOTH,LNDISP,LNDISP,LNDISP
27149,WDRMINING,WDR,WDR,WDR
668,NPDMUNILRG,NPDESWW,NPDESWW,NPDESWW
15555,WDRNONMUNIPRCS,WDR,WDR,WDR
20184,WDRMUNIOWTS,WDR,WDRMUNIOTH,WDR


What if we look just at facilities that have a NPDES program in one field? 

In [94]:

diff_npdes = diff[(diff['PROGRAM'].isin(NPDES_PROGRAMS)) | (diff['PROGRAM.1'].isin(NPDES_PROGRAMS))]
diff_npdes[['PROGRAM', 'PROGRAM CATEGORY', 'PROGRAM.1', 'PROGRAM CATEGORY.1']].sample(frac=1).head(20)

Unnamed: 0,PROGRAM,PROGRAM CATEGORY,PROGRAM.1,PROGRAM CATEGORY.1
28453,NPDMUNILRG,NPDESWW,WDRMUNILRG,WDR
972,NPDNONMUNIPRCS,NPDESWW,NPDESWW,NPDESWW
9814,NPDMUNIOTH,NPDESWW,SSOMUNISML,SSO
1857,NPDNONMUNIPRCS,NPDESWW,NPDESWW,NPDESWW
19739,NPDMUNILRG,NPDESWW,NPDESWW,NPDESWW
20150,DODNPDESWW,DOD,NPDESWW,NPDESWW
35598,NPDMUNIOTH,NPDESWW,NPDESWW,NPDESWW
8342,NPDESWW,NPDESWW,NPDNONMUNIPRCS,NPDESWW
14270,NPDMUNIOTH,NPDESWW,NPDESWW,NPDESWW
29482,ANIWSTCOWS,ANIMALWASTE,NPDNONMUNIPRCS,NPDESWW


For the majority of facilities with different programs under `PROGRAM` and `PROGRAM.1`, the different values appear to still be within the larger NPDES umbrella, even if the specific programs listed are different. Animal waste appears to be the most frequent non-NPDES program that shows up, with some SSOs, land disposal, and WDRs.