| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: 
#### The notebook was collaboratively authored by the Environmental Data & Governance Initiative (EDGI) following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

## How to Run this Notebook
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---

# Nationwide statistics about environmental compliance trends

## Setup
Here we load some helper code to get us going.

In [1]:
# Import code libraries
!git clone https://github.com/edgi-govdata-archiving/ECHO_modules.git &>/dev/null;
%run ECHO_modules/DataSet.py

import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import requests
import csv
import datetime
import folium
from folium.plugins import FastMarkerCluster
import ipywidgets as widgets
from IPython.core.display import display, HTML
from pandas.errors import EmptyDataError
def formatter(value):
  return "{:0.2f}".format(value)



Here we set up some code to help us store and eventually export the metrics.

In [2]:
inspections = dict()
violations = dict()
enforcements = dict()
penalties = dict()
emissions = dict()

## Start getting data
First, get summary data from the ECHO_EXPORTER table.

In [3]:
# Get everything we will need from ECHO_EXPORTER in a single DB query.
# We can then use the full dataframe to specialize views of it.
full_echo_data = None
column_mapping = {
    '"REGISTRY_ID"': str,
    '"FAC_NAME"': str,
    '"FAC_LAT"': float,
    '"FAC_LONG"': float,
    '"AIR_IDS"': str,
    '"NPDES_IDS"': str,
    '"RCRA_IDS"': str,
    '"DFR_URL"': str,
    '"AIR_FLAG"': str,
    '"NPDES_FLAG"': str,
    '"GHG_FLAG"': str,
    '"RCRA_FLAG"': str,
    '"FAC_ACTIVE_FLAG"': str
}
column_names = list( column_mapping.keys() )
columns_string = ','.join( column_names )
sql = 'select ' + columns_string + ' from "ECHO_EXPORTER" where "AIR_FLAG" = \'Y\' or "NPDES_FLAG" = \'Y\' or "GHG_FLAG" = \'Y\' or "RCRA_FLAG" = \'Y\''
try:
    # Don't index.
    full_echo_data = get_data( sql )
except EmptyDataError:
    print("\nThere are no EPA facilities for this query.\n")
full_echo_data

Unnamed: 0,REGISTRY_ID,FAC_NAME,FAC_LAT,FAC_LONG,AIR_IDS,NPDES_IDS,RCRA_IDS,DFR_URL,AIR_FLAG,NPDES_FLAG,GHG_FLAG,RCRA_FLAG,FAC_ACTIVE_FLAG
0,1.100137e+11,TOPPER ONE HOUR CLEANER,39.987020,-75.161830,PAPAM0004210101795,,PAD982677197,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,Y,Y
1,1.100094e+11,PIEZAS EXTRA,18.252013,-66.036570,,,PRR000011601,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
2,1.100019e+11,QVC SUFFOLK INC,36.768620,-76.542970,VA0000005180000018,,VAD988168910,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,Y,Y
3,1.100342e+11,PARIS - HENRY COUNTY LANDFILL,36.315170,-88.362142,,TNR053299 TNR121673,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,Y
4,1.100384e+11,TATE METALWORKS INC,34.864320,-81.959520,SC00020600481,,,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,N,Y
5,1.100396e+11,STABIL CONCRETE PRODUCTS LLC,27.761690,-82.694070,FL0000001210300358,,FLR000212639,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,Y,Y
6,1.100015e+11,"SONOCO PROTECTIVE SOLUTIONS, INC.",36.151100,-78.730733,NC0000003703900087,NCG050199,,http://echo.epa.gov/detailed-facility-report?f...,Y,Y,N,N,
7,1.100432e+11,SEBASTICOOK VALLEY HEALTH,44.790820,-69.371240,ME000A001136,,MER000509307,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,Y,Y
8,1.100080e+11,NEW YORK POWER AUTHORTIY,42.377200,-76.949030,,,NYD981179716,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
9,1.100058e+11,US DOE BPA LEWISTON MAINT HDQRS,46.437070,-116.963380,,,ID6891435574,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,Y


## Number of Currently Regulated Facilities Per Program

In [4]:
air_fac = full_echo_data.loc[(full_echo_data["AIR_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
water_fac = full_echo_data.loc[(full_echo_data["NPDES_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
waste_fac = full_echo_data.loc[(full_echo_data["RCRA_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
ghg_fac = full_echo_data.loc[(full_echo_data["GHG_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]

display(HTML("<h3>There are "+ str(air_fac) + " facilities currently regulated under the Clean Air Act.</h3>"))
display(HTML("<h3>There are "+ str(water_fac) + " facilities currently regulated under the Clean Water Act.</h3>"))
display(HTML("<h3>There are "+ str(waste_fac) + " facilities currently regulated under RCRA (hazardous waste).</h3>"))
display(HTML("<h3>There are "+ str(ghg_fac) + " facilities currently reporting greenhouse gas emissions.</h3>"))

## Clean Air Act inspections in 2019

In [5]:
# Use SQL to search for and select the data about air stack tests
air_inspections = None
try:
    sql = 'select * from \"ICIS-AIR_FCES_PCES\" where \"ACTUAL_END_DATE\" like \'__-__-2019\''

    # Download the data from that URL
    air_inspections = get_data( sql, 'pgm_sys_id' )
except EmptyDataError:
    print( "No data found")

air_inspections

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,STATE_EPA_FLAG,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,COMP_MONITOR_TYPE_CODE,COMP_MONITOR_TYPE_DESC,ACTUAL_END_DATE,PROGRAM_CODES
0,04000CAPECARFL1,3601794661,E,INS,Inspection/Evaluation,PCE,PCE On-Site,03-27-2019,"CAAGACTM, CAAMACT"
1,020000003606390000,3601943049,E,INS,Inspection/Evaluation,PCE,PCE On-Site,07-23-2019,"CAACFC, CAAFESOP"
2,020000003606501000,3601851095,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,05-08-2019,CAAOP
3,020000003606501000,3601866216,E,INS,Inspection/Evaluation,FOO,FCE On-Site,05-01-2019,"CAANAM, CAAOP"
4,020000003606501000,3601972076,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-24-2019,CAASIP
5,020000003606501000,3601972077,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-17-2019,CAASIP
6,0500026147R5002,3602044831,E,INS,Inspection/Evaluation,PCE,PCE On-Site,10-30-2019,CAAMS
7,0500026163R5004,3602008140,E,INS,Inspection/Evaluation,PCE,PCE On-Site,10-08-2019,CAAMS
8,0500027003R5001,3601999934,E,INS,Inspection/Evaluation,PCE,PCE On-Site,09-20-2019,CAAMS
9,0500027009R5001,3601999946,E,INS,Inspection/Evaluation,PCE,PCE On-Site,09-19-2019,CAAMS


In [6]:
# Number of inspections in 2019 per 1000 regulated facilities

air_inspections_metric = formatter((air_inspections.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CAA"] = air_inspections_metric
display(HTML("<h3>"+ air_inspections_metric +" inspections per 1000 facilities</h3>"))

## High priority violations of the Clean Air Act in 2019



In [7]:
air_violations = None
try:
    sql = 'select * from "ICIS-AIR_VIOLATION_HISTORY" where "HPV_DAYZERO_DATE" like \'__-__-2019\''

    air_violations = get_data( sql, "pgm_sys_id" )

    # Remove "FACIL" violations, which are paperwork violations according to: https://19january2017snapshot.epa.gov/sites/production/files/2013-10/documents/frvmemo.pdf
    # air_violations = air_violations.loc[(air_violations["POLLUTANT_DESCS"]!="FACIL")]
except EmptyDataError:
    print( "No data found")
air_violations

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,AGENCY_TYPE_DESC,STATE_CODE,AIR_LCON_CODE,COMP_DETERMINATION_UID,ENF_RESPONSE_POLICY_CODE,PROGRAM_CODES,PROGRAM_DESCS,POLLUTANT_CODES,POLLUTANT_DESCS,EARLIEST_FRV_DETERM_DATE,HPV_DAYZERO_DATE,HPV_RESOLVED_DATE
0,IN0000001803300043,3601685463,State,IN,,IN000A74118,HPV,CAANSPS,New Source Performance Standards,10373,Particulate matter - PM10,12-03-2018,01-13-2019,
1,IN0000001801900008,3601799804,State,IN,,IN000A75598,HPV,CAAMACT,MACT Standards (40 CFR Part 63),300000036,Mercury,04-05-2019,04-30-2019,05-26-2020
2,IN0000001801900008,3602022157,State,IN,,IN000A78256,HPV,CAAMACT,MACT Standards (40 CFR Part 63),300000036,Mercury,09-25-2019,12-24-2019,
3,MN0000002704700055,3602241949,State,MN,,MN000A00001670PEN20191,HPV,CAATVP,Title V Permits,10193,Carbon monoxide,06-06-2018,04-23-2019,06-04-2020
4,PA000493288,3601889184,State,PA,,PA000A0000H00000000376058,HPV,CAAMACT CAANSPS CAASIP CAATVP,MACT Standards (40 CFR Part 63) New Source Per...,10358,Nitrogen oxides,06-06-2019,06-06-2019,04-23-2020
5,TX0000004820100031,3601982497,State,TX,,TX000A0779337472019262001,HPV,CAATVP,Title V Permits,300000329,FACIL,08-30-2019,08-30-2019,
6,AR0000000513900012,3602081708,State,AR,,AR000A79340,HPV,CAATVP,Title V Permits,300000329,FACIL,07-08-2019,07-08-2019,
7,AL0000000110300026,3602120488,State,AL,,AL000A79835,HPV,CAAMACT,MACT Standards (40 CFR Part 63),300000094,Hexane,04-03-2019,04-03-2019,
8,IL000031045AAJ,3601792118,State,IL,,IL000AA-2019-00010,HPV,CAASIP CAATVP,State Implementation Plan for National Primary...,10461 300000005 300000329,FACIL NITROGEN OXIDES NO2 Sulfur dioxide,03-03-2019,03-03-2019,
9,IL000197090AAI,3601954358,State,IL,,IL000AA-2019-00039,HPV,CAASIP CAATVP,State Implementation Plan for National Primary...,300000329,FACIL,08-06-2019,08-06-2019,11-26-2019


In [8]:
# Number of high priority violations per 1000 regulated facilities

#air_violations_fac = air_violations.shape[0] / len(air_violations["PGM_SYS_ID"].unique()) # Total number of violations divided by number of facilities with violations. Will use this later in looking at enforcement actions.
air_violations_metric = formatter((air_violations.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CAA"] = air_violations_metric
display(HTML("<h3>"+air_violations_metric+" high priority violations per 1000 facilities </h3>"))

## Formal Enforcement Actions and Penalties under the Clean Air Act in 2019

In [9]:
air_enforcements = None
try:
    sql = 'select * from "ICIS-AIR_FORMAL_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2019\''

    air_enforcements = get_data( sql, "pgm_sys_id" )
except EmptyDataError:
    print( "No data found")
air_enforcements

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,ENF_IDENTIFIER,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,STATE_EPA_FLAG,ENF_TYPE_CODE,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,PENALTY_AMOUNT
0,OH0000000627010056,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
1,OH0000000684000000,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
2,IN0000001802900002,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
3,IN0000001814700020,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
4,OH0000000165000006,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
5,OH0000000641050002,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
6,OH0000000616000000,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
7,IL000107035AAX,158290,05-2005-5009,JDC,Judicial,E,CIV,Civil Judicial Action,04/30/2019,0.0
8,IL000031069AAI,158290,05-2005-5009,JDC,Judicial,E,CIV,Civil Judicial Action,04/30/2019,0.0
9,WI0000005510100022,158290,05-2005-5009,JDC,Judicial,E,CIV,Civil Judicial Action,04/30/2019,0.0


In [10]:
# Number of formal actions in 2019 per violation

air_enforcements_metric = formatter(air_enforcements.shape[0]/air_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CAA"] = air_enforcements_metric
display(HTML("<h3>"+air_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [11]:
# Penalties each year per violating facility
air_penalties = air_enforcements.loc[air_enforcements["PENALTY_AMOUNT"]>0]
air_penalties_metric = formatter(sum(air_penalties["PENALTY_AMOUNT"]) / len(air_violations["PGM_SYS_ID"].unique())) #Divide the sum of penalties by number of violating facilities
air_penalties_max = formatter(max(air_penalties["PENALTY_AMOUNT"])) 
air_penalties_min = formatter(min(air_penalties["PENALTY_AMOUNT"])) 
penalties["CAA"] = air_penalties_metric
display(HTML("<h3>$"+air_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+air_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+air_penalties_min +"</h3>"))

---

## Clean Water Act inspections in 2019

In [12]:
# Find facilities with pollutant exceedences
water_inspections = None
try:
    sql = 'select "NPDES_ID", "REGISTRY_ID", "ACTUAL_END_DATE", "STATE_EPA_FLAG"' + \
        ' from "NPDES_INSPECTIONS" where "ACTUAL_END_DATE" like \'__/__/2019\''

    water_inspections = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_inspections

Unnamed: 0_level_0,REGISTRY_ID,ACTUAL_END_DATE,STATE_EPA_FLAG
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AK0000507,110030488620,12/17/2019,S
AK0021385,110000761453,08/26/2019,E
AK0022497,110039730459,02/25/2019,S
AKG370029,110028064387,08/21/2019,S
AKG524027,110042369807,12/12/2019,E
AKR06AA08,110070146690,04/25/2019,S
AKR06AE89,110000707423,03/28/2019,S
AKG520160,110009691440,07/19/2019,S
AKG520042,110064604094,06/07/2019,S
AKG520244,110007347335,06/06/2019,S


In [13]:
# Number of inspections in 2019 per 1000 regulated facilities
water_inspections_metric = formatter((water_inspections.shape[0] / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CWA"] = water_inspections_metric
display(HTML("<h3>"+water_inspections_metric +" inspections per 1000 facilities</h3>"))

## Effluent violations of the Clean Water Act in 2019
*NOTE*: Not other kind of violations (schedule, permit, single event)

In [14]:
# Find facilities with pollutant exceedences
water_violations = None
try:
    sql = 'select "NPDES_ID", "EXCEEDENCE_PCT", "MONITORING_PERIOD_END_DATE", "VIOLATION_CODE", "PARAMETER_DESC"' + \
        ' from "NPDES_EFF_VIOLATIONS" where "VIOLATION_CODE" like \'E90\' and "MONITORING_PERIOD_END_DATE" like \'__/__/2019\''
    water_violations = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_violations

Unnamed: 0_level_0,EXCEEDENCE_PCT,MONITORING_PERIOD_END_DATE,VIOLATION_CODE,PARAMETER_DESC
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
IN0020818,1.500000e+01,08/31/2019,E90,"Copper, total recoverable"
SC0026379,3.110000e+02,08/31/2019,E90,"Nitrogen, ammonia total [as N]"
IL0072192,6.000000e+00,12/31/2019,E90,"Nitrogen, ammonia total [as N]"
TX0034452,1.090000e+02,02/28/2019,E90,E. coli
AR0046973,3.200000e+01,03/31/2019,E90,"Solids, total suspended"
MO0095494,1.075000e+03,06/30/2019,E90,E. coli
OH0020028,4.000000e+00,05/31/2019,E90,"Phosphorus, total [as P]"
WV1017993,3.390000e+02,01/31/2019,E90,"Manganese, total [as Mn]"
TX0006050,1.000000e+01,10/31/2019,E90,"Chlorine, total residual"
KYG500132,9.999900e+04,06/30/2019,E90,Chloride [as Cl]


In [15]:
# Number of violations each year per 1000 regulated facilities

water_violations_metric = formatter((water_violations.shape[0] / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CWA"] = water_violations_metric
display(HTML("<h3>"+water_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under the Clean Water Act in 2019

In [16]:
# Find facilities with pollutant exceedences
water_enforcements = None
try:
    sql = 'select "NPDES_ID", "AGENCY", "ENF_TYPE_DESC", "SETTLEMENT_ENTERED_DATE", "FED_PENALTY_ASSESSED_AMT", "STATE_LOCAL_PENALTY_AMT"' + \
        ' from "NPDES_FORMAL_ENFORCEMENT_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2019\''

    water_enforcements = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_enforcements

Unnamed: 0_level_0,AGENCY,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,FED_PENALTY_ASSESSED_AMT,STATE_LOCAL_PENALTY_AMT
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AKR06AA89,EPA,CWA 309A AO For Compliance,03/18/2019,,
AL0032310,EPA,CWA 309G2B AO For Class II Penalties,10/28/2019,50000.0,
AL0080225,State,State Administrative Order of Consent,12/19/2019,,0.00
AR0048879,State,State Administrative Order of Consent,07/10/2019,,
ARU001243,EPA,CWA 309A AO For Compliance,07/24/2019,,
CA0048127,State,State CWA Penalty AO,05/21/2019,,
CAC193396,State,State CWA Penalty AO,09/17/2019,,
FL0001465,State,State CWA Penalty AO,07/11/2019,,10000.00
FL0043079,State,State CWA Penalty AO,01/08/2019,,0.00
GA0001261,State,State Administrative Order of Consent,12/20/2019,,2255.00


In [17]:
# Number of formal actions in 2019 per violation
water_enforcements_metric = formatter(water_enforcements.shape[0]/water_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CWA"] = water_enforcements_metric
display(HTML("<h3>"+water_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [18]:
# Penalties each year per violating facility
water_penalties = water_enforcements.loc[water_enforcements["FED_PENALTY_ASSESSED_AMT"]>0]
water_penalties_metric = formatter(sum(water_penalties["FED_PENALTY_ASSESSED_AMT"]) / len(water_violations.index.unique())) #Divide the sum of penalties by number of penalized facilities
water_penalties_max = formatter(max(water_penalties["FED_PENALTY_ASSESSED_AMT"])) 
water_penalties_min = formatter(min(water_penalties["FED_PENALTY_ASSESSED_AMT"]))
penalties["CWA"] = water_penalties_metric
display(HTML("<h3>$"+water_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+water_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+water_penalties_min +"</h3>"))

## RCRA inspections in 2019

In [19]:
# Find facilities with pollutant exceedences
waste_inspections = None
try:
    sql = 'select * from "RCRA_EVALUATIONS" where "EVALUATION_START_DATE" like \'__/__/2019\''

    waste_inspections = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_inspections

Unnamed: 0_level_0,ACTIVITY_LOCATION,EVALUATION_IDENTIFIER,EVALUATION_TYPE,EVALUATION_DESC,EVALUATION_AGENCY,EVALUATION_START_DATE,FOUND_VIOLATION
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MAD982196164,MA,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,08/07/2019,N
WID006129225,WI,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,10/07/2019,Y
KYR000056879,KY,001,NRR,NON-FINANCIAL RECORD REVIEW,S,10/11/2019,N
KYD006376347,KY,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,09/25/2019,N
KYD981853005,KY,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,10/11/2019,N
NCD986166338,NC,143,FCI,FOCUSED COMPLIANCE INSPECTION,S,10/11/2019,N
ARD981147283,AR,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,11/15/2019,Y
KYD053348108,KY,002,FRR,FINANCIAL RECORD REVIEW,S,11/18/2019,N
KYD981027469,KY,001,FRR,FINANCIAL RECORD REVIEW,S,11/18/2019,N
TNR000027763,TN,001,FUI,FOLLOW-UP INSPECTION,S,02/11/2019,Y


In [20]:
# Number of inspections in 2019 per 1000 regulated facilities
waste_inspections_metric = formatter((waste_inspections.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["RCRA"] = waste_inspections_metric
display(HTML("<h3>"+waste_inspections_metric+" inspections per 1000 facilities</h3>"))

## Violations of RCRA in 2019

In [21]:
# Find facilities with pollutant exceedences
waste_violations = None
try:
    sql = 'select * from "RCRA_VIOLATIONS" where "DATE_VIOLATION_DETERMINED" like \'__/__/2019\''

    waste_violations = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_violations

Unnamed: 0_level_0,ACTIVITY_LOCATION,VIOLATION_TYPE,VIOLATION_TYPE_DESC,VIOL_DETERMINED_BY_AGENCY,DATE_VIOLATION_DETERMINED,ACTUAL_RTC_DATE,SCHEDULED_COMPLIANCE_DATE
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MID006020275,MI,262.D,Standards Applicable to Recordkeeping and Repo...,S,11/25/2019,,
WVR000512293,WV,262.M,Standards Applicable to Generators of HW: Prep...,S,12/23/2019,04/29/2020,
LAD981512460,LA,262.C,Standards Applicable to Generators of HW: Pre-...,S,06/20/2019,11/12/2019,
WIR000171884,WI,279.C,Standards for Used Oil: Generators,S,06/19/2019,08/27/2019,08/21/2019
TNR000005439,TN,XXS,State Statutory or Regulatory requirements tha...,S,11/06/2019,11/06/2019,
FLR000232264,FL,262.A,Standards Applicable to Generators of HW: General,S,08/08/2019,08/27/2019,
OHD987054061,OH,279.C,Standards for Used Oil: Generators,S,02/19/2019,02/21/2019,
TXD982558710,TX,265.D,Interim Status Standards for Owners and Operat...,S,10/21/2019,11/25/2019,
RIR000501171,RI,273.B,Standards for Universal Waste Management: Stan...,S,05/07/2019,07/09/2019,
WAD980977011,WA,261.A,ID and Listing of HW: General,S,01/30/2019,03/27/2019,03/27/2019


In [22]:
# Number of violations in 2019 per 1000 regulated facilities
waste_violations_metric = formatter((waste_violations.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["RCRA"] = waste_violations_metric
display(HTML("<h3>"+waste_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under RCRA in 2019

In [23]:
# Find facilities with enforcement actions
waste_enforcements = None
try:
    sql = 'select * from "RCRA_ENFORCEMENTS" where "ENFORCEMENT_ACTION_DATE" like \'__/__/2019\''

    waste_enforcements = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_enforcements

Unnamed: 0_level_0,ACTIVITY_LOCATION,ENFORCEMENT_IDENTIFIER,ENFORCEMENT_TYPE,ENFORCEMENT_DESC,ENFORCEMENT_AGENCY,ENFORCEMENT_ACTION_DATE,PMP_AMOUNT,FMP_AMOUNT,FSC_AMOUNT,SCR_AMOUNT
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AKD991281023,AK,001,HQ120,WRITTEN INFORMAL,E,11/19/2019,,,,
ALD046481032,AL,001,AL115,WARNING LETTER,S,06/10/2019,,,,
ALD077647691,AL,001,HQ310,FINAL 3008(A) COMPLIANCE ORDER,S,02/21/2019,,,,
ALD057202558,AL,001,HQ120,WRITTEN INFORMAL,S,05/30/2019,,,,
ALD981472798,AL,001,HQ120,WRITTEN INFORMAL,S,07/26/2019,,,,
ALD981020894,AL,001,HQ140,LETTER OF INTENT TO INITIATE ENFORCEMENT ACTION,S,03/26/2019,,,,
ALR000048173,AL,001,AL115,WARNING LETTER,S,04/23/2019,,,,
ALD983191776,AL,001,HQ310,FINAL 3008(A) COMPLIANCE ORDER,S,08/06/2019,,13000.0,,
ALR000043430,AL,001,HQ120,WRITTEN INFORMAL,E,11/12/2019,,,,
ALR000056564,AL,001,HQ120,WRITTEN INFORMAL,S,08/16/2019,,,,


In [24]:
# Number of enforcement actions each year per violation
waste_enforcements_metric = formatter(waste_enforcements.shape[0] / waste_violations.shape[0])
enforcements["RCRA"] = waste_enforcements_metric
display(HTML("<h3>"+waste_enforcements_metric+" enforcement actions per violation</h3>"))

In [25]:
# Penalties each year per violating facility
waste_penalties = waste_enforcements.loc[waste_enforcements["FMP_AMOUNT"]>0]
waste_penalties_metric = formatter(sum(waste_penalties["FMP_AMOUNT"]) / len(waste_violations.index.unique())) #Divide by penalized facilities
waste_penalties_max = formatter(max(waste_penalties["FMP_AMOUNT"]))
waste_penalties_min = formatter(min(waste_penalties["FMP_AMOUNT"]))
penalties["RCRA"] = waste_penalties_metric
display(HTML("<h3>$"+waste_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+waste_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+waste_penalties_min +"</h3>"))

## Greenhouse Gas Emissions in 2018 (latest data available)

In [None]:
# Find facilities with pollutant exceedences
ghg_emissions = None
try:
    sql = 'select * from "POLL_RPT_COMBINED_EMISSIONS" where "REPORTING_YEAR" = \'2018\' and "PGM_SYS_ACRNM" = \'E-GGRT\''

    ghg_emissions = get_data( sql) 
except EmptyDataError:
    print( "No data found")
ghg_emissions

In [None]:
# Emissions in 2019 per facility
ghg_emissions_metric = formatter(np.nansum(ghg_emissions["ANNUAL_EMISSION"]) / len(ghg_emissions["REGISTRY_ID"].unique())) #Divide by reporting facility
ghg_emissions_fac = ghg_emissions.groupby("PGM_SYS_ID")[["ANNUAL_EMISSION"]].sum() # Group by facility
ghg_emissions_max = formatter(np.nanmax(ghg_emissions_fac["ANNUAL_EMISSION"]))
ghg_emissions_min = formatter(np.nanmin(ghg_emissions_fac.loc[ghg_emissions_fac["ANNUAL_EMISSION"]>0]["ANNUAL_EMISSION"]))
emissions["GHG"] = ghg_emissions_metric
display(HTML("<h3>"+ghg_emissions_metric+" MTCO2e (metric tons of carbon dioxide equivalent) emissions per reporting facility</h3>"))
display(HTML("<h3>Max: "+ghg_emissions_max+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
display(HTML("<h3>Min: "+ghg_emissions_min+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
ghg_emissions_fac

# Data Export

In [None]:
data = [inspections,
violations,
enforcements,
penalties,
emissions]

units = ["#inspections per 1000",
"#violations per 1000",
"#actions per facility in violation",
"$ per facility in violation",
"amount of emissions (metric tons)"]

short_units = ["inspectionsper1000",
"violationsper1000",
"enforcementsperviolatingfacility",
"penaltiesperviolatingfacility",
"emissions2018"]

for index, program in enumerate(data):
    # create dataframe
    df = pd.DataFrame(program, index=[0]).T
    df = df.rename(columns={0: units[index]})
    filename= short_units[index]+"_All_USA_pg4_081120.csv" #active-facilities_All_MA-CD4_3b_080620.csv
    df.to_csv(filename)
    print(df)