| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: 
#### The notebook was collaboratively authored by the Environmental Data & Governance Initiative (EDGI) following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

## How to Run this Notebook
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---

# Nationwide statistics about environmental compliance trends

## Setup
Here we load some helper code to get us going.

In [None]:
# Import code libraries
!git clone https://github.com/edgi-govdata-archiving/ECHO_modules.git &>/dev/null;
%run ECHO_modules/DataSet.py

import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import requests
import csv
import datetime
import folium
from folium.plugins import FastMarkerCluster
import ipywidgets as widgets
from IPython.core.display import display, HTML
from pandas.errors import EmptyDataError
def formatter(value):
  return "{:0.2f}".format(value)

Here we set up some code to help us store and eventually export the metrics.

In [None]:
inspections = dict()
violations = dict()
enforcements = dict()
penalties = dict()
emissions = dict()

## Start getting data
First, get summary data from the ECHO_EXPORTER table.

In [None]:
# Get everything we will need from ECHO_EXPORTER in a single DB query.
# We can then use the full dataframe to specialize views of it.
full_echo_data = None
column_mapping = {
    '"REGISTRY_ID"': str,
    '"FAC_NAME"': str,
    '"FAC_LAT"': float,
    '"FAC_LONG"': float,
    '"AIR_IDS"': str,
    '"NPDES_IDS"': str,
    '"RCRA_IDS"': str,
    '"DFR_URL"': str,
    '"AIR_FLAG"': str,
    '"NPDES_FLAG"': str,
    '"GHG_FLAG"': str,
    '"RCRA_FLAG"': str,
    '"FAC_ACTIVE_FLAG"': str
}
column_names = list( column_mapping.keys() )
columns_string = ','.join( column_names )
sql = 'select ' + columns_string + ' from "ECHO_EXPORTER" where "AIR_FLAG" = \'Y\' or "NPDES_FLAG" = \'Y\' or "GHG_FLAG" = \'Y\' or "RCRA_FLAG" = \'Y\''
try:
    # Don't index.
    full_echo_data = get_data( sql )
except EmptyDataError:
    print("\nThere are no EPA facilities for this query.\n")
full_echo_data

Unnamed: 0,REGISTRY_ID,FAC_NAME,FAC_LAT,FAC_LONG,AIR_IDS,NPDES_IDS,RCRA_IDS,DFR_URL,AIR_FLAG,NPDES_FLAG,GHG_FLAG,RCRA_FLAG,FAC_ACTIVE_FLAG
0,1.100042e+11,INSTRUMENTAL ENGINEERING,41.023599,-74.202748,,,NJD089751200,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
1,1.100006e+11,GREAT WESTERN CHEMICAL COMPANY PASCO,46.283833,-119.100216,,,WAH000002253,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
2,1.100706e+11,PALM PLAZA,28.818830,-81.887510,,FLR10SZ00,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,Y
3,1.100202e+11,WILDLIFE RECREATION POND 1,32.247591,-87.791091,,ALR165713,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,
4,1.100557e+11,A-1 AUTO REPAIR,37.951040,-121.999450,,,CAL000253375,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1860451,1.100707e+11,MONTAGE HEALTH,36.583162,-121.857242,,,CAC003029875,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
1860452,1.100700e+11,WYNN RESORT,42.395000,-71.066800,,MAR053953,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,
1860453,1.100073e+11,"WALLER, R P OIL BP",37.836731,-76.279640,VA0000005113300013,,,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,N,Y
1860454,1.100330e+11,BATAVIA HAULING COMPANY,41.846386,-88.295523,,ILR10J237,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,


## Number of Currently Regulated Facilities Per Program

In [None]:
air_fac = full_echo_data.loc[(full_echo_data["AIR_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
water_fac = full_echo_data.loc[(full_echo_data["NPDES_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
waste_fac = full_echo_data.loc[(full_echo_data["RCRA_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
ghg_fac = full_echo_data.loc[(full_echo_data["GHG_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]

display(HTML("<h3>There are "+ str(air_fac) + " facilities currently regulated under the Clean Air Act.</h3>"))
display(HTML("<h3>There are "+ str(water_fac) + " facilities currently regulated under the Clean Water Act.</h3>"))
display(HTML("<h3>There are "+ str(waste_fac) + " facilities currently regulated under RCRA (hazardous waste).</h3>"))
display(HTML("<h3>There are "+ str(ghg_fac) + " facilities currently reporting greenhouse gas emissions.</h3>"))

## Clean Air Act inspections in 2019

In [None]:
# Use SQL to search for and select the data about air stack tests
air_inspections = None
try:
    sql = 'select * from \"ICIS-AIR_FCES_PCES\" where \"ACTUAL_END_DATE\" like \'__-__-2019\''

    # Download the data from that URL
    air_inspections = get_data( sql, 'pgm_sys_id' )
except EmptyDataError:
    print( "No data found")

air_inspections

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,STATE_EPA_FLAG,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,COMP_MONITOR_TYPE_CODE,COMP_MONITOR_TYPE_DESC,ACTUAL_END_DATE,PROGRAM_CODES
0,020000003606390000,3601943049,E,INS,Inspection/Evaluation,PCE,PCE On-Site,07-23-2019,"CAACFC, CAAFESOP"
1,020000003606501000,3601851095,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,05-08-2019,CAAOP
2,020000003606501000,3601866216,E,INS,Inspection/Evaluation,FOO,FCE On-Site,05-01-2019,"CAANAM, CAAOP"
3,020000003606501000,3601972076,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-24-2019,CAASIP
4,020000003606501000,3601972077,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-17-2019,CAASIP
...,...,...,...,...,...,...,...,...,...
48869,WAPSC0005305312230,3602049948,L,INS,Inspection/Evaluation,POR,PCE On-Site Record/Report Review,10-09-2019,CAANSPS
48870,WAPSC0005305315100,3601898380,L,INS,Inspection/Evaluation,POR,PCE On-Site Record/Report Review,06-04-2019,"CAAMACT, CAANSPS"
48871,WAPSC0005305316052,3601881242,L,INS,Inspection/Evaluation,FOO,FCE On-Site,05-15-2019,CAAFESOP
48872,WAPSC0005305316052,3601881245,L,INS,Inspection/Evaluation,POR,PCE On-Site Record/Report Review,05-15-2019,CAAFESOP


In [None]:
# Number of inspections in 2019 per 1000 regulated facilities

air_inspections_metric = formatter((air_inspections.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CAA"] = air_inspections_metric
display(HTML("<h3>"+ air_inspections_metric +" inspections per 1000 facilities</h3>"))

## High priority violations of the Clean Air Act in 2019



In [None]:
air_violations = None
try:
    sql = 'select * from "ICIS-AIR_VIOLATION_HISTORY" where "HPV_DAYZERO_DATE" like \'__-__-2019\''

    air_violations = get_data( sql, "pgm_sys_id" )

    # Remove "FACIL" violations, which are paperwork violations according to: https://19january2017snapshot.epa.gov/sites/production/files/2013-10/documents/frvmemo.pdf
    # air_violations = air_violations.loc[(air_violations["POLLUTANT_DESCS"]!="FACIL")]
except EmptyDataError:
    print( "No data found")
air_violations

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,AGENCY_TYPE_DESC,STATE_CODE,AIR_LCON_CODE,COMP_DETERMINATION_UID,ENF_RESPONSE_POLICY_CODE,PROGRAM_CODES,PROGRAM_DESCS,POLLUTANT_CODES,POLLUTANT_DESCS,EARLIEST_FRV_DETERM_DATE,HPV_DAYZERO_DATE,HPV_RESOLVED_DATE
0,IN0000001803300043,3601685463,State,IN,,IN000A74118,HPV,CAANSPS,New Source Performance Standards,10373,Particulate matter - PM10,12-03-2018,01-13-2019,
1,IN0000001801900008,3601799804,State,IN,,IN000A75598,HPV,CAAMACT,MACT Standards (40 CFR Part 63),300000036,Mercury,04-05-2019,04-30-2019,05-26-2020
2,IN0000001801900008,3602022157,State,IN,,IN000A78256,HPV,CAAMACT,MACT Standards (40 CFR Part 63),300000036,Mercury,09-25-2019,12-24-2019,
3,MN0000002704700055,3602241949,State,MN,,MN000A00001670PEN20191,HPV,CAATVP,Title V Permits,10193,Carbon monoxide,06-06-2018,04-23-2019,06-04-2020
4,PA000493288,3601889184,State,PA,,PA000A0000H00000000376058,HPV,CAAMACT CAANSPS CAASIP CAATVP,MACT Standards (40 CFR Part 63) New Source Per...,10358,Nitrogen oxides,06-06-2019,06-06-2019,04-23-2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
814,IN0000001801900155,3602134543,State,IN,,IN000A80006,HPV,CAATVP,Title V Permits,300000018,Hydrochloric acid,12-16-2019,12-16-2019,
815,WAORC005302700058,3602032106,Local,WA,ORC,WAORCA78445,HPV,CAATVP,Title V Permits,300000243,VOLATILE ORGANIC COMPOUNDS (VOCS),10-29-2019,10-29-2019,
816,CO0000000812309FC0,3602053305,State,CO,,CO000A00000812309FC000001,HPV,CAASIP,State Implementation Plan for National Primary...,300000243,VOLATILE ORGANIC COMPOUNDS (VOCS),06-25-2019,06-25-2019,10-03-2019
817,AL0000000104900048,3602051772,State,AL,,AL000A78887,HPV,CAATVP,Title V Permits,300000328,ADMIN,11-05-2019,11-05-2019,


In [None]:
# Number of high priority violations per 1000 regulated facilities

#air_violations_fac = air_violations.shape[0] / len(air_violations["PGM_SYS_ID"].unique()) # Total number of violations divided by number of facilities with violations. Will use this later in looking at enforcement actions.
air_violations_metric = formatter((air_violations.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CAA"] = air_violations_metric
display(HTML("<h3>"+air_violations_metric+" high priority violations per 1000 facilities </h3>"))

## Formal Enforcement Actions and Penalties under the Clean Air Act in 2019

In [None]:
air_enforcements = None
try:
    sql = 'select * from "ICIS-AIR_FORMAL_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2019\''

    air_enforcements = get_data( sql, "pgm_sys_id" )
except EmptyDataError:
    print( "No data found")
air_enforcements

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,ENF_IDENTIFIER,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,STATE_EPA_FLAG,ENF_TYPE_CODE,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,PENALTY_AMOUNT
0,OH0000000627010056,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
1,OH0000000684000000,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
2,IN0000001802900002,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
3,IN0000001814700020,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
4,OH0000000165000006,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
...,...,...,...,...,...,...,...,...,...,...
2643,HI0000001500700066,3602236126,HI000AEA93,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,07/11/2019,22800.0
2644,MI00000000000N7688,3602237523,MI000AN7688FRV0000038302,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,08/08/2019,54600.0
2645,WASPC0005306310023,3602245188,WASPCA200188484,AFR,Administrative - Formal,L,SCAAAO,Administrative Order,12/31/2019,32000.0
2646,LA0000002212500007,3602258171,LA000A2573011,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,01/11/2019,7597.9


In [None]:
# Number of formal actions in 2019 per violation

air_enforcements_metric = formatter(air_enforcements.shape[0]/air_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CAA"] = air_enforcements_metric
display(HTML("<h3>"+air_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [None]:
# Penalties each year per violating facility
air_penalties = air_enforcements.loc[air_enforcements["PENALTY_AMOUNT"]>0]
air_penalties_metric = formatter(sum(air_penalties["PENALTY_AMOUNT"]) / len(air_violations["PGM_SYS_ID"].unique())) #Divide the sum of penalties by number of violating facilities
air_penalties_max = formatter(max(air_penalties["PENALTY_AMOUNT"])) 
air_penalties_min = formatter(min(air_penalties["PENALTY_AMOUNT"])) 
penalties["CAA"] = air_penalties_metric
display(HTML("<h3>$"+air_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+air_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+air_penalties_min +"</h3>"))

---

## Clean Water Act inspections in 2019

In [None]:
# Find facilities with pollutant exceedences
water_inspections = None
try:
    sql = 'select "NPDES_ID", "REGISTRY_ID", "ACTUAL_END_DATE", "STATE_EPA_FLAG"' + \
        ' from "NPDES_INSPECTIONS" where "ACTUAL_END_DATE" like \'__/__/2019\''

    water_inspections = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_inspections

Unnamed: 0_level_0,REGISTRY_ID,ACTUAL_END_DATE,STATE_EPA_FLAG
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AK0000507,110030488620,12/17/2019,S
AK0021385,110000761453,08/26/2019,E
AK0022497,110039730459,02/25/2019,S
AKG370029,110028064387,08/21/2019,S
AKG520042,110064604094,06/07/2019,S
...,...,...,...
WYR320686,110055196746,11/07/2019,S
WYR320871,110070235311,03/25/2019,S
WV0038113,110000585894,09/17/2019,S
WV0038288,110070259137,08/20/2019,S


In [None]:
# Number of inspections in 2019 per 1000 regulated facilities

water_inspections_metric = formatter((water_inspections.shape[0] / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CWA"] = water_inspections_metric
display(HTML("<h3>"+water_inspections_metric +" inspections per 1000 facilities</h3>"))

## Effluent violations of the Clean Water Act in 2019
*NOTE*: Not other kind of violations (schedule, permit, single event)

In [None]:
# Find facilities with pollutant exceedences
water_violations = None
try:
    sql = 'select "NPDES_ID", "EXCEEDENCE_PCT", "MONITORING_PERIOD_END_DATE", "VIOLATION_CODE", "PARAMETER_DESC"' + \
        ' from "NPDES_EFF_VIOLATIONS" where "VIOLATION_CODE" like \'E90\' and "MONITORING_PERIOD_END_DATE" like \'__/__/2019\''
    water_violations = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_violations

Unnamed: 0_level_0,EXCEEDENCE_PCT,MONITORING_PERIOD_END_DATE,VIOLATION_CODE,PARAMETER_DESC
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
MA0100765,19.0,08/31/2019,E90,Enterococci
WVG551350,1400.0,03/31/2019,E90,"Coliform, fecal general"
KY0076635,728.0,01/31/2019,E90,E. coli
KY0105082,10.0,01/31/2019,E90,"Solids, total suspended"
VA0085936,8.0,07/31/2019,E90,"Solids, total dissolved"
...,...,...,...,...
LA0079057,80.0,11/30/2019,E90,"Solids, total suspended"
ID0020265,13.0,11/30/2019,E90,"BOD, 5-day, 20 deg. C"
LAG534438,264.0,12/31/2019,E90,"Coliform, fecal general"
ALG250007,5163.0,12/31/2019,E90,"Chlorine, total residual"


In [None]:
# Number of violations each year per 1000 regulated facilities

water_violations_metric = formatter((water_violations.shape[0] / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CWA"] = water_violations_metric
display(HTML("<h3>"+water_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under the Clean Water Act in 2019

In [None]:
# Find facilities with pollutant exceedences
water_enforcements = None
try:
    sql = 'select "NPDES_ID", "AGENCY", "ENF_TYPE_DESC", "SETTLEMENT_ENTERED_DATE", "FED_PENALTY_ASSESSED_AMT", "STATE_LOCAL_PENALTY_AMT"' + \
        ' from "NPDES_FORMAL_ENFORCEMENT_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2019\''

    water_enforcements = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_enforcements

Unnamed: 0_level_0,AGENCY,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,FED_PENALTY_ASSESSED_AMT,STATE_LOCAL_PENALTY_AMT
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ALP000225,State,State Administrative Order of Consent,10/17/2019,,29150.0
ALR10BDL7,State,State Administrative Order of Consent,07/19/2019,,14000.0
AR0035696,State,State Administrative Order of Consent,07/10/2019,,600.0
AR0048879,State,State Administrative Order of Consent,07/10/2019,,
CA0053597,State,State CWA Penalty AO,10/23/2019,,
...,...,...,...,...,...
TNU101369,State,State Administrative Order of Consent,10/16/2019,,3840.0
TXR05CA36,State,State CWA Penalty AO,02/19/2019,,25000.0
TXR05EJ33,State,State CWA Penalty AO,09/10/2019,,875.0
TXR15804V,State,State CWA Penalty AO,08/13/2019,,875.0


In [None]:
# Number of formal actions in 2019 per violation
water_enforcements_metric = formatter(water_enforcements.shape[0]/water_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CWA"] = water_enforcements_metric
display(HTML("<h3>"+water_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [None]:
# Penalties each year per violating facility
water_penalties = water_enforcements.loc[water_enforcements["FED_PENALTY_ASSESSED_AMT"]>0]
water_penalties_metric = formatter(sum(water_penalties["FED_PENALTY_ASSESSED_AMT"]) / len(water_violations.index.unique())) #Divide the sum of penalties by number of penalized facilities
water_penalties_max = formatter(max(water_penalties["FED_PENALTY_ASSESSED_AMT"])) 
water_penalties_min = formatter(min(water_penalties["FED_PENALTY_ASSESSED_AMT"]))
penalties["CWA"] = water_penalties_metric
display(HTML("<h3>$"+water_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+water_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+water_penalties_min +"</h3>"))

## RCRA inspections in 2019

In [None]:
# Find facilities with pollutant exceedences
waste_inspections = None
try:
    sql = 'select * from "RCRA_EVALUATIONS" where "EVALUATION_START_DATE" like \'__/__/2019\''

    waste_inspections = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_inspections

Unnamed: 0_level_0,ACTIVITY_LOCATION,EVALUATION_IDENTIFIER,EVALUATION_TYPE,EVALUATION_DESC,EVALUATION_AGENCY,EVALUATION_START_DATE,FOUND_VIOLATION
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ORQ000030450,OR,004,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,01/17/2019,Y
MAD982196164,MA,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,08/07/2019,N
ARD981147283,AR,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,11/15/2019,Y
KYD053348108,KY,002,FRR,FINANCIAL RECORD REVIEW,S,11/18/2019,N
KYD981027469,KY,001,FRR,FINANCIAL RECORD REVIEW,S,11/18/2019,N
...,...,...,...,...,...,...,...
ARD093417525,AR,001,NRR,NON-FINANCIAL RECORD REVIEW,S,08/12/2019,N
IDR000206581,ID,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,09/06/2019,Y
IDD073114654,ID,389,FCI,FOCUSED COMPLIANCE INSPECTION,S,09/05/2019,N
ARD983274911,AR,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,08/20/2019,Y


In [None]:
# Number of inspections in 2019 per 1000 regulated facilities
waste_inspections_metric = formatter((waste_inspections.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["RCRA"] = waste_inspections_metric
display(HTML("<h3>"+waste_inspections_metric+" inspections per 1000 facilities</h3>"))

## Violations of RCRA in 2019

In [None]:
# Find facilities with pollutant exceedences
waste_violations = None
try:
    sql = 'select * from "RCRA_VIOLATIONS" where "DATE_VIOLATION_DETERMINED" like \'__/__/2019\''

    waste_violations = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_violations

Unnamed: 0_level_0,ACTIVITY_LOCATION,VIOLATION_TYPE,VIOLATION_TYPE_DESC,VIOL_DETERMINED_BY_AGENCY,DATE_VIOLATION_DETERMINED,ACTUAL_RTC_DATE,SCHEDULED_COMPLIANCE_DATE
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MID006020275,MI,262.D,Standards Applicable to Recordkeeping and Repo...,S,11/25/2019,,
DEN201500044,DE,261.A,ID and Listing of HW: General,S,05/22/2019,05/22/2019,
SCD069316271,SC,273.B,Standards for Universal Waste Management: Stan...,S,12/17/2019,03/03/2020,
LAD981512460,LA,262.C,Standards Applicable to Generators of HW: Pre-...,S,06/20/2019,11/12/2019,
CAR000191650,CA,262.A,Standards Applicable to Generators of HW: General,E,10/01/2019,12/04/2019,
...,...,...,...,...,...,...,...
FLD000823393,FL,273.B,Standards for Universal Waste Management: Stan...,E,04/02/2019,08/22/2019,
FLD000823393,FL,262.M,Standards Applicable to Generators of HW: Prep...,E,04/02/2019,08/22/2019,
MID985609056,MI,XXS,State Statutory or Regulatory requirements tha...,S,12/16/2019,,
MIK174443509,MI,262.D,Standards Applicable to Recordkeeping and Repo...,S,10/25/2019,11/07/2019,


In [None]:
# Number of violations in 2019 per 1000 regulated facilities
waste_violations_metric = formatter((waste_violations.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["RCRA"] = waste_violations_metric
display(HTML("<h3>"+waste_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under RCRA in 2019

In [None]:
# Find facilities with enforcement actions
waste_enforcements = None
try:
    sql = 'select * from "RCRA_ENFORCEMENTS" where "ENFORCEMENT_ACTION_DATE" like \'__/__/2019\''

    waste_enforcements = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_enforcements

Unnamed: 0_level_0,ACTIVITY_LOCATION,ENFORCEMENT_IDENTIFIER,ENFORCEMENT_TYPE,ENFORCEMENT_DESC,ENFORCEMENT_AGENCY,ENFORCEMENT_ACTION_DATE,PMP_AMOUNT,FMP_AMOUNT,FSC_AMOUNT,SCR_AMOUNT
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AKD991281023,AK,001,HQ120,WRITTEN INFORMAL,E,11/19/2019,,,,
ALD046481032,AL,001,AL115,WARNING LETTER,S,06/10/2019,,,,
ALD057202558,AL,001,HQ120,WRITTEN INFORMAL,S,05/30/2019,,,,
ALD077647691,AL,001,HQ310,FINAL 3008(A) COMPLIANCE ORDER,S,02/21/2019,,,,
ALD981020894,AL,001,HQ140,LETTER OF INTENT TO INITIATE ENFORCEMENT ACTION,S,03/26/2019,,,,
...,...,...,...,...,...,...,...,...,...,...
WVR000547760,WV,001,HQ120,WRITTEN INFORMAL,S,11/21/2019,,,,
WIR000160010,WI,001,WI124,NOTICE OF NONCOMPLIANCE LETTER,S,05/01/2019,,,,
WIR000160150,WI,001,WI124,NOTICE OF NONCOMPLIANCE LETTER,S,02/11/2019,,,,
WIR000166371,WI,001,WI124,NOTICE OF NONCOMPLIANCE LETTER,S,06/12/2019,,,,


In [None]:
# Number of enforcement actions each year per violation
waste_enforcements_metric = formatter(waste_enforcements.shape[0] / waste_violations.shape[0])
enforcements["RCRA"] = waste_enforcements_metric
display(HTML("<h3>"+waste_enforcements_metric+" enforcement actions per violation</h3>"))

In [None]:
# Penalties each year per violating facility
waste_penalties = waste_enforcements.loc[waste_enforcements["FMP_AMOUNT"]>0]
waste_penalties_metric = formatter(sum(waste_penalties["FMP_AMOUNT"]) / len(waste_violations.index.unique())) #Divide by penalized facilities
waste_penalties_max = formatter(max(waste_penalties["FMP_AMOUNT"]))
waste_penalties_min = formatter(min(waste_penalties["FMP_AMOUNT"]))
penalties["RCRA"] = waste_penalties_metric
display(HTML("<h3>$"+waste_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+waste_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+waste_penalties_min +"</h3>"))

## Greenhouse Gas Emissions in 2018 (latest data available)

In [None]:
# Find facilities with pollutant exceedences
ghg_emissions = None
try:
    sql = 'select * from "POLL_RPT_COMBINED_EMISSIONS" where "REPORTING_YEAR" = \'2018\' and "PGM_SYS_ACRNM" = \'E-GGRT\''

    ghg_emissions = get_data( sql) 
except EmptyDataError:
    print( "No data found")
ghg_emissions

Unnamed: 0,REPORTING_YEAR,REGISTRY_ID,PGM_SYS_ACRNM,PGM_SYS_ID,POLLUTANT_NAME,ANNUAL_EMISSION,UNIT_OF_MEASURE,NEI_TYPE,NEI_HAP_VOC_FLAG
0,2018,110013317035,E-GGRT,1006363,Nitrous oxide,5.364,MTCO2e,,
1,2018,110001697799,E-GGRT,1003406,Carbon dioxide,20371.500,MTCO2e,,
2,2018,110001697799,E-GGRT,1003406,Methane,101086.750,MTCO2e,,
3,2018,110001697799,E-GGRT,1003406,Nitrous oxide,73.606,MTCO2e,,
4,2018,110001697799,E-GGRT,1003406,Carbon dioxide,647.000,MTCO2e,,
...,...,...,...,...,...,...,...,...,...
21678,2018,110007164736,E-GGRT,1002191,Methane,12632.500,MTCO2e,,
21679,2018,110007164736,E-GGRT,1002191,Carbon dioxide,38497.200,MTCO2e,,
21680,2018,110013970621,E-GGRT,1005241,Methane,13.000,MTCO2e,,
21681,2018,110013970621,E-GGRT,1005241,Nitrous oxide,15.496,MTCO2e,,


In [None]:
# Emissions in 2019 per facility
ghg_emissions_metric = formatter(np.nansum(ghg_emissions["ANNUAL_EMISSION"]) / len(ghg_emissions["REGISTRY_ID"].unique())) #Divide by reporting facility
ghg_emissions_fac = ghg_emissions.groupby("PGM_SYS_ID")[["ANNUAL_EMISSION"]].sum() # Group by facility
ghg_emissions_max = formatter(np.nanmax(ghg_emissions_fac["ANNUAL_EMISSION"]))
ghg_emissions_min = formatter(np.nanmin(ghg_emissions_fac.loc[ghg_emissions_fac["ANNUAL_EMISSION"]>0]["ANNUAL_EMISSION"]))
emissions["GHG"] = ghg_emissions_metric
display(HTML("<h3>"+ghg_emissions_metric+" MTCO2e (metric tons of carbon dioxide equivalent) emissions per reporting facility</h3>"))
display(HTML("<h3>Max: "+ghg_emissions_max+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
display(HTML("<h3>Min: "+ghg_emissions_min+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
ghg_emissions_fac

Unnamed: 0_level_0,ANNUAL_EMISSION
PGM_SYS_ID,Unnamed: 1_level_1
1000001,302529.480
1000002,110511.712
1000003,79393.210
1000004,55547.748
1000005,83863.020
...,...
1013419,28467.792
1013420,28356.444
1013481,29375.000
1013489,26073.752


# Data Export

In [None]:
data = [inspections,
violations,
enforcements,
penalties,
emissions]

units = ["#inspections per 1000",
"#violations per 1000",
"#actions per facility in violation",
"$ per facility in violation",
"amount of emissions (metric tons)"]

short_units = ["inspectionsper1000",
"violationsper1000",
"enforcementsperviolatingfacility",
"penaltiesperviolatingfacility",
"emissions2018"]

for index, program in enumerate(data):
    # create dataframe
    df = pd.DataFrame(program, index=[0]).T
    df = df.rename(columns={0: units[index]})
    filename= short_units[index]+"_All_USA_pg4_081120.csv" #active-facilities_All_MA-CD4_3b_080620.csv
    df.to_csv(filename)
    print(df)

     #inspections per 1000
CAA                 261.27
CWA                 148.41
RCRA                 37.30
     #violations per 1000
CAA                  4.38
CWA                413.62
RCRA                24.05
     #actions per facility in violation
CAA                                3.23
CWA                                0.03
RCRA                               0.48
     $ per facility in violation
CAA                    145670.14
CWA                       258.14
RCRA                     3169.69
    amount of emissions (metric tons)
GHG                         422134.34
