| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: 
#### The notebook was collaboratively authored by the Environmental Data & Governance Initiative (EDGI) following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

## How to Run this Notebook
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---

# Nationwide statistics about environmental compliance trends

## Setup
Here we load some helper code to get us going.

In [1]:
# Import code libraries
!git clone https://github.com/edgi-govdata-archiving/ECHO_modules.git &>/dev/null;
%run ECHO_modules/DataSet.py

import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import requests
import csv
import datetime
import folium
from folium.plugins import FastMarkerCluster
import ipywidgets as widgets
from IPython.core.display import display, HTML
from pandas.errors import EmptyDataError
def formatter(value):
  return "{:0.2f}".format(value)

Here we set up some code to help us store and eventually export the metrics.

In [2]:
inspections = dict()
violations = dict()
enforcements = dict()
penalties = dict()
emissions = dict()

## Start getting data
First, get summary data from the ECHO_EXPORTER table.

In [7]:
# Get everything we will need from ECHO_EXPORTER in a single DB query.
# We can then use the full dataframe to specialize views of it.
full_echo_data = None
column_mapping = {
    '"REGISTRY_ID"': str,
    '"FAC_NAME"': str,
    '"FAC_LAT"': float,
    '"FAC_LONG"': float,
    '"AIR_IDS"': str,
    '"NPDES_IDS"': str,
    '"RCRA_IDS"': str,
    '"DFR_URL"': str,
    '"AIR_FLAG"': str,
    '"NPDES_FLAG"': str,
    '"GHG_FLAG"': str,
    '"RCRA_FLAG"': str,
    '"FAC_ACTIVE_FLAG"': str
}
column_names = list( column_mapping.keys() )
columns_string = ','.join( column_names )
sql = 'select ' + columns_string + ' from "ECHO_EXPORTER" where "AIR_FLAG" = \'Y\' or "NPDES_FLAG" = \'Y\' or "GHG_FLAG" = \'Y\' or "RCRA_FLAG" = \'Y\''
try:
    # Don't index.
    full_echo_data = get_data( sql )
except EmptyDataError:
    print("\nThere are no EPA facilities for this query.\n")
full_echo_data

Unnamed: 0,REGISTRY_ID,FAC_NAME,FAC_LAT,FAC_LONG,AIR_IDS,NPDES_IDS,RCRA_IDS,DFR_URL,AIR_FLAG,NPDES_FLAG,GHG_FLAG,RCRA_FLAG,FAC_ACTIVE_FLAG
0,1.100137e+11,TOPPER ONE HOUR CLEANER,39.987020,-75.161830,PAPAM0004210101795,,PAD982677197,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,Y,Y
1,1.100094e+11,PIEZAS EXTRA,18.252013,-66.036570,,,PRR000011601,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
2,1.100019e+11,QVC SUFFOLK INC,36.768620,-76.542970,VA0000005180000018,,VAD988168910,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,Y,Y
3,1.100342e+11,PARIS - HENRY COUNTY LANDFILL,36.315170,-88.362142,,TNR053299 TNR121673,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,Y
4,1.100384e+11,TATE METALWORKS INC,34.864320,-81.959520,SC00020600481,,,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,N,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1862113,1.100051e+11,WILSON CLEANERS AND LAUNDRY,31.963080,-95.270250,,,TXD988026779,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
1862114,1.100348e+11,OVILLA ROAD CLEANER,32.531760,-96.813420,06000000481396E003,,,http://echo.epa.gov/detailed-facility-report?f...,Y,N,N,N,Y
1862115,1.100252e+11,BLUFFVIEW MANOR,40.102875,-89.152611,,ILR108310,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,Y
1862116,1.100704e+11,LAVON FARMS,33.019600,-96.430000,,TXR10F6H1 TXR10F6HF,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,


## Number of Currently Regulated Facilities Per Program

In [8]:
air_fac = full_echo_data.loc[(full_echo_data["AIR_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
water_fac = full_echo_data.loc[(full_echo_data["NPDES_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
waste_fac = full_echo_data.loc[(full_echo_data["RCRA_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
ghg_fac = full_echo_data.loc[(full_echo_data["GHG_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]

display(HTML("<h3>There are "+ str(air_fac) + " facilities currently regulated under the Clean Air Act.</h3>"))
display(HTML("<h3>There are "+ str(water_fac) + " facilities currently regulated under the Clean Water Act.</h3>"))
display(HTML("<h3>There are "+ str(waste_fac) + " facilities currently regulated under RCRA (hazardous waste).</h3>"))
display(HTML("<h3>There are "+ str(ghg_fac) + " facilities currently reporting greenhouse gas emissions.</h3>"))

## Clean Air Act inspections in 2019

In [6]:
# Use SQL to search for and select the data about air stack tests
air_inspections = None
try:
    sql = 'select * from \"ICIS-AIR_FCES_PCES\" where \"ACTUAL_END_DATE\" like \'__-__-2019\''

    # Download the data from that URL
    air_inspections = get_data( sql, 'pgm_sys_id' )
except EmptyDataError:
    print( "No data found")

air_inspections

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,STATE_EPA_FLAG,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,COMP_MONITOR_TYPE_CODE,COMP_MONITOR_TYPE_DESC,ACTUAL_END_DATE,PROGRAM_CODES
0,020000003606390000,3601943049,E,INS,Inspection/Evaluation,PCE,PCE On-Site,07-23-2019,"CAACFC, CAAFESOP"
1,020000003606501000,3601851095,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,05-08-2019,CAAOP
2,020000003606501000,3601866216,E,INS,Inspection/Evaluation,FOO,FCE On-Site,05-01-2019,"CAANAM, CAAOP"
3,020000003606501000,3601972076,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-24-2019,CAASIP
4,020000003606501000,3601972077,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-17-2019,CAASIP
...,...,...,...,...,...,...,...,...,...
48896,VA0000005117300001,3601776181,S,INS,Inspection/Evaluation,PFF,PCE Off-Site,03-07-2019,"CAAMACT, CAATVP"
48897,VA0000005117300001,3601791764,S,INS,Inspection/Evaluation,PFF,PCE Off-Site,03-27-2019,CAATVP
48898,VA0000005117300001,3601955560,S,INS,Inspection/Evaluation,PFF,PCE Off-Site,08-27-2019,CAATVP
48899,VA0000005117300001,3601955566,S,INS,Inspection/Evaluation,PFF,PCE Off-Site,08-27-2019,"CAAMACT, CAATVP"


In [9]:
# Number of inspections in 2019 per 1000 regulated facilities
air_inspections_metric = formatter((air_inspections.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CAA"] = air_inspections_metric
display(HTML("<h3>"+ air_inspections_metric +" inspections per 1000 facilities</h3>"))

## Violations of the Clean Air Act in 2019



In [10]:
air_violations = None
try:
    sql = 'select * from "ICIS-AIR_VIOLATION_HISTORY" where "EARLIEST_FRV_DETERM_DATE" like \'__-__-2019\' or "HPV_DAYZERO_DATE" like \'__-__-2019\''

    air_violations = get_data( sql, "pgm_sys_id" )

    # Remove "FACIL" violations, which are paperwork violations according to: https://19january2017snapshot.epa.gov/sites/production/files/2013-10/documents/frvmemo.pdf
    # air_violations = air_violations.loc[(air_violations["POLLUTANT_DESCS"]!="FACIL")]
except EmptyDataError:
    print( "No data found")
air_violations

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,AGENCY_TYPE_DESC,STATE_CODE,AIR_LCON_CODE,COMP_DETERMINATION_UID,ENF_RESPONSE_POLICY_CODE,PROGRAM_CODES,PROGRAM_DESCS,POLLUTANT_CODES,POLLUTANT_DESCS,EARLIEST_FRV_DETERM_DATE,HPV_DAYZERO_DATE,HPV_RESOLVED_DATE
0,FL0000001200500009,3601741276,State,FL,,FL000A0000120050000900375,FRV,CAANSPS,New Source Performance Standards,300000329,FACIL,02-06-2019,,05-13-2019
1,IA0000001901900052,3601913691,State,IA,,IA000A77058,FRV,CAATVP,Title V Permits,,,07-15-2019,,07-24-2019
2,IL000163020AAB,3601954355,State,IL,,IL000AA-2019-00133,FRV,CAASIP CAATVP,State Implementation Plan for National Primary...,300000243,VOLATILE ORGANIC COMPOUNDS (VOCS),07-25-2019,,
3,IN0000001803900097,3601869864,State,IN,,IN000A76378,FRV,CAATVP,Title V Permits,300000329,FACIL,05-06-2019,,
4,IN0000001803300043,3601685463,State,IN,,IN000A74118,HPV,CAANSPS,New Source Performance Standards,10373,Particulate matter - PM10,12-03-2018,01-13-2019,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4030,CO0000000803102488,3602151392,State,CO,,CO000A0000080310248800002,FRV,CAASIP,State Implementation Plan for National Primary...,300000243,VOLATILE ORGANIC COMPOUNDS (VOCS),08-04-2019,,02-12-2020
4031,MN0000002700100021,3602178887,State,MN,,MN000A00100202PEN20191,FRV,CAATVP,Title V Permits,300000329,FACIL,04-11-2019,,01-03-2020
4032,CABAAA0085,3602188895,Local,CA,BAA,CABAAA80845,HPV,CAATVP,Title V Permits,300000243,VOLATILE ORGANIC COMPOUNDS (VOCS),07-23-2019,07-23-2019,06-04-2020
4033,PA000633228,3602223361,State,PA,,PA000A0000F00000002901815,FRV,CAASIP,State Implementation Plan for National Primary...,300000329,FACIL,07-02-2019,,


In [12]:
# Number of high priority and federally reportable violations per 1000 regulated facilities
air_violations_metric = formatter((air_violations.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CAA"] = air_violations_metric
display(HTML("<h3>"+air_violations_metric+" violations per 1000 facilities </h3>"))

## Formal Enforcement Actions and Penalties under the Clean Air Act in 2019

In [13]:
air_enforcements = None
try:
    sql = 'select * from "ICIS-AIR_FORMAL_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2019\''

    air_enforcements = get_data( sql, "pgm_sys_id" )
except EmptyDataError:
    print( "No data found")
air_enforcements

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,ENF_IDENTIFIER,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,STATE_EPA_FLAG,ENF_TYPE_CODE,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,PENALTY_AMOUNT
0,OH0000000627010056,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
1,OH0000000684000000,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
2,IN0000001802900002,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
3,IN0000001814700020,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
4,OH0000000165000006,31941,05-1999-0644,JDC,Judicial,E,CIV,Civil Judicial Action,07/17/2019,0.0
...,...,...,...,...,...,...,...,...,...,...
2641,HI0000001500700066,3602236126,HI000AEA93,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,07/11/2019,22800.0
2642,MI00000000000N7688,3602237523,MI000AN7688FRV0000038302,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,08/08/2019,54600.0
2643,WASPC0005306310023,3602245188,WASPCA200188484,AFR,Administrative - Formal,L,SCAAAO,Administrative Order,12/31/2019,32000.0
2644,LA0000002212500007,3602258171,LA000A2573011,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,01/11/2019,7597.9


In [14]:
# Number of formal actions in 2019 per violation
air_enforcements_metric = formatter(air_enforcements.shape[0]/air_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CAA"] = air_enforcements_metric
display(HTML("<h3>"+air_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [15]:
# Penalties each year per violating facility
air_penalties = air_enforcements.loc[air_enforcements["PENALTY_AMOUNT"]>0]
air_penalties_metric = formatter(sum(air_penalties["PENALTY_AMOUNT"]) / len(air_violations["PGM_SYS_ID"].unique())) #Divide the sum of penalties by number of violating facilities
air_penalties_max = formatter(max(air_penalties["PENALTY_AMOUNT"])) 
air_penalties_min = formatter(min(air_penalties["PENALTY_AMOUNT"])) 
penalties["CAA"] = air_penalties_metric
display(HTML("<h3>$"+air_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+air_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+air_penalties_min +"</h3>"))

---

## Clean Water Act inspections in 2019

In [16]:
# Find facilities with pollutant exceedences
water_inspections = None
try:
    sql = 'select "NPDES_ID", "REGISTRY_ID", "ACTUAL_END_DATE", "STATE_EPA_FLAG"' + \
        ' from "NPDES_INSPECTIONS" where "ACTUAL_END_DATE" like \'__/__/2019\''

    water_inspections = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_inspections

Unnamed: 0_level_0,REGISTRY_ID,ACTUAL_END_DATE,STATE_EPA_FLAG
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AK0000507,110030488620,12/17/2019,S
AKR06AA08,110070146690,04/25/2019,S
AK0021385,110000761453,08/26/2019,E
AK0022497,110039730459,02/25/2019,S
AKG520160,110009691440,07/19/2019,S
...,...,...,...
WYR105839,110070517716,09/03/2019,S
WYR105849,110055373590,09/09/2019,S
WYR105859,110055182975,06/26/2019,S
WYR105860,110055172290,09/09/2019,S


In [17]:
# Number of inspections in 2019 per 1000 regulated facilities
water_inspections_metric = formatter((water_inspections.shape[0] / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CWA"] = water_inspections_metric
display(HTML("<h3>"+water_inspections_metric +" inspections per 1000 facilities</h3>"))

## Violations of the Clean Water Act in 2019

In [37]:
# Find facilities with water permit violations
water_violations = None
try:
    sql = 'select * from "NPDES_QNCR_HISTORY" where "YEARQTR" = 20191 or "YEARQTR" = 20192 or "YEARQTR" = 20193 or "YEARQTR" = 20194'
    water_violations = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_violations

Unnamed: 0_level_0,YEARQTR,HLRNC,NUME90Q,NUMCVDT,NUMSVCD,NUMPSCH
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AK0001058,20191,,0,0,0,3
AK0001058,20192,,0,0,0,3
AK0001058,20193,,0,0,0,3
AK0001058,20194,,0,0,0,3
AK0001155,20191,C,0,0,0,0
...,...,...,...,...,...,...
WYR105141,20191,C,0,0,0,0
WYR105141,20192,C,0,0,0,0
WYR105141,20193,C,0,0,0,0
WYR105141,20194,C,0,0,0,0


In [53]:
# Number of violations each year per 1000 regulated facilities
# Sum violations 
water_violations["Sum"] = water_violations["NUME90Q"]	+ water_violations["NUMCVDT"] + water_violations["NUMSVCD"]	+ water_violations["NUMPSCH"]
water_violations_metric = formatter((np.sum(water_violations["Sum"]) / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CWA"] = water_violations_metric
display(HTML("<h3>"+water_violations_metric+" violations per 1000 facilities</h3>"))

In [72]:
x = water_violations.loc[water_violations["HLRNC"]=="C"]
plus = x.loc[x["Sum"]>0]
len(plus.index.unique())

1210

## Enforcement Actions and Penalties under the Clean Water Act in 2019

In [54]:
# Find facilities with pollutant exceedences
water_enforcements = None
try:
    sql = 'select "NPDES_ID", "AGENCY", "ENF_TYPE_DESC", "SETTLEMENT_ENTERED_DATE", "FED_PENALTY_ASSESSED_AMT", "STATE_LOCAL_PENALTY_AMT"' + \
        ' from "NPDES_FORMAL_ENFORCEMENT_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2019\''

    water_enforcements = get_data( sql, "NPDES_ID" ) 
except EmptyDataError:
    print( "No data found")
water_enforcements

Unnamed: 0_level_0,AGENCY,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,FED_PENALTY_ASSESSED_AMT,STATE_LOCAL_PENALTY_AMT
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AKR06AA89,EPA,CWA 309A AO For Compliance,03/18/2019,,
AL0032310,EPA,CWA 309G2B AO For Class II Penalties,10/28/2019,50000.0,
AL0080225,State,State Administrative Order of Consent,12/19/2019,,0.0
AR0048879,State,State Administrative Order of Consent,07/10/2019,,
ARU001243,EPA,CWA 309A AO For Compliance,07/24/2019,,
...,...,...,...,...,...
PA0272736,State,State CWA Penalty AO,08/08/2019,,500.0
PAR212229,State,State CWA Non Penalty AO,08/01/2019,,
TNU058148,State,State Administrative Order of Consent,08/21/2019,,2735.0
TXR05EJ33,State,State CWA Penalty AO,09/10/2019,,875.0


In [55]:
# Number of formal actions in 2019 per violation
water_enforcements_metric = formatter(water_enforcements.shape[0]/water_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CWA"] = water_enforcements_metric
display(HTML("<h3>"+water_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [63]:
# Penalties each year per violating facility
# Find violating facilities (not all in NPDES QNCR are violating...)
water_violators = water_violations.loc[water_violations["Sum"]>0]
water_violators = len(water_violators.index.unique())
water_penalties = water_enforcements.loc[water_enforcements["FED_PENALTY_ASSESSED_AMT"]>0]
water_penalties_metric = formatter(sum(water_penalties["FED_PENALTY_ASSESSED_AMT"]) / water_violators) #Divide the sum of penalties by number of penalized facilities
water_penalties_max = formatter(max(water_penalties["FED_PENALTY_ASSESSED_AMT"])) 
water_penalties_min = formatter(min(water_penalties["FED_PENALTY_ASSESSED_AMT"]))
penalties["CWA"] = water_penalties_metric
display(HTML("<h3>$"+water_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+water_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+water_penalties_min +"</h3>"))

## RCRA inspections in 2019

In [27]:
# Find facilities with pollutant exceedences
waste_inspections = None
try:
    sql = 'select * from "RCRA_EVALUATIONS" where "EVALUATION_START_DATE" like \'__/__/2019\''

    waste_inspections = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_inspections

Unnamed: 0_level_0,ACTIVITY_LOCATION,EVALUATION_IDENTIFIER,EVALUATION_TYPE,EVALUATION_DESC,EVALUATION_AGENCY,EVALUATION_START_DATE,FOUND_VIOLATION
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
NJD003812047,NJ,004,CAV,COMPLIANCE ASSISTANCE VISIT,S,09/24/2019,N
WID006129225,WI,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,10/07/2019,Y
KYR000056879,KY,001,NRR,NON-FINANCIAL RECORD REVIEW,S,10/11/2019,N
KYD006376347,KY,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,09/25/2019,N
KYD981853005,KY,001,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,10/11/2019,N
...,...,...,...,...,...,...,...
FLD980847214,FL,CEI,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,08/20/2019,Y
FLD984171306,FL,CEN,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,08/22/2019,N
FLR000231233,FL,CEI,CEI,COMPLIANCE EVALUATION INSPECTION ON-SITE,S,11/20/2019,Y
MAD030812127,MA,001,SNY,SIGNIFICANT NON-COMPLIER,S,08/16/2019,N


In [28]:
# Number of inspections in 2019 per 1000 regulated facilities
waste_inspections_metric = formatter((waste_inspections.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["RCRA"] = waste_inspections_metric
display(HTML("<h3>"+waste_inspections_metric+" inspections per 1000 facilities</h3>"))

## Violations of RCRA in 2019

In [29]:
# Find facilities with pollutant exceedences
waste_violations = None
try:
    sql = 'select * from "RCRA_VIOLATIONS" where "DATE_VIOLATION_DETERMINED" like \'__/__/2019\''

    waste_violations = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_violations

Unnamed: 0_level_0,ACTIVITY_LOCATION,VIOLATION_TYPE,VIOLATION_TYPE_DESC,VIOL_DETERMINED_BY_AGENCY,DATE_VIOLATION_DETERMINED,ACTUAL_RTC_DATE,SCHEDULED_COMPLIANCE_DATE
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CAR000191650,CA,262.A,Standards Applicable to Generators of HW: General,E,10/01/2019,12/04/2019,
DEN201500044,DE,261.A,ID and Listing of HW: General,S,05/22/2019,05/22/2019,
SCD069316271,SC,273.B,Standards for Universal Waste Management: Stan...,S,12/17/2019,03/03/2020,
WIR000171884,WI,279.C,Standards for Used Oil: Generators,S,06/19/2019,08/27/2019,08/21/2019
TNR000005439,TN,XXS,State Statutory or Regulatory requirements tha...,S,11/06/2019,11/06/2019,
...,...,...,...,...,...,...,...
TXR000022327,TX,265.C,Interim Status Standards for Owners and Operat...,S,11/20/2019,01/16/2020,
FLR000222182,FL,262.A,Standards Applicable to Generators of HW: General,S,03/05/2019,05/10/2019,
NYD065939902,NY,261.A,ID and Listing of HW: General,S,07/09/2019,07/09/2019,
OHR000137034,OH,279.H,Standards for Used Oil: Fuel Marketers,S,01/15/2019,03/05/2019,


In [30]:
# Number of violations in 2019 per 1000 regulated facilities
waste_violations_metric = formatter((waste_violations.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["RCRA"] = waste_violations_metric
display(HTML("<h3>"+waste_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under RCRA in 2019

In [31]:
# Find facilities with enforcement actions
waste_enforcements = None
try:
    sql = 'select * from "RCRA_ENFORCEMENTS" where "ENFORCEMENT_ACTION_DATE" like \'__/__/2019\''

    waste_enforcements = get_data( sql, "ID_NUMBER" ) 
except EmptyDataError:
    print( "No data found")
waste_enforcements

Unnamed: 0_level_0,ACTIVITY_LOCATION,ENFORCEMENT_IDENTIFIER,ENFORCEMENT_TYPE,ENFORCEMENT_DESC,ENFORCEMENT_AGENCY,ENFORCEMENT_ACTION_DATE,PMP_AMOUNT,FMP_AMOUNT,FSC_AMOUNT,SCR_AMOUNT
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AKD991281023,AK,001,HQ120,WRITTEN INFORMAL,E,11/19/2019,,,,
ALD046481032,AL,001,AL115,WARNING LETTER,S,06/10/2019,,,,
ALD057202558,AL,001,HQ120,WRITTEN INFORMAL,S,05/30/2019,,,,
ALD077647691,AL,001,HQ310,FINAL 3008(A) COMPLIANCE ORDER,S,02/21/2019,,,,
ALD981020894,AL,001,HQ140,LETTER OF INTENT TO INITIATE ENFORCEMENT ACTION,S,03/26/2019,,,,
...,...,...,...,...,...,...,...,...,...,...
WVR000547752,WV,001,HQ120,WRITTEN INFORMAL,S,10/30/2019,,,,
WVR000547760,WV,001,HQ120,WRITTEN INFORMAL,S,11/21/2019,,,,
WIR000149724,WI,001,WI124,NOTICE OF NONCOMPLIANCE LETTER,S,02/27/2019,,,,
WIR000160010,WI,001,WI124,NOTICE OF NONCOMPLIANCE LETTER,S,05/01/2019,,,,


In [32]:
# Number of enforcement actions each year per violation
waste_enforcements_metric = formatter(waste_enforcements.shape[0] / waste_violations.shape[0])
enforcements["RCRA"] = waste_enforcements_metric
display(HTML("<h3>"+waste_enforcements_metric+" enforcement actions per violation</h3>"))

In [33]:
# Penalties each year per violating facility
waste_penalties = waste_enforcements.loc[waste_enforcements["FMP_AMOUNT"]>0]
waste_penalties_metric = formatter(sum(waste_penalties["FMP_AMOUNT"]) / len(waste_violations.index.unique())) #Divide by penalized facilities
waste_penalties_max = formatter(max(waste_penalties["FMP_AMOUNT"]))
waste_penalties_min = formatter(min(waste_penalties["FMP_AMOUNT"]))
penalties["RCRA"] = waste_penalties_metric
display(HTML("<h3>$"+waste_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+waste_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+waste_penalties_min +"</h3>"))

## Greenhouse Gas Emissions in 2018 (latest data available)

In [64]:
# Find GHG emissions
ghg_emissions = None
try:
    sql = 'select * from "POLL_RPT_COMBINED_EMISSIONS" where "REPORTING_YEAR" = \'2018\' and "PGM_SYS_ACRNM" = \'E-GGRT\''

    ghg_emissions = get_data( sql) 
except EmptyDataError:
    print( "No data found")
ghg_emissions

Unnamed: 0,REPORTING_YEAR,REGISTRY_ID,PGM_SYS_ACRNM,PGM_SYS_ID,POLLUTANT_NAME,ANNUAL_EMISSION,UNIT_OF_MEASURE,NEI_TYPE,NEI_HAP_VOC_FLAG
0,2018,110013317035,E-GGRT,1006363,Nitrous oxide,5.364,MTCO2e,,
1,2018,110000492020,E-GGRT,1003261,Methane,48.750,MTCO2e,,
2,2018,110000492020,E-GGRT,1003261,Carbon dioxide,101570.300,MTCO2e,,
3,2018,110000492020,E-GGRT,1003261,Nitrous oxide,60.196,MTCO2e,,
4,2018,110024586544,E-GGRT,1005340,Nitrous oxide,28.310,MTCO2e,,
...,...,...,...,...,...,...,...,...,...
21678,2018,110000597328,E-GGRT,1001666,Carbon dioxide,84108.700,MTCO2e,,
21679,2018,110000326530,E-GGRT,1001265,Nitrous oxide,121.286,MTCO2e,,
21680,2018,110000326530,E-GGRT,1001265,Methane,101.750,MTCO2e,,
21681,2018,110000326530,E-GGRT,1001265,Carbon dioxide,224350.300,MTCO2e,,


In [65]:
# Emissions in 2018 per facility
ghg_emissions_metric = formatter(np.nansum(ghg_emissions["ANNUAL_EMISSION"]) / len(ghg_emissions["REGISTRY_ID"].unique())) #Divide by reporting facility
ghg_emissions_fac = ghg_emissions.groupby("PGM_SYS_ID")[["ANNUAL_EMISSION"]].sum() # Group by facility
ghg_emissions_max = formatter(np.nanmax(ghg_emissions_fac["ANNUAL_EMISSION"]))
ghg_emissions_min = formatter(np.nanmin(ghg_emissions_fac.loc[ghg_emissions_fac["ANNUAL_EMISSION"]>0]["ANNUAL_EMISSION"]))
emissions["GHG"] = ghg_emissions_metric
display(HTML("<h3>"+ghg_emissions_metric+" MTCO2e (metric tons of carbon dioxide equivalent) emissions per reporting facility</h3>"))
display(HTML("<h3>Max: "+ghg_emissions_max+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
display(HTML("<h3>Min: "+ghg_emissions_min+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
ghg_emissions_fac

Unnamed: 0_level_0,ANNUAL_EMISSION
PGM_SYS_ID,Unnamed: 1_level_1
1000001,302529.480
1000002,110511.712
1000003,79393.210
1000004,55547.748
1000005,83863.020
...,...
1013419,28467.792
1013420,28356.444
1013481,29375.000
1013489,26073.752


# Data Export

In [73]:
data = [inspections,
violations,
enforcements,
penalties,
emissions]

units = ["#inspections per 1000",
"#violations per 1000",
"#actions per facility in violation",
"$ per facility in violation",
"amount of emissions (metric tons)"]

short_units = ["inspectionsper1000",
"violationsper1000",
"enforcementsperviolatingfacility",
"penaltiesperviolatingfacility",
"emissions2018"]

for index, program in enumerate(data):
    # create dataframe
    df = pd.DataFrame(program, index=[0]).T
    df = df.rename(columns={0: units[index]})
    filename= short_units[index]+"_All_USA_pg4_090120.csv" #active-facilities_All_MA-CD4_3b_080620.csv
    df.to_csv(filename)
    print(df)

     #inspections per 1000
CAA                 261.36
CWA                 148.16
RCRA                 37.30
     #violations per 1000
CAA                 21.57
RCRA                24.08
CWA               1108.10
     #actions per facility in violation
CAA                                0.66
RCRA                               0.48
CWA                                0.01
     $ per facility in violation
CAA                     23062.80
RCRA                     3166.88
CWA                       125.04
    amount of emissions (metric tons)
GHG                         422134.34
