| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information:  https://github.com/edgi-govdata-archiving/ECHO-Cross-Program
#### The notebook was collaboratively authored by the Environmental Data & Governance Initiative (EDGI) following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

## How to Run this Notebook
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue.
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---

# Nationwide statistics about environmental enforcement and compliance trends

## Setup
Here we load some helper code to get us going.

In [None]:
# Import code libraries
!pip install ECHO_modules &>/dev/null;
!pip install geopandas &>/dev/null;

import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import requests
import csv
import datetime
import folium
from folium.plugins import FastMarkerCluster
import ipywidgets as widgets
from IPython.core.display import display, HTML
from pandas.errors import EmptyDataError
from ECHO_modules.get_data import get_echo_data

def formatter(value):
  return "{:0.2f}".format(value)

print("Done!")

Done!


Here we set up some code to help us store and eventually export the metrics.

In [None]:
inspections = dict()
violations = dict()
enforcements = dict()
penalties = dict()
emissions = dict()

## Start getting data
First, get summary data from the ECHO_EXPORTER table.

In [None]:
from ECHO_modules.get_data import get_echo_data

# Get everything we will need from ECHO_EXPORTER in a single DB query.
# We can then use the full dataframe to specialize views of it.
full_echo_data = None
column_mapping = {
    '"REGISTRY_ID"': str,
    '"FAC_NAME"': str,
    '"FAC_LAT"': float,
    '"FAC_LONG"': float,
    '"AIR_IDS"': str,
    '"NPDES_IDS"': str,
    '"RCRA_IDS"': str,
    '"DFR_URL"': str,
    '"AIR_FLAG"': str,
    '"NPDES_FLAG"': str,
    '"GHG_FLAG"': str,
    '"RCRA_FLAG"': str,
    '"FAC_ACTIVE_FLAG"': str
}
column_names = list( column_mapping.keys() )
columns_string = ','.join( column_names )
sql = 'select ' + columns_string + ' from "ECHO_EXPORTER" where "AIR_FLAG" = \'Y\' or "NPDES_FLAG" = \'Y\' or "GHG_FLAG" = \'Y\' or "RCRA_FLAG" = \'Y\''
try:
    # Don't index.
    full_echo_data = get_echo_data( sql )
except EmptyDataError:
    print("\nThere are no EPA facilities for this query.\n")
full_echo_data

Unnamed: 0,REGISTRY_ID,FAC_NAME,FAC_LAT,FAC_LONG,AIR_IDS,NPDES_IDS,RCRA_IDS,DFR_URL,AIR_FLAG,NPDES_FLAG,GHG_FLAG,RCRA_FLAG,FAC_ACTIVE_FLAG
0,1.100465e+11,TIMELINE LOGISTICS,,,,,NDC000009902,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
1,1.100712e+11,PNR - 415,36.097744,-95.962657,,,,http://echo.epa.gov/detailed-facility-report?f...,N,N,Y,N,
2,1.100712e+11,MUNSON PLANT,31.158936,-101.096615,,,,http://echo.epa.gov/detailed-facility-report?f...,N,N,Y,N,
3,1.100178e+11,CHEM SECURITY LTD,,,,,WAD980976484,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
4,1.100669e+11,UNKNOWN,38.498546,-98.383430,,KSR102460,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2266607,1.100669e+11,UNKNOWN,38.498546,-98.383430,,KSR104511,,http://echo.epa.gov/detailed-facility-report?f...,N,Y,N,N,
2266608,1.100712e+11,WD 143 A/B,28.661714,-89.551313,,,,http://echo.epa.gov/detailed-facility-report?f...,N,N,Y,N,
2266609,1.100708e+11,MCKINNEY TRAILER RENTALS,34.041704,-117.989254,,,CAC003071892 CAL000455247,http://echo.epa.gov/detailed-facility-report?f...,N,N,N,Y,
2266610,1.100712e+11,BTA 430 PERMIAN BASIN,31.990559,-102.080113,,,,http://echo.epa.gov/detailed-facility-report?f...,N,N,Y,N,


## Number of Currently Regulated Facilities Per Program

In [None]:
air_fac = full_echo_data.loc[(full_echo_data["AIR_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
water_fac = full_echo_data.loc[(full_echo_data["NPDES_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
waste_fac = full_echo_data.loc[(full_echo_data["RCRA_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]
ghg_fac = full_echo_data.loc[(full_echo_data["GHG_FLAG"]=="Y") & (full_echo_data["FAC_ACTIVE_FLAG"]=="Y")].shape[0]

display(HTML("<h3>There are "+ str(air_fac) + " facilities currently regulated under the Clean Air Act.</h3>"))
display(HTML("<h3>There are "+ str(water_fac) + " facilities currently regulated under the Clean Water Act.</h3>"))
display(HTML("<h3>There are "+ str(waste_fac) + " facilities currently regulated under RCRA (hazardous waste).</h3>"))
display(HTML("<h3>There are "+ str(ghg_fac) + " facilities currently reporting greenhouse gas emissions.</h3>"))

## Clean Air Act inspections in 2023

In [None]:
# Use SQL to search for and select the data about air stack tests
air_inspections = None
try:
    sql = 'select * from \"ICIS-AIR_FCES_PCES\" where \"ACTUAL_END_DATE\" like \'__-__-2023\''

    # Download the data from that URL
    air_inspections = get_echo_data( sql, 'pgm_sys_id' )
except EmptyDataError:
    print( "No data found")

air_inspections

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,STATE_EPA_FLAG,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,COMP_MONITOR_TYPE_CODE,COMP_MONITOR_TYPE_DESC,ACTUAL_END_DATE,PROGRAM_CODES
0,020000003606501000,3603525833,E,INS,Inspection/Evaluation,FOO,FCE On-Site,02-15-2023,"CAAMACT, CAANSPS, CAAOP, CAATIP"
1,02000110006908383,3603670421,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,06-28-2023,"CAAMACT, CAANSPS, CAASO"
2,03000000WV00005,3603637422,E,INS,Inspection/Evaluation,FOO,FCE On-Site,06-27-2023,"CAAGACTM, CAAMACT, CAANSPS"
3,03000000WV00006,3603640498,E,INS,Inspection/Evaluation,FOO,FCE On-Site,06-28-2023,CAAFENF
4,03000PA00013,3603577661,E,INS,Inspection/Evaluation,PFF,PCE Off-Site,04-20-2023,CAAIRM
...,...,...,...,...,...,...,...,...,...
37872,WV00005100142,3603751040,S,INS,Inspection/Evaluation,FOO,FCE On-Site,09-28-2023,"CAAGACTM, CAANSPS, CAASIP"
37873,WV00005100145,3603773378,S,INS,Inspection/Evaluation,FOO,FCE On-Site,09-21-2023,"CAAMACT, CAANSPS, CAASIP"
37874,WV00005100125,3603773364,S,INS,Inspection/Evaluation,FOO,FCE On-Site,09-20-2023,"CAANSPS, CAASIP"
37875,MO0000002951000016,3603753773,E,INS,Inspection/Evaluation,PCE,PCE On-Site,09-25-2023,


In [None]:
# Number of inspections in 2023 per 1000 regulated facilities
air_inspections_metric = formatter((air_inspections.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CAA"] = air_inspections_metric
display(HTML("<h3>"+ air_inspections_metric +" inspections per 1000 facilities</h3>"))

## Violations of the Clean Air Act in 2023



In [None]:
air_violations = None
try:
    sql = 'select * from "ICIS-AIR_VIOLATION_HISTORY" where "EARLIEST_FRV_DETERM_DATE" like \'__-__-2023\' or "HPV_DAYZERO_DATE" like \'__-__-2023\''

    air_violations = get_echo_data( sql, "pgm_sys_id" )

    # Optional: remove "FACIL" violations, which are paperwork violations according to: https://19january2017snapshot.epa.gov/sites/production/files/2013-10/documents/frvmemo.pdf
    # air_violations = air_violations.loc[(air_violations["POLLUTANT_DESCS"]!="FACIL")]
except EmptyDataError:
    print( "No data found")
air_violations

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,AGENCY_TYPE_DESC,STATE_CODE,AIR_LCON_CODE,COMP_DETERMINATION_UID,ENF_RESPONSE_POLICY_CODE,PROGRAM_CODES,PROGRAM_DESCS,POLLUTANT_CODES,POLLUTANT_DESCS,EARLIEST_FRV_DETERM_DATE,HPV_DAYZERO_DATE,HPV_RESOLVED_DATE
0,CT0000000900308899,3603697029,State,CT,,CT000A102337,FRV,CAASIP,State Implementation Plan for National Primary...,300000329,FACIL,06-20-2023,,
1,CT0000000900309016,3603578518,State,CT,,CT000A100134,FRV,CAANSR,New Source Review Permit Requirements,300000243,VOLATILE ORGANIC COMPOUNDS (VOCS),04-19-2023,,
2,IL000119813AAI,3603596439,State,IL,,IL000AA-2022-00162,FRV,CAASIP CAATVP,State Implementation Plan for National Primary...,300000319 300000320,PARTICULATE MATTER < 10 UM PARTICULATE MATTER ...,02-08-2023,,
3,IN0000001803900097,3603489144,State,IN,,IN000A98533,FRV,CAATVP,Title V Permits,300000329,FACIL,01-25-2023,,
4,IN0000001803300043,3603672177,State,IN,,IN000A101809,HPV,CAATVP,Title V Permits,300000036,Mercury,05-22-2023,08-20-2023,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3391,OH0000001318008977,3603753861,State,OH,,OH000AC13785,FRV,CAASIP,State Implementation Plan for National Primary...,300000329,FACIL,10-16-2023,,
3392,OH0000000247102007,3603748298,State,OH,,OH000AC13780,HPV,CAAFESOP CAASIP,Federally-Enforceable State Operating Permit -...,300000329,FACIL,10-03-2023,10-03-2023,
3393,MI00000000000P1318,3603752497,State,MI,,MI000AP1318CF0000077251,FRV,CAANSR CAASIP,New Source Review Permit Requirements State Im...,300000329,FACIL,08-22-2023,,
3394,PA000860825,3603753520,State,PA,,PA000A0000F00000003567289,FRV,CAASIP,State Implementation Plan for National Primary...,300000329,FACIL,06-06-2023,,


In [None]:
# Number of high priority and federally reportable violations per 1000 regulated facilities
air_violations_metric = formatter((air_violations.shape[0] / air_fac) * 1000) # Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CAA"] = air_violations_metric
display(HTML("<h3>"+air_violations_metric+" violations per 1000 facilities </h3>"))

## Formal Enforcement Actions and Penalties under the Clean Air Act in 2023

In [None]:
air_enforcements = None
try:
    sql = 'select * from "ICIS-AIR_FORMAL_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2023\''

    air_enforcements = get_echo_data( sql, "pgm_sys_id" )
except EmptyDataError:
    print( "No data found")
air_enforcements

Unnamed: 0,PGM_SYS_ID,ACTIVITY_ID,ENF_IDENTIFIER,ACTIVITY_TYPE_CODE,ACTIVITY_TYPE_DESC,STATE_EPA_FLAG,ENF_TYPE_CODE,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,PENALTY_AMOUNT
0,ID0000001605500122,3603585412,ID000A200223766,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,04/27/2023,2250.0
1,TX0000004835500003,3602956335,TX000A389500382021341,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,08/07/2023,13563.0
2,CABAA00006013B1911,3602109137,CABAAA200182432,AFR,Administrative - Formal,L,SCAAAO,Administrative Order,01/05/2023,0.0
3,IL000037803AAF,3602227606,IL000ACM20200630105737114,JDC,Judicial,S,CIV,Civil Judicial Action,06/01/2023,75000.0
4,TX0000004836100014,3602226809,TX000A149515062020176,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,03/13/2023,13200.0
...,...,...,...,...,...,...,...,...,...,...
2047,IN0000001803900283,3603721887,IN000A200228607,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,09/05/2023,12400.0
2048,CASJV00006099N1662,3603718642,CASJVA3000000000000011868,AFR,Administrative - Formal,L,SCAAAO,Administrative Order,07/31/2023,0.0
2049,IN0000001809100163,3603747719,IN000A200229176,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,09/18/2023,500.0
2050,OK0000004004500604,3603721543,OK000A4004500604E00130399,AFR,Administrative - Formal,S,SCAAAO,Administrative Order,09/14/2023,3500.0


In [None]:
# Number of formal actions in 2023 per violation
air_enforcements_metric = formatter(air_enforcements.shape[0]/air_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CAA"] = air_enforcements_metric
display(HTML("<h3>"+air_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [None]:
# Penalties each year per violating facility
air_penalties = air_enforcements.loc[air_enforcements["PENALTY_AMOUNT"]>0]
air_penalties_metric = formatter(sum(air_penalties["PENALTY_AMOUNT"]) / len(air_violations["PGM_SYS_ID"].unique())) #Divide the sum of penalties by number of violating facilities
air_penalties_max = formatter(max(air_penalties["PENALTY_AMOUNT"]))
air_penalties_min = formatter(min(air_penalties["PENALTY_AMOUNT"]))
penalties["CAA"] = air_penalties_metric
display(HTML("<h3>$"+air_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+air_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+air_penalties_min +"</h3>"))

---

## Clean Water Act inspections in 2023

In [None]:
# Find facilities with pollutant exceedences
water_inspections = None
try:
    sql = 'select "NPDES_ID", "REGISTRY_ID", "ACTUAL_END_DATE", "STATE_EPA_FLAG"' + \
        ' from "NPDES_INSPECTIONS" where "ACTUAL_END_DATE" like \'__/__/2023\''

    water_inspections = get_echo_data( sql, "NPDES_ID" )
except EmptyDataError:
    print( "No data found")
water_inspections

Unnamed: 0_level_0,REGISTRY_ID,ACTUAL_END_DATE,STATE_EPA_FLAG
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
FLR05G764,110000365603,04/18/2023,S
PAC480072,110070563506,05/19/2023,S
VTS006835,110071221209,05/10/2023,S
UTRC04044,110071181513,04/28/2023,S
FLR20CY73,110058278091,05/05/2023,S
...,...,...,...
FLG110637,110033170226,05/08/2023,S
OH0123595,110000840225,09/27/2023,S
OH0136247,110027372714,08/29/2023,S
VAG750207,110070563313,09/14/2023,S


In [None]:
# Number of inspections in 2023 per 1000 regulated facilities
water_inspections_metric = formatter((water_inspections.shape[0] / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["CWA"] = water_inspections_metric
display(HTML("<h3>"+water_inspections_metric +" inspections per 1000 facilities</h3>"))

## Violations of the Clean Water Act in 2023

In [None]:
# Find facilities with water permit violations
water_violations = None
try:
    sql = 'select * from "NPDES_QNCR_HISTORY" where "YEARQTR" = 20231 or "YEARQTR" = 20232 or "YEARQTR" = 20233 or "YEARQTR" = 20234'
    water_violations = get_echo_data( sql, "NPDES_ID" )
except EmptyDataError:
    print( "No data found")
water_violations

Unnamed: 0_level_0,YEARQTR,HLRNC,NUME90Q,NUMCVDT,NUMSVCD,NUMPSCH
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AK0001058,20231,,0,0,0,3
AK0001058,20232,,0,0,0,3
AK0001058,20233,,0,0,0,3
AK0000272,20234,,1,0,0,0
AK0000345,20231,,0,0,0,2
...,...,...,...,...,...,...
WY0096113,20231,R,0,0,2,0
WY0096113,20232,V,0,0,2,0
WY0096113,20233,V,1,0,2,0
WY0096113,20234,V,0,0,0,0


In [None]:
# Number of violations each year per 1000 regulated facilities
# Sum violations
water_violations["Sum"] = water_violations["NUME90Q"]	+ water_violations["NUMCVDT"] + water_violations["NUMSVCD"]	+ water_violations["NUMPSCH"]
water_violations_metric = formatter((np.sum(water_violations["Sum"]) / water_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["CWA"] = water_violations_metric
display(HTML("<h3>"+water_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under the Clean Water Act in 2023

In [None]:
# Find facilities with pollutant exceedences
water_enforcements = None
try:
    sql = 'select "NPDES_ID", "AGENCY", "ENF_TYPE_DESC", "SETTLEMENT_ENTERED_DATE", "FED_PENALTY_ASSESSED_AMT", "STATE_LOCAL_PENALTY_AMT"' + \
        ' from "NPDES_FORMAL_ENFORCEMENT_ACTIONS" where "SETTLEMENT_ENTERED_DATE" like \'__/__/2023\''

    water_enforcements = get_echo_data( sql, "NPDES_ID" )
except EmptyDataError:
    print( "No data found")
water_enforcements

Unnamed: 0_level_0,AGENCY,ENF_TYPE_DESC,SETTLEMENT_ENTERED_DATE,FED_PENALTY_ASSESSED_AMT,STATE_LOCAL_PENALTY_AMT
NPDES_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
TXR05CG16,EPA,Civil Judicial Action,09/14/2023,483064.0,51936.00
PAR803556,EPA,Civil Judicial Action,09/14/2023,483064.0,51936.00
PAR800080,EPA,Civil Judicial Action,09/14/2023,483064.0,51936.00
PAR802229,EPA,Civil Judicial Action,09/14/2023,483064.0,51936.00
ILR001339,EPA,Civil Judicial Action,09/14/2023,483064.0,51936.00
...,...,...,...,...,...
NC0021181,State,State CWA Penalty AO,10/16/2023,,2933.03
NC0024881,State,State CWA Penalty AO,10/26/2023,,1053.14
OKR040021,State,State CWA Non Penalty AO,10/31/2023,,
NYR00G297,State,State Administrative Order of Consent,09/25/2023,,2600.00


In [None]:
# Number of formal actions in 2023 per violation
water_enforcements_metric = formatter(water_enforcements.shape[0]/water_violations.shape[0]) # Formal actions divided by number of violations
enforcements["CWA"] = water_enforcements_metric
display(HTML("<h3>"+water_enforcements_metric +" formal enforcement actions per violation</h3>"))

In [None]:
# Penalties each year per violating facility
# Find violating facilities (not all in NPDES QNCR are violating...)
water_violators = water_violations.loc[water_violations["Sum"]>0]
water_violators = len(water_violators.index.unique())
water_enforcements["StateLocalFedFines"] = water_enforcements["FED_PENALTY_ASSESSED_AMT"].fillna(0) + water_enforcements["STATE_LOCAL_PENALTY_AMT"].fillna(0)
water_penalties = water_enforcements.loc[water_enforcements["StateLocalFedFines"]>0]
water_penalties_metric = formatter(sum(water_penalties["StateLocalFedFines"]) / water_violators) #Divide the sum of penalties by number of penalized facilities
water_penalties_max = formatter(max(water_penalties["StateLocalFedFines"]))
water_penalties_min = formatter(min(water_penalties["StateLocalFedFines"]))
penalties["CWA"] = water_penalties_metric
display(HTML("<h3>$"+water_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+water_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+water_penalties_min +"</h3>"))

## RCRA inspections in 2023

In [None]:
# Find facilities with pollutant exceedences
waste_inspections = None
try:
    sql = 'select * from "RCRA_EVALUATIONS" where "EVALUATION_START_DATE" like \'__/__/2023\''

    waste_inspections = get_echo_data( sql, "ID_NUMBER" )
except EmptyDataError:
    print( "No data found")
waste_inspections

Unnamed: 0_level_0,ACTIVITY_LOCATION,EVALUATION_IDENTIFIER,EVALUATION_TYPE,EVALUATION_DESC,EVALUATION_AGENCY,EVALUATION_START_DATE,FOUND_VIOLATION
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
NVR000033738,NV,001,CEI,COMPLIANCE EVALUATION INSPECTION,S,06/29/2023,N
IDR000205732,ID,001,CAV,COMPLIANCE ASSISTANCE VISIT,S,06/27/2023,N
TND982165136,TN,001,CEI,COMPLIANCE EVALUATION INSPECTION,S,06/28/2023,Y
NYR000076901,NY,001,CEI,COMPLIANCE EVALUATION INSPECTION,S,06/30/2023,Y
WVD988786455,WV,001,CEI,COMPLIANCE EVALUATION INSPECTION,S,04/04/2023,Y
...,...,...,...,...,...,...,...
TXD007325087,TX,2,CEI,COMPLIANCE EVALUATION INSPECTION,S,08/17/2023,N
KYR000032888,KY,001,CEI,COMPLIANCE EVALUATION INSPECTION,S,10/03/2023,Y
NCD980842132,NC,395,FCI,FOCUSED COMPLIANCE INSPECTION,S,10/19/2023,N
NCD980842132,NC,395,FCI,FOCUSED COMPLIANCE INSPECTION,S,10/31/2023,N


In [None]:
# Number of inspections in 2023 per 1000 regulated facilities
waste_inspections_metric = formatter((waste_inspections.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
inspections["RCRA"] = waste_inspections_metric
display(HTML("<h3>"+waste_inspections_metric+" inspections per 1000 facilities</h3>"))

## Violations of RCRA in 2023

In [None]:
# Find facilities with pollutant exceedences
waste_violations = None
try:
    sql = 'select * from "RCRA_VIOLATIONS" where "DATE_VIOLATION_DETERMINED" like \'__/__/2023\''

    waste_violations = get_echo_data( sql, "ID_NUMBER" )
except EmptyDataError:
    print( "No data found")
waste_violations

Unnamed: 0_level_0,ACTIVITY_LOCATION,VIOLATION_TYPE,VIOLATION_TYPE_DESC,VIOL_DETERMINED_BY_AGENCY,DATE_VIOLATION_DETERMINED,ACTUAL_RTC_DATE,SCHEDULED_COMPLIANCE_DATE
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
MID005340161,MI,273.B,Standards for Universal Waste Management: Stan...,S,05/25/2023,,
CAL000403861,CA,262.B,Standards Applicable to Generators of HW: Mani...,S,07/27/2023,07/28/2023,
CAR000245050,CA,263.A,Standards Applicable to Transporters of HW: Ge...,S,05/31/2023,07/11/2023,
MID017053844,MI,XXS,State Statutory or Regulatory requirements tha...,S,04/20/2023,05/17/2023,
MIK568288583,MI,XXS,State Statutory or Regulatory requirements tha...,S,05/10/2023,06/01/2023,
...,...,...,...,...,...,...,...
WI0000361501,WI,262.M,Standards Applicable to Generators of HW: Prep...,S,02/28/2023,04/26/2023,
FLR000136523,FL,279.C,Standards for Used Oil: Generators,S,02/07/2023,05/18/2023,
FLR000143891,FL,PCR,Violation of a permit condition or requirement,S,02/22/2023,03/17/2023,
FLR000073296,FL,262.A,Standards Applicable to Generators of HW: General,S,02/09/2023,05/18/2023,


In [None]:
# Number of violations in 2023 per 1000 regulated facilities
waste_violations_metric = formatter((waste_violations.shape[0] / waste_fac) * 1000) #Divide by regulated facilities and multiply by desired rate (per 1000)
violations["RCRA"] = waste_violations_metric
display(HTML("<h3>"+waste_violations_metric+" violations per 1000 facilities</h3>"))

## Enforcement Actions and Penalties under RCRA in 2023

In [None]:
# Find facilities with enforcement actions
waste_enforcements = None
try:
    sql = 'select * from "RCRA_ENFORCEMENTS" where "ENFORCEMENT_ACTION_DATE" like \'__/__/2023\''

    waste_enforcements = get_echo_data( sql, "ID_NUMBER" )
except EmptyDataError:
    print( "No data found")
waste_enforcements

Unnamed: 0_level_0,ACTIVITY_LOCATION,ENFORCEMENT_IDENTIFIER,ENFORCEMENT_TYPE,ENFORCEMENT_DESC,ENFORCEMENT_AGENCY,ENFORCEMENT_ACTION_DATE,PMP_AMOUNT,FMP_AMOUNT,FSC_AMOUNT,SCR_AMOUNT
ID_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
KYR000006874,KY,001,HQ120,WRITTEN INFORMAL,S,01/09/2023,,,,
INR000130088,IN,003,HQ120,WRITTEN INFORMAL,S,10/17/2023,,,,
ALR000062588,AL,001,HQ120,WRITTEN INFORMAL,S,05/01/2023,,,,
IAD078096732,IA,001,HQ305,3008(a) EXPEDITED SETTLEMENT AGREEMENT,E,06/20/2023,,8750.0,,
SCR000776526,SC,001,HQ310,FINAL 3008(A) COMPLIANCE ORDER,S,05/23/2023,,9000.0,,
...,...,...,...,...,...,...,...,...,...,...
INT190014381,IN,002,HQ120,WRITTEN INFORMAL,S,10/17/2023,,,,
OHR000221846,OH,001,HQ120,WRITTEN INFORMAL,S,04/24/2023,,,,
DCR000502682,DC,001,HQ120,WRITTEN INFORMAL,S,10/05/2023,,,,
NYD986909026,NY,001,HQ120,WRITTEN INFORMAL,S,02/15/2023,,,,


In [None]:
# Number of enforcement actions each year per violation
waste_enforcements_metric = formatter(waste_enforcements.shape[0] / waste_violations.shape[0])
enforcements["RCRA"] = waste_enforcements_metric
display(HTML("<h3>"+waste_enforcements_metric+" enforcement actions per violation</h3>"))

In [None]:
# Penalties each year per violating facility
waste_penalties = waste_enforcements.loc[waste_enforcements["FMP_AMOUNT"]>0]
waste_penalties_metric = formatter(sum(waste_penalties["FMP_AMOUNT"]) / len(waste_violations.index.unique())) #Divide by penalized facilities
waste_penalties_max = formatter(max(waste_penalties["FMP_AMOUNT"]))
waste_penalties_min = formatter(min(waste_penalties["FMP_AMOUNT"]))
penalties["RCRA"] = waste_penalties_metric
display(HTML("<h3>$"+waste_penalties_metric +" per facility in violation</h3>"))
display(HTML("<h3>Max: $"+waste_penalties_max +"</h3>"))
display(HTML("<h3>Min: $"+waste_penalties_min +"</h3>"))

## Greenhouse Gas Emissions in 2022 (latest data available)

In [None]:
# Find GHG emissions
ghg_emissions = None
try:
    sql = 'select * from "POLL_RPT_COMBINED_EMISSIONS" where "REPORTING_YEAR" = \'2022\' and "PGM_SYS_ACRNM" = \'E-GGRT\''

    ghg_emissions = get_echo_data( sql)
except EmptyDataError:
    print( "No data found")
ghg_emissions

Unnamed: 0,REPORTING_YEAR,REGISTRY_ID,PGM_SYS_ACRNM,PGM_SYS_ID,POLLUTANT_NAME,ANNUAL_EMISSION,UNIT_OF_MEASURE,NEI_TYPE,NEI_HAP_VOC_FLAG
0,2022,110030735916,E-GGRT,1008735,Methane,17.750,MTCO2e,,
1,2022,110070716075,E-GGRT,1013663,Carbon dioxide,23.100,MTCO2e,,
2,2022,110070716075,E-GGRT,1013663,Methane,20054.500,MTCO2e,,
3,2022,110017864890,E-GGRT,1013537,Carbon dioxide,29490.500,MTCO2e,,
4,2022,110017864890,E-GGRT,1013537,Methane,14.000,MTCO2e,,
...,...,...,...,...,...,...,...,...,...
21457,2022,110063648904,E-GGRT,1007691,Methane,7476.750,MTCO2e,,
21458,2022,110063648904,E-GGRT,1007691,Carbon dioxide,126.400,MTCO2e,,
21459,2022,110028047209,E-GGRT,1004969,Nitrous oxide,5725.176,MTCO2e,,
21460,2022,110028047209,E-GGRT,1004969,Carbon dioxide,158599.900,MTCO2e,,


In [None]:
# Emissions in 2022 per facility
ghg_emissions_metric = formatter(np.nansum(ghg_emissions["ANNUAL_EMISSION"]) / len(ghg_emissions["REGISTRY_ID"].unique())) #Divide by reporting facility
ghg_emissions_fac = ghg_emissions.groupby("PGM_SYS_ID")[["ANNUAL_EMISSION"]].sum() # Group by facility
ghg_emissions_max = formatter(np.nanmax(ghg_emissions_fac["ANNUAL_EMISSION"]))
ghg_emissions_min = formatter(np.nanmin(ghg_emissions_fac.loc[ghg_emissions_fac["ANNUAL_EMISSION"]>0]["ANNUAL_EMISSION"]))
emissions["GHG"] = ghg_emissions_metric
display(HTML("<h3>"+ghg_emissions_metric+" MTCO2e (metric tons of carbon dioxide equivalent) emissions per reporting facility</h3>"))
display(HTML("<h3>Max: "+ghg_emissions_max+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
display(HTML("<h3>Min: "+ghg_emissions_min+" MTCO2e (metric tons of carbon dioxide equivalent) emissions</h3>"))
ghg_emissions_fac

Unnamed: 0_level_0,ANNUAL_EMISSION
PGM_SYS_ID,Unnamed: 1_level_1
1000001,464785.558
1000002,115616.900
1000003,79156.516
1000005,76626.784
1000007,16693.588
...,...
1014727,53616.084
1014728,25935.722
1014729,130774.538
1014730,123149.272


# Data Export

In [None]:
data = [inspections,
violations,
enforcements,
penalties,
emissions]

units = ["#inspections per 1000",
"#violations per 1000",
"#actions per facility in violation",
"$ per facility in violation",
"amount of emissions (metric tons)"]

short_units = ["inspectionsper1000",
"violationsper1000",
"enforcementsperviolatingfacility",
"penaltiesperviolatingfacility",
"emissions2021"]

for index, program in enumerate(data):
    # create dataframe
    df = pd.DataFrame(program, index=[0]).T
    df = df.rename(columns={0: units[index]})
    filename= short_units[index]+"_All_USA_pg4_2023.csv"
    df.to_csv(filename)
    print(df)

     #inspections per 1000
CAA                 194.20
CWA                  95.27
RCRA                 25.40
     #violations per 1000
CAA                 17.41
CWA                856.61
RCRA                15.81
     #actions per facility in violation
CAA                                0.60
CWA                                0.01
RCRA                               0.48
     $ per facility in violation
CAA                    136009.23
CWA                      2002.27
RCRA                     2906.24
    amount of emissions (metric tons)
GHG                         382474.04
