# Facility Data Deep Dive

## Violations, Inspections and Enforcements for Resource Conservation and Recovery Act (RCRA), Clean Water Act (CWA), Clean Air Act (CAA)

This notebook examines ECHO data on inspections, violations and enforcements by EPA, state and other agencies.  It uses the following data sets from the ECHO downloadable files:
<ul>
    <li>ECHO_EXPORTER - Facility information, especially derived FAC_DERIVED_CD113 for congressional districts.</li>
    <li>RCRA_EVALUATIONS</li>
</ul>



A zip code must be chosen using the input
widget that is provided.

## How to Run
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---
---

# **Let's begin!**

### Hover over the "[ ]" on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell then move to the next one.

### Run this next cell to create the widget for inputting your zip code. It will create an input field at the bottom. Enter your zip code and then move on to the next cell.

In [None]:
# Import libraries
import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import folium

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

from IPython.display import display

zip_widget = widgets.IntText(
    value=1,
    description='Zip Code:',
    disabled=False
)
display( zip_widget )

### Run this cell after choosing a zip code. It will pull facility information from the data  in the  ECHO_EXPORTER table.

In [None]:
my_zip = zip_widget.value

sql = "select REGISTRY_ID, FAC_NAME, FAC_LAT, FAC_LONG, RCRA_IDS, AIR_IDS, NPDES_IDS," + \
    "RCRA_PERMIT_TYPES, CWA_PERMIT_TYPES, CAA_PERMIT_TYPES, FAC_DERIVED_ZIP, " + \
    "AIR_FLAG, NPDES_FLAG, SDWIS_FLAG, RCRA_FLAG, TRI_FLAG, GHG_FLAG " + \
    "from ECHO_EXPORTER where FAC_DERIVED_ZIP = " + str( my_zip )

url='http://apps.tlt.stonybrook.edu/echoepa/?query='
data_location=url+urllib.parse.quote(sql)
print(sql)
# print(data_location)

echo_data = pd.read_csv(data_location,encoding='iso-8859-1')
echo_data.set_index( 'REGISTRY_ID', inplace=True )
num_facilities = echo_data.shape[0]

print("\nThere are %s EPA facilities in Zip code %s tracked in the ECHO database." \
      %(num_facilities, my_zip))


## Create an interactive map of all of the facilities in the area that report to EPA
### Once you run this cell, a map should appear. You can zoom in and out, or click on facilities to get their names and the EPA programs that monitor them.

The EPA program acronyms are:
<ul>
    <li>CAA = Clean Air Act</li>
    <li>CWA = Clean Water Act</li>
    <li>SDWIS = Safe Drinking Water Information System</li>
    <li>RCRA = Resource Conservation and Recovery Act</li>
    <li>TRI = Toxics Release Inventory</li>
    <li>GHG = Greenhouse Gas</li>
</ul>
The map won't display if there are too many markers, so only the first 200 are shown.

In [None]:
# [To do:  Use some ECHO_EXPORTER data to pick the top 200 facilities (by some measure) 
# to map.]

# Let's show a quick map of your area and the facilities in it
# To-do:  Add some more ECHO_EXPORTER information in the markers.

# Put some information with the marker to show the programs that track the facility.
def marker_text( row ):
    text = row["FAC_NAME"] + ' - '
    if ( row['AIR_FLAG'] == 'Y' ):
        text += 'CAA, ' 
    if ( row['NPDES_FLAG'] == 'Y' ):
        text += 'CWA, ' 
    if ( row['SDWIS_FLAG'] == 'Y' ):
        text += 'SDWIS, ' 
    if ( row['RCRA_FLAG'] == 'Y' ):
        text += 'RCRA, ' 
    if ( row['TRI_FLAG'] == 'Y' ):
        text += 'TRI, ' 
    if ( row['GHG_FLAG'] == 'Y' ):
        text += 'GHG, ' 
    return text
    
def mapper(df):
    # Initialize the map
    center = [df.mean()["FAC_LAT"], df.mean()["FAC_LONG"]]
    m = folium.Map(
        location = center,
        zoom_start = 11
    )
    print( center )
    print( len( df ))

    # Add a clickable marker for each facility
    i = 0
    for index, row in df.iterrows():
        # Make sure the FAC_NAME is not NaN, which is interpreted as a number.
        if ( type( row['FAC_NAME'] == str )) :
            folium.Marker(
                location = [row["FAC_LAT"], row["FAC_LONG"]],
                popup = marker_text( row )).add_to(m)
            i += 1
        if ( i > 200 ):    # The map won't display with too many markers.
            break

    # Show the map
    return m

map_of_facilities_in_cd = mapper(echo_data)
map_of_facilities_in_cd

## Choose a facility
### Run the below cell, then choose a facility from the dropdown that appears to delve deeper into data on inspections, violations and enforcements at that facility.

In [None]:
cd_array = echo_data["FAC_NAME"].dropna().unique()
cd_array.sort()

fac_widget=widgets.Dropdown(
    options=cd_array,
    description='Facility Name:',
    disabled=False,
)
display(fac_widget)

### The next cells filter the program-specific IDs to get just records for the selected facility.
Resulting dataframes are:
<ul>
    <li>rcra_insp_data - RCRA Inspections</li>
    <li>rcra_viol_data - RCRA Violations</li>
    <li>rcra_enf_data - RCRA Enforcements</li>
    <li>air_insp_data - CAA Inspections</li>
    <li>air_viol_data - CAA Violations</li>
    <li>air_enf_data - CAA Enforcements</li>
    <li>air_formal_data - CAA Formal Enforcements</li>
    <li>air_comp_data - CAA Full Partial Compliance</li>
    <li>water_history_data - CWA QNCR History</li>
    <li>water_insp_data - CWA Inspections</li>
    <li>water_enf_data - CWA Enforcements</li>
</ul>

In [None]:
# Keep track of which data sets are retrieved.
have_rcra_insp = False
have_rcra_viol = False
have_rcra_enf = False
have_air_insp = False
have_air_viol = False
have_air_enf = False
have_air_formal = False
have_air_comp = False
have_water_history = False
have_water_insp = False
have_water_enf = False


my_fac = echo_data[echo_data["FAC_NAME"] == fac_widget.value]
if ( my_fac['RCRA_FLAG'].iloc[0] == 'Y' ):
    rcra_id_string = my_fac['RCRA_IDS'].tolist()
    
    try:
        sql = "select * from `RCRA_EVALUATIONS` where ID_NUMBER in ( '" + \
            "', '".join( rcra_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        rcra_insp_data = pd.read_csv(data_location,encoding='iso-8859-1')
        rcra_insp_data.set_index( "ID_NUMBER", inplace=True)
        print( "Data from RCRA_EVALUATIONS stored in rcra_insp_data.")
        # print( rcra_insp_data )
        have_rcra_insp = True
    except pd.errors.EmptyDataError:
        print( "No data for this facility in RCRA_EVALUATIONS.")
    
    try:
        sql = "select * from `RCRA_VIOLATIONS` where ID_NUMBER in ( '" + \
            "', '".join( rcra_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        rcra_viol_data = pd.read_csv(data_location,encoding='iso-8859-1')
        rcra_viol_data.set_index( "ID_NUMBER", inplace=True)
        print( "Data from RCRA_VIOLATIONS stored in rcra_viol_data.")
        have_rcra_viol = True
        # print( rcra_viol_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in RCRA_VIOLATIONS.")
    
        sql = "select * from `RCRA_ENFORCEMENTS` where ID_NUMBER in ( '" + \
            "', '".join( rcra_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        rcra_enf_data = pd.read_csv(data_location,encoding='iso-8859-1')
        rcra_enf_data.set_index( "ID_NUMBER", inplace=True)
        print( "Data from RCRA_ENFORCEMENTS stored in rcra_enf_data.")
        have_rcra_enf = True
        # print( rcra_enf_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in RCRA_ENFORCEMENTS.")
    

In [None]:
if ( my_fac['AIR_FLAG'].iloc[0] == 'Y' ):
    air_id_string = my_fac['AIR_IDS'].tolist()
    
    try:
        sql = "select * from `ICIS_FEC_EPA_INSPECTIONS` where REGISTRY_ID = '" + \
            str(int( my_fac.index[0] )) + "'"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        air_insp_data = pd.read_csv(data_location,encoding='iso-8859-1')
        air_insp_data.set_index( "ID_NUMBER", inplace=True)
        print( "Data from ICIS_FEC_EPA_INSPECTIONS stored in air_insp_data.")
        have_air_insp = True
        # print( air_insp_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in ICIS_FEC_EPA_INSPECTIONS.")
    
    try:
        sql = "select * from `ICIS-AIR_VIOLATION_HISTORY` where PGM_SYS_ID in ( '" + \
            "', '".join( air_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        air_viol_data = pd.read_csv(data_location,encoding='iso-8859-1')
        air_viol_data.set_index( "pgm_sys_id", inplace=True)
        print( "Data from ICIS-AIR_VIOLATION_HISTORY stored in air_viol_data.")
        have_air_viol = True
        # print( air_viol_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in ICIS-AIR_VIOLATION_HISTORY.")
    
    try:
        sql = "select * from `CASE_FACILITIES` CF, `CASE_ENFORCEMENTS` CE " + \
            " where CE.HQ_DIVISION = 'AIR' and CE.CASE_NUMBER = CF.CASE_NUMBER and " + \
            " CF.REGISTRY_ID = '" + str(int( my_fac.index[0] )) + "'"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        air_enf_data = pd.read_csv(data_location,encoding='iso-8859-1')
        air_enf_data.set_index( "pgm_sys_id", inplace=True)
        print( "Data from CASE_ENFORCEMENTS stored in air_enf_data.")
        have_air_enf = True
        # print( air_enf_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in CASE_ENFORCEMENTS.")    

    try:
        sql = "select * from `ICIS-AIR_FORMAL_ACTIONS` where PGM_SYS_ID in ( '" + \
            "', '".join( air_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        air_formal_data = pd.read_csv(data_location,encoding='iso-8859-1')
        air_formal_data.set_index( "pgm_sys_id", inplace=True)
        print( "Data from ICIS-AIR_FORMAL_ACTIONS stored in air_formal_data.")
        have_air_formal = True
        # print( air_formal_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in ICIS-AIR_FORMAL_ACTIONS.")
    
    try:
        sql = "select * from `ICIS-AIR_FCES_PCES` where PGM_SYS_ID in ( '" + \
            "', '".join( air_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        air_comp_data = pd.read_csv(data_location,encoding='iso-8859-1')
        air_comp_data.set_index( "pgm_sys_id", inplace=True)
        print( "Data from ICIS-AIR_FCES_PCES stored in air_comp_data.")
        have_air_comp = True
        # print( air_comp_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in ICIS-AIR_FCES_PCES.")
    


In [None]:
if ( my_fac['NPDES_FLAG'].iloc[0] == 'Y' ):
    water_id_string = my_fac['NPDES_IDS'].tolist()
    
    try:
        sql = "select * from `NPDES_QNCR_HISTORY` where NPDES_ID in ( '" + \
            "', '".join( water_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        water_history_data = pd.read_csv(data_location,encoding='iso-8859-1')
        water_history_data.set_index( "NPDES_ID", inplace=True)
        print( "Data from NPDES_QNCR_HISTORY stored in water_history_data.")
        have_water_history = True
        # print( water_insp_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in NPDES_QNCR_HISTORY.")
    
    try:
        sql = "select * from `NPDES_INSPECTIONS` where NPDES_ID in ( '" + \
            "', '".join( water_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        water_insp_data = pd.read_csv(data_location,encoding='iso-8859-1')
        water_insp_data.set_index( "NPDES_ID", inplace=True)
        print( "Data from NPDES_INSPECTIONS stored in water_insp_data.")
        have_water_insp = True
        # print( water_insp_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in NPDES_INSPECTIONS.")

    try:
        sql = "select * from `NPDES_FORMAL_ENFORCEMENT_ACTIONS` where NPDES_ID in ( '" + \
            "', '".join( water_id_string[0].split() ) + "')"
        url='http://apps.tlt.stonybrook.edu/echoepa/?query='
        data_location=url+urllib.parse.quote(sql)
        water_enf_data = pd.read_csv(data_location,encoding='iso-8859-1')
        water_enf_data.set_index( "NPDES_ID", inplace=True)
        print( "Data from NPDES_FORMAL_ENFORCEMENT_ACTIONS stored in water_enf_data.")
        have_water_enf = True
        # print( water_enf_data )
    except pd.errors.EmptyDataError:
        print( "No data for this facility in NPDES_FORMAL_ENFORCEMENT_ACTIONS.")


## Optionally show the data in one of the data sets.
Select a data set from the dropdown list and run the next cell to view it.
To see additional data, return to the dropdown to choose a new value and run
the cell below this one again.

In [None]:

data_set_dict = {}
if ( have_rcra_insp ):
    data_set_dict["RCRA Inspections"] = rcra_insp_data
if ( have_rcra_viol ):
    data_set_dict["RCRA Violations"] = rcra_viol_data
if ( have_rcra_enf ):
    data_set_dict["RCRA Enforcements"] = rcra_enf_data
if ( have_air_insp ):
    data_set_dict["Air Inspections"] = air_insp_data
if ( have_air_viol ):
    data_set_dict["Air Violations"] = air_viol_data
if ( have_air_enf ):
    data_set_dict["Air Enforcements"] = air_enf_data
if ( have_air_formal ):
    data_set_dict["Air Formal Actions"] = air_formal_data
if ( have_air_comp ):
    data_set_dict["Air Compliance"] = air_comp_data
if ( have_water_history ):
    data_set_dict["Water QNCR History"] = water_history_data
if ( have_water_insp ):
    data_set_dict["Water Inspections"] = water_insp_data
if ( have_water_enf ):
    data_set_dict["Water Enforcements"] = water_enf_data

data_set_widget=widgets.Dropdown(
    options=list(data_set_dict.keys()),
    description='Data Sets:',
    disabled=False,
)
display(data_set_widget)

In [None]:
print( "Showing %s data for the chosen facility" %( data_set_widget.value ))
data_set_dict[data_set_widget.value] 

## This section optionally saves some of this data to a CSV file in your Google Drive.
The first of the next two cells will open your Google Drive to allow the file to be written there.  Then you must select one of the data sets to write from the dropdown list.  The next cell writes the data to your drive.  You can return to the dropdown to write additional data files.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Write the state or congressional district data to CSV file.

In [None]:
data_set_widget2=widgets.Dropdown(
    options=list(data_set_dict.keys()),
    description='Data Sets:',
    disabled=False,
)
display(data_set_widget2)

Choose a data set and run the next cell to download it.

In [None]:
filename = data_set_widget2.value
fullpath = '/content/drive/My Drive/' + filename
fullpath += '.csv'
data_set_dict[data_set_widget2.value].to_csv( fullpath ) 
print( "Writing this data to %s" %(fullpath))