## Clean Water Act: Enforcement Actions 
### By Congressional District

This notebook examines ECHO data on the National Pollutant Discharge Elimination System, or NPDES, which was established under the Clean Water Act to require monitoring and compliance from wastewater treatment plants, factories, and other point sources of water pollution. This notebook uses ECHO_EXPORTER and NPDES_INSPECTIONS.

From ECHO_EXPORTER:
<ul>
    <li>NPDES_IDS - to match facilities/violations in NPDES_INPSECTIONS</li>
    <li>FAC_DERIVED_CD113 - 113th congressional district</li>
    <li>FAC_LAT and FAC_LONG - latitude and longitude</li>
    <li>CWA_PERMIT_TYPES</li>
</ul>

CWA Permit Types include:
<ul>
    <li>Major = Publicly Owned Treatment Works (POTW) Handling at least 1 Million gallons per day as well as other major projects.</li>
    <li>Minor = Any other project.</li>
</ul>

From NPDES_INSPECTIONS we get:
<ul>
    <li>COMP_MONITORING_TYPE - a description of the evaluation</li>
    <li>STATE_EPA_FLAG - the agency that conducted the evaluation</li>
    <li>ACTUAL_BEGIN_DATE</li>
</ul>

A state and congressional district must be chosen using the dropdown
widgets that are provided.

---
---

## How to Run
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Some cells, like the one shown below, will create a dropdown menu after you run them. Be sure to make a selection (for example, click to change NY to LA) before running the next cell.
![Dropdown menu](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/dropdown.JPG?raw=true)
* Other cells will simply print information when you run them, like this one:
![Simple cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/cell-simple.JPG?raw=true)
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---
---

# Let's begin! 
### Hover over the circle on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell. 
Doing so will load in some extra code to help us make sense of our ECHO data and when it finishes, you should see your cell grayed out. You can now move on to the next one.

In [None]:
# Import libraries
import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import folium

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from IPython.display import display



### Run this next cell to create a widget for selecting states. It will create a dropdown menu at the bottom. Choose your state from the menu then move on to the next cell.

In [None]:
states = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
dropdown_state=widgets.Dropdown(
    options=states,
    value='NY',
    description='State:',
    disabled=False,
)
output_state = widgets.Output()
my_state = ""

def dropdown_state_eventhandler( change ):
    output_state.clear_output()
    value = change.new
    with output_state:
        display( change.new )
            
dropdown_state.observe( dropdown_state_eventhandler, names='value')
display( dropdown_state )


### Run this cell after choosing a state. It will pull the data for that state from ECHO

In [None]:
my_state = dropdown_state.value

sql = "select FAC_NAME,  FAC_STATE, FAC_LAT, FAC_LONG, NPDES_IDS, CWA_PERMIT_TYPES, FAC_DERIVED_CD113 from ECHO_EXPORTER where NPDES_FLAG = 'Y'  and FAC_STATE = '%s'" %(my_state)
print(sql)
url='http://apps.tlt.stonybrook.edu/echoepa/?query='
data_location=url+urllib.parse.quote(sql)
#print(data_location)


### Run this cell to load the CSV of that data.
#### How many facilities in the selected state are tracked for water pollution under CWA?

In [None]:
echo_data = pd.read_csv(data_location,encoding='iso-8859-1',header = 0)
num_facilities = echo_data.shape[0]

print("There are %s NDIS facilities in %s tracked in the ECHO database." %(num_facilities, my_state))


### Run this next cell to generate the Congressional District dropdown list for your state. 

#### Here is a map of congressional districts: https://www.govtrack.us/congress/members/map

In [None]:
if (( my_state != 'none' ) & (my_state != 'all' )):
    cd_array = echo_data["FAC_DERIVED_CD113"].fillna(0).astype(int).unique()
    cd_array.sort()
    w2=widgets.Dropdown(
        options=cd_array,
        value=1,
        description='Congressional Districts:',
        disabled=False,
    )
    display(w2)


### Select a CD and run the following cell:

In [None]:
my_cd = w2.value
my_cd_facs = echo_data[echo_data["FAC_DERIVED_CD113"].fillna(0).astype(int) == my_cd]
num_facilities = my_cd_facs.shape[0]    
print("There are %s NDIS facilities in %s district %s tracked in the ECHO database." %(num_facilities, my_state, my_cd))


### Next look up the enforcement history for the facilities in the selected state and congressional district. This step may take a while to run. What we'll get back is a list of facility IDs and the dates of enforcement actions taken against them.
How many are there? Below the table, the number of rows listed is the total number of CWA enforcement actions that have occurred over the history of the district since they started tracking in this database.

In [None]:
sql = "select NPDES_ID, AGENCY, ENF_TYPE_DESC, SETTLEMENT_ENTERED_DATE" + \
        " from NPDES_FORMAL_ENFORCEMENT_ACTIONS  where NPDES_ID like '" + my_state + "%'"
url='http://apps.tlt.stonybrook.edu/echoepa/?query='
data_location=url+urllib.parse.quote(sql)
#print(data_location)
npdes_data = pd.read_csv(data_location,encoding='iso-8859-1',header = 0)
npdes_data.set_index( "NPDES_ID", inplace=True)
npdes_data


### This cell gets more information about each facility. Run it to set up for the next part.

In [None]:
# The NPDES_IDS in ECHO_EXPORTER can contain multiple ids for a facility. 
# The string must be parsed to get each individual NPDES_ID to look up 
# in NPDES_INSPECTIONS.

my_cd_npdes = pd.DataFrame()
no_data_ids = []
for fac in my_cd_facs.itertuples():
    ids = fac.NPDES_IDS
    for npdes_id in ids.split():
        try:
            npdes_fac = npdes_data.loc[ npdes_id ].copy()
            # Add the facility's index number to npdes_data, to refer to it.
            n = npdes_fac.shape[0]
            fac_list = [fac.Index] * n
            npdes_fac['facility'] = fac_list
            frames = [my_cd_npdes, npdes_fac]
            my_cd_npdes = pd.concat( frames )
        except KeyError:
            no_data_ids.append( npdes_id )
          
my_cd_npdes


### Let's look more closely at the facilities that have had these enforcement actions taken against them.
Keep in mind CWA Permit Types:
<ul>
    <li>Major = Publicly Owned Treatment Works (POTW) Handling at least 1 Million gallons per day as well as other major projects.</li>
    <li>Minor = Any other project.</li>
</ul>

In [None]:
fac_idx_list = my_cd_npdes['facility'].dropna()
fac_cd_npdes = my_cd_facs.loc[fac_idx_list]
fac_cd_npdes


### This section saves some of this data to CSV files in your Google Drive.
The first of the next three cells will open our Google Drive to write into.
The second cell writes the congressional district file.
The third cell writes the file for state data. 
**Running these cells is optional.**

In [None]:
from google.colab import drive
drive.mount('/content/drive')


#### Write the congressional district data to CSV file.

In [None]:
filename = '/content/drive/My Drive/cwa-enforcements-' + my_state + '-' + str( my_cd ) + '.csv'
my_cd_npdes.to_csv( filename ) 
print( "Writing this data to %s" %(filename))

#### Write the state data to CSV file.

In [None]:
filename = '/content/drive/My Drive/cwa-enforcements-' + my_state + '.csv'
npdes_data.to_csv( filename ) 
print( "Writing this data to %s" %(filename))


### Let's show a quick map of your area and the facilities in it. 
#### Once you run this cell, a map should appear. You can zoom in and out, or click on facilities to get their names.

In [None]:
def mapper(df):
    # Initialize the map
    m = folium.Map(
        location = [df.mean()["FAC_LAT"], df.mean()["FAC_LONG"]],
        zoom_start = 11
    )

    # Add a clickable marker for each facility
    for index, row in df.iterrows():
        folium.Marker(
            location = [row["FAC_LAT"], row["FAC_LONG"]],
            popup = row["FAC_NAME"] ).add_to(m)

    # Show the map
    return m

map_of_facilities_in_cd = mapper(fac_cd_npdes)
map_of_facilities_in_cd


### What if we wanted to focus on just one facility? 
#### Run this cell to create a dropdown menu where we can pick one facility to learn more about.

In [None]:
cd_array = fac_cd_npdes["FAC_NAME"].unique()
cd_array.sort()

w3=widgets.Dropdown(
    options=cd_array,
    description='Facility Name:',
    disabled=False,
)
display(w3)


#### Run this next cell after choosing a facility to see all the enforcement actions taken against it.

In [None]:
my_fac = fac_cd_npdes[fac_cd_npdes["FAC_NAME"] == w3.value]
evaluations = my_cd_npdes[my_cd_npdes['facility'] == my_fac.iloc[[0]].index[0]]
print( my_fac.iloc[0] )
evaluations


### Let's plot our data!
#### This cell helps us do just that by summarizing the number of enforcement actions by year

In [None]:
# This cell creates a function that will be used by both the CD and the state
# to plot the number of cases by year.
import datetime

def show_plot( df, date_field, year_field, place, date_format, chart_title ):
    format_str = date_format # The format
    nan_count = 0
    year_col = []
    for day in df[date_field]:
        try:
            # breakpoint()
            viol_year = datetime.datetime.strptime(day, format_str).year
            year_col.append( viol_year )
        except:
            nan_count += 1
            year_col.append(  np.NaN )
    df[year_field] = year_col
    
    year_groups = df.groupby( year_field )[[ year_field ]]
    counted_years = year_groups.count()

    # Print how many values are present 
    print(counted_years)
    chart_title +=  " in " + place + " by year"

    ax = counted_years[[year_field]].plot(kind='bar', title = chart_title, figsize=(15, 10), legend=False, fontsize=12)
    ax.set_xlabel("Year", fontsize=12)
    ax.set_ylabel("Count", fontsize=12)


### Plot the number of enforcements by year for the congressional district.

In [None]:
chart_title = "Total CWA enforcements"

show_plot(my_cd_npdes, 'SETTLEMENT_ENTERED_DATE', 'YEAR_ENFORCEMENT', \
          my_state + ' - #' + str( my_cd ), '%m/%d/%Y', chart_title )


### Plot the number of enforcement actions by year, using the entire state.
Since the number of enforcements in a single CD may be small, it can be more
interesting to look at the entire state.

In [None]:
show_plot(npdes_data, 'SETTLEMENT_ENTERED_DATE', 'YEAR_ENFORCEMENT', \
          my_state, '%m/%d/%Y', chart_title )