| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: https://github.com/edgi-govdata-archiving/ECHO-COVID19
#### The notebook was collaboratively authored by EDGI following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

# Examining Data from Multiple EPA Programs

This notebook examines data from the EPA's Enforcement and Compliance History Online (ECHO) database (https://echo.epa.gov/). It includes information from EPA's programs covering air quality (the Clean Air Act, or CAA), water quality (the Clean Water Act, or CWA), and hazardous and other waste processing (the Resource Recovery and Conservation Act, or RCRA). 

ECHO data is available for facility violations as well as inspections and enforcement actions by EPA, state and other agencies. The data made accessible here runs from the present day (the database is refreshed weekly) back to 2001, which is when the EPA believes the data to be most reliable. The notebook can be run to produce data for multiple Congressional Districts and states of your choosing. 

## How to Run
* A "cell" in a Jupyter notebook is a block of code performing a set of actions making available or using specific data.  The notebook works by running one cell after another, as the notebook user selects offered options.
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---

# **Let's begin!**

Hover over the "[ ]" on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell then move to the next one.

These first two cells give us access to some external Python code we will need.

### 1.  Bring in some code that is stored in a Github project.
These two github repositories hold Python code that the notebook uses.
* ECHO_modules holds code that can be used in this and other notebooks--the DataSet class, the make_data_sets() function, etc.
* The ECHO-Cross-Program repository is the one this notebook is contained in.  We clone it to be able to use the utilities.py file contained in it.

In [None]:
!git clone https://github.com/edgi-govdata-archiving/ECHO_modules.git
!git clone https://github.com/edgi-govdata-archiving/ECHO-Cross-Program.git
#Some!pip install geopandas
print("Done!")

### 2.  Run a few Python modules.
These will help us process and visualize the different program data sets later.
* The DataSet class knows how to read the database for an ECHO data set--e.g. CWA Violations.
* The utilities.py has Python code that helps with showing charts and maps, making filenames, etc.
* The make_data_set.py has code that creates a DataSet object for each of the ECHO data sets, using the appropriate database tables.  

In [None]:
%run ECHO_modules/DataSet.py
%run ECHO-Cross-Program/utilities.py
%run ECHO_modules/make_data_sets.py
print("Done!")

### 3.  This cell contains the parameters of the notebook run.  You can change the (state, CD) pairs to run the notebook for multiple congressional districts in multiple states.  After setting the (state, CD) pairs you want, you can instruct the notebook to Run All and it will step through all of the remaining cells.  You can then come back and examine the results.

In [None]:
region_type = 'Congressional District'
should_make_charts = True
state_cds = [('WV',1),('TX',17)]
#state_cds = [('WV',1),('WV',None),('WY',None),('TX',17),('TX',33),('TX',26)
#    ('TX',22),('TX',21),('TX',31),('TX',24),('TX',10),('TX',23),('TX',None),
#    ('VA',4),('VA',9),('VT',None),('WA',5),('WA',3),('OR',2),('OR',5),('OR',None),
#    ('PA',18),('PA',1),('PA',10),('RI',None),('SC',3),('SC',None),('SD',None),    
#    ('NY',16),('NY',20),('NY',9),('NY',27),('NY',24),('NY',1),('NY',2),('OH',6),
#    ('OH',5),('OH',1),('OH',12),('OK',2),('OK',None),('MS',None),('MT',None),
#    ('NC',1),('NC',8),('NC',2),('NC',None), ('ND',None), ('NE',2),('NH',2),
#    ('NH',None),('NJ',6),('NJ',None),('NM',3),('NM',None),('MI',12),('MI',6),
#    ('MI',None),('MN',1),('MN',None),('MO',7),('MO',2),('LA',1),('LA',2),
#    ('MA',4),('MA',None),('MD',3),('MD',None),('ME',None),('IN',8),('IN',5),
#    ('IN',None),('KS',2),('KS',None),('KY',2),('KY',None),('AK',None),
#    ('AL',None),('AR',None),('AZ',1),('AZ',None),('IL',16),('IL',1),('IL',9),
#    ('IL',15),('IL',2),('IL',None),('IL',13),('GA',1),('GA',7),('GA',None),
#    ('IA',2),('IA',None),('IA',4),('CO',1),('CO',None),('DE',None),('FL',9),
#    ('FL',12),('FL',14),('FL',15),('CA', 6),('CA',9),('CA',44),('CA',36),
#    ('CA',52), ('CA',29), ('CA',22), ('CA',50)]
# Change this^! For example, instead of running New Jersey's 6th Congressional ('NJ', 6) you could do Wisconsin's 2nd ('WI', 2)
# See here: https://www.govtrack.us/congress/members/map
data_set_list = ['RCRA Violations', 'RCRA Inspections', 'RCRA Penalties', 'CAA Enforcements',
                 'CAA Violations', 'CAA Inspections', 'CAA Penalties', 'Greenhouse Gas Emissions', 
                 'CWA Violations', 'CWA Inspections', 'CWA Penalties', ] 
                 #CAA Enforcements, CWA Enforcements, RCRA Enforcements


### 6. Get the State data for comparisons
Ask the database for ECHO_EXPORTER records for facilities in the state.
* state_echo_data is a dictionary with the state name as key and the data as value, for all records.
* state_echo_active is a dictionary for all records in state_echo_data identified as active.

In [None]:
states = list(set([s_cd[0] for s_cd in state_cds]))  #Use conversion to set to make unique
state_echo_data = {}
state_echo_active = {}
for state in states:
    state_echo_data[state] = read_file( 'ECHO_EXPORTER', 'State', state, None )
    if ( state_echo_data[state] is None ):
        sql = 'select * from "ECHO_EXPORTER" where "FAC_STATE" = \'{}\''.format( state )
        state_echo_data[state] = get_data( sql, 'REGISTRY_ID' )
        write_dataset( state_echo_data[state], 'ECHO_EXPORTER', 'State', state, None )
    state_echo_active[state] = state_echo_data[state].loc[state_echo_data[state]['FAC_ACTIVE_FLAG']=='Y']
    print( 'There are {} active facilities in {}.'.format( 
        str(state_echo_active[state].shape[0]), state))

### 7. Number of currently active facilities regulated in CAA, CWA, RCRRA, GHGRP
* The program_count() function looks at the ECHO_EXPORTER data that is passed in and counts the number of facilities have the 'flag' parameter set to 'Y' (AIR_FLAG, NPDES_FLAG, RCRA_FLAG, GHG_FLAG)
* cd_echo_data is a dictionary with key (state, cd), where the state_echo_data is filtered for records of the current CD.
* cd_echo_active is a dictionary for active facilities in the CD.
* The number of records from these dictionaries is written into a file named like 'active-facilities_All_pg3', in a directory identified by the state and CD, e.g. "LA2".

In [None]:
def program_count( echo_data, program, flag, state, cd ):
    count = echo_data.loc[echo_data[flag]=='Y'].shape[0]
    print( 'There are {} active facilities in {} CD {} tracked under {}.'.format( 
        str( count ), state, cd, program))
    return count
    
cd_echo_data = {}
cd_echo_active = {}
for state, cd in state_cds:
    rowdata = []    
    if ( cd is None ):
        this_echo_data = state_echo_data[state]
        filename = make_filename( 'active-facilities_All_pg3', 'State', 
                             None, state )
    else:
        this_echo_data = state_echo_data[state].loc[state_echo_data[state]['FAC_DERIVED_CD113'] == cd]
        cd_echo_data[(state,cd)] = this_echo_data
        filename = make_filename( 'active-facilities_All_pg3', 'Congressional District', 
                             state, cd )
    this_echo_active = this_echo_data.loc[this_echo_data['FAC_ACTIVE_FLAG']=='Y']
    if ( cd is not None ):
        cd_echo_active[(state,cd)] = this_echo_active
    rowdata.append( ['CAA', program_count( this_echo_active, 'CAA', 'AIR_FLAG', state, cd)] )
    rowdata.append( ['CWA', program_count( this_echo_active, 'CWA', 'NPDES_FLAG', state, cd)] )
    rowdata.append( ['RCRA', program_count( this_echo_active, 'RCRA', 'RCRA_FLAG', state, cd)] )
    rowdata.append( ['GHG', program_count( this_echo_active, 'GHG', 'GHG_FLAG', state, cd)] )
    with open( filename, 'w', newline='' ) as csvfile:
        header = ['Program', 'Count']
        writer = csv.writer( csvfile )
        writer.writerow( header )
        writer.writerows( rowdata ) 
        print( "Wrote {}".format( filename ))     

### 8. Map all currently active facilities in each district

In [None]:
if ( should_make_charts ):
    import geopandas

    for state, cd in state_cds:
        print( 'Map for {} CD {}'.format( state, cd ))
        if ( cd is None ):
            this_data = state_echo_active[state]
        else:
            this_data = cd_echo_active[(state, cd)]
        # Only map CAA, CWA, RCRA, or GHG facilities active in this district:
        map_data = this_data.loc[(this_data['AIR_FLAG']=="Y") | (this_data['NPDES_FLAG']=="Y") |
                (this_data['RCRA_FLAG']=="Y")| (this_data['GHG_FLAG']=="Y")]
        m = mapper(df=map_data,is_echo=False)
        if ( cd is not None ):
            url = "https://raw.githubusercontent.com/unitedstates/districts/gh-pages/cds/2016/{}-{}/shape.geojson".format( state, str(cd))
            map_data = geopandas.read_file(url)
            w = folium.GeoJson(
                map_data,
                name = "EPA Regions",
            ).add_to(m) #m is the map object created to hold the facility points. we want to add this shape object to that map object
            folium.GeoJsonTooltip(fields=["District"]).add_to(w)

        display( m )