| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: https://github.com/edgi-govdata-archiving/ECHO-COVID19
#### The notebook was collaboratively authored by EDGI following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

# Produce HTML map files for Congressional Districts or States

This notebook is a subset of the cells in the full AllPrograms notebook.  This notebook will take the congressional districts and/or states specified in cell #3 and produce an HTML map in the Outputs directory.  For congressional districts the map will show markers of all facilities.  The number of facilities in a state is generally so large that an HTML containing all of them will not run in a browser, so markers are not produced on the maps for states.

## How to Run
* A "cell" in a Jupyter notebook is a block of code performing a set of actions making available or using specific data.  The notebook works by running one cell after another, as the notebook user selects offered options.
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---

# **Let's begin!**

Hover over the "[ ]" on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell then move to the next one.

These first two cells give us access to some external Python code we will need.

### 1.  Bring in some code that is stored in a Github project.
These two github repositories hold Python code that the notebook uses.
* ECHO_modules holds code that can be used in this and other notebooks--the DataSet class, the make_data_sets() function, etc.
* The ECHO-Cross-Program repository is the one this notebook is contained in.  We clone it to be able to use the utilities.py file contained in it.

In [None]:
!git clone https://github.com/edgi-govdata-archiving/ECHO_modules.git
!git clone https://github.com/edgi-govdata-archiving/ECHO-Cross-Program.git
!pip install geopandas
print("Done!")

### 2.  Run a few Python modules.
These will help us process and visualize the different program data sets later.
* The DataSet class knows how to read the database for an ECHO data set--e.g. CWA Violations.
* The utilities.py has Python code that helps with showing charts and maps, making filenames, etc.
* The make_data_set.py has code that creates a DataSet object for each of the ECHO data sets, using the appropriate database tables.  

In [None]:
%run ECHO_modules/DataSet.py
%run ECHO-Cross-Program/utilities.py
%run ECHO_modules/make_data_sets.py
print("Done!")

### 3.  This cell contains the parameters of the notebook run.  You can change the (state, CD) pairs to run the notebook for multiple congressional districts in multiple states.  After setting the (state, CD) pairs you want, you can instruct the notebook to Run All and it will step through all of the remaining cells.  You can then come back and examine the results.

In [None]:
region_type = 'Congressional District'
should_make_charts = False
read_cds_from_csv = True
cds_filename = 'cds_todo/test_todo_3.csv'
state_cds = []
if ( read_cds_from_csv ):
    from csv import reader
    with open( cds_filename, 'r' ) as read_obj:
        csv_reader = reader( read_obj )
        raw_state_cds = list( map( tuple, csv_reader ))
    state_cds = []
    for state, cd in raw_state_cds:
        if ( cd == '0' ):
            cd = None
        else:
            cd = int( cd )
        state_cds.append((state,cd))
# Specify the state/CD pairs to run. They will be added to any that
# were already read from the file.
# Examples:
# state_cds.extend([('AL',4)])

# data_set_list = ['RCRA Violations', 'RCRA Penalties',
#                  'CAA Violations', 'CAA Penalties',
#                  'CWA Violations', 'CWA Penalties', ] 
                 #CAA Enforcements, CWA Enforcements, RCRA Enforcements
data_set_list = ['RCRA Violations', 'RCRA Inspections', 'RCRA Penalties',
                 'CAA Violations', 'CAA Inspections', 'CAA Penalties', 'Greenhouse Gas Emissions', 
                 'CWA Violations', 'CWA Inspections', 'CWA Penalties', ] 


### 4. Get the State data for comparisons
Ask the database for ECHO_EXPORTER records for facilities in the state.
* state_echo_data is a dictionary with the state name as key and the data as value, for all records.
* state_echo_active is a dictionary for all records in state_echo_data identified as active.

In [None]:
states = list(set([s_cd[0] for s_cd in state_cds]))  #Use conversion to set to make unique
state_echo_data = {}
state_echo_active = {}
for state in states:
    state_echo_data[state] = read_file( 'ECHO_EXPORTER', 'State', state, None )
    if ( state_echo_data[state] is None ):
        sql = 'select * from "ECHO_EXPORTER" where "FAC_STATE" = \'{}\''.format( state )
        state_echo_data[state] = get_data( sql, 'REGISTRY_ID' )
    state_echo_active[state] = state_echo_data[state].loc[state_echo_data[state]['FAC_ACTIVE_FLAG']=='Y']
    print( 'There are {} active facilities in {}.'.format( 
        str(state_echo_active[state].shape[0]), state))

### 5. Number of currently active facilities regulated in CAA, CWA, RCRRA, GHGRP

* cd_echo_data is a dictionary with key (state, cd), where the state_echo_data is filtered for records of the current CD.
* cd_echo_active is a dictionary for active facilities in the CD.
* The number of records from these dictionaries is written into a file named like 'active-facilities_All_pg3', in a directory identified by the state and CD, e.g. "LA2".

In [None]:

cd_echo_data = {}
cd_echo_active = {}
for state, cd in state_cds:
    if ( cd is None ):
        this_echo_data = state_echo_data[state]
    else:
        this_echo_data = state_echo_data[state].loc[state_echo_data[state]['FAC_DERIVED_CD113'] == cd]
        cd_echo_data[(state,cd)] = this_echo_data
    this_echo_active = this_echo_data.loc[this_echo_data['FAC_ACTIVE_FLAG']=='Y']
    if ( cd is not None ):
        cd_echo_active[(state,cd)] = this_echo_active


### 6. Map all currently active facilities in each district

In [None]:
import geopandas
from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.set_window_size( 550, 510 )

for state, cd in state_cds:
    print( 'Map for {} CD {}'.format( state, cd ))
    if ( cd is None ):
        this_data = state_echo_active[state]
        # There are too many facilities in most states to successfully plot on a map.
        f_map = folium.Map( location=[this_data.mean()['FAC_LAT'], this_data.mean()['FAC_LONG']])
        url = "https://github.com/edgi-govdata-archiving/ECHO-Geo/raw/main/states.geojson"
        state_boundary = geopandas.read_file( url )
        state_data = state_boundary[ state_boundary['STUSPS'] == state ]
        w = folium.GeoJson( state_data, name="State" ).add_to( f_map )
        # filename = make_filename( 'map', 'State', 
        #                    None, state, 'html' )
        filename = '{}_map'.format( state )
    else:
        url = "https://raw.githubusercontent.com/unitedstates/districts/gh-pages/cds/2012/{}-{}/shape.geojson".format( state, str(cd))       
        cd_boundary = geopandas.read_file(url)
        bounds = cd_boundary.bounds
        this_data = cd_echo_active[(state, cd)]
        # Only map CAA, CWA, RCRA, or GHG facilities active in this district:
        map_df = this_data.loc[(this_data['AIR_FLAG']=="Y") | (this_data['NPDES_FLAG']=="Y") | \
            (this_data['RCRA_FLAG']=="Y")| (this_data['GHG_FLAG']=="Y")]
        f_map = mapper(df=map_df, bounds=bounds, no_text=True)
        w = folium.GeoJson( cd_boundary, name = "Congressional Districts", 
                          ).add_to( f_map ) 
        # breakpoint()
        # display( m[(state,cd)] )
        # filename = make_filename( 'map', 'Congressional District', 
        #                     state, cd, 'html' )
        filename = '{}{}_map'.format( state, str(cd) )
    f_map.save( '/var/www/html/EDGI/{}.html'.format( filename ))

    driver.get( 'http://localhost/EDGI/{}.html'.format( filename ))
    time.sleep( 6 )
    driver.save_screenshot( 'Output/CD_maps/{}.png'.format( filename ))
