|  Sunrise logo | ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: https://github.com/edgi-govdata-archiving/ECHO-COVID19
#### The notebook was collaboratively authored by the Environmental Data & Governance Initiative (EDGI) following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

## How to Run
* A "cell" in a Jupyter notebook is a block of code performing a set of actions making available or using specific data.  The notebook works by running one cell after another, as the notebook user selects offered options.
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

# **Let's begin!** 
These first few cells give us access to some external Python code we will need. Hover over the "[ ]" on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell then move to the next one.
### 1.  Bring in extra code

In [None]:
# Code stored in Github projects
!git clone -b sunrise --single-branch https://github.com/ericnost/ECHO_modules.git &>/dev/null;
!git clone -b add_geos https://github.com/edgi-govdata-archiving/ECHO-Geo.git &>/dev/null;
!git clone -b split https://github.com/edgi-govdata-archiving/ECHO-Sunrise.git &>/dev/null; # This has the utilities file for mapping and make_data_sets.py

# Import main code libraries
%run ECHO_modules/DataSet.py
%run ECHO-Sunrise/utilities.py
import pandas as pd
!pip install geopandas &>/dev/null;
import geopandas
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import requests
import csv
import datetime
import ipywidgets as widgets    

### Load CD map and choose district

In [None]:
select_region_widget = widgets.Dropdown(
    options=['1', '2', '3', '4', '5', '6', '7', '8', '9'],
    style=style,
    value='1',
    description='Select a Congressional District:',
    disabled=False
)
display( select_region_widget )

# Read in and map geojson for the selected geography
geo_json_data = geopandas.read_file("ECHO-Geo/ma_congressional_districts.geojson")

m = folium.Map(
    #tiles='Mapbox Bright',
)
folium.GeoJson(
    geo_json_data,
    name = "Congressional District",
    popup=folium.GeoJsonPopup(fields=["ids"])
).add_to(m)

bounds = m.get_bounds()
m.fit_bounds(bounds)

m

### 3. What facilities does EPA track in this District?
This may take just a little bit of time to load - there may be thousands! The next two blocks of code will load in the data and give you a preview of it.

In [None]:
echo_data_sql = "select * from ECHO_EXPORTER where FAC_STATE = 'MA' and FAC_ACTIVE_FLAG='Y' and FAC_DERIVED_CD113='"+select_region_widget.value+"'"
print(echo_data_sql)
try:
    echo_data = get_data( echo_data_sql, 'REGISTRY_ID' )
    num_facilities = echo_data.shape[0]
    print("\nThere are %s facilities in Massachussets Congressional District %s currently tracked in the ECHO database." %(num_facilities, select_region_widget.value))
    echo_data
except pd.errors.EmptyDataError:
    print("\nThere are no facilities in this region.\n")

### 4.  Run this next cell to create to choose how you want to *zoom in* on the data: what program you want to focus on (Air, Water, Waste, or GHG).
Here's where you can learn more about the different programs...

In [None]:
# Only list the data set if it has the correct flag set.
%run make_data_sets.py #%run ECHO-Sunrise/make_data_sets.py
data_sets=make_data_sets()

data_set_choices = []
for k, v in data_sets.items():
    if ( v.has_echo_flag( echo_data ) ):
        data_set_choices.append( k )

data_set_widget=widgets.Dropdown(
    options=list(data_set_choices),
    description='Data sets:',
    disabled=False,
    value='Greenhouse Gases'
) 
display(data_set_widget)

### 5. Here are all the facilities 
First, let's get all the data from the database. And map. And barchart

In [None]:
program = data_sets[ data_set_widget.value ]
program_data = None

my_prog_data, bars, stacked = get_program_data(echo_data, program, program_data)

ax = bars.plot(kind='bar', stacked=stacked, title = program.name + ": Congressional District "+select_region_widget.value, figsize=(20, 10), fontsize=16)
ax.set_xlabel( 'Reporting Year' )
ax.set_ylabel( program.name )
ax    

map_of_facilities = mapper_marker(my_prog_data)
map_of_facilities

### 7. Now we bring the geographic data and the facility data together.

In [None]:
# Make a geodataframe out of the facilities data   
gdf = geopandas.GeoDataFrame(
    my_prog_data, crs= "EPSG:4326", geometry=geopandas.points_from_xy(my_prog_data["FAC_LONG"], my_prog_data["FAC_LAT"]))
gdf.to_csv("full_program_data-"+program.name+"-"+g+".csv")

cmap = geo_json_data.loc[(geo_json_data["ids"] == select_region_widget.value)] #where ids match

# get geo and attribute data column names
geo_column = {"congressional_districts": "ids"}
geo = "congressional_districts"
g = geo_column[geo]
a = program.agg_col

ranked = my_prog_data.set_index("Index")
ranked['quantile'] = pd.qcut(ranked[a], 5, labels=False, duplicates="drop")
ranked = ranked.sort_values(by=a, ascending=False)
ranked.to_csv("facilities_ranked-"+program.name+".csv")

sns.set(style='whitegrid')
plt.figure(figsize=(10,6))
unit = ranked[0:19]["FAC_NAME"] # First 20 rows 
values = ranked[0:19][a] # First 20 rows
sns.barplot(values, unit, order=list(unit), orient="h") 

plt.title('Top 20 facilities in Massachusetts Congressional District %s from 2010-2018' %(select_region_widget.value))
plt.xlabel(program.name)

plt.show()

mp = mapper_area(ranked, cmap, g, a)
mp