|  !(Sunrise Logo)(https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/sunrise%20boston.jpg) | ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|---|

#### This notebook is licensed under GPL 3.0. Please visit our Github repo for more information: https://github.com/edgi-govdata-archiving/ECHO-COVID19
#### The notebook was collaboratively authored by EDGI following our authorship protocol: https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/
#### For more information about this project, visit https://www.environmentalenforcementwatch.org/

## How to Run
* A "cell" in a Jupyter notebook is a block of code performing a set of actions making available or using specific data.  The notebook works by running one cell after another, as the notebook user selects offered options.
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

# **Let's begin!** 
These first two cells give us access to some external Python code we will need. Hover over the "[ ]" on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell then move to the next one.
### 1.  Bring in some code that is stored in a Github project.

In [None]:
!git clone -b sunrise --single-branch https://github.com/ericnost/ECHO_modules.git
!git clone https://github.com/edgi-govdata-archiving/ECHO-Geo.git
!git clone https://github.com/edgi-govdata-archiving/ECHO-Sunrise.git # This has the utilities file for mapping

### 2.  Run some external Python modules.

In [None]:
# Import code libraries
%run ECHO_modules/DataSet.py
%run ECHO-Sunrise/utilities.py
import pandas as pd
!pip install geopandas
!apt install libspatialindex-dev &>/dev/null;
!pip install rtree
import rtree
import geopandas
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import requests
import csv
import datetime
import ipywidgets as widgets

### 3. What facilities does EPA track in Mass?
This may take just a little bit of time to load - there are thousands! The next two blocks of code will load in the data and give you a preview of it.

In [None]:
echo_data_sql = "select * from ECHO_EXPORTER where FAC_STATE = 'MA' and FAC_ACTIVE_FLAG='Y'"

try:
    print(echo_data_sql)
    echo_data = get_data( echo_data_sql, 'REGISTRY_ID' )
    num_facilities = echo_data.shape[0]
    print("\nThere are %s facilities in Massachussets currently tracked in the ECHO database." %(num_facilities))
except pd.errors.EmptyDataError:
    print("\nThere are no facilities in this region.\n")

In [None]:
echo_data

### 4.  Run this next cell to create to choose how you want to *zoom in*: what specific programs you want to look at and whether you want to view this information by county, congressional district or zip code.
Here's where you can learn more about the different programs...

In [None]:
%run ECHO_modules/make_data_sets.py

# Only list the data set if it has the correct flag set.
data_set_choices = []
for k, v in data_sets.items():
    if ( v.has_echo_flag( echo_data ) ):
        data_set_choices.append( k )

data_set_widget=widgets.Dropdown(
    options=list(data_set_choices),
    description='Data sets:',
    disabled=False,
    value='Greenhouse Gases'
) 
display(data_set_widget)

region_field = { 
    'Congressional District': { "field": "cd" },
    'County': { "field": "county" },
    'State Districts': { "field": "state_districts" },
    'Town': {"field": "town"},
    'Watershed': {"field": "watershed"},
    'Zip Code': { "field": "zip" },
}

style = {'description_width': 'initial'}
select_region_widget = widgets.Dropdown(
    options=region_field.keys(),
    style=style,
    value='State Districts',
    description='Region of interest:',
    disabled=False
)
display( select_region_widget )

### 5. Here are all the facilities in this program
This may take some time because we're looking at all incidents under this program for all facilities across the state!

First, let's get all the data from the database.

In [None]:
program = data_sets[ data_set_widget.value ]
program_data = None

my_prog_data=get_program_data(echo_data, program, program_data)
my_prog_data=pd.DataFrame(my_prog_data)
my_prog_data

And map it:

In [None]:
fac = my_prog_data.drop_duplicates(subset=["Index"])
map_of_facilities = mapper_marker(fac)
map_of_facilities

### 6. Here are the geographies by which we're going to summarize this information.

In [None]:
# read in and map geojson for the selected geography
geo = region_field[select_region_widget.value]["field"].lower()
geo_json_data = geopandas.read_file("ECHO-Geo/ma_"+geo+".geojson")

# Get rid of any null geographies
for geom in geo_json_data.iterrows():
    if(geom[1]["geometry"]==None):
        geo_json_data=geo_json_data.drop(geom[0])
        
m = folium.Map(
    #location=[42.365135, -72.079501], zoom_start=8
)
folium.GeoJson(
    geo_json_data,
).add_to(m)

bounds = m.get_bounds()
m.fit_bounds(bounds)

m

### 7. Now we bring the geographic data and the facility data together. First, let's rank each geography.

In [None]:
# first, spatialize my_prog_data
gdf = geopandas.GeoDataFrame(
    my_prog_data, crs="EPSG:4326", geometry=geopandas.points_from_xy(my_prog_data["FAC_LONG"], my_prog_data["FAC_LAT"]))

join = geopandas.sjoin(gdf, geo_json_data, how="inner", op='intersects')

join.to_csv("full_program_data_"+program.name+".csv")

# get geo and attribute data column names
geo_column = {"county": "COUNTY", "state_districts": "REP_DIST", "town": "town"} # EXPAND
att_column = {"Greenhouse Gases": {"col":"ANNUAL_EMISSION", "agg":"sum"},
              "Air Inspections": {"col": "ACTIVITY_TYPE_CODE", "agg": "count"},
              "Clean Water Inspections": {"col":"ACTIVITY_TYPE_CODE", "agg":"count"}} # EXPAND
g = geo_column[geo]
a = att_column[program.name]["col"]

data = join.groupby(join[g])[[a]].agg(att_column[program.name]["agg"])
data.to_csv(program.name+"_geos_ranked_"+geo+".csv")
data.sort_values(by=a, ascending = False)

### 8. Now, let's map it!

In [None]:
data.reset_index(inplace=True)
att_data = data.rename(columns={g: "geo", a: "value"}) 
mp = mapper_area(geo_json_data, att_data, g)
mp

### 9. Rank individual facilities.

In [None]:
ranked = my_prog_data.groupby(["Index", "FAC_NAME", "FAC_LAT", "FAC_LONG"])[[a]].agg(att_column[program.name]["agg"])
ranked.reset_index(inplace=True)
ranked = ranked.set_index("Index")
ranked.to_csv(program.name+"_facilities_ranked_"+geo+".csv")
ranked.sort_values(by=a, ascending=False)

### 10. Map individual facilities.

In [None]:
ranked['quantile'] = pd.qcut(ranked[a], 4, labels=False, duplicates="drop")
mp = mapper_circle(ranked, a)
mp

### 11. Are these facilities near prisons, detention centers, and jails?
We will try to answer this question using ECHO data. It it is important to note that ECHO data has significant gaps. As the [Carceral Ecologies](https://github.com/Carceral-Ecologies/Carceral-ECHO-data) project is finding out, ECHO does not fully cover these kinds of facilities. It could be that not all are regulated by the EPA, and so they wouldn't appear in ECHO. Or, they could be regulated by the EPA, but be missing from ECHO anyway.

In addition, we will use NAICS codes, which classify facilities according to their purpose, and which are included in ECHO records. The [NAICS code](https://www.naics.com/naics-code-description/?code=922140) for prisons, detention centers, and jails is 922140. However, if one of these kinds of facilities is not actually classified as 922140, for whatever reason, then we won't catch it.

A different question is, _what_ industrial facilities are near prisons, detention centers, and jails? That will be a separate analysis, forthcoming.

First, here are the prisons, detention centers, and jail identified in ECHO.

In [None]:
naics = echo_data.loc[(echo_data["FAC_NAICS_CODES"].notna())] # We can only look at facilities that have at least one NAICS code
pdcjs = naics.loc[(naics["FAC_NAICS_CODES"].str.contains("922140"))]
mapper_marker(pdcjs)

In the following cell, we show the facilities under the selected program that fall within a 5 mile radius of a prison, detention center, or jail.

The prisons/detention centers/jails are the small orange dots, the 5 mile buffers around them are the blue circles, and the blue pins are the facilities regulated under the program you selected above (e.g. Greenhouse Gases) that fall within this 5 mile buffer.

In [None]:
from geopandas import GeoSeries

pdcjs = geopandas.GeoDataFrame(pdcjs, crs = "EPSG:4326", geometry=geopandas.points_from_xy(pdcjs["FAC_LONG"], pdcjs["FAC_LAT"]))
pdcjs=pdcjs.to_crs("EPSG:32615") # Project
b = pdcjs.buffer(8100) # Create 8100 m or ~ 5 mile radius bufffer around each prison, detention center, and jail

fac = geopandas.GeoDataFrame(ranked, crs = "EPSG:4326", geometry=geopandas.points_from_xy(ranked["FAC_LONG"], ranked["FAC_LAT"]))
fac=fac.to_crs("EPSG:32615") # Project

i = fac.loc[fac.intersects(b.unary_union)] # Clip program facilities to the buffer

m = folium.Map(
    #tiles='Mapbox Bright',
)
folium.GeoJson(
    b,
).add_to(m)
for index, row in pdcjs.iterrows():
    folium.CircleMarker(
        location = [row["FAC_LAT"], row["FAC_LONG"]],
        popup = row["FAC_NAME"], 
        radius = 4,
        color = "black",
        weight = 1,
        fill_color = "orange",
        fill_opacity= .4
    ).add_to(m)
for index, row in i.iterrows():
    folium.Marker(
        location = [row["FAC_LAT"], row["FAC_LONG"]],
        popup = row["FAC_NAME"] + ": " + str(row[a])
    ).add_to(m)

bounds = m.get_bounds()
m.fit_bounds(bounds)

m