| ![EEW logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/eew.jpg?raw=true) | ![EDGI logo](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/edgi.png?raw=true) |
|---|---|

This notebook is licensed under GPL 3.0. Please visit our [Github repo](https://github.com/edgi-govdata-archiving/ECHO-Cross-Program) for more information.

The notebook was collaboratively authored by EDGI following our [authorship protocol](https://docs.google.com/document/d/1CtDN5ZZ4Zv70fHiBTmWkDJ9mswEipX6eCYrwicP66Xw/).

For more information about this project, visit https://www.environmentalenforcementwatch.org/

Note:  This notebook pulls data from a copy of EPA's ECHO database hosted by Stony Brook University. The data sets are updated on a weekly basis, meaning that some of the results from your run may not exactly match those in [EEW's Congressional Report Cards](https://www.environmentalenforcementwatch.org/reports). For instance, the Report Cards show ten facilities that have spent at least three of the past 12 quarters in non-compliance with different environmental protection laws. These results will therefore change as we enter new parts of the year. In addition, the Report Cards estimate the number of facilities that were active in 2019, since EPA does not provide such figures. Our estimate is based on the number of facilities EPA records as active at the *current* moment in time. In short, we use active right now as a proxy for active in 2019. This number informs several metrics in the Report Cards - including violations and inspections per 1000 facilities - and these will change as the number of facilities reported as active right now by the EPA changes. Please see the [CD-Report repo](https://github.com/edgi-govdata-archiving/CD-report) for facility counts and non-compliance rates as we recorded them in mid-September 2020 in order to produce the Report Cards.

# Examining Data from the EPA's Risk Screening Environmental Indicators (RSEI) 

This notebook examines data from the Risk Screening Environmental Indicators (RSEI) database (https://epa.gov/rsei). 

As data is retrieved from each RSEI data set, a subset of the available fields are selected. Those are in the ***column*** variable in the code blocks. 

Additional columns can be added by modifying the list in the ***column*** variable.

The fields available and their meaning can be found in the data dictionary at this link: (https://www.epa.gov/rsei/rsei-data-dictionary-site-data).

In [None]:
# Install our codebase 
# !pip install ECHO_modules >&/dev/null;
%pip install git+https://github.com/edgi-govdata-archiving/ECHO_modules@neighborhoods >&/dev/null;
%pip install geopandas >&/dev/null;

### Select the type of region and then the state
A state selection is not necessary for Zip Code and Neighborhood region types.

In [1]:
from ECHO_modules.get_data import get_echo_data
from ECHO_modules.utilities import show_region_type_widget, \
    show_state_widget, show_year_range_widget
from ECHO_modules.rsei_utilities import show_rsei_pick_region_widget

region_type_widget = show_region_type_widget(region_types=('City', 'County', 'State', 'Zip Code', 'Neighborhood'), 
                                             default_value='City' )
state_widget = None
# display( region_type_widget )
print('(The State will be ignored for Zip Code and Neighborhood regions.)')
state_widget = show_state_widget()

Dropdown(description='Region of interest:', options=('City', 'County', 'State', 'Zip Code', 'Neighborhood'), s…

(The State will be ignored for Zip Code and Neighborhood regions.)


Dropdown(description='State:', options=('AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI'…

## Select the regions to look for
For Neighborhoods, only rectangles are currently supported.

City, county and state names will be automatically converted to upper case. Don't worry about the case as you type in your selections.

Multiple selections can be made with a comma-separated list.


In [2]:
from ECHO_modules.utilities import polygon_map

region_widget = None
region_type = region_type_widget.value
if ( region_type == 'Neighborhood' ):
    (map,shapes) = polygon_map()
    display(map)
elif ( region_type != 'State' ):
    region_widget = show_rsei_pick_region_widget( type=region_type,
                                           state_widget=state_widget )

Text(value='', description='City:')

## Get the facilities for the chosen regions
These are the producers of toxic waste in the chosen region, as reported to the EPA's Toxic Release Inventory (TRI).

In [13]:
from ECHO_modules.rsei_utilities import get_rsei_facilities

state = state_widget.value if state_widget is not None else None
regions_selected = None
if ( region_type == 'Zip Code' ):
    regions_selected = str(region_widget.value)
elif ( region_type == 'Neighborhood' ):
    regions_selected = shapes.pop()
elif ( region_type != 'State' ):
    regions_selected = region_widget.value
    
columns = '"FacilityName", "FacilityID", "FacilityNumber", "FRSID", "Latitude", "Longitude", "Street",'
columns += '"City", "County", "State", "ZIPCode", "StandardizedParentCompany"'

fac_df = get_rsei_facilities(state=state, region_type=region_type, regions_selected=regions_selected, 
                             rsei_type='facility', columns=columns)
# If the columns aren't specified, all columns are returned ("select * from ...")
# fac_df = get_rsei_facilities(state=state, region_type=region_type, regions_selected=regions_selected, 
#                              rsei_type='facility')
fac_df

select "FacilityName", "FacilityID", "FacilityNumber", "FRSID", "Latitude", "Longitude", "Street","City", "County", "State", "ZIPCode", "StandardizedParentCompany" from "facility_data_rsei_v2312" where upper("State") = 'CO' and upper("City") in ('COMMERCE CITY','BRIGHTON')


Unnamed: 0,FacilityName,FacilityID,FacilityNumber,FRSID,Latitude,Longitude,Street,City,County,State,ZIPCode,StandardizedParentCompany
0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,1534,110000466585,39.80727,-104.9365,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC
1,"ARCHITECTURAL DESIGN PRECAST, CONCRETE INC",80022RCHTC6401E,8725,110009689373,39.84172,-104.91447,6401 E 80TH AVE,COMMERCE CITY,ADAMS,CO,80022,
2,PURINA ANIMAL NUTRITION LLC - COMMERCE CITY,8002WPRNML6151B,11402,110060260355,39.80898,-104.93943,6151 BRIGHTON BLVD,COMMERCE CITY,ADAMS,CO,80022,LAND O LAKES INC
3,DENVER REFINED PRODUCTS TERMINAL,80022DNVRR3601E,14039,110002398780,39.79829,-104.94424,3601 E 56TH AVE,COMMERCE CITY,ADAMS,CO,80022,
4,REPUBLIC PAPERBOARD CO,80022RPBLC5501B,14642,110000466647,39.79826,-104.95155,5501 BRIGHTON BLVD,COMMERCE CITY,ADAMS,CO,80022,
5,SMYRNA READY MIX CONCRETE LLC - BRIGHTON READY...,8060WRCKYM975US,15927,110002436793,40.014029,-104.818395,975 US HWY 85,BRIGHTON,WELD,CO,80603,SMYRNA READY MIX LLC
6,FIVE STAR AFFILIATES,80022FVSTR6731E,16678,110000466610,39.78745,-104.910879,6731 E 50TH AVE,COMMERCE CITY,ADAMS,CO,80022,
7,MESA FIBERGLASS INC,80022MSFBR6471E,18538,110000466629,39.78522,-104.91335,6471 E 49TH DR,COMMERCE CITY,ADAMS,CO,80022,
8,BESTWAY CONCRETE CO-BRIGHTON,8060WBSTWY11723,20914,110060280556,40.00041,-104.83503,11723 WCR #2,BRIGHTON,WELD,CO,80601,
9,THERMO FLUIDS DENVER,8002WTHRMF4845F,27454,110046325071,39.78493,-104.92701,4845 FOREST ST,COMMERCE CITY,ADAMS,CO,80022,CLEAN HARBORS INC


#### See a map of these producing facilities in the regions selected

In [14]:
from ECHO_modules.utilities import mapper

map_of_facilities = mapper(fac_df, no_text=False, lat_field='Latitude', long_field='Longitude', name_field='FacilityName')
display(map_of_facilities)

#### Choose the years for the submissions you want to see

In [4]:
from ECHO_modules.utilities import show_year_range_widget
year_range = show_year_range_widget()

SelectionRangeSlider(description='Dates', index=(0, 54), layout=Layout(width='500px'), options=(1970, 1971, 19…

We'll work to follow the chain from facilities to their submissions (with the associated chemical),
then from the submission to releases (using SubmissionNumber),
then from releases to elements (using ReleaseNumber),
and from releases to offsite facilities (using releases.OffsiteNumber with offsite.FacilityNumber).
We can then try to connect the offsite facility (offsite.TRIFID) with facility (FacilityID)

### Get the submissions made by these facilities

In [15]:
from ECHO_modules.rsei_utilities import get_this_by_that

columns = '"SubmissionNumber", "FacilityNumber", "ChemicalNumber", "SubmissionYear", "OneTimeReleaseQty", "TradeSecretInd"'

# sub_df = get_submissions_by_facilities(facilities=fac_df['FacilityNumber'], columns=columns, years=year_range.value)
sub_df = get_this_by_that(this_name='submissions', that_series=fac_df['FacilityNumber'], this_key='FacilityNumber',
                          this_columns=columns, years=year_range.value, year_field='SubmissionYear')
sub_df

select "SubmissionNumber", "FacilityNumber", "ChemicalNumber", "SubmissionYear", "OneTimeReleaseQty", "TradeSecretInd" from "submissions_data_rsei_v2312" where "FacilityNumber" in (1534,8725,11402,14039,14642,15927,16678,18538,20914,27454,28556,29023,33116,37113,39263,40149,40626,45054,45100,45148,49251,49372,51984,52740,53736,53737,55498,55708,57129,58818,59900,60893,61758)


Unnamed: 0,SubmissionNumber,FacilityNumber,ChemicalNumber,SubmissionYear,OneTimeReleaseQty,TradeSecretInd
727,3990745,40626,599,2012,,0
728,3984353,51984,610,2012,,0
729,3984354,51984,360,2012,,0
730,3984355,51984,609,2012,,0
731,3985447,45100,608,2012,,0
...,...,...,...,...,...,...
1454,4862957,33116,551,2022,,0
1455,4863401,58818,406,2022,0.0,0
1456,4849700,15927,347,2022,,0
1457,4849701,15927,409,2022,,0


Start a linking dataframe with minimal fields to trace from the facility to the offsite locations.
Join on the FacilityNumber fields of fac_df and sub_df (submissions).

In [16]:
link_df = fac_df.set_index('FacilityNumber').join(sub_df.set_index('FacilityNumber'), lsuffix='_left', rsuffix='_right')
link_df

Unnamed: 0_level_0,FacilityName,FacilityID,FRSID,Latitude,Longitude,Street,City,County,State,ZIPCode,StandardizedParentCompany,SubmissionNumber,ChemicalNumber,SubmissionYear,OneTimeReleaseQty,TradeSecretInd
FacilityNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1534,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.807270,-104.936500,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,4000196.0,346.0,2012.0,,0.0
1534,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.807270,-104.936500,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,4000197.0,595.0,2012.0,,0.0
1534,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.807270,-104.936500,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,4076490.0,346.0,2013.0,,0.0
1534,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.807270,-104.936500,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,4076492.0,595.0,2013.0,,0.0
1534,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.807270,-104.936500,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,4143872.0,346.0,2014.0,,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58818,CMC REBAR BRIGHTON,8060WCMCCT455ID,110043671238,40.003333,-104.818889,455 IDA ST.,BRIGHTON,ADAMS,CO,80603,COMMERCIAL METALS CO,4642909.0,406.0,2020.0,0.0,0.0
58818,CMC REBAR BRIGHTON,8060WCMCCT455ID,110043671238,40.003333,-104.818889,455 IDA ST.,BRIGHTON,ADAMS,CO,80603,COMMERCIAL METALS CO,4863401.0,406.0,2022.0,0.0,0.0
59900,COLORADO REFINING CO,80022CLRDR5800B,110070828197,39.804880,-104.942980,5800 BRIGHTON BLVD,COMMERCE CITY,ADAMS,CO,80022,,,,,,
60893,DPC INDUSTRIES INC,80022DXPTR6330C,110002370755,39.811540,-104.940440,6330 COLORADO,COMMERCE CITY,ADAMS,CO,80022,,,,,,


### Get the releases for the submissions

In [17]:

columns = '"ReleaseNumber", "SubmissionNumber", "Media", "PoundsReleased", "OffsiteNumber", "TEF"'
rel_df = get_this_by_that(this_name='releases', that_series=sub_df['SubmissionNumber'], this_key='SubmissionNumber',
                          this_columns=columns)
rel_df

select "ReleaseNumber", "SubmissionNumber", "Media", "PoundsReleased", "OffsiteNumber", "TEF" from "releases_data_rsei_v2312" where "SubmissionNumber" in (3990745,3984353,3984354,3984355,3985447,3985448,3986899,3993747,3993748,3993749,3993750,3990746,4001844,4009045,4000196,4000197,4009047,4009049,4009051,4009054,4009056,4009057,4009059,4009061,4028187,4028188,4029020,4029022,4029023,4029025,4029027,4029028,4029030,4029031,4029033,4029034,4029037,4029039,4029041,4029043,4029045,4029047,4029049,4029051,4029052,4029053,4029055,4029056,4029057,4029058)
select "ReleaseNumber", "SubmissionNumber", "Media", "PoundsReleased", "OffsiteNumber", "TEF" from "releases_data_rsei_v2312" where "SubmissionNumber" in (4054636,4045273,4068411,4057189,4057190,4068412,4093631,4076490,4076492,4082438,4072916,4072921,4073246,4073247,4078715,4078716,4078717,4085050,4085052,4085054,4085056,4085058,4085061,4085064,4085067,4085069,4085076,4085082,4093636,4093638,4093640,4094369,4106858,4106866,4158147,4117203,4

Unnamed: 0,ReleaseNumber,SubmissionNumber,Media,PoundsReleased,OffsiteNumber,TEF
0,6590585,3985448,764,9803.000000,9270.0,
1,6451642,4028188,792,182.000000,1004724.0,
2,6451643,4028188,2,2053.000000,,
3,6451644,4028188,792,743.000000,5764.0,
4,6451645,4028188,3,0.018000,,
...,...,...,...,...,...,...
79,9322207,4855693,6,110.700000,17194.0,
80,9322211,4855687,6,0.000138,17194.0,
81,9322212,4855687,6,0.000062,17194.0,
82,9322213,4855685,6,0.009620,17194.0,


Continue the linking process for facilities by joining the previous link with the releases.

In [18]:
link_df2 = link_df.set_index('SubmissionNumber').join(rel_df.set_index('SubmissionNumber')).dropna(subset=('OffsiteNumber'))
link_df2

Unnamed: 0_level_0,FacilityName,FacilityID,FRSID,Latitude,Longitude,Street,City,County,State,ZIPCode,StandardizedParentCompany,ChemicalNumber,SubmissionYear,OneTimeReleaseQty,TradeSecretInd,ReleaseNumber,Media,PoundsReleased,OffsiteNumber,TEF
SubmissionNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
4000196.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,346.0,2012.0,,0.0,6547477.0,764.0,6.090000,21061.0,
4000196.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,346.0,2012.0,,0.0,6547479.0,741.0,0.460000,27259.0,
4000197.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,595.0,2012.0,,0.0,6547474.0,741.0,3498.280000,27259.0,
4000197.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,595.0,2012.0,,0.0,6547476.0,764.0,19891.320000,21061.0,
4076492.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,AZZ INC,595.0,2013.0,,0.0,6836545.0,741.0,4024.230000,21061.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4855687.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,,360.0,2022.0,,0.0,9028780.0,726.0,0.110000,1205597.0,
4855687.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,,360.0,2022.0,,0.0,9322211.0,6.0,0.000138,17194.0,
4855687.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,,360.0,2022.0,,0.0,9322212.0,6.0,0.000062,17194.0,
4855693.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,,409.0,2022.0,,0.0,9322206.0,6.0,12.300000,17194.0,


### Get the offsite facilities from the releases

In [19]:

columns = '"FacilityNumber", "TRIFID", "FRSID", "Name", "Street", "City", "State", "ZIPCode", "Latitude", "Longitude"'
off_df = get_this_by_that(this_name='offsite', that_series=rel_df['OffsiteNumber'].dropna(), this_key='OffsiteID', this_columns=columns)
off_df

select "FacilityNumber", "TRIFID", "FRSID", "Name", "Street", "City", "State", "ZIPCode", "Latitude", "Longitude" from "offsite_data_rsei_v2312" where "OffsiteID" in (9270.0,1004724.0,5764.0,1206510.0,24812.0,6491.0,31876.0,31876.0,6491.0,6491.0,31876.0,31876.0,6491.0,31876.0,6491.0,6491.0,31876.0,31876.0,6491.0,27259.0,21061.0,21061.0,27259.0,2547.0,2547.0,6491.0,6491.0,1206510.0,28499.0,5764.0,21061.0,9270.0,5764.0,1206510.0,28499.0,21061.0,9270.0,28499.0,9270.0,11866.0,9270.0,21061.0,11866.0,9270.0,5764.0,21395.0,9270.0,28499.0,9270.0,21061.0)
select "FacilityNumber", "TRIFID", "FRSID", "Name", "Street", "City", "State", "ZIPCode", "Latitude", "Longitude" from "offsite_data_rsei_v2312" where "OffsiteID" in (28499.0,5764.0,9270.0,1206510.0,28499.0,21061.0,21061.0,1206510.0,9270.0,28499.0,2547.0,9270.0,9270.0,1206354.0,7796.0,1004434.0,1206354.0,7796.0,17194.0,17194.0,5764.0,1004724.0,24812.0,9270.0,5764.0,9270.0,28499.0,7796.0,9270.0,2547.0,2547.0,31876.0,31876.0,31876.0,6491.0,31876

Unnamed: 0,FacilityNumber,TRIFID,FRSID,Name,Street,City,State,ZIPCode,Latitude,Longitude
0,2547,72015RNC001007V,1.100005e+11,RINECO CHEM,1007 VULCAN RD. - HASKELL,BENTON,AR,72015.0,34.513889,-92.630556
1,5764,66736SYSTCCEMEN,1.100412e+11,SYSTECH ENVIR CORPRT,142 S CEMENT PLANT RD.,FREDONIA,KS,66736.0,37.507728,-95.824133
2,6491,80640NYXNV9131E,1.100108e+11,VEOLIA ES TECH SOLUTIONS,9131 EAST 96TH AVE.,HENDERSON,CO,80640.0,39.875890,-104.882870
3,9270,69145CLNHR5MISO,1.100416e+11,CLEAN HARBORS ENVIR SVC,5 MILES SOUTH OF KIMBALL ON HW 71,KIMBALL,NE,69145.0,41.152720,-103.663390
4,11866,,1.100006e+11,CATALYST RECVY OF LA,100 AMERICAN BLVD.,LAFAYETTE,LA,70508.0,30.167660,-91.986820
...,...,...,...,...,...,...,...,...,...,...
18,1205597,,,REMELT METALS,2350 SO RARITAN,ENGLEWOOD,CO,80110.0,39.645959,-105.008434
19,1208214,,1.100158e+11,"THERMO FLUIDS, INC. (TLN)",4000 & 4020 ARCATA WAY,NORTH LAS VEGAS,NV,89030.0,36.232891,-115.120618
20,1409929,,1.100674e+11,CLEAN HARBORS WICHITA LLC,2808 N OHIO ST,WICHITA,KS,67219.0,37.733840,-97.324130
21,1412845,,,CLEAN HARBORS ENVIRONMENTAL SERVICES INC,9775 E 97TH PL,HENDERSON,CO,80640.0,39.874290,-104.873754


#### Continue the linking process started earlier. 
This time link the OffsiteNumber from releases with the FacilityNumber in offsite.

In [20]:
link_df3 = link_df2.set_index('OffsiteNumber').join(off_df.set_index('FacilityNumber'), lsuffix='_left', rsuffix='_right')
link_df3

Unnamed: 0_level_0,FacilityName,FacilityID,FRSID_left,Latitude_left,Longitude_left,Street_left,City_left,County,State_left,ZIPCode_left,...,TEF,TRIFID,FRSID_right,Name,Street_right,City_right,State_right,ZIPCode_right,Latitude_right,Longitude_right
OffsiteNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
21061.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,...,,80105SFTYK10855,1.100609e+11,CLEAN HARBORS DEER TRAIL LLC,108555 EAST US HIGHWAY 36,DEER TRAIL,CO,80105.0,39.739943,-103.708304
21061.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,...,,80105SFTYK10855,1.100609e+11,CLEAN HARBORS DEER TRAIL LLC,108555 EAST US HIGHWAY 36,DEER TRAIL,CO,80105.0,39.739943,-103.708304
21061.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,...,,80105SFTYK10855,1.100609e+11,CLEAN HARBORS DEER TRAIL LLC,108555 EAST US HIGHWAY 36,DEER TRAIL,CO,80105.0,39.739943,-103.708304
21061.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,...,,80105SFTYK10855,1.100609e+11,CLEAN HARBORS DEER TRAIL LLC,108555 EAST US HIGHWAY 36,DEER TRAIL,CO,80105.0,39.739943,-103.708304
21061.0,AZZ GALVANIZING -DENVER,80022BYLSG4400E,110000466585,39.80727,-104.93650,4400 E 61ST AVE,COMMERCE CITY,ADAMS,CO,80022,...,,80105SFTYK10855,1.100609e+11,CLEAN HARBORS DEER TRAIL LLC,108555 EAST US HIGHWAY 36,DEER TRAIL,CO,80105.0,39.739943,-103.708304
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17194.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,...,,,1.100008e+11,CITY OF BRIGHTON WWTP,625 NORTH KUNER RD.,BRIGHTON,CO,80601.0,39.991666,-104.825000
17194.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,...,,,1.100008e+11,CITY OF BRIGHTON WWTP,625 NORTH KUNER RD.,BRIGHTON,CO,80601.0,39.991666,-104.825000
17194.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,...,,,1.100008e+11,CITY OF BRIGHTON WWTP,625 NORTH KUNER RD.,BRIGHTON,CO,80601.0,39.991666,-104.825000
17194.0,WELLS CONCRETE BRIGHTON,8060WWLLSD2145E,110070567038,40.02949,-104.81151,2145 E CROWN PRINCE BLVD,BRIGHTON,WELD,CO,80603,...,,,1.100008e+11,CITY OF BRIGHTON WWTP,625 NORTH KUNER RD.,BRIGHTON,CO,80601.0,39.991666,-104.825000


Pare the linking information down to just the latitude/longitude for the originating facility (_left)
and the coordinates for the offsite facility (_right).
There may be multiple transfers between the same two facilities, so we drop duplicates.
(The multiple transfers may be of interest. They will exist in link_df3.)

In [21]:
link_df4 = link_df3.drop_duplicates(subset=['Latitude_left', 'Longitude_left', 'Latitude_right', 'Longitude_right'])
link_df4 = link_df4[['Latitude_left', 'Longitude_left', 'Latitude_right', 'Longitude_right']]
link_df4

Unnamed: 0_level_0,Latitude_left,Longitude_left,Latitude_right,Longitude_right
OffsiteNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
21061.0,39.80727,-104.93650,39.739943,-103.708304
27259.0,39.80727,-104.93650,39.798290,-105.057750
1205474.0,39.80727,-104.93650,39.632414,-105.013232
3156.0,39.80727,-104.93650,36.106354,-96.038974
523.0,39.80727,-104.93650,38.464731,-105.113542
...,...,...,...,...
21061.0,39.80427,-104.93787,39.739943,-103.708304
1412845.0,39.80427,-104.93787,39.874290,-104.873754
1206485.0,40.02949,-104.81151,39.819318,-104.932939
1205597.0,40.02949,-104.81151,39.645959,-105.008434


### Link the producing facilities with their offsite facilities.

Map the facilities releasing and the offsite facilities they send to.
    df_dicts : tuple
        Tuple of dictionaries containing the facilities to map.  They must have a latitude and 
        longitude field. The dictionaries should have these fields:

             the DataFrame - 'DataFrame'

             circle border color - 'marker_color'

             circle interior color - 'marker_fill_color'

             facility name - 'name_field' in the dataframe 

             latitude field - 'lat_field'

             longitude field - 'long_field'

             URL field - 'url_field'

The facilities producing waste will be shown with green circles.
The offsite facilities receiving the waste from the green facilities are shown with blue circles.

Lines show transfer from green dot producing facilities to blue dot offsite facilities.

In [22]:
from ECHO_modules.rsei_utilities import mapper2

fac_dict = {
    'DataFrame' : fac_df,
    'marker_color' : 'black',
    'marker_fill_color' : 'green',
    'name_field' : 'FacilityName',
    'lat_field' : 'Latitude',
    'long_field' : 'Longitude',
    'url_field' : None
}
off_dict = {
    'DataFrame' : off_df,
    'marker_color' : 'yellow',
    'marker_fill_color' : 'blue',
    'name_field' : 'Name',
    'lat_field' : 'Latitude',
    'long_field' : 'Longitude',
    'url_field' : None
}
map_facs_and_offs = mapper2(df_dicts=(fac_dict, off_dict), link_df=link_df4 )
display(map_facs_and_offs)

(39.80727, -104.9365), (39.739943, -103.708304)
(39.80727, -104.9365), (39.79829, -105.05775)
(39.80727, -104.9365), (39.632414, -105.013232)
(39.80727, -104.9365), (36.106354, -96.038974)
(39.80727, -104.9365), (38.464731, -105.113542)
(39.80727, -104.9365), (35.828219, -78.666298)
(39.80727, -104.9365), (42.667176, -71.160877)
(39.80727, -104.9365), (41.707547, -87.597591)
(39.80727, -104.9365), (29.74775, -95.31226)
(39.78493, -104.92701), (37.73384, -97.32413)
(39.78493, -104.92701), (41.15272, -103.66339)
(39.78493, -104.92701), (41.648417, -87.482414)
(39.78493, -104.92701), (29.730608, -95.09346)
(39.78493, -104.92701), (36.232891, -115.120618)
(39.78493, -104.92701), (40.228632, -90.357496)
(39.78493, -104.92701), (44.294808, -105.475589)
(39.802789, -104.9475), (41.15272, -103.66339)
(39.802789, -104.9475), (34.513889, -92.630556)
(39.802789, -104.9475), (39.739943, -103.708304)
(39.802789, -104.9475), (39.852357, -104.499627)
(39.802789, -104.9475), (40.530535, -112.296411)
(

## Add the chemicals to the submissions

In [None]:
from ECHO_modules.rsei_utilities import add_chemical_to_submissions

# columns = '"Chemical", "RfCInhale", "RfDOral"'
columns = '*'
sub1_df = add_chemical_to_submissions(submissions=sub_df, chemical_columns=columns)
columns = ["SubmissionNumber", "ChemicalNumber", "Chemical", "RfCInhale"]
sub1_df[columns]

## Get the elements for the releases

In [None]:

columns = '"ElementNumber", "PoundsPT", "ScoreCategory", "Score", "Population", "ScoreA", "PopA", "ScoreB", "PopB"'
element_df = get_this_by_that(this_name='elements', that_series=rel_df['ReleaseNumber'], this_key='ReleaseNumber', 
                              this_columns=columns)
element_df

## See offsite facilities for the chosen region
These offsite facilities may be receiving from other facilities outside of this region. They aren't necessarily linked to the producing facilities in fac_df.

In [None]:
from ECHO_modules.rsei_utilities import get_rsei_facilities

columns = '"Name", "OffsiteID", "FacilityNumber", "TRIFID", "FRSID", "Latitude", "Longitude", "Street",'
columns += '"City", "State", "ZIPCode"'

off_df2 = get_rsei_facilities(state=state, region_type=region_type, regions_selected=regions_selected, 
                             rsei_type='offsite', columns=columns)
off_df2

In [None]:


to_map = off_df2.dropna(subset=['Latitude', 'Longitude'])
map_of_facilities = mapper(to_map, no_text=False, lat_field='Latitude', 
                           long_field='Longitude', name_field='Name')
display(map_of_facilities)

In [None]:

# All the releases where media = 1 (I think that's direct air releases) 
rsql = 'select * from "releases_data_rsei_v2312" where "Media" <= 2;' 
get_echo_data(rsql)
# All the releases above a certain weight 
rsql = 'select * from "releases_data_rsei_v2312" where "PoundsReleased" > 100000;' 
releases = get_echo_data(rsql)

In [None]:
len(releases)

In [None]:
# All the releases where media = 1 (I think that's direct air releases) 
media_sql = 'select "Media", "MediaText" from "media_data_rsei_v2312";' 
media_types = get_echo_data(media_sql)
media_types

In [None]:
# Get Exxon facilities 
rsql = 'select * from "facility_data_rsei_v2312" where "StandardizedParentCompany" like \'%EXXON%\';' 
facs = get_echo_data(rsql) 
# Get their submissions 
these_fac_numbers = list(facs["FacilityNumber"].unique()) 
rsql = 'select * from "submissions_data_rsei_v2312" where "FacilityNumber" in ({});'.format(','.join([str(fac) for fac in these_fac_numbers])) 
# You shouldn't do SQL like this but I'm being quick 
subs = get_echo_data(rsql) 

# Use these submission numbers to get releases 
# Ok, actually there are too many submissions (>20,000) to easily get all the Exxon releases from the database. 
# An enterprising SQL writer could do this with some joins, I bet! No time right now for me though 
# But this is the general idea.... 
these_submission_numbers = list(subs["SubmissionNumber"].unique())[0:50] 
# Just do the first 50 as a test 
rsql = 'select * from "releases_data_rsei_v2312" where "SubmissionNumber" in ({});'.format(','.join([str(fac) for fac in these_submission_numbers])) 
res = get_echo_data(rsql) 
res