# Facility Data Deep Dive

## Violations, Inspections and Enforcements for the Resource Conservation and Recovery Act (RCRA), Clean Water Act (CWA), Clean Air Act (CAA)

This notebook examines ECHO data on inspections, violations and enforcements by EPA, state and other agencies. It uses the following data sets from the ECHO downloadable files:

- RCRA_EVALUATIONS = Inspections under RCRA
- RCRA_VIOLATIONS = Violations of RCRA rules
- RCRA_ENFORCEMENTS = Enforcement actions takenby state agencies and the EPA
- ICIS_FEC_EPA_INSPECTIONS = Cross-program, federally-led inspections 
- CASE_FACILITIES = Cross-program enforcements
- ICIS-AIR_VIOLATION_HISTORY = CAA violations
- ICIS-AIR_FCES_PCES = Both state and federal CAA compliance evaluations
- ICIS-AIR_FORMAL_ACTIONS = CAA formal enforcement actions
- NPDES_QNCR_HISTORY = CWA Quarterly Non-Compliance History
- NPDES_INSPECTIONS = CWA Inspections
- NPDES_FORMAL_ENFORCEMENT_ACTIONS = CWA Enforcements


#### A zip code must be chosen using the input widget that is provided.

## How to Run
* If you click on a gray **code** cell, a little “play button” arrow appears on the left. If you click the play button, it will run the code in that cell (“**running** a cell”). The button will animate. When the animation stops, the cell has finished running.
![Where to click to run the cell](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/pressplay.JPG?raw=true)
* You may get a warning that the notebook was not authored by Google. We know, we authored them! It’s okay. Click “Run Anyway” to continue. 
![Error Message](https://github.com/edgi-govdata-archiving/EEW-Image-Assets/blob/master/Jupyter%20instructions/warning-message.JPG?raw=true)
* **It is important to run cells in order because they depend on each other.**
* Run all of the cells in a Notebook to make a complete report. Please feel free to look at and **learn about each result as you create it**!

---
---

# **Let's begin!**

### Hover over the "[ ]" on the top left corner of the cell below and you should see a "play" button appear. Click on it to run the cell then move to the next one.

### Run this next cell to create the widget for inputting your zip code. It will create an input field at the bottom. Enter your zip code and then move on to the next cell.

In [None]:
# Import libraries
import urllib.parse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import folium

def get_data( sql, index_field ):
    url='http://apps.tlt.stonybrook.edu/echoepa/?query='
    data_location=url+urllib.parse.quote(sql)
    ds = pd.read_csv(data_location,encoding='iso-8859-1')
    ds.set_index( index_field, inplace=True)
    return ds
 
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

from IPython.display import display

zip_widget = widgets.IntText(
    value=1,
    description='Zip Code:',
    disabled=False
)
display( zip_widget )

### Run this cell after choosing a zip code. It will pull facility information from the data  in the  ECHO_EXPORTER table.

In [None]:
my_zip = zip_widget.value

sql = "select * from ECHO_EXPORTER where FAC_DERIVED_ZIP = " + str( my_zip ) #FAC_ZIP?

try:
    echo_data = get_data( sql, 'REGISTRY_ID' )
    num_facilities = echo_data.shape[0]
    print("\nThere are %s EPA facilities in Zip code %s tracked in the ECHO database." \
          %(num_facilities, my_zip))
except pd.errors.EmptyDataError:
    print("\nThere are no EPA facilities in this zip code.\n")

## Create an interactive map of all of the facilities in the area that report to EPA
### Once you run this cell, a map should appear. You can zoom in and out, or click on facilities to get their names and the EPA programs that monitor them.

The EPA program acronyms are:
- CAA = Clean Air Act
- CWA = Clean Water Act
- SDWIS = Safe Drinking Water Information System
- RCRA = Resource Conservation and Recovery Act
- TRI = Toxics Release Inventory
- GHG = Greenhouse Gas
The map won't display if there are too many markers, so only the first 200 are shown.

In [None]:
# [To do:  Use some ECHO_EXPORTER data to pick the top 200 facilities (by some measure) 
# to map.]

# Let's show a quick map of your area and the facilities in it
# To-do:  Add some more ECHO_EXPORTER information in the markers.

# Put some information with the marker to show the programs that track the facility.
def marker_text( row ):
    text = ""
    if ( type( row['FAC_NAME'] == str )) :
        try:
            text = row["FAC_NAME"] + ' - '
        except TypeError:
            print( "A facility was found without name. ")
        if ( row['AIR_FLAG'] == 'Y' ):
            text += 'CAA, ' 
        if ( row['NPDES_FLAG'] == 'Y' ):
            text += 'CWA, ' 
        if ( row['SDWIS_FLAG'] == 'Y' ):
            text += 'SDWIS, ' 
        if ( row['RCRA_FLAG'] == 'Y' ):
            text += 'RCRA, ' 
        if ( row['TRI_FLAG'] == 'Y' ):
            text += 'TRI, ' 
        if ( row['GHG_FLAG'] == 'Y' ):
            text += 'GHG, ' 
    return text
    
def mapper(df):
    # Initialize the map
    center = [df.mean()["FAC_LAT"], df.mean()["FAC_LONG"]]
    m = folium.Map(
        location = center,
    )

    # Add a clickable marker for each facility
    i = 0
    for index, row in df.iterrows():
        # Make sure the FAC_NAME is not NaN, which is interpreted as a number.
        if ( type( row['FAC_NAME'] == str )) :
            folium.Marker(
                location = [row["FAC_LAT"], row["FAC_LONG"]],
                popup = marker_text( row )).add_to(m)
            i += 1
        if ( i > 200 ):    # The map won't display with too many markers.
            break
            
    bounds = m.get_bounds()
    m.fit_bounds(bounds)
    
    # Show the map
    return m

map_of_facilities_in_zip = mapper(echo_data)
map_of_facilities_in_zip

## Graph program-specific data for all the facilities in the zip code.
### First, choose the program (Air, Water, or RCRA) and type of data (Violations, Inspections, Enforcement) you want to explore

Running the code below will show you a dropdown you can use to make a selection. What's available for you to look at here:

- RCRA_EVALUATIONS = Inspections under RCRA
- RCRA_VIOLATIONS = Violations of RCRA rules
- RCRA_ENFORCEMENTS = Enforcement actions takenby state agencies and the EPA
- ~~ICIS_FEC_EPA_INSPECTIONS = Cross-program, federally-led inspections~~ (TBD)  
- ~~CASE_FACILITIES = Cross-program enforcements~~ (TBD) 
- ICIS-AIR_VIOLATION_HISTORY = CAA violations
- ICIS-AIR_FCES_PCES = Both state and federal CAA compliance evaluations
- ICIS-AIR_FORMAL_ACTIONS = CAA formal enforcement actions
- NPDES_QNCR_HISTORY = CWA Quarterly Non-Compliance History
- NPDES_INSPECTIONS = CWA Inspections
- NPDES_FORMAL_ENFORCEMENT_ACTIONS = CWA Enforcements


In [None]:
zip_data_set_widget=widgets.Dropdown(
    options=['RCRA_EVALUATIONS','RCRA_VIOLATIONS', 
             'RCRA_ENFORCEMENTS', 
            'ICIS-AIR_VIOLATION_HISTORY', 'ICIS-AIR_FORMAL_ACTIONS',
            'ICIS-AIR_FCES_PCES','NPDES_QNCR_HISTORY',
            'NPDES_INSPECTIONS','NPDES_FORMAL_ENFORCEMENT_ACTIONS'
            ], # TO DO: Add in Case Facilities, 'ICIS_FEC_EPA_INSPECTIONS'. Create a lookup to make the dropdown options more user-friendly, with the actual spreadsheet names on the backend.
    description='Data sets:',
    disabled=False,
) 
display(zip_data_set_widget)

### Once you've chosen the program and data type you want to focus on...
....run the code below, which will access our database and retrieve the information

In [None]:
program = zip_data_set_widget.value

# Get id tags from ECHO_EXPORTER for the relevant program
code = {"ICIS": "AIR_IDS", "NPDES": "NPDES_IDS", "RCRA": "RCRA_IDS"} # Todo: figure out case enforcements...
echo_code = [v for k,v in code.items() if k in zip_data_set_widget.value][0]


ids = echo_data.loc[echo_data[echo_code].str.len() >0]    # just give all RCRA_IDs and let sql deal with it?
ids = ids.loc[:,echo_code]

id_string = ""
for pos,row in enumerate(ids):
    id_string = id_string + "'"+row+"',"
id_string=id_string[:-1] # removes trailing comma
#print(id_string)

# Translate those id tags into what the specific spreadsheet wants
id_tags = {
    "ICIS-AIR_VIOLATION_HISTORY": "pgm_sys_id", "ICIS-AIR_FCES_PCES": "PGM_SYS_ID", "ICIS-AIR_FORMAL_ACTIONS":"pgm_sys_id",
    "NPDES_INSPECTIONS": "NPDES_ID", "NPDES_QNCR_HISTORY":"NPDES_ID","NPDES_FORMAL_ENFORCEMENT_ACTIONS": "NPDES_ID",
    "RCRA_EVALUATIONS": "ID_NUMBER","RCRA_VIOLATIONS":"ID_NUMBER","RCRA_ENFORCEMENTS":"ID_NUMBER"
          }
index_field=id_tags[program]

sql = "select * from `"+program+"` where "+index_field+" in (" + \
            id_string + ")"

data = None

try:
    data = get_data( sql, index_field )
    print( "Data from " + program + " loaded for this zip code!")    
    
except pd.errors.EmptyDataError:
    print("There's no data on "+program+" available for this zip code")
    
data

### Let's make a chart out of this!

In [None]:
columns = {
    "RCRA_VIOLATIONS": {"c": "VIOL_DETERMINED_BY_AGENCY", "d": "DATE_VIOLATION_DETERMINED", "f": "%m/%d/%Y"},
    "RCRA_EVALUATIONS": {"c":"EVALUATION_AGENCY", "d":"EVALUATION_START_DATE", "f": "%m/%d/%Y"},
    "RCRA_ENFORCEMENTS": {"c":"ENFORCEMENT_TYPE", "d":"ENFORCEMENT_ACTION_DATE", "f": "%m/%d/%Y"},
    "ICIS-AIR_VIOLATION_HISTORY": {"c":"AGENCY_TYPE_DESC", "d":"HPV_DAYZERO_DATE", "f": "%m-%d-%Y"},
    "ICIS-AIR_FCES_PCES": {"c": "STATE_EPA_FLAG", "d":"ACTUAL_END_DATE", "f": "%m-%d-%Y"},
    "ICIS-AIR_FORMAL_ACTIONS":{"c": "STATE_EPA_FLAG", "d":"SETTLEMENT_ENTERED_DATE", "f": "%m/%d/%Y"},
    "NPDES_INSPECTIONS":{"c": "STATE_EPA_FLAG", "d":"ACTUAL_END_DATE", "f": "%m/%d/%Y"},
    "NPDES_FORMAL_ENFORCEMENT_ACTIONS":{"c": "AGENCY", "d":"SETTLEMENT_ENTERED_DATE", "f": "%m/%d/%Y"}
}

# Handle NPDES_QNCR_HISTORY because there are multiple counts we need to sum
if (program=="NPDES_QNCR_HISTORY"): 
    year=data["YEARQTR"].astype("str").str[0:4:1]
    data["YEARQTR"]=year
    d = data.groupby(pd.to_datetime(data['YEARQTR'], format="%Y").dt.to_period("Y")).sum()
    d.index = d.index.strftime('%Y')
    
    ax = d.plot(kind='bar', title = program, figsize=(20, 10), fontsize=16)
    ax

# All other columns
else: 
    this_columns = columns[program]["c"]
    this_columns_date = columns[program]["d"]
    this_columns_date_format = columns[program]["f"]
    
    try:
        d = data.groupby(pd.to_datetime(data[this_columns_date], format=this_columns_date_format))[[this_columns]].count()
        d = d.resample("Y").count()
        d.index = d.index.strftime('%Y')
        
        ax = d.plot(kind='bar', title = program, figsize=(20, 10), legend=False, fontsize=16)
        ax
        
    except AttributeError:
        print("There's no data to chart for "+program+" !")

### If you'd like to look at another dimension of ECHO data for this zip code...
....return to the dropdown menu above and choose a different program/data type, run the database access code, and then chart it!

---

## Choose a specific facility
### In the next few blocks of code, you can focus in on just one of these facilities in the zip code.
Run the below cell, then choose a facility from the dropdown that appears.

In [None]:
cd_array = echo_data["FAC_NAME"].dropna().unique()
cd_array.sort()

fac_widget=widgets.Dropdown(
    options=cd_array,
    description='Facility Name:',
    disabled=False,
)
display(fac_widget)

### The next cell filters the program-specific IDs to get just records for the selected facility.

In [None]:
# Keep track of which data sets are retrieved.
have_rcra_insp = False
have_rcra_viol = False
have_rcra_enf = False
have_air_insp = False
have_air_viol = False
have_air_enf = False
have_air_formal = False
have_air_comp = False
have_water_history = False
have_water_insp = False
have_water_enf = False
    
my_fac = echo_data[echo_data["FAC_NAME"] == fac_widget.value]
if ( my_fac['RCRA_FLAG'].iloc[0] == 'Y' ):
    rcra_id_string = my_fac['RCRA_IDS'].tolist()
    
    try:
        sql = "select * from `RCRA_EVALUATIONS` where ID_NUMBER in ( '" + \
            "', '".join( rcra_id_string[0].split() ) + "')"
        rcra_insp_data = get_data( sql, "ID_NUMBER" )
        print( "Data from RCRA_EVALUATIONS stored in rcra_insp_data.")
        have_rcra_insp = True
    except pd.errors.EmptyDataError:
        print( "This is a RCRA-regulated facility, but there's no data for it in the RCRA_EVALUATIONS table.")
    
    try:
        sql = "select * from `RCRA_VIOLATIONS` where ID_NUMBER in ( '" + \
            "', '".join( rcra_id_string[0].split() ) + "')"
        rcra_viol_data = get_data( sql, "ID_NUMBER" )
        print( "Data from RCRA_VIOLATIONS stored in rcra_viol_data.")
        have_rcra_viol = True
    except pd.errors.EmptyDataError:
        print( "This is a RCRA-regulated facility, but there's no data for it in the RCRA_VIOLATIONS table.")
    
    try:
        sql = "select * from `RCRA_ENFORCEMENTS` where ID_NUMBER in ( '" + \
            "', '".join( rcra_id_string[0].split() ) + "')"
        rcra_enf_data = get_data( sql, "ID_NUMBER" )
        print( "Data from RCRA_ENFORCEMENTS stored in rcra_enf_data.")
        have_rcra_enf = True
        # print( rcra_enf_data )
    except pd.errors.EmptyDataError:
        print( "This is a RCRA-regulated facility, but there's no data for it in the RCRA_ENFORCEMENTS table.")
else:
    print ("This facility does not appear to be regulated under RCRA!")
    
if ( my_fac['AIR_FLAG'].iloc[0] == 'Y' ):
    air_id_string = my_fac['AIR_IDS'].tolist()
    
    try:
        sql = "select * from `ICIS_FEC_EPA_INSPECTIONS` where REGISTRY_ID = '" + \
            str(int( my_fac.index[0] )) + "'"
        air_insp_data = get_data( sql, "ID_NUMBER" )
        print( "Data from ICIS_FEC_EPA_INSPECTIONS stored in air_insp_data.")
        have_air_insp = True
        # print( air_insp_data )
    except pd.errors.EmptyDataError:
        print( "This is a CAA-regulated facility, but there's no data for it in the ICIS_FEC_EPA_INSPECTIONS table.")
   
    try:
        sql = "select * from `ICIS-AIR_VIOLATION_HISTORY` where PGM_SYS_ID in ( '" + \
            "', '".join( air_id_string[0].split() ) + "')"
        air_viol_data = get_data( sql, "pgm_sys_id" )
        print( "Data from ICIS-AIR_VIOLATION_HISTORY stored in air_viol_data.")
        have_air_viol = True
        # print( air_viol_data )
    except pd.errors.EmptyDataError:
        print( "This is a CAA-regulated facility, but there's no data for it in the ICIS-AIR_VIOLATION_HISTORY table.")
    
    try:
        sql = "select * from `CASE_FACILITIES` CF, `CASE_ENFORCEMENTS` CE " + \
            " where CE.HQ_DIVISION = 'AIR' and CE.CASE_NUMBER = CF.CASE_NUMBER and " + \
            " CF.REGISTRY_ID = '" + str(int( my_fac.index[0] )) + "'"
        air_enf_data = get_data( sql, "REGISTRY_ID" )
        print( "Data from CASE_ENFORCEMENTS stored in air_enf_data.")
        have_air_enf = True
        # print( air_enf_data )
    except pd.errors.EmptyDataError:
        print( "This is a CAA-regulated facility, but there's no data for it in the CASE_ENFORCEMENTS table.")    

    try:
        sql = "select * from `ICIS-AIR_FORMAL_ACTIONS` where PGM_SYS_ID in ( '" + \
            "', '".join( air_id_string[0].split() ) + "')"
        air_formal_data = get_data( sql, "pgm_sys_id" )
        print( "Data from ICIS-AIR_FORMAL_ACTIONS stored in air_formal_data.")
        have_air_formal = True
        # print( air_formal_data )
    except pd.errors.EmptyDataError:
        print( "This is a CAA-regulated facility, but there's no data for it in the ICIS-AIR_FORMAL_ACTIONS table.")
    
    try:
        sql = "select * from `ICIS-AIR_FCES_PCES` where PGM_SYS_ID in ( '" + \
            "', '".join( air_id_string[0].split() ) + "')"
        air_comp_data = get_data( sql, "PGM_SYS_ID" )
        print( "Data from ICIS-AIR_FCES_PCES stored in air_comp_data.")
        have_air_comp = True
        # print( air_comp_data )
    except pd.errors.EmptyDataError:
        print( "This is a CAA-regulated facility, but there's no data for it in the ICIS-AIR_FCES_PCES table.")
else:
    print ("This facility does not appear to be regulated under the CAA!")
    
if ( my_fac['NPDES_FLAG'].iloc[0] == 'Y' ):
    water_id_string = my_fac['NPDES_IDS'].tolist()
    
    try:
        sql = "select * from `NPDES_QNCR_HISTORY` where NPDES_ID in ( '" + \
            "', '".join( water_id_string[0].split() ) + "')"
        water_history_data = get_data( sql, "NPDES_ID" )
        print( "Data from NPDES_QNCR_HISTORY stored in water_history_data.")
        have_water_history = True
        # print( water_insp_data )
    except pd.errors.EmptyDataError:
        print( "This is a CWA-regulated facility, but there's no data for it in the NPDES_QNCR_HISTORY table.")
    
    try:
        sql = "select * from `NPDES_INSPECTIONS` where NPDES_ID in ( '" + \
            "', '".join( water_id_string[0].split() ) + "')"
        water_insp_data = get_data( sql, "NPDES_ID" )
        print( "Data from NPDES_INSPECTIONS stored in water_insp_data.")
        have_water_insp = True
        # print( water_insp_data )
    except pd.errors.EmptyDataError:
        print( "This is a CWA-regulated facility, but there's no data for it in the NPDES_INSPECTIONS table.")

    try:
        sql = "select * from `NPDES_FORMAL_ENFORCEMENT_ACTIONS` where NPDES_ID in ( '" + \
            "', '".join( water_id_string[0].split() ) + "')"
        water_enf_data = get_data( sql, "NPDES_ID" )
        print( "Data from NPDES_FORMAL_ENFORCEMENT_ACTIONS stored in water_enf_data.")
        have_water_enf = True
        # print( water_enf_data )
    except pd.errors.EmptyDataError:
        print( "This is a CWA-regulated facility, but there's no data for it in the NPDES_FORMAL_ENFORCEMENT_ACTIONS table.")
else:
    print ("This facility does not appear to be regulated under the CWA!")
        

### Show the information from one of the data sets for this facility
Running this cell will create a dropdown list of the program data sets available specifically for this faciliity. Pick one and then run the second cell to view its contents.

In [None]:
data_set_dict = { "ECHO_EXPORTER" : my_fac }
if ( have_rcra_insp ):
    data_set_dict["RCRA Inspections"] = rcra_insp_data
if ( have_rcra_viol ):
    data_set_dict["RCRA Violations"] = rcra_viol_data
if ( have_rcra_enf ):
    data_set_dict["RCRA Enforcements"] = rcra_enf_data
if ( have_air_insp ):
    data_set_dict["Air Inspections"] = air_insp_data
if ( have_air_viol ):
    data_set_dict["Air Violations"] = air_viol_data
if ( have_air_enf ):
    data_set_dict["Air Enforcements"] = air_enf_data
if ( have_air_formal ):
    data_set_dict["Air Formal Actions"] = air_formal_data
if ( have_air_comp ):
    data_set_dict["Air Compliance"] = air_comp_data
if ( have_water_history ):
    data_set_dict["Water QNCR History"] = water_history_data
if ( have_water_insp ):
    data_set_dict["Water Inspections"] = water_insp_data
if ( have_water_enf ):
    data_set_dict["Water Enforcements"] = water_enf_data

data_set_widget=widgets.Dropdown(
    options=list(data_set_dict.keys()),
    description='Data sets:',
    disabled=False,
)
display(data_set_widget)

In [None]:
print( "Showing %s data for the chosen facility" %( data_set_widget.value ))
data_set_dict[data_set_widget.value]

### Let's chart it!

In [None]:
program = data_set_widget.value
this_data = data_set_dict[program]

columns = {
    "ECHO_EXPORTER": {"c": "FAC_INSPECTION_COUNT", "d": "FAC_DATE_LAST_INSPECTION", "f":"%m/%d/%Y"},
    "RCRA Violations": {"c": "VIOL_DETERMINED_BY_AGENCY", "d": "DATE_VIOLATION_DETERMINED", "f": "%m/%d/%Y"},
    "RCRA Inspections": {"c":"EVALUATION_AGENCY", "d":"EVALUATION_START_DATE", "f": "%m/%d/%Y"},
    "RCRA Enforcements": {"c":"ENFORCEMENT_TYPE", "d":"ENFORCEMENT_ACTION_DATE", "f": "%m/%d/%Y"},
    "Air Inspections": {"c":"AGENCY", "d":"ACTUAL_END_DATE", "f": "%m-%d-%Y"},
    "Air Violations": {"c":"AGENCY_TYPE_DESC", "d":"HPV_DAYZERO_DATE", "f": "%m-%d-%Y"},
    "Air Enforcements": {"c":"LEAD", "d":"FISCAL_YEAR", "f": "%Y"},
    "Air Compliance": {"c": "STATE_EPA_FLAG", "d":"ACTUAL_END_DATE", "f": "%m-%d-%Y"},
    "Air Formal Actions":{"c": "STATE_EPA_FLAG", "d":"SETTLEMENT_ENTERED_DATE", "f": "%m/%d/%Y"},
    "Water Inspections":{"c": "STATE_EPA_FLAG", "d":"ACTUAL_END_DATE", "f": "%m/%d/%Y"},
    "Water Enforcements":{"c": "AGENCY", "d":"SETTLEMENT_ENTERED_DATE", "f": "%m/%d/%Y"}
}

# Handle NPDES_QNCR_HISTORY because there are multiple counts we need to sum
if (program=="Water QNCR History"): 
    year=this_data["YEARQTR"].astype("str").str[0:4:1]
    this_data["YEARQTR"]=year
    this_data = this_data.groupby(pd.to_datetime(this_data['YEARQTR'], format="%Y").dt.to_period("Y")).sum()
    this_data.index = this_data.index.strftime('%Y')
    
    ax = this_data.plot(kind='bar', title = program, figsize=(20, 10), fontsize=16)
    ax

# All other columns
else: 
    this_columns = columns[program]["c"]
    this_columns_date = columns[program]["d"]
    this_columns_date_format = columns[program]["f"]
    
    try:
        this_data = this_data.groupby(pd.to_datetime(this_data[this_columns_date], format=this_columns_date_format))[[this_columns]].count()
        this_data = this_data.resample("Y").count()
        this_data.index = this_data.index.strftime('%Y')
        
        ax = this_data.plot(kind='bar', title = program, figsize=(20, 10), legend=False, fontsize=16)
        ax
        
    except AttributeError:
        print("There's no data to chart for "+program+" !")

### If you'd like to look at another dimension of ECHO data for this facility...
....return to the dropdown menu above and choose a different program/data type, then chart it!

## This section saves the facility data to your computer.
The first of the next two cells ask you to select one of the data sets to export from the dropdown list.  The next cell actually exports that data to your computer.  You can return to the dropdown to export additional data files.

_Note: When you click on [] in the second cell, it may continue to show \*. That's to be expected! Check your Downloads folder and confirm that the spreadsheet was succesfully exported. Hit the square button (Interrupt Kernel) at the top of the page. You can now choose to export other data sets from the dropdown..._

In [None]:
data_download_widget=widgets.Dropdown(
    options=list(data_set_dict.keys()),
    description='Data Sets:',
    disabled=False,
)
display(data_download_widget)

In [None]:
filename = data_download_widget.value
fullpath = filename+'.csv'
data_set_dict[data_download_widget.value].to_csv( fullpath ) 

print( "Wrote "+filename+" to the Google Colab 'Files' menu as %s" %(fullpath))

### Accessing your files
Click on the 'Files' tab in the menu on the left-hand side of the notebook (it looks like a folder). You may have to hit 'Refresh' if you don't see your file. Then, you can click on the ... next to your file and choose "Download". The CSV spreadsheet will download to wherever your browser usually saves files (e.g. Downloads folder)