# Interactive Plotly Map: Environmental Burden (CES) and Power Plants 

### Author: Rachel Lu

### Description
This notebook is specifically to create an interactive plotly map that:
1) has hover functionality that identifies each plant location's CES percentile, fuel type, jobs, and project location; 
2) uses a cloropleth base map based on CES percentile; 
3) has different marker colors based on whether the plant is Clean Energy or Fossil Fuel.

Because it takes so much data, it had to be separated out into a different notebook.

# Import Libraries

In [None]:
%matplotlib notebook
# the usuals
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import contextily as ctx
import plotly.express as px


# Import Data

Import California Power Plant dataset. 

In [None]:
cpp = gpd.read_file('California_Power_Plants_MP Cleaned 3.1.21.csv')

# Data Exploration

Let's explore our data. 

In [None]:
type(cpp)

In [None]:
cpp.shape

Excellent, we have 871 power plants in our data set! 

Next, let's take a look at the data in our data set. 

In [None]:
cpp.head(5)

Looks like a few columns don't have much information in them. We'll delete them later. We also notice that there is no geometry data, so we'll have to change that when we clean the data below. 

In [None]:
cpp.info()

We notice that all the data types are objects. We'll have to convert some data types to floats or ints in order to work with them for our visualizations. 

# Clean Data

Let's rename some of our columns so they're easier to work with and more intuitive. 

First, we print a list of all the columns in the CPP dataset. 

In [None]:
list(cpp)

Next, we rename the columns we'd like renamed

In [None]:
#rename
cpp.columns = ['Plant_ID',
 'Name',
 'MW',
 'Gross_MWh',
 'Net_MWh',
 'Fuel_Type',
 'Status',
 'Online_Year',
 'REAT_ID',
 'County',
 'State',
 'Energy_Type',
 'Jobs',
 'Senate_District',
 'Assembly_District',
 'Congressional_District',
 'CES30_PercentileRange',
 'CES30_Percentile',
 'Lon',
 'Lat',
 'Operation_Job',
 'Capacity_Factor',
 'Income_Percent',
 'Project_Location',
 'geometry']

#print to double check it worked. 
cpp.head(5)

We rename the columns and print out the first 5 rows in the dataframe to double check that it worked. It worked. 

Let's drop some unnecessary columns from our dataset. 

In [None]:
# define variable with desired columns 
desired_columns = [
 'Name',
 'Fuel_Type',
 'Status',
 'County',  
 'State',
 'Energy_Type',
 'Jobs',
 'CES30_PercentileRange',
 'CES30_Percentile',
 'Lon',
 'Lat',
 'Income_Percent',
 'Project_Location',
 'geometry']

# redefine our dataframe with just our desired columns.

cpp_trim = cpp[desired_columns].copy()

# check out the new dataframe! 
cpp_trim.head(5)

I created a new dataframe called cpp_trim with only desired columns. I check the work and yes, we have a new dataframe with only the columns we want! 

Next up, we remember that our data types are almost all object types. Let's convert some fields to floats so we can work with them in our visualizations and data analysis. 

In [None]:
# We want to convert coordinates  and jobs to floats. 

cpp_trim.Lon = cpp_trim.Lon.astype('float')
cpp_trim.Lat = cpp_trim.Lat.astype('float')
cpp_trim.Jobs = cpp_trim.Jobs.astype('float')

I also noticed that all the values under Renewable Energy column were 0s and 1s. We want them to be more intuitive, so we rename them. 

In [None]:
# Rename values in Energy_Type, such that 0 = Fossil Fuel and 1 = Clean Energy. To do so, we create a dictionary. 

cpp_trim = cpp_trim.replace({'Energy_Type': {'0': 'Fossil Fuel', '1': 'Clean Energy'}})

#check
cpp_trim.head(3)


It worked! 

Now, let's turn our lat long data points into geometry point data. 

In [None]:
cpp_trim = gpd.GeoDataFrame(cpp_trim, 
                         crs='EPSG:4326',
                         geometry=gpd.points_from_xy(cpp_trim.Lon, cpp_trim.Lat))

cpp_trim.head(5)

It worked, our geometry data column is now filled. 

In [None]:
# check crs type

cpp_trim.crs

# CalEnviroScreen

Our research question centers on whether fossil fuel and clean energy jobs are located in disadvantaged communities. Now it's time to overlay the fossil fuel data with CalEnviroScreen data to answer this question.

In [None]:
gdf_ces = gpd.read_file('../CES3June2018Update.shp')

# Explore Data

I begin by exploring the data. Let's make sure that it's a geodataframe, check out the CRS type, and see what the data looks like. 

In [None]:
type(gdf_ces)

In [None]:
gdf_ces.crs

In [None]:
gdf_ces.head()

During my data exploration, I confirm that my dataframe is a geodataframe, CRS is EPSG:3310 and check out the first 5 rows of my data frame. I see a few things that could be cleaned up. 



# Data Cleaning


In [None]:
#define variable with desired columns 
columns_to_keep = ['tract', 'pop2010', 'California', 'ZIP', 'City', 'Longitude', 'Latitude', 'CIscore', 'CIscoreP', 'edu', 'eduP', 'pov', 'povP', 'unemp', 'unempP', 'Pop_11_64_', 'Elderly_ov', 'Hispanic_p', 'White_pct', 'African_Am', 'Native_Ame', 'Asian_Amer', 'Other_pct', 'geometry']

#redfine dataframe with desired columns 
gdf_ces = gdf_ces[columns_to_keep]

# check to make sure 

gdf_ces.head()


## Sorting and mapping CES scores

To continue exploring the data, I'm going to sort it. I want to see which counties have the highest CES score (that is, are most burdened by and vulnerable to environmental pollution). So I'll create a new dataframe.

In [None]:
# to sort the data by CES score
gdf_sortbyces = gdf_ces.sort_values(by='CIscore', ascending = False)

# check my work 
gdf_sortbyces.head()

In [None]:
# create a new dataframe with just the columns I want. 

gdf_sortbyces[['California','City','CIscore','CIscoreP', 'geometry']]

# Overlay Powerplants and CalEnviroScreen

When we explored our CES data above, we discovered that it's CRS is in EPSG:3310, which is different from our CRS for the CPP dataset. Let's make sure they're in the same CRS. 

In [None]:
#reproject gdf_sortbyces to web mercator. 

gdf_sortbyces = gdf_sortbyces.to_crs(epsg=4326)

#reproject cpp_trim to web mercator
cpp_trim = cpp_trim.to_crs(epsg=4326)

# check work

print(gdf_sortbyces.crs)

In [None]:
print(cpp_trim.crs)

Next, let's create an interactive plotly map that allows us to hover. In class we learned how to do interactive plotly scatter maps, but here we add the additional layer of a plotly cloropleth map below our scatter map. 

In order to work with a plotly cloropleth map, we must convert our data to geojson, which we do below. 

In [None]:
# convert shape file to geojson

from ipywidgets import interact, interact_manual
from shapely.geometry import LineString, MultiLineString
import numpy as np
import pyproj

#gdf_sortbyces_plotly = gdf_sortbyces.to_crs(epsg=3857)
gdf_sortbyces_plotly = gdf_sortbyces.to_crs(pyproj.CRS.from_epsg(4326))


# using empet code to convert .shp to geoJSON
def shapefile_to_geojson(gdf, index_list, tolerance=0.025):
   # gdf - geopandas dataframe containing the geometry column and values to be mapped to a colorscale
   # index_list - a sublist of list(gdf.index)  or gdf.index  for all data
   # tolerance - float parameter to set the Polygon/MultiPolygon degree of simplification
   # returns a geojson type dict

   #geo_names = list(gdf[f'lad19nm']) # name of authorities
   geojson = {'type': 'FeatureCollection', 'features': []}
   for index in index_list:
       geo = gdf['geometry'][index].simplify(tolerance)

       if isinstance(geo.boundary, LineString):
           gtype = 'Polygon'
           bcoords = np.dstack(geo.boundary.coords.xy).tolist()

       elif isinstance(geo.boundary, MultiLineString):
           gtype = 'MultiPolygon'
           bcoords = []
           for b in geo.boundary:
               x, y = b.coords.xy
               coords = np.dstack((x,y)).tolist()
               bcoords.append(coords)
       else: pass



       feature = {'type': 'Feature',
                  'id' : index,
                  'properties': {'name': 'test'},
                  'geometry': {'type': gtype,
                               'coordinates': bcoords},
                   }

       geojson['features'].append(feature)
   return geojson

geojson = shapefile_to_geojson(gdf_sortbyces_plotly, list(gdf_sortbyces_plotly.index), 0.0001)


Great, we successfully convert our gdf_sortbyces data to geojson and defined a new dataframe geojson. 

Next up, I want to create a map that 1) has hover functionality that identifies each plant location's CES percentile, fuel type, jobs, and project location; 2) uses a cloropleth base map based on CES percentile; 3) has different marker colors based on whether the plant is Clean Energy or Fossil Fuel. 

I start by identifying the unique values in Energy_Type, since I will need to create a discrete color map to assign colors to each unique value in Energy_Type. 

In [None]:
cpp_trim['Energy_Type'].unique()

We see that Energy_Type has two object type values: Clean Energy and Fossil Fuel. 

Next, let's create the map using plotly! 

In [None]:
#plot cloropleth map as the base map, pulling in the geojson data we defined above 
fig = px.choropleth_mapbox(gdf_sortbyces_plotly,
                          geojson=geojson,
                          locations=gdf_sortbyces_plotly.index,
                          color_continuous_scale="plasma",
                          mapbox_style="carto-positron",
                            zoom=4, center = {"lat": 37, "lon": -120},
                          opacity=0.4,
                          color="CIscoreP")


# map the california power plant points
# color code points based on fossil fuel vs clean energy classification
# and add hover functionality
fig2 = px.scatter_mapbox(cpp_trim, 
                        lat="Lat", 
                        lon="Lon",
                        color='Energy_Type',
                        color_discrete_sequence=px.colors.qualitative.Alphabet,
                        color_discrete_map={"Fossil Fuel": 'red', "Clean Energy": 'blue'},
                        hover_name='Name',
                        hover_data=['Fuel_Type','Jobs','CES30_Percentile','Project_Location'],
                        zoom = 4
                       )
fig.update_layout(
    # title the map
    title="Power Plant and CES Percentile by Census Tract",
    # format the Renewable Energy legend to be horizontal and in the top right corner. 
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.02,
        xanchor="right",
        x=1),
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="RebeccaPurple"))
        

fig.add_trace(fig2.data[1])
fig.add_trace(fig2.data[0])
fig.show()

In [None]:
fig.write_html("V2PowerPlant_Location_by_CIscoreP.html")

Phew! This map was a beast to create, but after a lot of troubleshooting with stackoverflow, it works as expected and looks great. Next, let's create the same map but with dynamic marker sizes to indicate number of jobs. 

In [None]:
#plot cloropleth map as the base map, pulling in the geojson data we defined above 
fig = px.choropleth_mapbox(gdf_sortbyces_plotly,
                          geojson=geojson,
                          locations=gdf_sortbyces_plotly.index,
                          color_continuous_scale="plasma",
                          mapbox_style="carto-positron",
                            zoom=4, center = {"lat": 37, "lon": -120},
                          opacity=0.4,
                          color="CIscoreP")


# map the california power plant points
# color code points based on fossil fuel vs clean energy classification
# and add hover functionality
fig2 = px.scatter_mapbox(cpp_trim, 
                        lat="Lat", 
                        lon="Lon",
                        color='Energy_Type',
                        size = 'Jobs',
                        color_discrete_sequence=px.colors.qualitative.Alphabet,
                        color_discrete_map={"Fossil Fuel": 'red', "Clean Energy": 'blue'},
                        hover_name='Name',
                        hover_data=['Fuel_Type','Jobs','CES30_Percentile','Project_Location'],
                        zoom = 4
                       )
fig.update_layout(
    # title the map
    title="Power Plant and CES Percentile by Census Tract\n # Jobs indicated by Marker Size",
    # format the Renewable Energy legend to be horizontal and in the top right corner. 
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.02,
        xanchor="right",
        x=1),
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="RebeccaPurple"))
        

fig.add_trace(fig2.data[1])
fig.add_trace(fig2.data[0])
fig.show()

In [None]:
fig.write_html("Jobs_PowerPlant_Location_by_CIscoreP.html")

We created an interactive plotly map for powerplant locations, jobs, and CIscoreP. For this notebook, we'll stop here since using plotly has taken up so much data already. 