# Fossil Fuel Jobs Visualizations

### Author: Rachel Lu

### Description:


# Import Libraries

In [None]:
%matplotlib notebook
# the usuals
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import contextily as ctx
import plotly.express as px


# Import Data

Import California Power Plant dataset. 

In [None]:
cpp = gpd.read_file('California_Power_Plants_MP Cleaned 3.1.21.csv')

# Data Exploration

Let's explore our data. 

In [None]:
type(cpp)

In [None]:
cpp.shape

In [None]:
cpp.head(5)

In [None]:
cpp.info()

# Clean Data

Let's rename some of our columns so they're easier to work with and more intuitive. 

In [None]:
list(cpp)

In [None]:
cpp.columns = ['Plant_ID',
 'Name',
 'MW',
 'Gross_MWh',
 'Net_MWh',
 'Fuel_Type',
 'Status',
 'Online_Year',
 'REAT_ID',
 'County',
 'State',
 'Renewable_Energy',
 'Jobs',
 'Senate_District',
 'Assembly_District',
 'Congressional_District',
 'CES30_PercentileRange',
 'CES30_Percentile',
 'Lon',
 'Lat',
 'Operation_Job',
 'Capacity_Factor',
 'Income_Percent',
 'Project_Location',
 'geometry']

In [None]:
cpp.head(5)

Let's drop some unnecessary columns from our dataset. 

In [None]:
# define variable with desired columns 
desired_columns = [
 'Name',
 'Fuel_Type',
 'Status',
 'County',
 'State',
 'Renewable_Energy',
 'Jobs',
 'CES30_PercentileRange',
 'CES30_Percentile',
 'Lon',
 'Lat',
 'Income_Percent',
 'Project_Location',
 'geometry']

# redefine our dataframe with just our desired columns.

cpp_trim = cpp[desired_columns].copy()

# check out the new dataframe! 
cpp_trim[cpp_trim.CES30_Percentile == '']

I created a new dataframe called cpp_trim with only desired columns. I check the work and yes, we have a new dataframe with only the columns we want! 

Next up, we remember that our data types are almost all object types. Let's convert some fields to floats so we can work with them in our visualizations and data analysis. 

In [None]:
# We want to convert coordinates  and jobs to floats. 

cpp_trim.Lon = cpp_trim.Lon.astype('float')
cpp_trim.Lat = cpp_trim.Lat.astype('float')
cpp_trim.Jobs = cpp_trim.Jobs.astype('float')

In [None]:
# Rename values in Renewable_Energy, such that 0 = Fossil Fuel and 1 = Clean Energy. To do so, we create a dictionary. 

cpp_trim = cpp_trim.replace({'Renewable_Energy': {'0': 'Fossil Fuel', '1': 'Clean Energy'}})

#check
cpp_trim


Now, let's turn our lat long data points into geometry point data. 

In [None]:
cpp_trim = gpd.GeoDataFrame(cpp_trim, 
                         crs='EPSG:4326',
                         geometry=gpd.points_from_xy(cpp_trim.Lon, cpp_trim.Lat))

cpp_trim.head(5)

In [None]:
#create a list for unique project locations, so we can run it through our interactive function later on. 

unique_project_locations = cpp_trim['Project_Location'].unique()

In [None]:
# check crs type

cpp_trim.crs

# CalEnviroScreen

Our research question centers on whether fossil fuel and clean energy jobs are located in disadvantaged communities. Now it's time to overlay the fossil fuel data with CalEnviroScreen data to answer this question.

In [None]:
gdf_ces = gpd.read_file('../CES3June2018Update.shp')

# Explore Data

I begin by exploring the data. Let's make sure that it's a geodataframe, check out the CRS type, and see what the data looks like. 

In [None]:
type(gdf_ces)

In [None]:
gdf_ces.crs

In [None]:
gdf_ces.head()

During my data exploration, I confirm that my dataframe is a geodataframe, CRS is EPSG:3310 and check out the first 5 rows of my data frame. I see a few things that could be cleaned up. 



# Data Cleaning


In [None]:
#define variable with desired columns 
columns_to_keep = ['tract', 'pop2010', 'California', 'ZIP', 'City', 'Longitude', 'Latitude', 'CIscore', 'CIscoreP', 'edu', 'eduP', 'pov', 'povP', 'unemp', 'unempP', 'Pop_11_64_', 'Elderly_ov', 'Hispanic_p', 'White_pct', 'African_Am', 'Native_Ame', 'Asian_Amer', 'Other_pct', 'geometry']

#redfine dataframe with desired columns 
gdf_ces = gdf_ces[columns_to_keep]

# check to make sure 

gdf_ces.head()


## Sorting and mapping CES scores

To continue exploring the data, I'm going to sort it. I want to see which counties have the highest CES score (that is, are most burdened by and vulnerable to environmental pollution). So I'll create a new dataframe.

In [None]:
# to sort the data by CES score
gdf_sortbyces = gdf_ces.sort_values(by='CIscore', ascending = False)

# check my work 
gdf_sortbyces.head()

In [None]:
# create a new dataframe with just the columns I want. 

gdf_sortbyces[['California','City','CIscore','CIscoreP', 'geometry']]

Now, let's map it out!



In [None]:
# to map the sorted dataframe using geopandas chloropleth maps
gdf_sortbyces.plot(figsize=(10,10),
                   column='CIscore',
                   legend=True,
                   scheme='NaturalBreaks')

Success! Here's a map that shows the state of California by census tracts sorted by CES score. I used the natural breaks scheme, which skews the data a bit by grouping tracts with high CES scores together.

The natural breaks map shows that there seems to be a natural break around CES scores of 52.

# Overlay Fossil Fuel Jobs and CalEnviroScreen

When we explored our CES data above, we discovered that it's CRS is in EPSG:3310, which is different from our CRS for the CPP dataset. Let's make sure they're in the same CRS. 

In [None]:
#reproject gdf_sortbyces to web mercator. 

gdf_sortbyces = gdf_sortbyces.to_crs(epsg=4326)

#reproject cpp_trim to web mercator
cpp_trim = cpp_trim.to_crs(epsg=4326)

# check work

print(gdf_sortbyces.crs)

In [None]:
print(cpp_trim.crs)

In [None]:
# set up the plot canvas with plt.subplots
fig, ax = plt.subplots(figsize=(10, 10))

# block groups
gdf_sortbyces.plot(ax=ax, # this puts it in the ax plot
        color='gray', 
        edgecolor='white',
        alpha=0.5)

# cpp_trim
cpp_trim.plot(ax=ax, # this also puts it in the same ax plot
            color='red',
            markersize=3,
            alpha=0.2)

In [None]:
cpp_trim['Renewable_Energy'].unique()

In [None]:
# side by side for clean energy vs. fossil fuel, drop down of disadvantaged vs. lowincome communities.

# use plotly to add the hover functionality

from ipywidgets import interact, interact_manual
from shapely.geometry import LineString, MultiLineString
import numpy as np
import pyproj

#gdf_sortbyces_plotly = gdf_sortbyces.to_crs(epsg=3857)
gdf_sortbyces_plotly = gdf_sortbyces.to_crs(pyproj.CRS.from_epsg(4326))


# using empet code to convert .shp to geoJSON
def shapefile_to_geojson(gdf, index_list, tolerance=0.025):
   # gdf - geopandas dataframe containing the geometry column and values to be mapped to a colorscale
   # index_list - a sublist of list(gdf.index)  or gdf.index  for all data
   # tolerance - float parameter to set the Polygon/MultiPolygon degree of simplification
   # returns a geojson type dict

   #geo_names = list(gdf[f'lad19nm']) # name of authorities
   geojson = {'type': 'FeatureCollection', 'features': []}
   for index in index_list:
       geo = gdf['geometry'][index].simplify(tolerance)

       if isinstance(geo.boundary, LineString):
           gtype = 'Polygon'
           bcoords = np.dstack(geo.boundary.coords.xy).tolist()

       elif isinstance(geo.boundary, MultiLineString):
           gtype = 'MultiPolygon'
           bcoords = []
           for b in geo.boundary:
               x, y = b.coords.xy
               coords = np.dstack((x,y)).tolist()
               bcoords.append(coords)
       else: pass



       feature = {'type': 'Feature',
                  'id' : index,
                  'properties': {'name': 'test'},
                  'geometry': {'type': gtype,
                               'coordinates': bcoords},
                   }

       geojson['features'].append(feature)
   return geojson

geojson = shapefile_to_geojson(gdf_sortbyces_plotly, list(gdf_sortbyces_plotly.index), 0.0001)


In [None]:
fig = px.choropleth_mapbox(gdf_sortbyces_plotly,
                          geojson=geojson,
                          locations=gdf_sortbyces_plotly.index,
                          color_continuous_scale="plasma",
                          mapbox_style="carto-positron",
                            zoom=3, center = {"lat": 37.0902, "lon": -95.7129},
                          opacity=0.5,
                          color="CIscoreP")



fig2 = px.scatter_mapbox(cpp_trim, 
                        lat="Lat", 
                        lon="Lon",
                        color='Renewable_Energy',
                        color_discrete_sequence=px.colors.qualitative.Alphabet,
                        color_discrete_map={"Fossil Fuel": 'red', "Clean Energy": 'blue'},
                        hover_name='Name',
                        hover_data=['Fuel_Type','County','Jobs','CES30_Percentile','Project_Location'],
                        size='Jobs',
                        zoom = 6
                       )
fig.update_layout(
    title="Power Plant Jobs and CES Percentile by Census Tract",
    legend_title="CES Percentile",
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="RebeccaPurple"))
        
#fig2.update_layout(legend=dict(yanchor="top", xanchor="left"))
fig.add_trace(fig2.data[1])
fig.add_trace(fig2.data[0])
fig.show()

In [None]:
# side by side for clean energy vs. fossil fuel, drop down of disadvantaged vs. lowincome communities. 

# use plotly to add the hover functionality 

from ipywidgets import interact, interact_manual

@interact
def test(Renewable=['Clean Energy', 'Fossil Fuel'], Location = unique_project_locations):

   # set up the plot canvas with plt.subplots
    fig, ax = plt.subplots(figsize=(10, 10)) 
    
    gdf_sortbyces.plot(ax=ax, # this puts it in the ax plot
            column = 'CIscore', 
            edgecolor='grey',
            linewidth = 0.4,
            legend=True,
            scheme='naturalbreaks',
            alpha=0.7)
     

    display_data = cpp_trim[cpp_trim.Renewable_Energy==Renewable]
    display_data = display_data[display_data.Project_Location==Location]
    display_data.plot(ax=ax, # this also puts it in the same ax plot
            cmap = 'hot',
            column='Renewable_Energy',
            legend=True,
            markersize=0.5,
            alpha=0.5)

    plt.show()




In [None]:
# seaborn needs an x and y column so let's extract it from the geometry field
cpp_trim['x'] = cpp_trim.geometry.x
cpp_trim['y'] = cpp_trim.geometry.y

In [None]:
# Set up figure and axis
f, ax = plt.subplots(figsize=(10,7))

# Generate and add hexbin with 50 hexagons in each 
# dimension, half transparency, ommitting grids with no crime,
# and the reverse viridis colormap
hb = ax.hexbin(
    x = cpp_trim['x'], 
    y = cpp_trim['y'],
    gridsize=50, 
    linewidths=1,
    alpha=0.5, 
    mincnt=1, # don't show zero
    cmap='viridis_r')

# title
#new_title = 'Powerplant Counts per Census Tract'
#ax.legend([new_title])

# Add basemap
ctx.add_basemap(
    ax, 
    crs='epsg:4326',
    source=ctx.providers.CartoDB.Positron
)

# Add colorbar
plt.colorbar(hb)

# Remove axes
ax.axis('off')

In [None]:
gdf_ff = cpp_trim[cpp_trim.Renewable_Energy=='Fossil Fuel']
gdf_ff

In [None]:
gdf_ce = cpp_trim[cpp_trim.Renewable_Energy=='Clean Energy']
gdf_ce

In [None]:
# Set up figure and axis
f, ax = plt.subplots(figsize=(10,7))

# Generate and add hexbin with 50 hexagons in each 
# dimension, half transparency, ommitting grids with no crime,
# and the reverse viridis colormap
hb = ax.hexbin(
    x = gdf_ce['x'], 
    y = gdf_ce['y'],
    gridsize=50, 
    linewidths=1,
    alpha=0.5, 
    mincnt=1, # don't show zero
    cmap='viridis_r')

# title
#new_title = 'Powerplant Counts per Census Tract'
#ax.legend([new_title])

# Add basemap
ctx.add_basemap(
    ax, 
    crs='epsg:4326',
    source=ctx.providers.CartoDB.Positron
)

# Add colorbar
plt.colorbar(hb)

# Remove axes
ax.axis('off')


In [None]:
# Set up figure and axis
f, ax = plt.subplots(figsize=(10,7))

# Generate and add hexbin with 50 hexagons in each 
# dimension, half transparency, ommitting grids with no crime,
# and the reverse viridis colormap
hb = ax.hexbin(
    x = gdf_ff['x'], 
    y = gdf_ff['y'],
    gridsize=50, 
    linewidths=1,
    alpha=0.5, 
    mincnt=1, # don't show zero
    cmap='viridis_r')

# title
#new_title = 'Powerplant Counts per Census Tract'
#ax.legend([new_title])

# Add basemap
ctx.add_basemap(
    ax, 
    crs='epsg:4326',
    source=ctx.providers.CartoDB.Positron
)

# Add colorbar
plt.colorbar(hb)

# Remove axes
ax.axis('off')