## Analyzing EIA Electricity Data

The U.S. Energy Information Agency has Annual Electric Power Industry data spanning from 1990 to 2016. The data can be found here:

https://www.eia.gov/electricity/data/eia861/

The Data Analyst should use at least the 2016 dataset to answer the questions. The creativity of how you explore the data, present it, and analyze it will be key. Providing aditional analysis based on your background will be a huge plus.

Please provide all the python code used to answer the questions.

<hr/>
## Dynamic Electricity Pricing

Some questions may require no data analysis (only research), while others may require both.

1) Which U.S. states have deregulated retail electricity markets?

2) Which U.S. states have the most power utilities utilizing dynamic pricing?

3) Which states have the most enrolled customers in dynamic pricing?

In [21]:
import pandas as pd

dynamic = pd.read_csv(r'C:\Users\Matt\Downloads\Dynamic_Pricing2016.csv', skiprows=range(0,2),  thousands=',')

# 1) Which U.S. states have deregulated retail electricity markets?

# There does not seem to be an answer in the csv files, so research is needed
# There are different answers because it depends on your definition of "deregulated" because no state is completely deregulated
# These two websites look like a reliable source of information, let's compare the two

#extract embedded table for analysis
deregulated = pd.read_html("https://www.sparkenergy.com/en/energy-regulation-by-state/", header=0)[0]
deregulated2 = pd.read_html("https://infocastinc.com/insights/solar/regulated-deregulated-energy-markets/", header=0)[0]
deregulated2.loc[deregulated2['Electric Market'].str.contains('Deregulated')]

# Find states that appear on both lists (these are the ones that should definitely be considered deregulated)
deregulated_states = pd.merge(deregulated, deregulated2, on=['State'], how='inner')[['State']]
deregulated_states.columns = ['States with a Deregulated Electricity Market']
deregulated_states

# A: Deregulated states are CA, CT, ME, MD, NH, IL, MA, MI, NJ, NY, OH, PA, RI, DE, OR, TX


# 2) Which U.S. states have the most power utilities utilizing dynamic pricing?

dynamic_states = dynamic.State.value_counts().to_frame().iloc[range(0,10)]
dynamic_states.columns = ['States with the most utilities utilizing dynamic pricing']

# A frame of the top 10 states with the most power utilities using dynamic pricing
dynamic_states

# A: Top states are TN, WI, CA, MS, MN, IN, AL, NC, TX, CO


# 3) Which states have the most enrolled customers in dynamic pricing?

dynamic_customers = dynamic.groupby(['State'])[['Total']].sum()
dynamic_customers = dynamic_customers.sort_values(by=['Total'], ascending=False).iloc[range(0,10)]
dynamic_customers.columns = ['States with the most total customers utilizing dynamic pricing']

# A frame of the top 10 states with the most enrolled customers using dynamic pricing
dynamic_customers

# A: Top states are CA, MD, AZ, OH, OK, TX, DE, LA, IL, NY

# Creating a list to display all frames
final = [deregulated_states, dynamic_states, dynamic_customers]
final

[   States with a Deregulated Electricity Market
 0                                    California
 1                                   Connecticut
 2                                         Maine
 3                                      Maryland
 4                                 New Hampshire
 5                                      Illinois
 6                                 Massachusetts
 7                                      Michigan
 8                                    New Jersey
 9                                      New York
 10                                         Ohio
 11                                 Pennsylvania
 12                                 Rhode Island
 13                                     Delaware
 14                                       Oregon
 15                                        Texas,
     States with the most utilities utilizing dynamic pricing
 WI                                                 66       
 TN                                       

<hr/>
## Metering

Some questions may require no data analysis (only research), while others may require both.

1) What policies are leading to more Net Metering customers in Southern California Edison in CA compared to Consolidated Edison in NY?

2) Which utility has the most Net Metering Capacity (MW) by technology?

In [19]:
# 1) What policies are leading to more Net Metering customers in Southern California Edison in CA 
#    compared to Consolidated Edison in NY?

# One policy that is likely having an effect on net metering customers in California is the Net Metering 2.0 initiative.
# This initiative was implemented in the summer of 2017 and is the follow up to the original system.
# One major difference with this new system is that there is no cap on the amount of solar energy eligible for net metering.
# This is a major benefit for customers in CA as compared to NY where net metering will shut off once the cap has been reached.

# Another factor is that retail rate net metering has been preserved with Net Metering 2.0 in California.
# This is not the case in NY, where a compromise with utilities has led to a shift away from retail rate net metering.
# The practical difference here is that within a few years, NY residents will be paid less than the ratail rate for their energy.

# Sources: https://www.energysage.com/net-metering/sce/
#          https://www.energysage.com/net-metering/con-edison/
#          https://news.energysage.com/net-metering-2-0-in-california-everything-you-need-to-know/
#          https://www.utilitydive.com/news/strange-bedfellows-how-solar-and-utilities-struck-a-net-metering-compromis/419367/


# 2) Which utility has the most Net Metering Capacity (MW) by technology?

netmeter = pd.read_csv(r'C:\Users\Matt\Downloads\Net_Metering_2016.csv', skiprows=range(0,3),  thousands=',')
netmeter

# top capacity for solar
pv_capacity = netmeter.groupby(['Utility Name'])[['Total']].sum().reset_index()
pv_capacity = pv_capacity.sort_values(by=['Total'], ascending=False).iloc[range(0,1)]
pv_capacity['Technology'] = 'Solar'
pv_capacity.columns = ['Utility Name','Total capacity','Technology']
pv_capacity = pv_capacity[['Utility Name','Technology','Total capacity']]
# A: Pacific Gas & Electric, 2359 MW

# top capacity for wind
wind_capacity = netmeter.groupby(['Utility Name'])[['Total.12']].sum().reset_index()
wind_capacity = wind_capacity.sort_values(by=['Total.12'], ascending=False).iloc[range(0,1)]
wind_capacity['Technology'] = 'Wind'
wind_capacity.columns = ['Utility Name','Total capacity','Technology']
wind_capacity = wind_capacity[['Utility Name','Technology','Total capacity']]
# A: NSTAR Electric, 6357 MW

# top capacity for other technologies
other_capacity = netmeter.groupby(['Utility Name'])[['Total.13']].sum().reset_index()
other_capacity = other_capacity.sort_values(by=['Total.13'], ascending=False).iloc[range(0,1)]
other_capacity['Technology'] = 'Other'
other_capacity.columns = ['Utility Name','Total capacity','Technology']
other_capacity = other_capacity[['Utility Name','Technology','Total capacity']]
# A: Pacific Gas & Electric, 2471 MW

final = pd.concat([pv_capacity, wind_capacity, other_capacity], ignore_index=True)

final

Unnamed: 0,Utility Name,Technology,Total capacity
0,Pacific Gas & Electric Co,Solar,2359.382
1,NSTAR Electric Company,Wind,6357.438
2,Pacific Gas & Electric Co,Other,2470.532


<hr/>
## Electricity Sales

Some questions may require no data analysis (only research), while others may require both.

1) Which Independent System Operator (ISO) (or also known as a Balancing Authority) region has the largest retail electricity sales?

In [20]:
# 1) Which Independent System Operator (ISO) (or also known as a Balancing Authority) 
# region has the largest retail electricity sales?

ba = pd.read_csv(r'C:\Users\Matt\Downloads\Balancing_Authority_2016.csv')
short_form = pd.read_excel(r'C:\Users\Matt\Downloads\Short_Form_2016.xlsx', thousands=',')

#merging to get full BA name instead of just code
retail_sales = pd.merge(short_form, ba, left_on='BA_CODE', right_on='BA Code', how='left')
retail_sales = retail_sales.groupby(['Balancing Authority Name'])[['Total Sales (MWh)']].sum().reset_index()
retail_sales = retail_sales.sort_values(by=['Total Sales (MWh)'], ascending=False).iloc[range(0,1)]
retail_sales

# A: Midcontinent Independent Transmission System Operator, 127,593,908 MWh


Unnamed: 0,Balancing Authority Name,Total Sales (MWh)
16,Midcontinent Independent Transmission System O...,127593908.0


## Python Question 
Implement the function filter_by_class: It takes a feature matrix, X, an array of classes, y, and a class label, label. It should return all of the rows from X whose label is the given label.

In [200]:
import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
y = np.array(["a", "c", "a", "b"])

def filter_by_class(X,y,label):
    idx = np.where(y==label)
    Z = X[idx]
    return(Z)

print(filter_by_class(X, y, "a"))

print(filter_by_class(X, y, "b"))


[[1 2 3]
 [7 8 9]]
[[10 11 12]]
