<a href="https://colab.research.google.com/github/ellenwang995/final_project/blob/main/PythonFinalAssignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Importing Data:**

[Data sourced from EIA:](https://www.eia.gov/electricity/data.php)
the data includes monthly electricity prices by state (in cents/kWh) from 2010 to 2025.


In [28]:
#importing data
import pandas as pd

price_df = pd.read_excel("/content/MonthlyPrice_State.xlsx")
price_df.head()


Unnamed: 0,Year,Month,Date,State,Price
0,2025.0,7.0,Jul 2025,Alaska,27.3
1,2025.0,7.0,Jul 2025,Alabama,15.88
2,2025.0,7.0,Jul 2025,Arkansas,13.23
3,2025.0,7.0,Jul 2025,Arizona,15.38
4,2025.0,7.0,Jul 2025,California,32.58


**Exploring and Cleaning the Data:**

In [29]:
price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9538 entries, 0 to 9537
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    9537 non-null   float64
 1   Month   9537 non-null   float64
 2   Date    9537 non-null   object 
 3   State   9537 non-null   object 
 4   Price   9537 non-null   float64
dtypes: float64(3), object(2)
memory usage: 372.7+ KB


In [30]:
#change the variable "Date" to a date and time variable type
#used AI to determine what formatting is aligned with the existing formatting in the raw data
price_df['Date_dt'] = pd.to_datetime(price_df['Date'], format = '%b %Y')
price_df['Date_dt']

Unnamed: 0,Date_dt
0,2025-07-01
1,2025-07-01
2,2025-07-01
3,2025-07-01
4,2025-07-01
...,...
9533,2010-01-01
9534,2010-01-01
9535,2010-01-01
9536,2010-01-01


In [31]:
#checking for duplicates
price_df[price_df.duplicated(keep=False)].sort_values(by=['Date_dt', 'State'])
price_df[price_df.duplicated(keep = False)]

Unnamed: 0,Year,Month,Date,State,Price,Date_dt


**Mutating the Exisitng Data Frame:**

In addition to looking into change in electricity prices and price volatility on a macro time scale, it is also important to assess volatility throughout the year. This is why we also create a separate data frame isolated for electricity prices during the year 2021 and 2024.

We are specifically interested at looking at price volatility between states during 2021 because of the energy crisis caused by the Russian invansion of Ukraine. Although the impacts on energy prices of this geopolitical conflict were more impactful to European electricity prices, there were ripple effects into the US market.

In [53]:
#creating a new data frame price_2021 with all the entries from price_df that have the year 2021
price_2021 = price_df[price_df['Date_dt'].dt.year == 2021]
price_2021

Unnamed: 0,Year,Month,Date,State,Price,Date_dt
2193,2021.0,12.0,Dec 2021,Alaska,22.14,2021-12-01
2194,2021.0,12.0,Dec 2021,Alabama,9.66,2021-12-01
2195,2021.0,12.0,Dec 2021,Arkansas,10.99,2021-12-01
2196,2021.0,12.0,Dec 2021,Arizona,12.62,2021-12-01
2197,2021.0,12.0,Dec 2021,California,23.38,2021-12-01
...,...,...,...,...,...,...
2800,2021.0,1.0,Jan 2021,Vermont,18.39,2021-01-01
2801,2021.0,1.0,Jan 2021,Washington,9.77,2021-01-01
2802,2021.0,1.0,Jan 2021,Wisconsin,14.03,2021-01-01
2803,2021.0,1.0,Jan 2021,West Virginia,11.19,2021-01-01


In [54]:
#creating another new data frame price_2024 with all the entries from price_df that have the year 2024
price_2024 = price_df[price_df['Date_dt'].dt.year == 2024]
price_2024

Unnamed: 0,Year,Month,Date,State,Price,Date_dt
357,2024.0,12.0,Dec 2024,Alaska,22.38,2024-12-01
358,2024.0,12.0,Dec 2024,Alabama,14.91,2024-12-01
359,2024.0,12.0,Dec 2024,Arkansas,11.74,2024-12-01
360,2024.0,12.0,Dec 2024,Arizona,15.20,2024-12-01
361,2024.0,12.0,Dec 2024,California,30.55,2024-12-01
...,...,...,...,...,...,...
964,2024.0,1.0,Jan 2024,Vermont,21.14,2024-01-01
965,2024.0,1.0,Jan 2024,Washington,11.07,2024-01-01
966,2024.0,1.0,Jan 2024,Wisconsin,16.54,2024-01-01
967,2024.0,1.0,Jan 2024,West Virginia,13.65,2024-01-01


**Data Visualizations:**

Here, we create line graphs for change in electricity prices across the whole time from of 2010 - 2024 (pricefig1)

In [21]:
# !pip -q install plotly
import plotly.express as px

In [68]:
def price_fig(df, state, title):

    if state == 'all':
        filtered_df = df
    else:
        filtered_df = df[df['State'].isin(state)]


    pricefig1 = px.line(filtered_df, x='Date_dt', y='Price', color='State',
                       title=title,
                       labels={'Date_dt': 'Date', 'Price': 'Price (cents/kWh)'})

    return pricefig1

#Select the states you are interested. For the purpose of intial inqury, we will take three of the largest economic states in the US: California, Texas, and New York

state = ["California", "Texas", "New York"]

#Figure 1 shows the change in electricity prices from 2010-2024 for California, Texas, and New York.
pricefig1 = price_fig(price_df, state, 'Change in Electricity Price (2010-2024)')
pricefig1.show()

#Figure 2 shows the change in electricity prices during 2024 for California, Texas, and New York.
pricefig2 = price_fig(price_2024, states, 'Change in Electricity Price (2024)')
pricefig2.show()

#Figure 3 shows the change in electricity prices during 2021 for California, Texas, and New York.
pricefig3 = price_fig(price_2021, states, 'Change in Electricity Price (2021)')
pricefig3.show()


#It is interesting to note that despite having its own grid, electricity prices in Texas have remained the least volatile amoung the three states, even in times of market crisis.