<a href="https://colab.research.google.com/github/elebon26/DN9-467-final-project/blob/main/notebooks/Final_Ethan_Lebon_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Streaming Weather Analysis â€” Ethan Lebon - Individual DIVE Notebook**
### Final Project â€” MGMT 467  
**Author:** Ethan Lebon  
**Project:** Real-Time Weather Trends & Prediction  
**Focus:** Temperature Volatility Index (TVI) Insight  


## ðŸŽ¯ Objective

This notebook explores the real-time weather streaming data from our teamâ€™s pipeline  
(`finalprojectfor467.weather_proj.live_weather_with_delta`) and develops an insight that  
directly shaped part of our teamâ€™s Looker Studio dashboard.

My focus is on analyzing **short-term temperature instability** using a metric I engineered  
called the **Temperature Volatility Index (TVI)**, which measures the standard deviation of  
hourly temperature changes (`temp_delta_1h`) for each city over the last 12 hours.

The output includes:

- Queries to BigQuery using Python  
- Plotly interactive visualizations  
- A refined insight influenced by LLM prompting  
- A full DIVE reflection  


## ðŸ’¬ Prompts Used in This Analysis

### **Prompt 1**
"What insight can I extract from real-time weather data that uses `temp_delta_1h` and supports a meaningful DIVE reflection?"

### **Prompt 2**
"Explain how to measure weather instability or volatility across cities using streaming temperature deltas."

### **Prompt 3**
"Generate a metric that is simple to compute and visualize, but meaningful enough to add as a KPI on a Looker dashboard."

### **How the Prompts Helped**
The prompts pushed me toward measuring **temperature instability** instead of raw temperature.  
The model suggested using **standard deviation of hourly temperature deltas**, which became the  
**Temperature Volatility Index (TVI)** that I analyze in this notebook and added to our Looker dashboard.


In [27]:
!pip install google-cloud-bigquery pandas plotly --quiet

from google.colab import auth
auth.authenticate_user()

from google.cloud import bigquery
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

client = bigquery.Client(project="finalprojectfor467")


## ðŸŸª D â€” Discover

At first glance, the streaming weather data did not show any obvious trends.  
When I plotted raw temperature data, all cities looked relatively stable, and  
it wasnâ€™t clear what meaningful insight I could extract for the individual  
analysis notebook.

I noticed that our teamâ€™s pipeline generated a feature called  
`temp_delta_1h`, which captures the hour-to-hour temperature change.  
However, the Looker dashboard only displayed the temperature trend and  
did not surface any insight about **instability** or **volatility**.

My initial takeaway:
> Raw temperature alone wasnâ€™t revealing anything interesting â€” I needed a  
> metric that highlighted how *unstable* or *variable* temperatures were  
> across cities.


In [28]:
query = """
SELECT
  city,
  ts_hour,
  temp_hour_avg,
  temp_delta_1h
FROM `finalprojectfor467.weather_proj.live_weather_with_delta`
WHERE ts_hour >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 12 HOUR)
ORDER BY ts_hour
"""

df = client.query(query).to_dataframe()
df.head()


Unnamed: 0,city,ts_hour,temp_hour_avg,temp_delta_1h
0,Albuquerque,2025-12-12 10:00:00+00:00,4.152747,-4.152387
1,Denver,2025-12-12 10:00:00+00:00,8.984977,-2.806771
2,Los Angeles,2025-12-12 10:00:00+00:00,12.140062,-2.503201
3,Vancouver,2025-12-12 10:00:00+00:00,7.3719,0.068177
4,Seattle,2025-12-12 10:00:00+00:00,7.8,-0.064186


## ðŸŸ© I â€” Investigate

To dig deeper, I explored the `temp_delta_1h` feature in Python to understand how  
temperature changed from hour to hour. I suspected that focusing on volatility,  
rather than absolute temperature, might reveal more meaningful insights.

I also tested a few statistical transformations (sum, moving average, difference),  
but none highlighted clear differences across cities.

After iterating with LLM prompts, the strongest idea emerged:

### **Compute the standard deviation of hourly temperature changes per city.**

This became the **Temperature Volatility Index (TVI)**:

\[
\text{TVI} = \text{STDDEV}(\Delta \text{Temperature}_{1h})
\]

A high TVI = more unstable weather conditions.  
A low TVI = consistent, steady temperatures.


In [29]:
tvi_df = (
    df.groupby("city")["temp_delta_1h"]
      .std()
      .reset_index()
      .rename(columns={"temp_delta_1h": "TVI"})
      .sort_values("TVI", ascending=False)
)

tvi_df


Unnamed: 0,city,TVI
1,Denver,2.798806
0,Albuquerque,2.660384
3,Los Angeles,2.352847
2,Las Vegas,2.261434
4,Phoenix,2.148771
6,San Diego,1.252204
5,Portland,0.771582
7,San Francisco,0.770079
8,Seattle,0.411973
9,Vancouver,0.12797


In [30]:
fig = px.bar(
    tvi_df,
    x="city",
    y="TVI",
    title="Temperature Volatility Index (TVI) by City â€” Last 12 Hours",
    color="TVI",
    color_continuous_scale="Blues"
)
fig.update_layout(
    template="plotly_dark",
    xaxis_title="City",
    yaxis_title="Std Dev of Hourly Temperature Change"
)
fig.show()


In [31]:
# Hourly temperature levels
fig = px.line(
    df,
    x="ts_hour",
    y="temp_hour_avg",
    color="city",
    title="Hourly Avg Temperature (Last 12 Hours)"
)
fig.update_layout(template="plotly_dark")
fig.show()

# Hourly temperature change
fig = px.line(
    df,
    x="ts_hour",
    y="temp_delta_1h",
    color="city",
    title="Hourly Change in Temperature (Î” Last 12 Hours)"
)
fig.update_layout(template="plotly_dark")
fig.show()


## ðŸŸ¦ V â€” Validate

The Plotly bar chart validated a clear pattern:

- **Denver** and **Albuquerque** had the highest TVI values, meaning their temperatures fluctuated the most within the last 12 hours.
- **Seattle** and **Vancouver** had extremely low TVI values, reflecting very stable weather conditions.
- This pattern did **not** appear in the raw temperature line charts â€”  
  it only became visible once volatility was calculated.

To confirm this wasnâ€™t an artifact:
- I rechecked the raw `temp_delta_1h` line chart.
- The hourly spikes and drops aligned with the high-TVl cities.
- Low-volatility cities showed almost flat deltas near zero.

The insight held up under multiple checks.


## ðŸŸ§ E â€” Extend

Once validated, this insight became a meaningful addition to our team dashboard.

I added:
- A **Temperature Volatility Index (TVI) KPI**  
- A **city-ranked bar chart** showing TVI values  
- A **narrative explanation** connecting volatility to operational use cases  
  (energy demand, transit delays, sudden weather shifts)

This moves our system beyond just tracking temperature level â€” we now track  
**weather stability**, which is often more useful for short-term forecasting  
and real-time decision-making.

The DIVE process helped turn a vague idea into a concrete, validated insight  
that strengthened our final Looker dashboard.


## âœ… Conclusion

Using both LLM-guided exploration and statistical investigation, I identified  
that **temperature volatility**, measured via the standard deviation of hourly  
deltas,  provides a deeper and more actionable insight than raw temperature  
readings alone.

The Temperature Volatility Index (TVI) is now part of our team dashboard and  
helps surface real-time weather instability across cities, aligning with the  
goals of predictive analytics and streaming data analysis covered in class.
