# COGS 108 - Project Proposal

# Names

- Kim Lim
- Ronnie Volman
- Milton Iwama
- Saul Sanchez
- Owen Connor

# Research Question

Can we predict average household energy consumption in San Diego County based on time of year, weather patterns and geographical location?



## Background and Prior Work

Energy consumption in households is influenced by a variety of factors including but not limited to seasonal changes, weather patterns and regional behaviors. San Diego, in particular, is home to a surprising variety of microclimates, each offering a distinct weather experience despite their proximity to one another<sup><a href="#footnote1">[1]</a></sup>. Moreover, San Diego doesn’t seem to follow closely to that of a traditional season change like other places. For instance, winter feels cold but may not experience snow, and summer is hot and dry in one location but humid in another. The weather fluctuation in this region is mild which could demonstrate unique trends in energy consumption. 

Despite the significance of these patterns, most research on energy consumption has focused on larger metropolitan areas or regions with more extreme weather conditions. For instance, Alaska, which experiences extreme cold weather, has two metropolitan areas. According to the US Energy Information Administration, Alaska had the highest per capita total primary energy consumption at the state level in 2022 with about 987 MMBtu (Million British Thermal Units) per person and the highest per capita transportation energy use<sup><a href="#footnote2">[2]</a></sup>. Comparatively little attention has been given to temperate coastal cities like San Diego. This presents an opportunity to explore how subtle shifts in weather, seasonal variations, and geographic location impact energy usage and to assess how effectively time and weather data can predict local consumption trends. 

But why does this matter? Understanding these localized patterns in energy consumption, especially in regions like San Diego with mild yet variable microclimates, matters more than it might initially seem. Accurate, location-specific energy forecasting can empower both individuals and communities to make informed, cost-effective decisions about their energy use. Energy use typically peaks at certain times, when utilities must rely on the most expensive and resource-intensive electricity sources to meet demand<sup><a href="#footnote3">[3]</a></sup>. By predicting when energy demand will peak due to subtle weather changes, residents can shift their usage to off-peak hours, saving money and easing pressure on the electrical grid. On a broader scale, these adjustments help reduce reliance on high-emission energy sources typically used during peak times. In turn, this supports a more resilient energy infrastructure and lowers carbon emissions. As climate conditions become less predictable and energy needs continue to rise, exploring how microclimate-driven behaviors affect consumption becomes a relevant and essential study.

<p id="footnote1"><sup>[1]</sup> Amber Coakley. “Experience All of San Diego County’s Unique Microclimates in One Day.” <i>FOX 5 San Diego & KUSI News</i>, 2 Mar. 2025. <a href="https://fox5sandiego.com/weather/experience-all-of-san-diego-countys-unique-microclimates-in-one-day/">https://fox5sandiego.com/weather/experience-all-of-san-diego-countys-unique-microclimates-in-one-day/</a></p>
<p id="footnote2"><sup>[2]</sup> “Frequently Asked Questions (FAQs) - U.S. Energy Information Administration (EIA).”  Www.eia.gov, <a href="www.eia.gov/tools/faqs/faq.php?id=85&t=1">www.eia.gov/tools/faqs/faq.php?id=85&t=1</a></p>
<p id="footnote3"><sup>[3]</sup> “Why Does It Matter What Time of Day You Use Power?” Efficiencyvermont.com, 2020, <a href="www.efficiencyvermont.com/blog/our-insights/why-does-it-matter-what-time-of-day-you-use-power">www.efficiencyvermont.com/blog/our-insights/why-does-it-matter-what-time-of-day-you-use-power</a></p>

# Hypothesis


Average household energy consumption in San Diego county can be accurately predicted using time of year, weather variable and general location, with higher consumption expected during hotter months due to increased cooling needs. We think more energy consumption would happen in hotter months, because aside from having to cool ourselves down, we also need to consume more energy to cool the electronics that could be potentially overheated. 

# Data

The study would measure variables like zip codes, daily or monthly average household electricity consumption per household , average temperature, minimum and maximum temperatures, humidity, precipitation, date, day of the week, month, and season. Data would be collected either at the household level or aggregated by zip codes across San Diego County. The most relevant time period for this analysis is from 2021 to present. We’ll exclude data from 2020, which is during the pandemic, because it would most likely skew the results of our analysis due to increased home energy usage from individuals staying at home. We’ll also exclude data from before the pandemic because people might have different energy spending habits from that time. Starting from 2021 provides a more accurate baseline, reflecting a period when people began adjusting to post pandemic routines and usage rates started stabilizing. Collecting recent data, including from the current year, would further improve the ability to predict electricity consumption patterns for the upcoming summer.

# Ethics & Privacy

There may be several potential biases and privacy concerns associated with the proposed energy consumption data. The data might contain personally identifiable information (PII), which would require careful handling to protect individual privacy. Additionally, there could be potential biases related to how the data was collected and who is represented within it. For instance, certain neighborhoods, particularly higher-income areas with smart meters, solar panels, or better internet access may be overrepresented, while lower-income communities without advanced infrastructure could be underrepresented. Demographic biases may also be embedded in the energy usage patterns due to historic housing segregation, affecting the equity of any resulting analysis. To address these concerns, steps will be taken to detect biases throughout the project lifecycle, including before, during, and after analysis, and especially when communicating findings. Specific strategies include focusing on transparency by clearly stating data sources, preprocessing methods, and known limitations, as well as working with aggregated or anonymized data whenever possible to minimize privacy risks and ensure more equitable outcomes.

# Team Expectations 

* Communication is through discord. The expected response time is 24 hrs.
* Meeting during discussions. Discord meetings when needed
* Assigned tasks need to be done by our internal due date (before the actual due date) If others depend on your task to be done, you have an obligation to complete your task ahead of time so everyone may have the appropriate time to complete their own tasks. Failure will be documented which can result in a lesser grade. If this issue persists, you’ll be reported to the TA and they can decide what to do.
* Tasks will be listed in github issues.
* Decision making will be a majority vote. If an individual fails to respond within 12 hours their vote is null. For urgent issues a 6 hour window will be appointed. If a quick decision with a hard deadline (such as a submission to canvas/github) has to be made the individual must respond 3 hours before the deadline so there is time to complete the task.
* People struggling with completing their tasks must contact the group right away in order for the work to not fall back. It is better to ask for help sooner rather than later.
* Tasks will be assigned on the person who is better at it but everyone will do a bit of everything. This is a GROUP project. We should be working together to help each other out and finish tasks accordingly.


# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 04/28 | 3 PM | When to meet schedule, Think about the research question  | Determine expectations for group, decide on what topic to do, discuss hypothesis, do background research | 
| 04/30 | 4 PM | Edit and finalize project proposal | Search for dataset, tidy and clean dataset | 
| 05/02 | 4 PM | Dataset should be tidy and clean |   Data exploration, discuss what features to use, assign tasks to members |
| 05/07 | 4 PM | NA | Progress check in |
| 05/09 | 4 PM | Data exploration should be complete | Data preprocessing, discuss techniques to properly complete this task |
| 05/14 | 4 PM | Data preprocessing should be complet | Explore different machine learning models, discuss which one is best |
| 05/16 | 4 PM | NA | Progress check in |
| 05/21 | 4 PM | Model tuned and finalized | Review the whole project, discuss if changes or improvements need to be made |
| 05/23 | 4 PM | Edits should be done | Review again |
| 05/28 | 4 PM | Finalize project and turn in | NA |