## Case Study: Weather Influence on Energy Consumption of a Building

**Aim of study**: To find out how the weather influence the energy consumption in "UnivClass_Ciara" building

**Dataset:**

Building Data Genome Project 1 - Hourly (Electrical Meter Data from Non-residential Buildings)

Another corresponding dataset for weather conditions around building and metadata("all_buildings_meta_data") to find corresponding weather data file name.

**License**: CC BY-SA 4.0

**Dataset Author**: [Clayton Miller](https://www.kaggle.com/claytonmiller)

**Key Metrics To Check:** 
1. Temperature vs Energy Consumption 
2. Humidity vs Energy Consumption 

**Tools Used**: 
Jupyter Notebook for Python. Libraries: Pandas, Matplotlib, and Seaborn

In [None]:
#importing libraries
import pandas as pd 
import os

The 'building-data-genome-project-v1' contains hourly time-series energy consumption data for whole year for each building. There is data of such 557 buildings. In this case study I have analyzed one such building called "UnivClass_Ciara". The hourly time-series meter data for entire year is available in the CSV file. The corresponding weather conditions data is uploaded. Metadata for the buildings is also uploaded.

In [None]:
#reading UnivClass_Ciara meter data
meter_data = pd.read_csv("/kaggle/input/building-data-genome-project-v1/UnivClass_Ciara.csv")
meter_data.head(5)

Making timestamp column as index and parsing dates

In [None]:
meter_data = pd.read_csv("/kaggle/input/building-data-genome-project-v1/UnivClass_Ciara.csv",parse_dates = True, index_col ="timestamp")
meter_data.head(5)

Plotting the line graph for meter data

In [None]:
meter_data.plot(figsize=(10,4))

Before proceding we will need to find out name of corresponding "weather" condition file to "UnivClass_Ciara" which is available in metadata

In [None]:
#Let's load all_building_meta_data first
metadata = pd.read_csv("/kaggle/input/buildings-energy-consumption-metadata/all_buildings_meta_data.csv",index_col='uid')
metadata.head(5)

In [None]:
#Let's locate the weather file name for "UnivClass_Ciara"
metadata.loc["UnivClass_Ciara"]["newweatherfilename"]

The corresponding weather data is in weather2.csv file. 
Uploaded 'weather2.csv' file

In [None]:
#reading weather data at "UnivClass_Ciara" location
weather_data = pd.read_csv("/kaggle/input/weather2/weather2.csv", index_col = "timestamp", parse_dates = True)
weather_data.head(5)

In [None]:
weather_data.columns

In [None]:
#dropping the columns which are not required in this study

drop_columns = ['Conditions', 'DateUTC<br />', 'Dew PointC', 'Events', 'Gust SpeedKm/h',
        'Precipitationmm', 'Sea Level PressurehPa',
       'TimeEDT', 'TimeEST', 'VisibilityKm', 'Wind Direction',
       'Wind SpeedKm/h', 'WindDirDegrees', 'timestamp.1']
weather_data= weather_data.drop(columns = drop_columns)
weather_data.info()

In [None]:
weather_data.head(5)

Let's plot each parameter with respect to time

In [None]:
weather_data["TemperatureC"].plot(figsize= (10,4))

There exists outliers because -10000 degrees celcius temperature on earth is impossible.

Before cleaning this outlier let's check other parameters first.

In [None]:
weather_data["Humidity"].plot(figsize= (10,10)) #plotting humidity vs temperature

First resampling the timeseries data by hours

In [None]:
weather_hourly = weather_data.resample("H").mean()
weather_hourly.head(5)

In [None]:
weather_hourly.info()

Cleaning temperature data to remove outliers

In [None]:
weather_hourly_clean = weather_hourly[weather_hourly>-40]
weather_hourly_clean.info()

By filtering data which is smaller than -40 degrees celcius the outliers have been removed. Before removing outliers there were 8567 rows and after removing the outliers they were 8544. But the gap created in the data after removing outliers needs to filled. Using fillna function.

In [None]:
weather_hourly_clean= weather_hourly_clean.fillna(method='ffill')
weather_hourly_clean.info()

In [None]:
#plot clean data
weather_hourly_clean["TemperatureC"].plot(figsize=(10,4))

In [None]:
weather_hourly_clean.to_csv("weather_hourly_clean.csv")

Let's merge meter data and temperature data for comparison

In [None]:
temp_vs_meter_data = pd.merge(weather_hourly_clean["TemperatureC"],meter_data['UnivClass_Ciara'],left_index=True,right_index=True,how='outer')
temp_vs_meter_data.head(5)

In [None]:
temp_vs_meter_data.info()

In [None]:
temp_vs_meter_data.plot(figsize=(20,10),subplots = True)

In [None]:
temp_vs_meter_data.plot(kind="scatter",x = "TemperatureC",y="UnivClass_Ciara",alpha= 0.5,figsize=(15,10))

In [None]:
import seaborn as sns
def make_color_division(x): #creating a function to create new column and then use as hue in the plot
    if x<14:
        return "Heating"
    else:
        return "Cooling"

temp_vs_meter_data.resample("D").mean() #resampling the data
temp_vs_meter_data['Heating_vs_Cooling'] = temp_vs_meter_data.TemperatureC.apply(lambda x: make_color_division(x)) #applying the function to combined data
temp_vs_meter_data.sample(frac=0.5) #checking random sample for new column

In [None]:
g= sns.lmplot(x="TemperatureC",y="UnivClass_Ciara",hue="Heating_vs_Cooling",data=temp_vs_meter_data,truncate=True,palette="husl")

In [None]:
sns.scatterplot(x="TemperatureC",y="UnivClass_Ciara",hue="Heating_vs_Cooling",data=temp_vs_meter_data,palette = "husl")

Clearly the energy consumption during cooling phase is more than during heating phase.

#### Using similar code for Humidity.

In [None]:
humidity_vs_meter_data = pd.merge(weather_hourly_clean["Humidity"],meter_data['UnivClass_Ciara'],left_index=True,right_index=True,how='outer')
humidity_vs_meter_data.head(5)

humidity_vs_meter_data.info()

humidity_vs_meter_data.plot(figsize=(20,10),subplots = True)
humidity_vs_meter_data.resample("D").mean() #resampling the data
humidity_vs_meter_data.plot(kind="scatter",x = "Humidity",y="UnivClass_Ciara",alpha= 0.5,figsize=(15,10))

There is no direct correlation between humidity and energy consumption