# <span style="color: red;"> Site-Energy Usage Intensity Prediction(EUI)</span>
------------------------------------------------------------------------------------------------------------------------------------------
#### **Description:**
- According to a report issued by the International Energy Agency (IEA), the lifecycle of buildings from construction to demolition was responsible for 37% of global energy-related and process-related CO2 emissions in 2020. Yet it is possible to drastically reduce the energy consumption of buildings by a combination of easy-to-implement fixes and state-of-the-art strategies.

#### **Dataset Description**
The dataset consists of building characteristics, weather data for the location of the building, as well as the energy usage for the building, and the given year, measured as Site Energy Usage Intensity (Site EUI). Each row in the data corresponds to a single building observed in a given year.

**Dataset-source**: [Site-Energy-dataset](https://www.kaggle.com/c/widsdatathon2022/data)  

In [10]:
import pandas as pd
pd.set_option('display.max_columns',100)

In [11]:
df = pd.read_csv('data/train_site_eui_dataset.csv')

In [12]:
df.head()

Unnamed: 0,Year_Factor,State_Factor,building_class,facility_type,floor_area,year_built,energy_star_rating,ELEVATION,january_min_temp,january_avg_temp,january_max_temp,february_min_temp,february_avg_temp,february_max_temp,march_min_temp,march_avg_temp,march_max_temp,april_min_temp,april_avg_temp,april_max_temp,may_min_temp,may_avg_temp,may_max_temp,june_min_temp,june_avg_temp,june_max_temp,july_min_temp,july_avg_temp,july_max_temp,august_min_temp,august_avg_temp,august_max_temp,september_min_temp,september_avg_temp,september_max_temp,october_min_temp,october_avg_temp,october_max_temp,november_min_temp,november_avg_temp,november_max_temp,december_min_temp,december_avg_temp,december_max_temp,cooling_degree_days,heating_degree_days,precipitation_inches,snowfall_inches,snowdepth_inches,avg_temp,days_below_30F,days_below_20F,days_below_10F,days_below_0F,days_above_80F,days_above_90F,days_above_100F,days_above_110F,direction_max_wind_speed,direction_peak_wind_speed,max_wind_speed,days_with_fog,site_eui,id
0,1,State_1,Commercial,Grocery_store_or_food_market,61242.0,1942.0,11.0,2.4,36,50.5,68,35,50.589286,73,40,53.693548,80,41,55.5,78,46,56.854839,84,50,60.5,90,52,62.725806,84,52,62.16129,85,52,64.65,90,47,63.016129,83,43,53.8,72,36,49.274194,71,115,2960,16.59,0.0,0,56.972603,0,0,0,0,14,0,0,0,1.0,1.0,1.0,,248.682615,0
1,1,State_1,Commercial,Warehouse_Distribution_or_Shipping_center,274000.0,1955.0,45.0,1.8,36,50.5,68,35,50.589286,73,40,53.693548,80,41,55.5,78,46,56.854839,84,50,60.5,90,52,62.725806,84,52,62.16129,85,52,64.65,90,47,63.016129,83,43,53.8,72,36,49.274194,71,115,2960,16.59,0.0,0,56.972603,0,0,0,0,14,0,0,0,1.0,,1.0,12.0,26.50015,1
2,1,State_1,Commercial,Retail_Enclosed_mall,280025.0,1951.0,97.0,1.8,36,50.5,68,35,50.589286,73,40,53.693548,80,41,55.5,78,46,56.854839,84,50,60.5,90,52,62.725806,84,52,62.16129,85,52,64.65,90,47,63.016129,83,43,53.8,72,36,49.274194,71,115,2960,16.59,0.0,0,56.972603,0,0,0,0,14,0,0,0,1.0,,1.0,12.0,24.693619,2
3,1,State_1,Commercial,Education_Other_classroom,55325.0,1980.0,46.0,1.8,36,50.5,68,35,50.589286,73,40,53.693548,80,41,55.5,78,46,56.854839,84,50,60.5,90,52,62.725806,84,52,62.16129,85,52,64.65,90,47,63.016129,83,43,53.8,72,36,49.274194,71,115,2960,16.59,0.0,0,56.972603,0,0,0,0,14,0,0,0,1.0,,1.0,12.0,48.406926,3
4,1,State_1,Commercial,Warehouse_Nonrefrigerated,66000.0,1985.0,100.0,2.4,36,50.5,68,35,50.589286,73,40,53.693548,80,41,55.5,78,46,56.854839,84,50,60.5,90,52,62.725806,84,52,62.16129,85,52,64.65,90,47,63.016129,83,43,53.8,72,36,49.274194,71,115,2960,16.59,0.0,0,56.972603,0,0,0,0,14,0,0,0,1.0,1.0,1.0,,3.899395,4


In [14]:
df.shape

(75757, 64)

In [19]:
72-43

29

In [17]:
df['Year_Factor'].unique()

array([1, 2, 3, 4, 5, 6], dtype=int64)

The Dataset has 75k rows and 64 dims/features/columns

Let's understand the description of each column

### Dependent Features

- **id**: building id
- **Year_Factor**: the anonymized year in which the weather and energy usage factors were observed
- **State_Factor**: anonymized state in which the building is located
- **building_class**: building classification
- **facility_type**: building usage type
- **floor_area**: floor area (in square feet) of the building
- **year_built**: year in which the building was constructed
- **energy_star_rating**: the energy star rating of the building
- **ELEVATION**: elevation of the building location
- **months_min_temp**: minimum temperature in a particular month (in Fahrenheit) at the location of the building
- **months_max_temp**: max temperature in a particular month (in Fahrenheit) at the location of the building
- **cooling_degree**: cooling degree day for a given day is the number of degrees where the daily average temperature exceeds 65 degrees Fahrenheit. Each month is summed to produce an annual total at the location of the building
- **heating_degree**: heating degree day for a given day is the number of degrees where the daily average temperature falls under 65 degrees Fahrenheit. Each month is summed to produce an annual total at the location of the building.
- **precipitation_in**: annual precipitation in inches at the location of the building
- **snowfall_in**: annual snowfall in inches at the location of the building
- **snowdepth_in**: annual snow depth in inches at the location of the building
- **avg_temp**: average temperature over a year at the location of the building
- **temperature_below_x**: total number of days below x degrees Fahrenheit at the location of the building
- **direction_max_wind_speed**: wind direction for maximum wind speed at the location of the building. Given in 360-degree compass point direction.
- **direction_peak_wind_speed**: wind direction for peak wind gust speed at the location of the building. Given in 360-degree compass point directions.
- **max_wind_speed**: maximum wind speed at the location of the building
- **days_with_fog**: the number of days with fog at the location of the building

### Target Column/Independent Feature

- **site_eui**: Site Energy Usage Intensity is the amount of heat and electricity consumed by a building as reflected in utility bills.
lected in utility bills.e_eui

************************************************************************************************************************************************
**Hypothesize The Data prior Analysis:**

Based on the domain Knowledge nullify the columns which are not necessary for further anaylsis
As in the given dataset there are months min, max and average temperature columns. 

$ \text{Average\_month\_temp} = \frac{\text{months\_min} + \text{months\_max}}{2} $

Hence, we can drop min and max temperature columns of each month

## Exploratory Data Analysis