# <div style="text-align: center"> ASHRAE - Great Energy Predictor III
### <div style="text-align: center"> How much does it cost to cool a skyscraper in the summer?

<img src="https://percentotech.com/wp-content/uploads/2019/07/Smart-Buildings.jpg">

   Developing energy savings has two key elements: Forecasting future energy usage without improvements, and forecasting energy use after a specific set of improvements have been implemented, like the installation and purchase of investment-grade meters, whose prices continue to fall. One issue preventing more aggressive growth of the energy markets are the lack of cost-effective, accurate, and scalable procedures for forecasting energy use.

  
 **I hope this kernel helpful **


<a id="top"></a> <br>
## Notebook  Content

1. [Understand the Competition](#1)
1. [Import](#2)
1. [Load Data](#3)
1. [Data Description](#4)
1. [Visualization](#5)
1. [References](#8)


**<a id="1"></a> <br>**
## 1- Understand the Competition

Assessing the value of energy efficiency improvements can be challenging as there's no way to truly know how much energy a building would have used without the improvements. The best we can do is to build counterfactual models. Once a building is overhauled the new (lower) energy consumption is compared against modeled values for the original building to calculate the savings from the retrofit. More accurate models could support better market incentives and enable lower cost financing.

This competition challenges you to build these counterfactual models across four energy types based on historic usage rates and observed weather. The dataset includes three years of hourly meter readings from over one thousand buildings at several different sites around the world.

Thankfully, significant investments are being made to improve building efficiencies to reduce costs and emissions. So, are the improvements working? That’s where you come in. Current methods of estimation are fragmented and do not scale well. Some assume a specific meter type or don’t work with different building types.

Developing energy savings has two key elements: Forecasting future energy usage without improvements, and forecasting energy use after a specific set of improvements have been implemented, like the installation and purchase of investment-grade meters, whose prices continue to fall. One issue preventing more aggressive growth of the energy markets are the lack of cost-effective, accurate, and scalable procedures for forecasting energy use.

In this competition, you’ll develop accurate predictions of metered building energy usage in the following areas: chilled water, electric, natural gas, hot water, and steam meters. The data comes from over 1,000 buildings over a three-year timeframe.

With better estimates of these energy-saving investments, large scale investors and financial institutions will be more inclined to invest in this area to enable progress in building efficiencies.

**<a id="2"></a> <br>**
## 2- Import

In [None]:
import pandas as pd

import numpy as np
import os
import seaborn as sns

import matplotlib.pyplot as plt

from statsmodels.tsa.stattools import adfuller

<a id="3"></a> <br>
## 3- Load Data

In [None]:
!ls ../input/ashrae-energy-prediction/

In [None]:
print('Total File sizes')
print('-'*10)
for f in os.listdir('../input/ashrae-energy-prediction'):
    if 'zip' not in f:
        print(f.ljust(30) + str(round(os.path.getsize('../input/ashrae-energy-prediction/' + f) / 10000000, 2)) + 'MB')

In [None]:
%%time
train = pd.read_csv('/kaggle/input/ashrae-energy-prediction/train.csv',index_col= 'timestamp', parse_dates=True)
test = pd.read_csv('/kaggle/input/ashrae-energy-prediction/test.csv',index_col= 'timestamp', parse_dates=True)
sample_sub = pd.read_csv('/kaggle/input/ashrae-energy-prediction/sample_submission.csv')
building_metadata = pd.read_csv('/kaggle/input/ashrae-energy-prediction/building_metadata.csv')
weather_train = pd.read_csv('/kaggle/input/ashrae-energy-prediction/weather_train.csv',index_col= 'timestamp', parse_dates=True)
train.head()

In [None]:
print('Train # rows: ',train.shape[0])
print('Train # Columns: ',train.shape[1])

In [None]:
test.head()

In [None]:
building_metadata.head()

In [None]:
print('building_metadata # rows: ',building_metadata.shape[0])
print('building_metadata # Columns: ',building_metadata.shape[1])

In [None]:
weather_train.head()

In [None]:
print('weather_train # rows: ',weather_train.shape[0])
print('weather_train # Columns: ',weather_train.shape[1])

In [None]:
sample_sub.head()

In [None]:
sample_sub.shape

<a id="4"></a> <br>
## 4- Data Description

In [None]:
print(list(train.columns))

**train.csv**
- **building_id **- Foreign key for the building metadata.
- **meter** - The meter id code. Read as {0: electricity, 1: chilledwater, 2: steam, hotwater: 3}. Not every building has all meter types.
- **timestamp** - When the measurement was taken
- **meter_reading** - The target variable. Energy consumption in kWh (or equivalent). Note that this is real data with measurement error, which we expect will impose a baseline level of modeling error.

In [None]:
print(list(test.columns))

**test.csv**
The submission files use row numbers for ID codes in order to save space on the file uploads. test.csv has no feature data; it exists so you can get your predictions into the correct order.

- row_id - Row id for your submission file
- building_id - Building id code
- meter - The meter id code
- timestamp - Timestamps for the test data period

In [None]:
print(list(building_metadata.columns))

**building_meta.csv**
- site_id - Foreign key for the weather files.
- building_id - Foreign key for training.csv
- primary_use - Indicator of the primary category of activities for the building based on EnergyStar property type definitions
- square_feet - Gross floor area of the building
- year_built - Year building was opened
- floor_count - Number of floors of the building

In [None]:
print(list(sample_sub.columns))

**sample_submission.csv**
A valid sample submission.

All floats in the solution file were truncated to four decimal places; we recommend you do the same to save space on your file upload.
There are gaps in some of the meter readings for both the train and test sets. Gaps in the test set are not revealed or scored.

In [None]:
train.info()

In [None]:
test.info()

In [None]:
building_metadata.info()

In [None]:
nulls = building_metadata.isnull().sum() # Sum of missing values
nulls = nulls[nulls > 0]  
nulls.sort_values(inplace=True)
nulls

In [None]:
weather_train.info()

In [None]:
nullsWeather = weather_train.isnull().sum() # Sum of missing values
nullsWeather = nullsWeather[nullsWeather > 0]  
nullsWeather.sort_values(inplace=True)
nullsWeather

<a id="5"></a> <br>
## 5- Visulization of data

In [None]:
building_metadata.head()

In [None]:
plt.figure(figsize=(7,7))
 
sns.countplot(y= building_metadata.primary_use,palette="Set2")

In [None]:
plt.figure(figsize=(7,7))
 
sns.countplot(y= building_metadata.site_id,palette="Set2")

In [None]:

sns.distplot(building_metadata.year_built, bins=25, hist=True,kde=False, rug=False ).set_title("Histogram of Year Built")

In [None]:
sns.distplot(building_metadata.square_feet, bins=25, hist=True,kde=False, rug=False ).set_title("Histogram of square_feet")

In [None]:
sns.distplot(building_metadata.floor_count, bins=25, hist=True,kde=False, rug=False ).set_title("Histogram of floor_count")

In [None]:
building_metadata.building_id.unique()

In [None]:
sns.distplot(weather_train.air_temperature, bins=25, hist=True,kde=False, rug=False ).set_title("Histogram of Air Temperature")

In [None]:
sns.distplot(weather_train.sea_level_pressure, bins=25, hist=True,kde=False, rug=False ).set_title("Histogram of Sea Level Pressure")

In [None]:
sns.distplot(weather_train.wind_speed, bins=25, hist=True,kde=False, rug=False ).set_title("Histogram of wind Speed")

In [None]:
Weather = weather_train.copy()

In [None]:
Weather.head()

## Example for site 1 with train

In [None]:
trainsite1 = train[train['building_id'] == 0]

In [None]:
trainsite1.meter_reading.plot(figsize=(16,8))

## Example for site 1 with Weather

In [None]:
site1 = Weather[Weather['site_id'] == 0 ]

In [None]:
site1.air_temperature.plot(figsize=(16,8))

In [None]:

site1.air_temperature.plot(kind = 'kde')

Check Stationary


In [None]:
def test_stationa(data):

    rolmean = data.rolling(window = 10).mean()
    
    #plotting rolling statistics
    original = plt.plot(data, color = 'blue', label = 'Original')
    mean = plt.plot(rolmean, color = 'red', label = 'Rolling Mean')
    
    plt.legend()
    plt.title('Rolling  Mean')
    plt.show()
    
    

In [None]:
test_stationa(site1.air_temperature)

In [None]:
site1.dew_temperature.plot(figsize=(16,8))

In [None]:
site1.sea_level_pressure.plot(figsize=(16,8))

In [None]:
plt.figure(figsize=(11,11))
correlations = Weather.corr()
mask = np.zeros_like(correlations)
mask[np.triu_indices_from(mask)] = True 
with sns.axes_style("white"):
    ax = sns.heatmap(correlations, mask=mask, vmax=.9, square=True)

**<a id="6"></a> <br>**
## 6- References
[ASHRAE - Great Energy Predictor III](https://www.kaggle.com/c/ashrae-energy-prediction/data)