# Energy Ratings Analysis
## Urban Data Genome Project
The metadata dataframe contains three different types of ratings that are used to measure building performance in regards to energy efficiency. This notebook explores the differences between these three types of rating systems, analyzes why they are used in the locations/sites in which they are reported, and if and how they can be compared to one another for future analysis.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
os.chdir('/kaggle/input/buildingdatagenomeproject2')
os.listdir()

In [None]:
meta = pd.read_csv('metadata.csv')
print(meta.shape)
meta.head()

## Exploring energystarscore
### In this dataset...
- There are 163 unique locations that reported an energystarscore out of the total 1636 rows.
- The energystarscore is measured numerically on a scale of 0 to 100. In this dataset, values range from 1 to 98.
- Out of the 163 locations with reported energystarscores, 109 of these values are null, or '-'.
- Taking this into account, there are really only 54 reported energystarscores.
- There are 32 unique values reported in the energystarscore column, excluding null values.
- The only site_id with reported energystarscore is 'Hog', which was an anonymous source. Using latitude and longidute, it can be determined that this source is located in Minneapolis, MN. Due to the high proportion of office, residential, storage, and parking data from this location, it likely corresponds to Downtown Minneapolis Minnesota United States.
- Most common energystarscore is 1.0.
- Mostly recorded in Offices. Also a few data points in lodging/residential, warehouse/storage, parking and healthcare space usages.
- All buildings with reported energystarscore also have reported eui (energy usage intensity). This is important to note because in the metadata dataframe as a whole both of these fields are very sparsely populated.


### What is it?
**Using the 1 – 100 ENERGY STAR score, you can understand how your building’s energy consumption measures up against similar buildings nationwide. The ENERGY STAR score allows everyone to quickly understand how a building is performing.**

Based on general building information including size, location, number of occupants, etc., this score’s algorithm estimates hypothetical energy usage for best performance, worst performance, and every level in between. It then compares the actual energy data reported to these estimates to determine where the building ranks relative to other buildings across the country with the same primary use. In this way, ENERGY STAR scores are based on national survey data originating from the  Commercial Building Energy Consumption Survey (CBECS).

This rating system is used by the EPA to determine energy efficiency in buildings nationwide (USA). This system is also used in Canada.

The following property types are eligble for receiving a 1 - 100 ENERGY STAR score in the US:
- Bank branch
- Barracks
- Courthouse
- Data or Distribution center
- Hospital
- Hotel
- K-12 school, Residence hall, or Dormitory
- Multifamily housing
- Non-refrigerated or refrigerated warehouse
- Office (including Financial and Medical offices)
- Retail store, supermarket/grocery store, or wholesale club/supercenter
- Senior care community
- Wastewater treatment plant
- Worship facility


**Interpreting the Score:** 50 represents median energy performance, while 75 or above means your building is a top energy performer.

*Learn more* [here](https://www.energystar.gov/buildings/facility-owners-and-managers/existing-buildings/use-portfolio-manager/understand-metrics/energy-star)

In [None]:
energystarscore = meta[meta['energystarscore'].notna()]
print(energystarscore.shape)
energystarscore.head(20)

In [None]:
energystarscore['energystarscore'].unique()

In [None]:
(energystarscore['energystarscore']=='-').sum()

In [None]:
energystarscore['energystarscore'] = energystarscore['energystarscore'].replace('-', np.nan).astype('float64')
energystarscore['energystarscore'].unique()

In [None]:
starscore = energystarscore[energystarscore['energystarscore'].notna()]
print(starscore.shape)
starscore.head(54)

In [None]:
pd.value_counts(starscore['energystarscore']).plot.bar(figsize=(15,5))

In [None]:
pd.value_counts(starscore['primaryspaceusage']).plot.bar(figsize=(10,5))

## Exploring leed_level
### In this dataset...
- There are 136 unique locations that reported an energystarscore out of the total 1636 rows.
- The leed_level is measured in categories, ranging from certified to platinum. In this dataset, the only values included are silver and gold.
- Out of the 136 locations with reported energystarscores, 120 of these values are null, or 'None'.
- Taking this into account, there are really only 16 reported leed_levels.
- There are 2 unique values reported in the leed_level column, excluding null values.
- The only site_id with reported energystarscore is 'Panther', which represents the University of Central Florida in Orlando, Florida.
- 81.25% of reported scores are Gold, the remaining 18.75% are Silver ratings.
- Mostly recorded in Offices. Also a few data points in lodging/residential and education space usages.


### What is it?
**LEED certification means healthier, more productive places for us to live, learn, work and play, as well as less stress on the environment, by encouraging energy- and resource-efficient buildings.**

In order to qualify for LEED certification of any sort, buildings must meet certain prerequisites. Once these minimmum requirements are met, you are free to go for any credit you want within your chosen rating system. Credits are how you earn points toward your LEED cerification. The following are the different rating systems:
- LEED for Building Design and Construction
- LEED for Interior Design and Construction
- LEED for Building Operations and Maintenance
- LEED for Neighborhood Development


The LEED rating system has seven areas of concentration: Sustainable Sites, Water Efficiency, Energy and Atmosphere, Materials and Resources, Indoor Environmental Quality, Innovation in Design Process, and Regional Priority. Projects obtain credits in these areas to achieve certification. In totality there is a maximum of 110 available points to be achieved across all four categories. The point distributions offered across these categories is as follows:
- Sustainable Sites: 26 Available Points
- Water Efficiency: 10 Available Points
- Energy and Atmosphere: 35 Available Points
- Materials and Resources: 14 Available Points
- Indoor Environmental Quality: 15 Available Points
- Innovation in Design Process: 6 Available Points
- Regional Priority: 4 Available Points


Projects pursuing LEED certification earn points for various green building strategies across several categories. Based on the number of points achieved through credit completion, a project earns one of four LEED rating levels: 
1. Certified (40-49 points earned)
2. Silver (50-59 points earned)
3. Gold (60-79 points earned)
4. Platinum (80+ points earned)

*Learn more* [here](https://www.usgbc.org/)

In [None]:
leed_level = meta[meta['leed_level'].notna()]
print(leed_level.shape)
leed_level.head(20)

In [None]:
leed_level['leed_level'].unique()

In [None]:
(leed_level['leed_level']=='None').sum()

In [None]:
leed_level['leed_level'] = leed_level['leed_level'].replace('None', np.nan)
leed_level['leed_level'].unique()

In [None]:
leed = leed_level[leed_level['leed_level'].notna()]
print(leed.shape)
leed.head(20)

In [None]:
pd.value_counts(leed['leed_level'], normalize=True)

In [None]:
pd.value_counts(leed['leed_level']).plot.bar(figsize=(10,5))

In [None]:
pd.value_counts(leed['primaryspaceusage']).plot.bar(figsize=(10,5))

# Exploring ratings

## In this dataset

* There are 184 locations that reported values for the ratings column
* Each of these locations are in the UK or Ireland
* There are 9 unique values in the ratings column (A-G, C1 and D1)
* The wolf site recorded the C1 and D1 values most likely because the wolf site is the Univ. College Dublin (UCD), which would use the BER scale instead of the EPC scale that the others use 
* There are 5 sites that recorded ratings data

## What is it?

The ratings column contains data on either the buiding energy rating (BER) or the energy performance certificate (EPC) depending on whether the building is in Ireland (BER) or the UK (EPC). Both BER and EPC measure and rate energy performance in a similar enough way to compare the two. However, they have slightly different formats. EPC uses a scale of A-G with A being the most energy efficient and G being the least. BER uses a scale that goes A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, E1, E2, F, G. These ratings are considered comarable because they use such a similar scale and criteria to measure energy efficiency.

In [None]:
ratings = meta[meta['rating'].notna()]
print(ratings.shape)
ratings.head(20)

In [None]:
ratings['rating'].unique()

In [None]:
ratings[ratings.rating == 'C1']

In [None]:
ratings[ratings.rating == 'D1']

In [None]:
ratings['rating'] = ratings['rating'].replace('C1', 'C')
ratings['rating'] = ratings['rating'].replace('D1', 'D')
ratings['rating'].unique()

In [None]:
pd.value_counts(ratings['rating']).plot.bar(figsize=(10,5))

In [None]:
pd.value_counts(ratings['site_id']).plot.bar(figsize=(10,5))

In [None]:
pd.value_counts(ratings['primaryspaceusage']).plot.bar(figsize=(10,5))