In [22]:
import numpy as np
import pandas as pd

We want to focus on the changes from 2014 to 2015. If Boston wants to be carbon neutral in 2050, we need to track if buildings are improving and point out which aren't (and why?)

For reference, this is the description fo the columns:

* Property Name	Property name, as reported by owner
* Reported	If the building submitted a report this year. This dataset includes all reported buildings.
* Property Type	Property type, as identified in Portfolio Manager
* Address	Reported address
* ZIP	ZIP
* Gross Area (sq ft)	Gross area
* Site EUI (kBTU/sf)	Site energy use intensity, or EUI (in kBTU/sq. ft.): This sums up all of the energy used in the building (electricity, gas, steam, etc.) each year, and divides by square footage. There are many drivers of energy use intensity, such as energy-intensive work, and, since EUI is not adjusted for these factors, it is not a definitive indicator of building efficiency. (This metric uses site energy, not source energy.)
* Energy Star Score	ENERGY STAR score: Portfolio Manager calculates a 1-100 score for many types of buildings, though not all types are eligible for a score. The score uses details about the building and its location to adjust for how the building is used, and provides some measure of how it performs relative to similar buildings. A higher score means that the building uses less energy than similar buildings.
* Energy Star Certified	ENERGY STAR certification: Buildings with a score of 75 or higher can apply to be ENERGY STAR-certified by EPA. This field lists the years in which the building is certified.
Property Uses	List of space uses reported for this property
Year Built	Year built, as reported by the owner
* GHG Emissions (MTCO2e)	Greenhouse gas emissions. Portfolio Manager uses averages for the regional electric grid and other fuels to estimate annual greenhouse gas emissions. Buildings and campuses that have central plants or buy bulk power may have different emissions than estimated.
* GHG Intensity (kgCO2/sf)	GHG intensity (kgCO2/sf) divides total GHG emissions by square footage.
*Total Site Energy (kBTU) 	Total energy used in 2014 (kBTU). This is a gross sum of all annual energy use in the building, not adjusted for size or uses.
* % Electricity	Percent of total energy that is electricity
* % Gas	Percent of total energy that is gas
* % Steam	Percent of total energy that is steam
* Water Intensity (gal/sf)	Water intensity (gal/sf); the total water use divided by gross floor area.
* Onsite Solar (kWh)	Onsite solar generated each year (kWh)
* User Submitted Info	User-submitted contextual information: This is a field in which the property owner can describe ongoing efficiency work or other related information.
* User Submitted Link	Any link submitted by the owner
* Tax Parcel	Tax parcel number reported by owner


In [4]:
berdo2015 = pd.read_csv(".//data/berdo2015.csv", encoding = "ISO-8859-1")
berdo2014 = pd.read_csv(".//data/berdo2014.csv", encoding = "ISO-8859-1")

In [12]:
berdo2014.head(2)

Unnamed: 0,Property Name,Reported,Property Type,Address,ZIP,Gross Area (sq ft),Site EUI (kBTU/sf),Energy Star Score,Energy Star Certified,Property Uses,...,GHG Intensity (kgCO2/sf),Total Site Energy (kBTU),% Electricity,% Gas,% Steam,Water Intensity (gal/sf),Onsite Solar (kWh),User Submitted Info,User Submitted Link,Tax Parcel
0,MEEI -Longwood,Yes,Ambulatory Surgical Center,800 Huntington Ave,2115,76300,173.1,Not applicable to this property type,,Ambulatory Surgical Center,...,12.7,13204950,47%,53%,0%,,,,,1000894000
1,Prime Motor Group,Yes,Automobile Dealership,1525-1607 VFW Parkway,2132,150000,28.7,Not applicable to this property type,,"Automobile Dealership, Parking",...,2.8,4301102,100%,0%,0%,8.96,,,,2010643010


In [13]:
berdo2015.head(2)

Unnamed: 0,Property Name,Reported,Property Type,Address,ZIP,Gross Area (sq ft),Site EUI (kBTU/sf),Energy Star Score,Energy Star Certified,Property Uses,...,Total Site Energy (kBTU),% Electricity,% Gas,% Steam,Water Intensity (gal/sf),Onsite Solar (kWh),User Submitted Info,User Submitted Link,Tax Parcel,Years Reported
0,#2679 South Bay/Boston,Yes,Retail Store,5 Alllstate Road,2125,132000,70.7,74,,"Parking, Retail Store",...,9331692,56%,44%,,7.6,,,,703501080,"2014, 2015, 2016"
1,0004 Roslindale,Yes,Supermarket/Grocery Store,950 American Legion Hgwy,2131,38694,245.9,49,,Supermarket/Grocery,...,9515343,61%,39%,,62.1,,,,1807323000,"2015, 2016"


Let's check sizes:

In [8]:
print(berdo2015.shape)
print(berdo2014.shape)

(1502, 23)
(1380, 22)


What is the difference in columns?

In [30]:
set(berdo2015.columns.tolist()) - set(berdo2014.columns.tolist())

{'Years Reported'}

There is no property id, so let's see if we can match up by 

* Property name
* Address

Let's see how many elements we have in common for those 2 features:

In [37]:
p15 = berdo2015["Property Name"].tolist()
p14 = berdo2014["Property Name"].tolist()
len(list(set(p15).intersection(p14)))

1094

In [44]:
a15 = berdo2015["Address"].tolist()
a14 = berdo2014["Address"].tolist()
len(list(set(a15).intersection(a14)))

1164

Let's do the join. We convert to lower case first

In [67]:
berdo2015["Address"] = berdo2015["Address"].astype(str)
berdo2014["Address"] = berdo2014["Address"].astype(str)
berdo2015["Address"] = berdo2015["Address"].map(str.lower)
berdo2014["Address"] = berdo2014["Address"].map(str.lower)
berdo2015["Property Name"] = berdo2015["Property Name"].astype(str)
berdo2014["Property Name"] = berdo2014["Property Name"].astype(str)
berdo2015["Property Name"] = berdo2015["Property Name"].map(str.lower)
berdo2014["Property Name"] = berdo2014["Property Name"].map(str.lower)

In [69]:
berdo = pd.merge(berdo2015, berdo2014,
                 how='inner',
                 on=['Address','Property Name'],
                 suffixes=('_2015', '_2014'))
berdo.shape

(1086, 43)

In [72]:
berdo.head(2)

Unnamed: 0,Property Name,Reported_2015,Property Type_2015,Address,ZIP_2015,Gross Area (sq ft)_2015,Site EUI (kBTU/sf)_2015,Energy Star Score_2015,Energy Star Certified_2015,Property Uses_2015,...,GHG Intensity (kgCO2/sf)_2014,Total Site Energy (kBTU) _2014,% Electricity_2014,% Gas_2014,% Steam_2014,Water Intensity (gal/sf)_2014,Onsite Solar (kWh)_2014,User Submitted Info_2014,User Submitted Link_2014,Tax Parcel_2014
0,#2679 south bay/boston,Yes,Retail Store,5 alllstate road,2125,132000,70.7,74,,"Parking, Retail Store",...,5.1,8418749.0,63%,37%,0%,4.88,,,,703501080
1,0004 roslindale,Yes,Supermarket/Grocery Store,950 american legion hgwy,2131,38694,245.9,49,,Supermarket/Grocery,...,20.1,9622656.0,64%,36%,0%,0.0,,,,1807323000


In [74]:
berdo.to_csv(".\\data\\berdo.csv", index=False)