<a href="https://colab.research.google.com/github/paulmcolaka/open-world/blob/master/EPC_Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TEAM: POLICY & REGULATIONS RISKS
**Members:**
> 1. DR. PEDRO BAIZ
> 2. STEPHANIE FREEMAN
> 3. PAUL McOLAKA

## EXPLORATORY DATA ANALYSIS
The purpose of this EDA is to try to understand and find insights in our data by employing summarizing and visualizing techniques. At a high level,  we shall be using maltivariate analysis. These insights may serve later in developing our machine learning models.

**Objectives:**
> 1. To give insight into a data set
> 2. Understand the underlying structure
> 3. Extract important parameters and relationships that hold between them
> 4. Test underlying assumptions

**Steps:**
> 1. Preparations
> 2. Variable identification
> 3. Understanding the dataset
> 4. Data cleaning/manipulation
> 5. Multi-variate analysis
> 6. Outlier treatment
> 7. Correlation analysis

### 1. PREPARATION

In [0]:
  # Load libraries and set parameters
import missingno as mn
import numpy as np
import pandas_profiling as pp
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams["figure.figsize"] = (10,30)
plt.style.use('ggplot')
sns.set()

In [4]:
  # Load dataset
df = pd.read_csv('certificates(1).csv')

FileNotFoundError: ignored

### 2. VARIABLE IDENTIFICATION

**Background:**

Almost 40% of the UK’s energy consumption and carbon emissions come from the way our buildings are heated and used. Even comparatively small changes in energy performance and the way a building is used will have a significant effect in reducing energy consumption.

**Definition of columns:**

**LMK_KEY:** Individual lodgement identifier. Guaranteed to be unique and can be used to identify a certificate in the downloads and the API

**ADDRESS1:** First line of the address

**ADDRESS2:** Second line of the address

**ADDRESS3:** Third line of the address

**POSTCODE:** The postcode of the property

**BUILDING_REFERENCE_NUMBER:** Unique identifier for the building

**CURRENT_ENERGY_RATING:** Current energy rating converted into a linear 'A to G' rating (where A is the most energy eficient and G is the least energy eficient)

**POTENTIAL_ENERGY_RATING:** Estimated potential energy rating converted into a linear 'A to G' rating (where A is the most energy efficient and G is the least energy eficient)

**CURRENT_ENERGY_EFFICIENCY:** Based on cost of energy, i.e. energy required for space heating, water heating and lighting in (kWh/year) multiplied by fuel costs. (£/m²/year where cost is derived from kWh)

**POTENTIAL_ENERGY_EFFICIENCY:** Summary of the potential environmental efficiency of the property feature

**PROPERTY_TYPE:** Describes the type of property such as House, Flat, Mansion, Maisonette etc. This is actually the type differerentiator for Property but only a limited number of property, types, notably Apartment and Apartment Block, have any specific characteristics and warrant their own definition

**BUILT_FORM:** The building type of the Property e.g. Detached, Semi-Detached, Terrace etc. Together with the Property Type, the Build Form produces a structured description of the property

**INSPECTION_DATE:** The date that the inspection was actually carried out by the energy assessor

**LOCAL_AUTHORITY:** Office for National Statistics (ONS) code. Local authority area in which the building is located

**CONSTITUENCY:** Office for National Statistics (ONS) code. Parliamentary constituency in which the building is located

**COUNTY:** County in which the building is located (where applicable)

**LODGEMENT_DATE:** Date lodged on the Energy Performance of Buildings Register

**TRANSACTION_TYPE:** Type of transaction that triggered EPC. For example, one of: marketed sale; nonmarketed sale; new-dwelling; rental; not sale or rental; assessment for Green Deal; following Green Deal; FIT application; none of the above; RHI application; ECO assessment. Where the reason for the assessment is unknown by the energy assessor the transaction type will be recorded as 'none of the above'. Transaction types may be changed over time

**ENVIRONMENT_IMPACT_CURRENT:** The Environmental Impact Rating. A measure of the property's current impact on the environment in terms of carbon dioxide (CO₂) emissions. The higher the rating the lower the CO₂ emissions. (CO₂ emissions in tonnes / year)

**ENVIRONMENT_IMPACT_POTENTIAL:** The potential Environmental Impact Rating. A measure of the property's potential impact on the environment in terms of carbon dioxide (CO₂) emissions after improvements have been carried out. The higher the rating the lower the CO₂ emissions. (CO₂ emissions in tonnes / year)

**ENERGY_CONSUMPTION_CURRENT:** Estimated total energy consumption for the Property in a 12 month period. Value is Kilowatt Hours per Square Metre (kWh/m²)

**ENERGY_CONSUMPTION_POTENTIAL:** Estimated potential total energy consumption for the Property in a 12 month period. Value is Kilowatt Hours per Square Metre (kWh/m²)

**CO2_EMISSIONS_CURRENT:** CO₂ emissions per year in tonnes/year

**CO2_EMISS_CURR_PER_FLOOR_AREA:** CO₂ emissions per square metre floor area per year in kg/m²

**CO2_EMISSIONS_POTENTIAL:** Estimated value in Tonnes per Year of the total CO₂ emissions produced by the Property in 12 month period

**LIGHTING_COST_CURRENT:** GBP. Current estimated annual energy costs for lighting the property

**LIGHTING_COST_POTENTIAL:** GBP. Potential estimated annual energy costs for lighting the property after improvements have been made

**HEATING_COST_CURRENT:** GBP. Current estimated annual energy costs for heating the property

**HEATING_COST_POTENTIAL:** GBP. Potential annual energy costs for lighting the property after improvements have been made

**HOT_WATER_COST_CURRENT:** GBP. Current estimated annual energy costs for hot water

**HOT_WATER_COST_POTENTIAL:** GBP. Potential estimated annual energy costs for hot water after improvements have been made

**TOTAL_FLOOR_AREA:** The total useful floor area is the total of all enclosed spaces measured to the internal face of the external walls, i.e. the gross floor area as measured in accordance with the guidance issued from time to time by the Royal Institute of Chartered Surveyors or by a body replacing that institution. (m²)

**ENERGY_TARIFF:** Type of electricity tariff for the property, e.g. single

**MAINS_GAS_FLAG:** Whether mains gas is available. Yes means that there is a gas meter or a gas-burning appliance in the dwelling. A closed-of gas pipe does not count

**FLOOR_LEVEL:** Flats and maisonettes only. Floor level relative to the lowest level of the property (0 for ground floor). If there is a basement, the basement is level 0 and the other floors are from 1 upwards

**FLAT_TOP_STOREY:** Whether the flat is on the top storey

**FLAT_STOREY_COUNT:** The number of Storeys in the Apartment Block

**MAIN_HEATING_CONTROLS:** Type of main heating controls. Includes both main heating systems if there are two

**MULTI_GLAZE_PROPORTION:** The estimated banded range (e.g. 0% - 10%) of the total glazed area of the Property that is multiple glazed

**GLAZED_TYPE:** The type of glazing. From British Fenestration Rating Council or manufacturer declaration, give as one of; single; double; triple.

**GLAZED_AREA:** Ranged estimate of the total glazed area of the Habitable Area

**EXTENSION_COUNT:** The number of extensions added to the property. Between 0 and 4

**NUMBER_HABITABLE_ROOMS:** Habitable rooms include any living room, sitting room, dining room, bedroom, study and similar; and also a non-separated conservatory. A kitchen/diner having a discrete seating area (with space for a table and four chairs) also counts as a habitable room. A non-separated conservatory adds to the habitable room count if it has an internal quality door between it and the dwelling. Excluded from the room count are any room used solely as a kitchen, utility room, bathroom, cloakroom, en-suite accommodation and similar; any hallway, stairs or landing; and also any room not having a window

**NUMBER_HEATED_ROOMS:** The number of heated rooms in the property if more than half of the habitable rooms are not heated

**LOW_ENERGY_LIGHTING:** The percentage of low energy lighting present in the property as a percentage of the total fixed lights in the property. 0% indicates that no low-energy lighting is present

**NUMBER_OPEN_FIREPLACES:** The number of Open Fireplaces in the Property. An Open Fireplace is a fireplace that still allows air to pass between the inside of the Property and the outside

**HOTWATER_DESCRIPTION:** Overall description of the property feature

**HOT_WATER_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**HOT_WATER_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**FLOOR_DESCRIPTION:** Overall description of the property feature

**FLOOR_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**FLOOR_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**WINDOWS_DESCRIPTION:** Overall description of the property feature

**WINDOWS_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certi􀃕cate shown as one to five star rating.

**WINDOWS_ENV_EFF:** WINDOWS. Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**WALLS_DESCRIPTION:** Overall description of the property feature

**WALLS_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**WALLS_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**SECONDHEAT_DESCRIPTION:** Overall description of the property feature

**SHEATING_ENERGY_EFF:** Energy e􀃞ciency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**SHEATING_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**ROOF_DESCRIPTION:** Overall description of the property feature

**ROOF_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**ROOF_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**MAINHEAT_DESCRIPTION:** Overall description of the property feature

**MAINHEAT_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**MAINHEAT_ENV_EFF:** Environmental effciency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**MAINHEATCONT_DESCRIPTION:** Overall description of the property feature

**MAINHEATC_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**MAINHEATC_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**LIGHTING_DESCRIPTION:** Overall description of property feature. Total number of fixed lighting outlets and total number of low- energy fixed lighting outlets

**LIGHTING_ENERGY_EFF:** Energy efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**LIGHTING_ENV_EFF:** Environmental efficiency rating. One of: very good; good; average; poor; very poor. On actual energy certificate shown as one to five star rating.

**MAIN_FUEL:** The type of fuel used to power the central heating e.g. Gas, Electricity

**WIND_TURBINE_COUNT:** Number of wind turbines; 0 if none.

**HEAT_LOSS_CORRIDOOR:** Flats and maisonettes only. Indicates that the flat contains a corridor through which heat is lost. Heat loss corridor, one of: no corridor; heated corridor; unheated corridor

**UNHEATED_CORRIDOR_LENGTH:** The total length of unheated corridor in the flat. Only populated if flat or maisonette contains unheated corridor. If unheated corridor, length of sheltered wall (m²). Average height of the storey in metres.

**PHOTO_SUPPLY:** Percentage of photovoltaic area as a percentage of total roof area. 0% indicates that a Photovoltaic Supply is not present in the property.

**SOLAR_WATER_HEATING_FLAG:** Indicates whether the heating in the Property is solar powered.

**ECHANICAL_VENTILATION:** Identifies the type of mechanical ventilation the property has. This is required for the RdSAP calculation.

**ADDRESS:** Field containing the concatenation of address1, address2 and address3. Note that post code is recorded separately.

**LOCAL_AUTHORITY_LABEL:** The name of the local authority area in which the building is located. This field is for additional information only and should not be relied upon: please refer to the Local Authority ONS Code.

**CONSTITUENCY_LABEL:** The name of the parliamentary constituency in which the building is located. This field is for additional information only and should not be relied upon: please refer to the constituency ONS Code.

**Identify target variable (output)**
Target variable is the CURRENT_ENERGY_RATING column

**Identify predictor variable (input)**
Predictor variable constitute all the other columns

**Format the columns to their correct data types**

In [0]:
# Format the float columns
for col in ['CURRENT_ENERGY_EFFICIENCY','ENVIRONMENT_IMPACT_CURRENT', 'ENVIRONMENT_IMPACT_POTENTIAL',
            'ENERGY_CONSUMPTION_CURRENT', 'ENERGY_CONSUMPTION_POTENTIAL', 'CO2_EMISSIONS_CURRENT',
            'CO2_EMISSIONS_POTENTIAL', 'CO2_EMISS_CURR_PER_FLOOR_AREA', 'LIGHTING_COST_CURRENT',
            'LIGHTING_COST_POTENTIAL', 'HEATING_COST_CURRENT', 'HEATING_COST_POTENTIAL',
            'HOT_WATER_COST_CURRENT', 'HOT_WATER_COST_POTENTIAL', 'TOTAL_FLOOR_AREA',
            'UNHEATED_CORRIDOR_LENGTH', 'FLOOR_HEIGHT']:
    df[col] = pd.to_numeric(df[col],errors='coerce').astype('float')

In [0]:
# Format the numeric columns
for col in ['BUILDING_REFERENCE_NUMBER', 'LMK_KEY', 'FLAT_STOREY_COUNT', 'MAIN_HEATING_CONTROLS',
            'MULTI_GLAZE_PROPORTION', 'EXTENSION_COUNT', 'NUMBER_HABITABLE_ROOMS', 'NUMBER_HEATED_ROOMS',
            'LOW_ENERGY_LIGHTING', 'NUMBER_OPEN_FIREPLACES', 'WIND_TURBINE_COUNT', 'PHOTO_SUPPLY']:
    df[col] = pd.to_numeric(df[col],errors='coerce')

In [0]:
# Format the categorical columns
for col in ['CURRENT_ENERGY_RATING', 'POTENTIAL_ENERGY_RATING', 'PROPERTY_TYPE', 'MAINS_GAS_FLAG', 
            'PROPERTY_TYPE', 'BUILT_FORM', 'TRANSACTION_TYPE', 'ENERGY_TARIFF',  'FLOOR_LEVEL',
            'FLAT_TOP_STOREY', 'GLAZED_TYPE', 'GLAZED_AREA','HOTWATER_DESCRIPTION', 'HOT_WATER_ENERGY_EFF',
            'HOT_WATER_ENV_EFF', 'FLOOR_DESCRIPTION', 'FLOOR_ENERGY_EFF', 'FLOOR_ENV_EFF',
            'WINDOWS_DESCRIPTION', 'WINDOWS_ENERGY_EFF', 'WINDOWS_ENV_EFF', 'WALLS_DESCRIPTION',
            'WALLS_ENERGY_EFF', 'WALLS_ENV_EFF', 'SECONDHEAT_DESCRIPTION', 'SHEATING_ENERGY_EFF',
            'SHEATING_ENV_EFF', 'MAINHEATCONT_DESCRIPTION', 'ROOF_DESCRIPTION', 'ROOF_ENERGY_EFF',
            'ROOF_ENV_EFF', 'MAINHEAT_DESCRIPTION', 'MAINHEATC_ENERGY_EFF', 'MAINHEATC_ENV_EFF',
            'MAINHEATCONT_DESCRIPTION', 'MAINHEATC_ENERGY_EFF', 'MAINHEATC_ENV_EFF','LIGHTING_DESCRIPTION',
            'LIGHTING_ENERGY_EFF', 'LIGHTING_ENV_EFF', 'MAIN_FUEL', 'HEAT_LOSS_CORRIDOOR', 
            'SOLAR_WATER_HEATING_FLAG', 'MECHANICAL_VENTILATION']:
    df[col] = df[col].astype('category')

In [0]:
# Format the string columns
for col in ['ADDRESS1', 'ADDRESS2', 'ADDRESS3', 'POSTCODE',  'ADDRESS', 'CONSTITUENCY_LABEL']:
    df[col] = df[col].astype('str')

### 3. UNDERSTANDING THE DATASET

In [0]:
  # Run ProfileReport funtion to get a summary statistics of the entire dataset
pp.ProfileReport(df)

0,1
Number of variables,84
Number of observations,34780
Total Missing (%),13.6%
Total size in memory,22.3 MiB
Average record size in memory,672.0 B

0,1
Numeric,22
Categorical,47
Boolean,0
Date,0
Text (Unique),1
Rejected,13
Unsupported,1

0,1
Distinct count,26469
Unique (%),76.1%
Missing (%),0.0%
Missing (n),0

0,1
"37, Kilwick Street",10
"15, Parton Street",7
"11, Findlay Grove",7
Other values (26466),34756

Value,Count,Frequency (%),Unnamed: 3
"37, Kilwick Street",10,0.0%,
"15, Parton Street",7,0.0%,
"11, Findlay Grove",7,0.0%,
"41, Chichester Close",7,0.0%,
"9, Houghton Street",7,0.0%,
"47, Grosvenor Street",7,0.0%,
"1, Millpool Close",7,0.0%,
"34, Hutton Avenue",6,0.0%,
"16, St. Anns Court",6,0.0%,
"9, Lacey Grove",6,0.0%,

0,1
Distinct count,26072
Unique (%),75.0%
Missing (%),0.0%
Missing (n),0

0,1
Flat 2,93
Flat 1,89
Flat 3,60
Other values (26069),34538

Value,Count,Frequency (%),Unnamed: 3
Flat 2,93,0.3%,
Flat 1,89,0.3%,
Flat 3,60,0.2%,
Flat 4,36,0.1%,
Flat 5,25,0.1%,
Flat 6,17,0.0%,
Flat 7,12,0.0%,
Flat,11,0.0%,
"37, Kilwick Street",10,0.0%,
Flat 10,10,0.0%,

0,1
Distinct count,306
Unique (%),0.9%
Missing (%),90.2%
Missing (n),31372

0,1
Greatham,251
Harbour Walk,222
Seaton Carew,167
Other values (302),2768
(Missing),31372

Value,Count,Frequency (%),Unnamed: 3
Greatham,251,0.7%,
Harbour Walk,222,0.6%,
Seaton Carew,167,0.5%,
Fleet Avenue,147,0.4%,
Elwick,132,0.4%,
Hart,131,0.4%,
Hartfields,118,0.3%,
Middleton Road,99,0.3%,
Stockton Road,72,0.2%,
Elwick Road,70,0.2%,

0,1
Distinct count,38
Unique (%),0.1%
Missing (%),99.3%
Missing (n),34529

0,1
York Road,48
Percy Street,31
Elwick,27
Other values (34),145
(Missing),34529

Value,Count,Frequency (%),Unnamed: 3
York Road,48,0.1%,
Percy Street,31,0.1%,
Elwick,27,0.1%,
Stockton Street,21,0.1%,
Greatham,20,0.1%,
Dalton Piercy,13,0.0%,
Hart,12,0.0%,
90 Durham Street,10,0.0%,
Seaton Carew,8,0.0%,
Elwick Road,8,0.0%,

0,1
Distinct count,26506
Unique (%),76.2%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4975900000
Minimum,239968
Maximum,9999835378
Zeros (%),0.0%

0,1
Minimum,239968
5-th percentile,480460000
Q1,2470400000
Median,4994700000
Q3,7460900000
95-th percentile,9486600000
Maximum,9999835378
Range,9999595410
Interquartile range,4990500000

0,1
Standard deviation,2884700000
Coef of variation,0.57974
Kurtosis,-1.1957
Mean,4975900000
MAD,2496000000
Skewness,-0.0016374
Sum,173060414852890
Variance,8.3215e+18
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
9517727078,10,0.0%,
409111378,7,0.0%,
4421471078,7,0.0%,
3426060078,7,0.0%,
6439836178,7,0.0%,
2497526178,7,0.0%,
3742636178,7,0.0%,
629248868,6,0.0%,
4123071568,6,0.0%,
8194987768,6,0.0%,

Value,Count,Frequency (%),Unnamed: 3
239968,1,0.0%,
782578,1,0.0%,
827578,1,0.0%,
989078,3,0.0%,
1587478,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
9998288968,1,0.0%,
9998668478,1,0.0%,
9998749178,1,0.0%,
9999782478,1,0.0%,
9999835378,3,0.0%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Mid-Terrace,13249
Semi-Detached,9030
End-Terrace,5912
Other values (4),6589

Value,Count,Frequency (%),Unnamed: 3
Mid-Terrace,13249,38.1%,
Semi-Detached,9030,26.0%,
End-Terrace,5912,17.0%,
Detached,5770,16.6%,
NO DATA!,579,1.7%,
Enclosed End-Terrace,127,0.4%,
Enclosed Mid-Terrace,113,0.3%,

0,1
Distinct count,162
Unique (%),0.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4.2939
Minimum,-6.5
Maximum,1217
Zeros (%),0.0%

0,1
Minimum,-6.5
5-th percentile,1.3
Q1,2.5
Median,3.6
Q3,5.0
95-th percentile,9.7
Maximum,1217.0
Range,1223.5
Interquartile range,2.5

0,1
Standard deviation,7.2265
Coef of variation,1.683
Kurtosis,22807
Mean,4.2939
MAD,2.0081
Skewness,136.33
Sum,149340
Variance,52.223
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
3.1,919,2.6%,
2.9,916,2.6%,
3.2,904,2.6%,
3.4,860,2.5%,
3.5,857,2.5%,
3.0,853,2.5%,
3.3,842,2.4%,
2.8,839,2.4%,
3.6,837,2.4%,
3.7,797,2.3%,

Value,Count,Frequency (%),Unnamed: 3
-6.5,1,0.0%,
-3.7,1,0.0%,
-0.4,1,0.0%,
-0.3,1,0.0%,
-0.2,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
53.0,1,0.0%,
61.0,1,0.0%,
79.0,1,0.0%,
130.0,1,0.0%,
1217.0,1,0.0%,

0,1
Correlation,0.96189

0,1
Correlation,0.98344

0,1
Constant value,E14000733

0,1
Constant value,Hartlepool

0,1
Constant value,

0,1
Distinct count,103
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,63.357
Minimum,1
Maximum,133
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,39
Q1,56
Median,65
Q3,72
95-th percentile,83
Maximum,133
Range,132
Interquartile range,16

0,1
Standard deviation,13.72
Coef of variation,0.21654
Kurtosis,2.1067
Mean,63.357
MAD,10.303
Skewness,-0.94895
Sum,2203553
Variance,188.23
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
66,1386,4.0%,
68,1344,3.9%,
67,1338,3.8%,
65,1327,3.8%,
64,1262,3.6%,
69,1240,3.6%,
70,1180,3.4%,
63,1153,3.3%,
71,1099,3.2%,
62,1061,3.1%,

Value,Count,Frequency (%),Unnamed: 3
1,75,0.2%,
2,6,0.0%,
3,17,0.0%,
4,16,0.0%,
5,16,0.0%,

Value,Count,Frequency (%),Unnamed: 3
99,9,0.0%,
101,2,0.0%,
105,1,0.0%,
106,1,0.0%,
133,1,0.0%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
D,14795
C,9356
E,5713
Other values (4),4916

Value,Count,Frequency (%),Unnamed: 3
D,14795,42.5%,
C,9356,26.9%,
E,5713,16.4%,
B,3196,9.2%,
F,1262,3.6%,
G,387,1.1%,
A,71,0.2%,

0,1
Distinct count,880
Unique (%),2.5%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,272.65
Minimum,-152
Maximum,9804
Zeros (%),0.0%

0,1
Minimum,-152
5-th percentile,104
Q1,197
Median,255
Q3,332
95-th percentile,476
Maximum,9804
Range,9956
Interquartile range,135

0,1
Standard deviation,131.27
Coef of variation,0.48146
Kurtosis,802.96
Mean,272.65
MAD,89.025
Skewness,12.186
Sum,9482761
Variance,17232
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
232,187,0.5%,
248,177,0.5%,
251,175,0.5%,
250,167,0.5%,
217,167,0.5%,
233,166,0.5%,
224,165,0.5%,
237,164,0.5%,
240,163,0.5%,
239,162,0.5%,

Value,Count,Frequency (%),Unnamed: 3
-152,1,0.0%,
-41,1,0.0%,
-33,1,0.0%,
-20,1,0.0%,
-17,2,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1402,1,0.0%,
1481,1,0.0%,
1631,1,0.0%,
1811,1,0.0%,
9804,1,0.0%,

0,1
Distinct count,696
Unique (%),2.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,161.84
Minimum,-173
Maximum,6040
Zeros (%),0.0%

0,1
Minimum,-173
5-th percentile,57
Q1,104
Median,143
Q3,203
95-th percentile,322
Maximum,6040
Range,6213
Interquartile range,99

0,1
Standard deviation,94.625
Coef of variation,0.58468
Kurtosis,435.35
Mean,161.84
MAD,65.289
Skewness,8.4986
Sum,5628800
Variance,8953.9
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
105.0,292,0.8%,
92.0,268,0.8%,
115.0,258,0.7%,
119.0,257,0.7%,
116.0,252,0.7%,
113.0,251,0.7%,
103.0,249,0.7%,
134.0,248,0.7%,
130.0,245,0.7%,
123.0,240,0.7%,

Value,Count,Frequency (%),Unnamed: 3
-173.0,3,0.0%,
-172.0,1,0.0%,
-161.0,1,0.0%,
-160.0,1,0.0%,
-152.0,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
1096.0,1,0.0%,
1207.0,1,0.0%,
1305.0,1,0.0%,
1342.0,1,0.0%,
6040.0,1,0.0%,

0,1
Distinct count,9
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Single,29453
standard tariff,2849
dual,1589
Other values (6),889

Value,Count,Frequency (%),Unnamed: 3
Single,29453,84.7%,
standard tariff,2849,8.2%,
dual,1589,4.6%,
Unknown,737,2.1%,
off-peak 7 hour,55,0.2%,
24 hour,53,0.2%,
NO DATA!,35,0.1%,
dual (24 hour),8,0.0%,
off-peak 10 hour,1,0.0%,

0,1
Correlation,0.95209

0,1
Correlation,0.92559

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),8.6%
Missing (n),2993
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.44707
Minimum,0
Maximum,4
Zeros (%),59.5%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,1
95-th percentile,2
Maximum,4
Range,4
Interquartile range,1

0,1
Standard deviation,0.68855
Coef of variation,1.5401
Kurtosis,2.2065
Mean,0.44707
MAD,0.58213
Skewness,1.5431
Sum,14211
Variance,0.4741
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,20695,59.5%,
1.0,8411,24.2%,
2.0,2292,6.6%,
3.0,340,1.0%,
4.0,49,0.1%,
(Missing),2993,8.6%,

Value,Count,Frequency (%),Unnamed: 3
0.0,20695,59.5%,
1.0,8411,24.2%,
2.0,2292,6.6%,
3.0,340,1.0%,
4.0,49,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,20695,59.5%,
1.0,8411,24.2%,
2.0,2292,6.6%,
3.0,340,1.0%,
4.0,49,0.1%,

0,1
Distinct count,8
Unique (%),0.0%
Missing (%),94.6%
Missing (n),32902
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.7636
Minimum,1
Maximum,13
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,2
Median,3
Q3,3
95-th percentile,4
Maximum,13
Range,12
Interquartile range,1

0,1
Standard deviation,0.85901
Coef of variation,0.31083
Kurtosis,10.289
Mean,2.7636
MAD,0.70216
Skewness,1.5542
Sum,5190
Variance,0.73789
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
2.0,845,2.4%,
3.0,666,1.9%,
4.0,313,0.9%,
5.0,41,0.1%,
1.0,8,0.0%,
6.0,4,0.0%,
13.0,1,0.0%,
(Missing),32902,94.6%,

Value,Count,Frequency (%),Unnamed: 3
1.0,8,0.0%,
2.0,845,2.4%,
3.0,666,1.9%,
4.0,313,0.9%,
5.0,41,0.1%,

Value,Count,Frequency (%),Unnamed: 3
3.0,666,1.9%,
4.0,313,0.9%,
5.0,41,0.1%,
6.0,4,0.0%,
13.0,1,0.0%,

0,1
Distinct count,3
Unique (%),0.0%
Missing (%),83.5%
Missing (n),29040

0,1
N,3310
Y,2430
(Missing),29040

Value,Count,Frequency (%),Unnamed: 3
N,3310,9.5%,
Y,2430,7.0%,
(Missing),29040,83.5%,

0,1
Distinct count,102
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
"Suspended, no insulation (assumed)",13241
"Solid, no insulation (assumed)",12148
(other premises below),2917
Other values (99),6474

Value,Count,Frequency (%),Unnamed: 3
"Suspended, no insulation (assumed)",13241,38.1%,
"Solid, no insulation (assumed)",12148,34.9%,
(other premises below),2917,8.4%,
"Solid, limited insulation (assumed)",1233,3.5%,
"Solid, insulated (assumed)",983,2.8%,
(another dwelling below),869,2.5%,
Average thermal transmittance 0.21 W/m²K,251,0.7%,
"Suspended, limited insulation (assumed)",196,0.6%,
Average thermal transmittance 0.22 W/m²K,174,0.5%,
Average thermal transmittance 0.19 W/m?K,153,0.4%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),61.8%
Missing (n),21503

0,1
NO DATA!,10584
Very Good,1875
Good,740
Other values (3),78
(Missing),21503

Value,Count,Frequency (%),Unnamed: 3
NO DATA!,10584,30.4%,
Very Good,1875,5.4%,
Good,740,2.1%,
Poor,52,0.1%,
Average,21,0.1%,
Very Poor,5,0.0%,
(Missing),21503,61.8%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),92.3%
Missing (n),32087

0,1
Very Good,1875
Good,740
Poor,52
Other values (2),26
(Missing),32087

Value,Count,Frequency (%),Unnamed: 3
Very Good,1875,5.4%,
Good,740,2.1%,
Poor,52,0.1%,
Average,21,0.1%,
Very Poor,5,0.0%,
(Missing),32087,92.3%,

0,1
Distinct count,408
Unique (%),1.2%
Missing (%),61.3%
Missing (n),21314
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,2.496
Minimum,1.69
Maximum,24
Zeros (%),0.0%

0,1
Minimum,1.69
5-th percentile,2.3
Q1,2.39
Median,2.4
Q3,2.55
95-th percentile,2.95
Maximum,24.0
Range,22.31
Interquartile range,0.16

0,1
Standard deviation,0.36435
Coef of variation,0.14597
Kurtosis,1909.1
Mean,2.496
MAD,0.15946
Skewness,35.489
Sum,33612
Variance,0.13275
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
2.4,3666,10.5%,
2.3,1069,3.1%,
2.5,744,2.1%,
2.6,634,1.8%,
2.35,360,1.0%,
2.9,354,1.0%,
2.46,291,0.8%,
2.32,260,0.7%,
2.39,242,0.7%,
2.7,238,0.7%,

Value,Count,Frequency (%),Unnamed: 3
1.69,1,0.0%,
1.77,1,0.0%,
1.8,1,0.0%,
1.87,1,0.0%,
1.95,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
5.02,1,0.0%,
10.89,1,0.0%,
18.0,1,0.0%,
23.0,1,0.0%,
24.0,1,0.0%,

0,1
Distinct count,12
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
NODATA!,20503
NO DATA!,8126
1st,2470
Other values (9),3681

Value,Count,Frequency (%),Unnamed: 3
NODATA!,20503,59.0%,
NO DATA!,8126,23.4%,
1st,2470,7.1%,
Ground,2087,6.0%,
2nd,847,2.4%,
3rd,275,0.8%,
mid floor,168,0.5%,
top floor,127,0.4%,
ground floor,116,0.3%,
4th,41,0.1%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
Normal,31323
NO DATA!,2993
More Than Typical,354
Other values (3),110

Value,Count,Frequency (%),Unnamed: 3
Normal,31323,90.1%,
NO DATA!,2993,8.6%,
More Than Typical,354,1.0%,
Less Than Typical,55,0.2%,
Much More Than Typical,49,0.1%,
Much Less Than Typical,6,0.0%,

0,1
Distinct count,10
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
double glazing installed before 2002,14737
double glazing installed during or after 2002,9022
"double glazing, unknown install date",6724
Other values (7),4297

Value,Count,Frequency (%),Unnamed: 3
double glazing installed before 2002,14737,42.4%,
double glazing installed during or after 2002,9022,25.9%,
"double glazing, unknown install date",6724,19.3%,
NO DATA!,2993,8.6%,
not defined,847,2.4%,
single glazing,195,0.6%,
INVALID!,121,0.3%,
"double, known data",88,0.3%,
secondary glazing,41,0.1%,
triple glazing,12,0.0%,

0,1
Distinct count,2719
Unique (%),7.8%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,731.82
Minimum,1.359
Maximum,24512
Zeros (%),0.0%

0,1
Minimum,1.359
5-th percentile,226.0
Q1,408.0
Median,599.0
Q3,854.0
95-th percentile,1723.1
Maximum,24512.0
Range,24511.0
Interquartile range,446.0

0,1
Standard deviation,577.16
Coef of variation,0.78867
Kurtosis,109.06
Mean,731.82
MAD,355.03
Skewness,5.4599
Sum,25453000
Variance,333120
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
272.0,69,0.2%,
591.0,67,0.2%,
618.0,65,0.2%,
499.0,63,0.2%,
487.0,62,0.2%,
524.0,62,0.2%,
536.0,62,0.2%,
535.0,60,0.2%,
504.0,60,0.2%,
651.0,60,0.2%,

Value,Count,Frequency (%),Unnamed: 3
1.359,1,0.0%,
77.0,1,0.0%,
81.0,5,0.0%,
86.0,2,0.0%,
88.0,2,0.0%,

Value,Count,Frequency (%),Unnamed: 3
7502.0,1,0.0%,
8743.0,1,0.0%,
9238.0,1,0.0%,
14596.0,1,0.0%,
24512.0,1,0.0%,

0,1
Distinct count,1776
Unique (%),5.1%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,538.9
Minimum,1.385
Maximum,21869
Zeros (%),0.0%

0,1
Minimum,1.385
5-th percentile,216.0
Q1,356.0
Median,478.0
Q3,629.0
95-th percentile,1073.0
Maximum,21869.0
Range,21868.0
Interquartile range,273.0

0,1
Standard deviation,334.39
Coef of variation,0.62052
Kurtosis,531.28
Mean,538.9
MAD,203.4
Skewness,11.152
Sum,18743000
Variance,111820
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
484.0,95,0.3%,
464.0,94,0.3%,
398.0,92,0.3%,
466.0,92,0.3%,
428.0,92,0.3%,
426.0,89,0.3%,
460.0,88,0.3%,
473.0,88,0.3%,
429.0,88,0.3%,
433.0,88,0.3%,

Value,Count,Frequency (%),Unnamed: 3
1.385,1,0.0%,
76.0,1,0.0%,
79.0,2,0.0%,
81.0,3,0.0%,
82.0,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
4966.0,1,0.0%,
5181.0,1,0.0%,
5411.0,1,0.0%,
11982.0,1,0.0%,
21869.0,1,0.0%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
NO DATA!,29040
unheated corridor,2099
no corridor,1869

Value,Count,Frequency (%),Unnamed: 3
NO DATA!,29040,83.5%,
unheated corridor,2099,6.0%,
no corridor,1869,5.4%,
heated corridor,1772,5.1%,

0,1
Distinct count,29
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
From main system,28947
"From main system, no cylinder thermostat",2145
"Electric immersion, off-peak",1508
Other values (26),2180

Value,Count,Frequency (%),Unnamed: 3
From main system,28947,83.2%,
"From main system, no cylinder thermostat",2145,6.2%,
"Electric immersion, off-peak",1508,4.3%,
Community scheme,417,1.2%,
"From main system, no cylinderstat",333,1.0%,
No system present: electric immersion assumed,302,0.9%,
Gas multipoint,298,0.9%,
"Electric immersion, standard tariff",288,0.8%,
From secondary system,115,0.3%,
"From main system, plus solar",99,0.3%,

0,1
Distinct count,559
Unique (%),1.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,124.93
Minimum,0
Maximum,921
Zeros (%),0.0%

0,1
Minimum,0
5-th percentile,73
Q1,91
Median,107
Q3,131
95-th percentile,251
Maximum,921
Range,921
Interquartile range,40

0,1
Standard deviation,66.375
Coef of variation,0.5313
Kurtosis,17.463
Mean,124.93
MAD,40.007
Skewness,3.5065
Sum,4345000
Variance,4405.7
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
91.0,689,2.0%,
103.0,623,1.8%,
107.0,615,1.8%,
100.0,589,1.7%,
94.0,583,1.7%,
90.0,582,1.7%,
108.0,570,1.6%,
105.0,569,1.6%,
96.0,567,1.6%,
109.0,559,1.6%,

Value,Count,Frequency (%),Unnamed: 3
0.0,5,0.0%,
26.0,1,0.0%,
29.0,4,0.0%,
30.0,4,0.0%,
31.0,3,0.0%,

Value,Count,Frequency (%),Unnamed: 3
807.0,1,0.0%,
838.0,1,0.0%,
855.0,1,0.0%,
890.0,1,0.0%,
921.0,1,0.0%,

0,1
Distinct count,350
Unique (%),1.0%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,86.181
Minimum,0
Maximum,676
Zeros (%),0.0%

0,1
Minimum,0
5-th percentile,53
Q1,68
Median,78
Q3,98
95-th percentile,134
Maximum,676
Range,676
Interquartile range,30

0,1
Standard deviation,33.207
Coef of variation,0.38532
Kurtosis,36.211
Mean,86.181
MAD,21.285
Skewness,4.2236
Sum,2997400
Variance,1102.7
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
73.0,973,2.8%,
76.0,935,2.7%,
72.0,907,2.6%,
74.0,904,2.6%,
75.0,869,2.5%,
69.0,842,2.4%,
77.0,840,2.4%,
70.0,808,2.3%,
78.0,783,2.3%,
71.0,777,2.2%,

Value,Count,Frequency (%),Unnamed: 3
0.0,5,0.0%,
26.0,1,0.0%,
29.0,4,0.0%,
30.0,4,0.0%,
31.0,4,0.0%,

Value,Count,Frequency (%),Unnamed: 3
524.0,1,0.0%,
531.0,1,0.0%,
586.0,1,0.0%,
619.0,1,0.0%,
676.0,1,0.0%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.3%
Missing (n),93

0,1
Good,25558
Average,4256
Very Good,2045
Other values (2),2828

Value,Count,Frequency (%),Unnamed: 3
Good,25558,73.5%,
Average,4256,12.2%,
Very Good,2045,5.9%,
Poor,1879,5.4%,
Very Poor,949,2.7%,
(Missing),93,0.3%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.3%
Missing (n),93

0,1
Good,25930
Average,2738
Poor,2542
Other values (2),3477

Value,Count,Frequency (%),Unnamed: 3
Good,25930,74.6%,
Average,2738,7.9%,
Poor,2542,7.3%,
Very Good,2507,7.2%,
Very Poor,970,2.8%,
(Missing),93,0.3%,

0,1
Distinct count,3196
Unique (%),9.2%
Missing (%),0.0%
Missing (n),0

0,1
2016-11-19,1122
2013-11-08,668
2016-06-10,85
Other values (3193),32905

Value,Count,Frequency (%),Unnamed: 3
2016-11-19,1122,3.2%,
2013-11-08,668,1.9%,
2016-06-10,85,0.2%,
2010-09-30,76,0.2%,
2011-08-23,65,0.2%,
2014-03-05,64,0.2%,
2009-10-22,58,0.2%,
2016-02-02,57,0.2%,
2014-11-24,54,0.2%,
2013-06-18,54,0.2%,

0,1
Distinct count,309
Unique (%),0.9%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,74.535
Minimum,1
Maximum,1773
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,32
Q1,51
Median,68
Q3,91
95-th percentile,135
Maximum,1773
Range,1772
Interquartile range,40

0,1
Standard deviation,36.513
Coef of variation,0.48987
Kurtosis,159.77
Mean,74.535
MAD,25.505
Skewness,5.4497
Sum,2592300
Variance,1333.2
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
55.0,548,1.6%,
48.0,534,1.5%,
56.0,530,1.5%,
49.0,526,1.5%,
58.0,526,1.5%,
54.0,522,1.5%,
52.0,507,1.5%,
65.0,504,1.4%,
68.0,503,1.4%,
53.0,501,1.4%,

Value,Count,Frequency (%),Unnamed: 3
1.0,1,0.0%,
4.0,1,0.0%,
4.0280000000000005,1,0.0%,
6.0,1,0.0%,
8.0,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
622.0,1,0.0%,
660.0,1,0.0%,
847.0,1,0.0%,
877.0,1,0.0%,
1773.0,1,0.0%,

0,1
Distinct count,210
Unique (%),0.6%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,51.415
Minimum,1
Maximum,1364
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,27
Q1,39
Median,49
Q3,60
95-th percentile,85
Maximum,1364
Range,1363
Interquartile range,21

0,1
Standard deviation,21.352
Coef of variation,0.41529
Kurtosis,483.4
Mean,51.415
MAD,13.773
Skewness,10.276
Sum,1788200
Variance,455.91
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
44.0,1032,3.0%,
49.0,1024,2.9%,
50.0,987,2.8%,
43.0,982,2.8%,
48.0,953,2.7%,
47.0,940,2.7%,
45.0,933,2.7%,
51.0,920,2.6%,
40.0,862,2.5%,
52.0,860,2.5%,

Value,Count,Frequency (%),Unnamed: 3
1.0,1,0.0%,
2.6860000000000004,1,0.0%,
4.0,1,0.0%,
5.0,1,0.0%,
6.0,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
353.0,2,0.0%,
366.0,1,0.0%,
516.0,1,0.0%,
847.0,1,0.0%,
1364.0,1,0.0%,

0,1
Distinct count,107
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0

0,1
Low energy lighting in all fixed outlets,7215
No low energy lighting,5371
Low energy lighting in 50% of fixed outlets,1909
Other values (104),20285

Value,Count,Frequency (%),Unnamed: 3
Low energy lighting in all fixed outlets,7215,20.7%,
No low energy lighting,5371,15.4%,
Low energy lighting in 50% of fixed outlets,1909,5.5%,
Low energy lighting in 25% of fixed outlets,1010,2.9%,
Low energy lighting in 78% of fixed outlets,984,2.8%,
Low energy lighting in 33% of fixed outlets,913,2.6%,
Low energy lighting in 67% of fixed outlets,830,2.4%,
Low energy lighting in 75% of fixed outlets,810,2.3%,
Low energy lighting in 40% of fixed outlets,776,2.2%,
Low energy lighting in 80% of fixed outlets,666,1.9%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.1%
Missing (n),41

0,1
Very Good,12917
Good,6312
Very Poor,5962
Other values (2),9548

Value,Count,Frequency (%),Unnamed: 3
Very Good,12917,37.1%,
Good,6312,18.1%,
Very Poor,5962,17.1%,
Average,5956,17.1%,
Poor,3592,10.3%,
(Missing),41,0.1%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.1%
Missing (n),30

0,1
Very Good,12923
Good,6314
Very Poor,5962
Other values (2),9551

Value,Count,Frequency (%),Unnamed: 3
Very Good,12923,37.2%,
Good,6314,18.2%,
Very Poor,5962,17.1%,
Average,5958,17.1%,
Poor,3593,10.3%,
(Missing),30,0.1%,

First 3 values
878566339262013043009234014518687
983275922712013080212121991070416
597607982432014120215065490068497

Last 3 values
398133259722013110609284459498387
986746258732013080812160270078407
1604442709832018013111012312778902

Value,Count,Frequency (%),Unnamed: 3
1000001339062013090314203743978857,1,0.0%,
1000001439442013090210331115379048,1,0.0%,
1000037139222014022513403493648854,1,0.0%,
1000037139962013111121225393298657,1,0.0%,
1000037139962014021915381693678954,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
999879042952014030600504393940316,1,0.0%,
999879082552014022414530793940316,1,0.0%,
999879092952014022218112893940316,1,0.0%,
999985502152013090215454593270511,1,0.0%,
999985539142014102311401914342078,1,0.0%,

0,1
Constant value,E06000001

0,1
Constant value,Hartlepool

0,1
Distinct count,3409
Unique (%),9.8%
Missing (%),0.0%
Missing (n),0

0,1
2016-11-19,1108
2013-11-08,662
2016-06-10,89
Other values (3406),32921

Value,Count,Frequency (%),Unnamed: 3
2016-11-19,1108,3.2%,
2013-11-08,662,1.9%,
2016-06-10,89,0.3%,
2010-09-30,74,0.2%,
2011-08-23,67,0.2%,
2014-03-24,60,0.2%,
2014-11-24,57,0.2%,
2014-09-29,55,0.2%,
2016-02-02,54,0.2%,
2014-03-21,53,0.2%,

0,1
Distinct count,100
Unique (%),0.3%
Missing (%),4.4%
Missing (n),1527
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,49.017
Minimum,0
Maximum,100
Zeros (%),15.5%

0,1
Minimum,0
5-th percentile,0
Q1,15
Median,50
Q3,80
95-th percentile,100
Maximum,100
Range,100
Interquartile range,65

0,1
Standard deviation,35.652
Coef of variation,0.72734
Kurtosis,-1.3726
Mean,49.017
MAD,31.328
Skewness,0.061592
Sum,1630000
Variance,1271.1
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
100.0,6062,17.4%,
0.0,5387,15.5%,
50.0,1806,5.2%,
78.0,982,2.8%,
33.0,864,2.5%,
25.0,823,2.4%,
67.0,758,2.2%,
40.0,732,2.1%,
75.0,701,2.0%,
80.0,656,1.9%,

Value,Count,Frequency (%),Unnamed: 3
0.0,5387,15.5%,
1.0,8,0.0%,
2.0,10,0.0%,
3.0,89,0.3%,
4.0,136,0.4%,

Value,Count,Frequency (%),Unnamed: 3
95.0,21,0.1%,
96.0,8,0.0%,
97.0,4,0.0%,
99.0,2,0.0%,
100.0,6062,17.4%,

0,1
Distinct count,41
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
"Programmer, room thermostat and TRVs",15022
"Programmer, no room thermostat",6049
Programmer and room thermostat,5277
Other values (38),8432

Value,Count,Frequency (%),Unnamed: 3
"Programmer, room thermostat and TRVs",15022,43.2%,
"Programmer, no room thermostat",6049,17.4%,
Programmer and room thermostat,5277,15.2%,
"Programmer, TRVs and bypass",2225,6.4%,
Time and temperature zone control,1334,3.8%,
Manual charge control,1123,3.2%,
No time or thermostatic control of room temperature,876,2.5%,
Automatic charge control,358,1.0%,
Room thermostat only,320,0.9%,
,260,0.7%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.2%
Missing (n),79

0,1
Good,14388
Average,8686
Very Poor,7327
Other values (2),4300

Value,Count,Frequency (%),Unnamed: 3
Good,14388,41.4%,
Average,8686,25.0%,
Very Poor,7327,21.1%,
Poor,2939,8.5%,
Very Good,1361,3.9%,
(Missing),79,0.2%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.2%
Missing (n),79

0,1
Good,14388
Average,8686
Very Poor,7327
Other values (2),4300

Value,Count,Frequency (%),Unnamed: 3
Good,14388,41.4%,
Average,8686,25.0%,
Very Poor,7327,21.1%,
Poor,2939,8.5%,
Very Good,1361,3.9%,
(Missing),79,0.2%,

0,1
Distinct count,42
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
"Boiler and radiators, mains gas",31024
Electric storage heaters,1489
Community scheme,542
Other values (39),1725

Value,Count,Frequency (%),Unnamed: 3
"Boiler and radiators, mains gas",31024,89.2%,
Electric storage heaters,1489,4.3%,
Community scheme,542,1.6%,
"Room heaters, electric",284,0.8%,
No system present: electric heaters assumed,258,0.7%,
Community scheme with CHP,244,0.7%,
"Room heaters, mains gas",171,0.5%,
"Warm air, mains gas",115,0.3%,
"Boiler and underfloor heating, mains gas",101,0.3%,
SAP05:Main-Heating,79,0.2%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.3%
Missing (n),93

0,1
Good,29109
Average,2426
Very Good,1898
Other values (2),1254

Value,Count,Frequency (%),Unnamed: 3
Good,29109,83.7%,
Average,2426,7.0%,
Very Good,1898,5.5%,
Very Poor,633,1.8%,
Poor,621,1.8%,
(Missing),93,0.3%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.3%
Missing (n),93

0,1
Good,29309
Very Good,2485
Very Poor,1649
Other values (2),1244

Value,Count,Frequency (%),Unnamed: 3
Good,29309,84.3%,
Very Good,2485,7.1%,
Very Poor,1649,4.7%,
Average,677,1.9%,
Poor,567,1.6%,
(Missing),93,0.3%,

0,1
Distinct count,3
Unique (%),0.0%
Missing (%),8.8%
Missing (n),3047

0,1
Y,29931
N,1802
(Missing),3047

Value,Count,Frequency (%),Unnamed: 3
Y,29931,86.1%,
N,1802,5.2%,
(Missing),3047,8.8%,

0,1
Distinct count,23
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
mains gas (not community),22463
mains gas - this is for backwards compatibility only and should not be used,9531
electricity (not community),1212
Other values (20),1574

Value,Count,Frequency (%),Unnamed: 3
mains gas (not community),22463,64.6%,
mains gas - this is for backwards compatibility only and should not be used,9531,27.4%,
electricity (not community),1212,3.5%,
electricity - this is for backwards compatibility only and should not be used,603,1.7%,
mains gas (community),333,1.0%,
To be used only when there is no heating/hot-water system,239,0.7%,
"Electricity: electricity, unspecified tariff",115,0.3%,
NO DATA!,111,0.3%,
oil (not community),57,0.2%,
LPG (not community),35,0.1%,

Unsupported value

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
natural,27957
NO DATA!,2993
"mechanical, extract only",2562

Value,Count,Frequency (%),Unnamed: 3
natural,27957,80.4%,
NO DATA!,2993,8.6%,
"mechanical, extract only",2562,7.4%,
"mechanical, supply and extract",1268,3.6%,

0,1
Distinct count,91
Unique (%),0.3%
Missing (%),8.8%
Missing (n),3067
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,94.188
Minimum,0
Maximum,100
Zeros (%),3.3%

0,1
Minimum,0
5-th percentile,44
Q1,100
Median,100
Q3,100
95-th percentile,100
Maximum,100
Range,100
Interquartile range,0

0,1
Standard deviation,21.111
Coef of variation,0.22414
Kurtosis,13.418
Mean,94.188
MAD,10.464
Skewness,-3.8261
Sum,2987000
Variance,445.68
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
100.0,28510,82.0%,
0.0,1146,3.3%,
90.0,260,0.7%,
95.0,220,0.6%,
80.0,167,0.5%,
50.0,161,0.5%,
75.0,105,0.3%,
85.0,88,0.3%,
60.0,87,0.3%,
70.0,71,0.2%,

Value,Count,Frequency (%),Unnamed: 3
0.0,1146,3.3%,
1.0,1,0.0%,
5.0,22,0.1%,
6.0,1,0.0%,
7.0,2,0.0%,

Value,Count,Frequency (%),Unnamed: 3
95.0,220,0.6%,
97.0,3,0.0%,
98.0,5,0.0%,
99.0,3,0.0%,
100.0,28510,82.0%,

0,1
Distinct count,21
Unique (%),0.1%
Missing (%),8.6%
Missing (n),2993
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,4.3629
Minimum,1
Maximum,41
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,2
Q1,3
Median,4
Q3,5
95-th percentile,7
Maximum,41
Range,40
Interquartile range,2

0,1
Standard deviation,1.6752
Coef of variation,0.38396
Kurtosis,11.139
Mean,4.3629
MAD,1.2562
Skewness,1.504
Sum,138680
Variance,2.8063
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
4.0,8315,23.9%,
5.0,8274,23.8%,
3.0,6256,18.0%,
2.0,3181,9.1%,
6.0,2707,7.8%,
7.0,1352,3.9%,
8.0,722,2.1%,
9.0,341,1.0%,
1.0,269,0.8%,
10.0,205,0.6%,

Value,Count,Frequency (%),Unnamed: 3
1.0,269,0.8%,
2.0,3181,9.1%,
3.0,6256,18.0%,
4.0,8315,23.9%,
5.0,8274,23.8%,

Value,Count,Frequency (%),Unnamed: 3
17.0,3,0.0%,
19.0,1,0.0%,
20.0,2,0.0%,
21.0,2,0.0%,
41.0,1,0.0%,

0,1
Correlation,0.9245

0,1
Distinct count,11
Unique (%),0.0%
Missing (%),2.8%
Missing (n),972
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.11231
Minimum,0
Maximum,9
Zeros (%),89.1%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,1
Maximum,9
Range,9
Interquartile range,0

0,1
Standard deviation,0.45205
Coef of variation,4.025
Kurtosis,73.607
Mean,0.11231
MAD,0.20589
Skewness,6.8608
Sum,3797
Variance,0.20435
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,30989,89.1%,
1.0,2225,6.4%,
2.0,412,1.2%,
3.0,87,0.3%,
4.0,42,0.1%,
5.0,27,0.1%,
6.0,10,0.0%,
7.0,8,0.0%,
9.0,4,0.0%,
8.0,4,0.0%,

Value,Count,Frequency (%),Unnamed: 3
0.0,30989,89.1%,
1.0,2225,6.4%,
2.0,412,1.2%,
3.0,87,0.3%,
4.0,42,0.1%,

Value,Count,Frequency (%),Unnamed: 3
5.0,27,0.1%,
6.0,10,0.0%,
7.0,8,0.0%,
8.0,4,0.0%,
9.0,4,0.0%,

0,1
Distinct count,17
Unique (%),0.0%
Missing (%),39.4%
Missing (n),13716
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,0.41326
Minimum,0
Maximum,80
Zeros (%),60.0%

0,1
Minimum,0
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,80
Range,80
Interquartile range,0

0,1
Standard deviation,4.3429
Coef of variation,10.509
Kurtosis,121.73
Mean,0.41326
MAD,0.81876
Skewness,10.855
Sum,8705
Variance,18.861
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,20866,60.0%,
45.0,136,0.4%,
40.0,17,0.0%,
50.0,13,0.0%,
20.0,9,0.0%,
30.0,4,0.0%,
80.0,4,0.0%,
47.0,3,0.0%,
60.0,3,0.0%,
25.0,3,0.0%,

Value,Count,Frequency (%),Unnamed: 3
0.0,20866,60.0%,
3.0,1,0.0%,
15.0,1,0.0%,
20.0,9,0.0%,
25.0,3,0.0%,

Value,Count,Frequency (%),Unnamed: 3
50.0,13,0.0%,
60.0,3,0.0%,
70.0,1,0.0%,
75.0,1,0.0%,
80.0,4,0.0%,

0,1
Distinct count,2244
Unique (%),6.5%
Missing (%),0.0%
Missing (n),0

0,1
TS26 0US,143
TS24 7HY,113
TS26 8LT,112
Other values (2241),34412

Value,Count,Frequency (%),Unnamed: 3
TS26 0US,143,0.4%,
TS24 7HY,113,0.3%,
TS26 8LT,112,0.3%,
TS24 0XH,112,0.3%,
TS26 9HL,104,0.3%,
TS24 9ND,93,0.3%,
TS26 8PY,89,0.3%,
TS24 9JB,88,0.3%,
TS25 4BF,87,0.3%,
TS25 3DN,86,0.2%,

0,1
Distinct count,116
Unique (%),0.3%
Missing (%),0.0%
Missing (n),0
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,78.368
Minimum,1
Maximum,133
Zeros (%),0.0%

0,1
Minimum,1
5-th percentile,62
Q1,74
Median,80
Q3,85
95-th percentile,90
Maximum,133
Range,132
Interquartile range,11

0,1
Standard deviation,9.577
Coef of variation,0.12221
Kurtosis,7.259
Mean,78.368
MAD,6.9895
Skewness,-1.5436
Sum,2725637
Variance,91.719
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
83,1991,5.7%,
84,1902,5.5%,
82,1856,5.3%,
85,1842,5.3%,
81,1694,4.9%,
86,1652,4.7%,
79,1631,4.7%,
80,1625,4.7%,
78,1552,4.5%,
77,1487,4.3%,

Value,Count,Frequency (%),Unnamed: 3
1,16,0.0%,
2,1,0.0%,
3,4,0.0%,
5,1,0.0%,
6,2,0.0%,

Value,Count,Frequency (%),Unnamed: 3
115,2,0.0%,
117,4,0.0%,
120,2,0.0%,
122,4,0.0%,
133,1,0.0%,

0,1
Distinct count,7
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
B,14724
C,14432
D,3498
Other values (4),2126

Value,Count,Frequency (%),Unnamed: 3
B,14724,42.3%,
C,14432,41.5%,
D,3498,10.1%,
A,1458,4.2%,
E,510,1.5%,
F,91,0.3%,
G,67,0.2%,

0,1
Distinct count,5
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
House,25798
Flat,5850
Bungalow,2948
Other values (2),184

Value,Count,Frequency (%),Unnamed: 3
House,25798,74.2%,
Flat,5850,16.8%,
Bungalow,2948,8.5%,
Maisonette,180,0.5%,
Park home,4,0.0%,

0,1
Distinct count,112
Unique (%),0.3%
Missing (%),0.0%
Missing (n),13

0,1
"Pitched, 250 mm loft insulation",4972
"Pitched, 200 mm loft insulation",4088
"Pitched, no insulation (assumed)",3964
Other values (108),21743

Value,Count,Frequency (%),Unnamed: 3
"Pitched, 250 mm loft insulation",4972,14.3%,
"Pitched, 200 mm loft insulation",4088,11.8%,
"Pitched, no insulation (assumed)",3964,11.4%,
(another dwelling above),3191,9.2%,
"Pitched, 150 mm loft insulation",2352,6.8%,
"Pitched, 100 mm loft insulation",2306,6.6%,
"Pitched, 300+ mm loft insulation",1776,5.1%,
"Pitched, 300 mm loft insulation",1141,3.3%,
"Pitched, 50 mm loft insulation",935,2.7%,
"Pitched, no insulation",878,2.5%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),9.9%
Missing (n),3451

0,1
Good,14581
Very Poor,6244
Very Good,5357
Other values (2),5147
(Missing),3451

Value,Count,Frequency (%),Unnamed: 3
Good,14581,41.9%,
Very Poor,6244,18.0%,
Very Good,5357,15.4%,
Average,3436,9.9%,
Poor,1711,4.9%,
(Missing),3451,9.9%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),9.9%
Missing (n),3451

0,1
Good,14581
Very Poor,6244
Very Good,5357
Other values (2),5147
(Missing),3451

Value,Count,Frequency (%),Unnamed: 3
Good,14581,41.9%,
Very Poor,6244,18.0%,
Very Good,5357,15.4%,
Average,3436,9.9%,
Poor,1711,4.9%,
(Missing),3451,9.9%,

0,1
Distinct count,18
Unique (%),0.1%
Missing (%),0.0%
Missing (n),0

0,1
,15271
"Room heaters, mains gas",10132
"Room heaters, electric",7386
Other values (15),1991

Value,Count,Frequency (%),Unnamed: 3
,15271,43.9%,
"Room heaters, mains gas",10132,29.1%,
"Room heaters, electric",7386,21.2%,
Portable electric heaters (assumed),997,2.9%,
"Room heaters, dual fuel (mineral and wood)",214,0.6%,
"Room heaters, wood logs",213,0.6%,
Portable electric heaters,196,0.6%,
"Room heaters, coal",132,0.4%,
"Room heaters, smokeless fuel",108,0.3%,
SAP05:Secondary-Heating,79,0.2%,

0,1
Constant value,

0,1
Constant value,

0,1
Distinct count,3
Unique (%),0.0%
Missing (%),48.2%
Missing (n),16774

0,1
N,17912
Y,94
(Missing),16774

Value,Count,Frequency (%),Unnamed: 3
N,17912,51.5%,
Y,94,0.3%,
(Missing),16774,48.2%,

0,1
Correlation,0.97903

0,1
Distinct count,15
Unique (%),0.0%
Missing (%),0.0%
Missing (n),0

0,1
marketed sale,11259
rental (social),7234
none of the above,4213
Other values (12),12074

Value,Count,Frequency (%),Unnamed: 3
marketed sale,11259,32.4%,
rental (social),7234,20.8%,
none of the above,4213,12.1%,
rental (private),3051,8.8%,
new dwelling,2862,8.2%,
assessment for green deal,2469,7.1%,
ECO assessment,1713,4.9%,
non marketed sale,1005,2.9%,
FiT application,652,1.9%,
following green deal,141,0.4%,

0,1
Distinct count,614
Unique (%),1.8%
Missing (%),93.7%
Missing (n),32587
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,6.0394
Minimum,0
Maximum,29.12
Zeros (%),0.3%

0,1
Minimum,0.0
5-th percentile,1.006
Q1,4.7
Median,6.0
Q3,7.46
95-th percentile,11.07
Maximum,29.12
Range,29.12
Interquartile range,2.76

0,1
Standard deviation,2.9195
Coef of variation,0.48341
Kurtosis,4.0242
Mean,6.0394
MAD,2.1011
Skewness,0.76882
Sum,13244
Variance,8.5235
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,97,0.3%,
5.2,55,0.2%,
7.0,54,0.2%,
6.6,51,0.1%,
5.0,49,0.1%,
4.9,47,0.1%,
6.0,46,0.1%,
4.7,33,0.1%,
8.0,32,0.1%,
6.2,30,0.1%,

Value,Count,Frequency (%),Unnamed: 3
0.0,97,0.3%,
1.0,13,0.0%,
1.01,1,0.0%,
1.05,5,0.0%,
1.09,1,0.0%,

Value,Count,Frequency (%),Unnamed: 3
18.9,2,0.0%,
19.73,1,0.0%,
21.55,1,0.0%,
23.41,1,0.0%,
29.12,1,0.0%,

0,1
Distinct count,114
Unique (%),0.3%
Missing (%),0.0%
Missing (n),1

0,1
"Cavity wall, filled cavity",13829
"Solid brick, as built, no insulation (assumed)",6395
"Cavity wall, as built, insulated (assumed)",4881
Other values (110),9674

Value,Count,Frequency (%),Unnamed: 3
"Cavity wall, filled cavity",13829,39.8%,
"Solid brick, as built, no insulation (assumed)",6395,18.4%,
"Cavity wall, as built, insulated (assumed)",4881,14.0%,
"Cavity wall, as built, no insulation (assumed)",4173,12.0%,
"Cavity wall, as built, partial insulation (assumed)",688,2.0%,
"Solid brick, with external insulation",504,1.4%,
Average thermal transmittance 0.27 W/m²K,251,0.7%,
Average thermal transmittance 0.26 W/m²K,245,0.7%,
"Solid brick, with internal insulation",222,0.6%,
"Timber frame, as built, insulated (assumed)",209,0.6%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.1%
Missing (n),31

0,1
Good,19726
Very Poor,6490
Poor,4429
Other values (2),4104

Value,Count,Frequency (%),Unnamed: 3
Good,19726,56.7%,
Very Poor,6490,18.7%,
Poor,4429,12.7%,
Very Good,2364,6.8%,
Average,1740,5.0%,
(Missing),31,0.1%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.1%
Missing (n),31

0,1
Good,19726
Very Poor,6490
Poor,4429
Other values (2),4104

Value,Count,Frequency (%),Unnamed: 3
Good,19726,56.7%,
Very Poor,6490,18.7%,
Poor,4429,12.7%,
Very Good,2364,6.8%,
Average,1740,5.0%,
(Missing),31,0.1%,

0,1
Distinct count,19
Unique (%),0.1%
Missing (%),0.0%
Missing (n),11

0,1
Fully double glazed,29546
High performance glazing,1914
Single glazed,1132
Other values (15),2177

Value,Count,Frequency (%),Unnamed: 3
Fully double glazed,29546,85.0%,
High performance glazing,1914,5.5%,
Single glazed,1132,3.3%,
Partial double glazing,955,2.7%,
Mostly double glazing,850,2.4%,
Some double glazing,239,0.7%,
Single glazeddouble glazing,39,0.1%,
SAP05:Windows,30,0.1%,
Full secondary glazing,17,0.0%,
Fully triple glazed,12,0.0%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.1%
Missing (n),30

0,1
Average,20542
Good,9865
Very Good,1958
Other values (2),2385

Value,Count,Frequency (%),Unnamed: 3
Average,20542,59.1%,
Good,9865,28.4%,
Very Good,1958,5.6%,
Very Poor,1327,3.8%,
Poor,1058,3.0%,
(Missing),30,0.1%,

0,1
Distinct count,6
Unique (%),0.0%
Missing (%),0.1%
Missing (n),30

0,1
Average,20542
Good,9865
Very Good,1958
Other values (2),2385

Value,Count,Frequency (%),Unnamed: 3
Average,20542,59.1%,
Good,9865,28.4%,
Very Good,1958,5.6%,
Very Poor,1327,3.8%,
Poor,1058,3.0%,
(Missing),30,0.1%,

0,1
Distinct count,4
Unique (%),0.0%
Missing (%),5.6%
Missing (n),1942
Infinite (%),0.0%
Infinite (n),0

0,1
Mean,3.0453e-05
Minimum,-1
Maximum,1
Zeros (%),94.3%

0,1
Minimum,-1
5-th percentile,0
Q1,0
Median,0
Q3,0
95-th percentile,0
Maximum,1
Range,2
Interquartile range,0

0,1
Standard deviation,0.028675
Coef of variation,941.62
Kurtosis,1213.4
Mean,3.0453e-05
MAD,0.00085264
Skewness,1.2885
Sum,1
Variance,0.00082224
Memory size,271.8 KiB

Value,Count,Frequency (%),Unnamed: 3
0.0,32811,94.3%,
1.0,14,0.0%,
-1.0,13,0.0%,
(Missing),1942,5.6%,

Value,Count,Frequency (%),Unnamed: 3
-1.0,13,0.0%,
0.0,32811,94.3%,
1.0,14,0.0%,

Value,Count,Frequency (%),Unnamed: 3
-1.0,13,0.0%,
0.0,32811,94.3%,
1.0,14,0.0%,

Unnamed: 0,LMK_KEY,ADDRESS1,ADDRESS2,ADDRESS3,POSTCODE,BUILDING_REFERENCE_NUMBER,CURRENT_ENERGY_RATING,POTENTIAL_ENERGY_RATING,CURRENT_ENERGY_EFFICIENCY,POTENTIAL_ENERGY_EFFICIENCY,PROPERTY_TYPE,BUILT_FORM,INSPECTION_DATE,LOCAL_AUTHORITY,CONSTITUENCY,COUNTY,LODGEMENT_DATE,TRANSACTION_TYPE,ENVIRONMENT_IMPACT_CURRENT,ENVIRONMENT_IMPACT_POTENTIAL,ENERGY_CONSUMPTION_CURRENT,ENERGY_CONSUMPTION_POTENTIAL,CO2_EMISSIONS_CURRENT,CO2_EMISS_CURR_PER_FLOOR_AREA,CO2_EMISSIONS_POTENTIAL,LIGHTING_COST_CURRENT,LIGHTING_COST_POTENTIAL,HEATING_COST_CURRENT,HEATING_COST_POTENTIAL,HOT_WATER_COST_CURRENT,HOT_WATER_COST_POTENTIAL,TOTAL_FLOOR_AREA,ENERGY_TARIFF,MAINS_GAS_FLAG,FLOOR_LEVEL,FLAT_TOP_STOREY,FLAT_STOREY_COUNT,MAIN_HEATING_CONTROLS,MULTI_GLAZE_PROPORTION,GLAZED_TYPE,GLAZED_AREA,EXTENSION_COUNT,NUMBER_HABITABLE_ROOMS,NUMBER_HEATED_ROOMS,LOW_ENERGY_LIGHTING,NUMBER_OPEN_FIREPLACES,HOTWATER_DESCRIPTION,HOT_WATER_ENERGY_EFF,HOT_WATER_ENV_EFF,FLOOR_DESCRIPTION,FLOOR_ENERGY_EFF,FLOOR_ENV_EFF,WINDOWS_DESCRIPTION,WINDOWS_ENERGY_EFF,WINDOWS_ENV_EFF,WALLS_DESCRIPTION,WALLS_ENERGY_EFF,WALLS_ENV_EFF,SECONDHEAT_DESCRIPTION,SHEATING_ENERGY_EFF,SHEATING_ENV_EFF,ROOF_DESCRIPTION,ROOF_ENERGY_EFF,ROOF_ENV_EFF,MAINHEAT_DESCRIPTION,MAINHEAT_ENERGY_EFF,MAINHEAT_ENV_EFF,MAINHEATCONT_DESCRIPTION,MAINHEATC_ENERGY_EFF,MAINHEATC_ENV_EFF,LIGHTING_DESCRIPTION,LIGHTING_ENERGY_EFF,LIGHTING_ENV_EFF,MAIN_FUEL,WIND_TURBINE_COUNT,HEAT_LOSS_CORRIDOOR,UNHEATED_CORRIDOR_LENGTH,FLOOR_HEIGHT,PHOTO_SUPPLY,SOLAR_WATER_HEATING_FLAG,MECHANICAL_VENTILATION,ADDRESS,LOCAL_AUTHORITY_LABEL,CONSTITUENCY_LABEL
0,111140140312012021616250191920249,"9, Swalebrooke Avenue",,,TS25 5JP,2955018468,C,C,69,70,Bungalow,Semi-Detached,2012-02-16,E06000001,E14000733,,2012-02-16,rental (private),72,73,198,190.0,2.2,37.0,2.1,62.0,37.0,403.0,407.0,73.0,73.0,67.48,Single,Y,NODATA!,,,2104,100.0,"double glazing, unknown install date",Normal,0.0,3.0,3.0,33.0,0.0,From main system,Good,Good,"Solid, limited insulation (assumed)",,,Fully double glazed,Average,Average,"Cavity wall, filled cavity",Good,Good,"Room heaters, electric",,,"Pitched, 300+ mm loft insulation",Very Good,Very Good,"Boiler and radiators, mains gas",Good,Good,Programmer and room thermostat,Average,Average,Low energy lighting in 33% of fixed outlets,Average,Average,mains gas (not community),0.0,NO DATA!,,2.3,0.0,,natural,"9, Swalebrooke Avenue",Hartlepool,Hartlepool
1,1232010769902014110610070420940148,"15, Amerston Close",Wynyard,,TS22 5QX,7929869278,C,C,71,79,House,Detached,2014-11-04,E06000001,E14000733,,2014-11-06,marketed sale,70,79,156,114.0,6.9,27.0,4.9,122.0,122.0,1395.0,1192.0,152.0,131.0,255.0,Single,Y,NODATA!,,,2106,100.0,"double glazing, unknown install date",Normal,0.0,10.0,10.0,78.0,0.0,From main system,Good,Good,"Solid, limited insulation (assumed)",,,Fully double glazed,Average,Average,"Cavity wall, as built, insulated (assumed)",Good,Good,"Room heaters, wood logs",,,"Pitched, 250mm loft insulation",Good,Good,"Boiler and radiators, mains gas",Good,Good,"Programmer, room thermostat and TRVs",Good,Good,Low energy lighting in 78% of fixed outlets,Very Good,Very Good,mains gas (not community),0.0,NO DATA!,,,0.0,,natural,"15, Amerston Close, Wynyard",Hartlepool,Hartlepool
2,623946537732013031817334702968205,"37, Moffatt Road",,,TS25 3QP,3592236868,C,B,69,85,House,Mid-Terrace,2013-03-18,E06000001,E14000733,,2013-03-18,marketed sale,70,86,197,95.0,2.7,38.0,1.4,84.0,42.0,461.0,430.0,85.0,61.0,72.0,Single,Y,NODATA!,,,2104,100.0,"double glazing, unknown install date",Normal,0.0,3.0,3.0,0.0,0.0,From main system,Good,Good,"Solid, no insulation (assumed)",,,Fully double glazed,Average,Average,"Cavity wall, filled cavity",Good,Good,"Room heaters, mains gas",,,"Pitched, 200 mm loft insulation",Good,Good,"Boiler and radiators, mains gas",Good,Good,Programmer and room thermostat,Average,Average,No low energy lighting,Very Poor,Very Poor,mains gas (not community),0.0,NO DATA!,,,0.0,,natural,"37, Moffatt Road",Hartlepool,Hartlepool
3,1145926633612014052412471896240323,"21, Fewston Close",,,TS26 0QN,3335363278,D,C,61,80,House,Detached,2014-05-24,E06000001,E14000733,,2014-05-24,assessment for green deal,55,78,227,115.0,7.4,44.0,3.8,127.0,81.0,1096.0,876.0,369.0,89.0,169.0,Single,Y,NODATA!,,,2106,100.0,"double glazing, unknown install date",Normal,2.0,9.0,9.0,43.0,0.0,From main system,Average,Average,"Solid, limited insulation (assumed)",,,Fully double glazed,Average,Average,"Cavity wall, as built, insulated (assumed)",Good,Good,"Room heaters, mains gas",,,"Pitched, 200 mm loft insulation",Good,Good,"Boiler and radiators, mains gas",Good,Good,"Programmer, room thermostat and TRVs",Good,Good,Low energy lighting in 43% of fixed outlets,Average,Average,mains gas (not community),0.0,NO DATA!,,,0.0,,natural,"21, Fewston Close",Hartlepool,Hartlepool
4,671252869922016111904014149718306,"180, Seaton Lane",,,TS25 1HF,475669868,D,C,64,79,House,Semi-Detached,2016-11-19,E06000001,E14000733,,2016-11-19,rental (social),58,76,248,141.0,5.5,44.0,3.1,113.0,73.0,959.0,821.0,194.0,81.0,126.0,Single,Y,NODATA!,,,2106,100.0,double glazing installed during or after 2002,Normal,0.0,5.0,5.0,45.0,0.0,From main system,Average,Average,"Solid, no insulation (assumed)",NO DATA!,,Fully double glazed,Good,Good,"Cavity wall, filled cavity",Good,Good,"Room heaters, mains gas",,,"Pitched, 300 mm loft insulation",Very Good,Very Good,"Boiler and radiators, mains gas",Good,Good,"Programmer, room thermostat and TRVs",Good,Good,Low energy lighting in 45% of fixed outlets,Good,Good,mains gas (not community),0.0,NO DATA!,,2.5,,N,"mechanical, extract only","180, Seaton Lane",Hartlepool,Hartlepool


### 4. DATA CLEANING/MANIPULATION

**Fixing rows and columns**
If we happen to find any columns with "Unique Value Count" as 1 or equal to the number of rows, they are of no particular use to the analysis. We might as well drop those columns from the data set.

In [0]:
        # Dropping columns that are insignificant to analysis
df = df.drop(['POSTCODE', 'ADDRESS1', 'BUILDING_REFERENCE_NUMBER', 'LMK_KEY'], axis=1)

**Dealing with missing values**

While we always want to be careful about removing information, if a column has a high percentage of missing values, then it probably will not be useful to our model. The threshold for removing columns should depend on the problem (here is a discussion), and based on the above results, we will remove any columns missing more than 30% missing values

In [0]:
# Dropping columns having more than 30% missing values
def missingvaluecol(df,threshold):
    l = []
    l = list(df.drop(df.loc[:,list((100*(df.isnull().sum()/len(df.index))>=threshold))].columns, 1).columns.values)
    print("# Columns having more than %s percent missing values:"%threshold,(df.shape[1] - len(l)))
    print("Columns:\n",list(set(list((df.columns.values))) - set(l)))
    return l

missingvaluecol(df,30) 

In [0]:
# Replacing null values with column mean for numerical values
df = df.select_dtypes(include=[np.number]).fillna(df.select_dtypes(include=[np.number]).mean())

**Outlier Treatment**

In [0]:
# Detecting outliers using box plots
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['CURRENT_ENERGY_EFFICIENCY'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['ENVIRONMENT_IMPACT_CURRENT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['ENVIRONMENT_IMPACT_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['ENERGY_CONSUMPTION_CURRENT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['ENERGY_CONSUMPTION_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['CO2_EMISSIONS_CURRENT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['CO2_EMISS_CURR_PER_FLOOR_AREA'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['CO2_EMISSIONS_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['LIGHTING_COST_CURRENT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['LIGHTING_COST_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['LIGHTING_COST_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['HEATING_COST_CURRENT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['HEATING_COST_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['HOT_WATER_COST_CURRENT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['HOT_WATER_COST_POTENTIAL'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['TOTAL_FLOOR_AREA'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['FLAT_STOREY_COUNT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['MAIN_HEATING_CONTROLS'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['MULTI_GLAZE_PROPORTION'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['EXTENSION_COUNT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['NUMBER_HABITABLE_ROOMS'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['NUMBER_HEATED_ROOMS'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['LOW_ENERGY_LIGHTING'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['NUMBER_OPEN_FIREPLACES'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['WIND_TURBINE_COUNT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['UNHEATED_CORRIDOR_LENGTH'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['FLOOR_HEIGHT'])

In [0]:
plt.rcParams["figure.figsize"] = (15,4)
sns.boxplot(x=df['PHOTO_SUPPLY'])

In [0]:
        # Outlier removal using standard deviation
df[np.abs(df - df.mean()) <= (3*df.std())]