# Group E Report


## NILMTK - An Introduction to Load Disaggregation


## Introduction
Non-Intrusive Load Monitoring is an approach for the energy disaggregation which aims to help to predict individual appliances power and their behavior through the whole household power meter. One can directly conclude that there is no need of the individual sensors, which makes NILM a cheaper alternative to monitoring appliances. Secondly, this helps to reduce energy consumption, save money, find and flatten peak power loads and, finally, go to sustainability. 

In this project we are going to explore NILMTK Python-framework through the UKDALE dataset to perform energy disaggregation. 

### EXERCISE 1: Installation and loading data to perform basic checks

In this exercise we prepare framework and UKDALE dataset. UKDALE is an open-access dataset from the UK recording Domestic Appliance-Level Electricity at a sample rate of 16 kHz for the whole-house and at 1/6 Hz for individual appliances. 

After we load data into memory, we can printout the UKDALE's metadata. In metadata we can see description of the dataset, e.g. which meter devices were used, location of the data reading, date, number of buildings, timeframe, creators etc.

The next task is to print out the sub-metered appliances in each building. Here we can see all submeters attached to appliances for each building separately. We conclude that each building has exactly one main meter there are no nested MeterGroups for appliances. Each submeter contains information about its instance, building, as well as appliance (type and instance).

The type of power for mains and sub-meters are shown below:
```python
elec.mains().available_ac_types('power')
Out[]: ['active', 'apparent']
elec.submeters().available_ac_types('power')
Out[]: ['active', 'apparent']
```

We calculate the total energy consumption for building 1 in kWh and have below result:
```python
apparent    5835.953591
active      5008.108254
dtype: float64
```

We can also get the wiring diagram for the MeterGroup which consist of 53 instances for Building 1:

<img src='./images/wirediagram.png'>

### EXERCISE 2: Appliances and power consumed

We select and plot the power that is used by the following appliances of building 1 from 2014-04-28 until 2014-04-29' for fridge freezer and light as below: 

<img src='./images/plot1.png'>

And the plot of the overall power consumption for a the same day is shown below:

<img src='./images/plot2.png'>

To get energy fraction per each submeter, we call energy_per_meter() method for the submeters. We are not interested in the reactive power since submeters do not utilize it, hence, we remove it from the consideration. Then we extract active and apparent power from the energy_fraction_per_submeter dataframe and build plots correspondingly. The plot of energy consumption for each sub-meter is shown below for active and apparent power fraction:

<img src='./images/plot3.png'>

<img src='./images/plot4.png'>

#### Data filtering
With NILM we can easily filter dataset as we have been practicing in this exercise. Some finding from our filtering task are below:
<ol>
  <li>The appliance with the highest power consumption in building 1 is instance number 12 which is <b>fridge freezer</b>. </li>
  <li>Appliances of the type “single-phase induction motor” consist of:
      <ol>
          <li>boiler</li>
          <li>solar thermal pumping station</li>
          <li>washer dryer</li>
          <li>dish washer</li>
          <li>kettle, food processor, toasted sandwich maker</li>
          <li>toaster, kitchen aid, food processor</li>
          <li>fridge freezer</li>
          <li>breadmaker</li>
          <li>vacuum cleaner</li>
          <li>fan</li>
          <li>immersion heater, water pump, security alarm, fan, drill, laptop computer</li>
      <ol>
  </li>
</ol>


### EXERCISE 3: Training and disaggregation
There are three disaggregation algorithms provided by NILMTK. Combinatorial Optimisation (CO) and FHMM are non-event based, while Hart’s algo is event based. In this exercise, we apply the algorithm CO and FHMM to disaggregate and plot the daily power consumption for building 3.

First, we set train set range until the end of the 24-03-2013 and test set range from the 25-03-2013. Plot of summarises power data for the building 3 for the site meter and various appliances during the training period shown below:

<img src='./images/plot5.png'>

CO models each appliance to consist of a fixed number of states and assigns different power levels to each of these states. The optimisation function involves finding the optimal combination of appliance states for different appliances which minimises the difference between predicted and observed aggregate power. In FHMM, each appliance is modelled as a hidden Markov model (HMM), where the hidden component is its state, and the observed component is its power draw. Like CO, FHMM has time complexity exponential in the number of appliances and thus become intractable for a large number of appliances. ["If You Measure It, Can You Improve It? Exploring The Value of Energy Disaggregation" - Nipun Batra, et al.].

Below are two plots showing predicted and the ground truth power consumption using CO:

<img src='./images/pred-co.png'>

<img src='./images/gt-co.png'>

And below are two plots showing predicted and the ground truth power consumption using FHMM:

<img src='./images/pred-fhmm.png'>

<img src='./images/gt-fhmm.png'>

### EXERCISE 4: Calculate F-Score of CO and FHMM

Until the calculating F-score we mainly used a sample code from the NILMTK documentation as for the reference for the model training and evaluation. Below the relevant steps are summarised:
*  Train the model on the training set
*  Read test data by chunks, apply the model to calculate predicted appliance power in the `pred` and ground truth data in the `gt`
*  Try to fit the data into the main memory by concatenating chunks into a pandas dataframe
*  Correspond data with the local timezone and put human-readable labels

As we are using F1 score the model evaluation, we transfer from the energy disaggregation to the classification problem. Hence, if the predicted power is more than a threshold, the device is classified as ON. 

F1 score is defined as a harmonic mean of the precision and recall metrics in the binary classification problem. 
Later we will explain the choice of the threshold value. 

Here one can see F-scores for the individual appliance level using CO and FHMM algorithms. CO performed significantly better for the kettle classification, showing F-score of 0.016418 against 0.009945. It also showed slightly higher F-metric for the Electric space heater and the Laptop computer but not that significant. In contrast, FHMM has an F-score of 0.091067 for the projector versus 0.087560 by CO but the diffence is very small. Thus, we conclude that Combinatorial Optimisation performed better.
Now why we chose a threshold of 5. It was actually retrieved empirically since a valud of 0 yielded inadequate results:
We chose empirically a classification threshold to be 5 since a threshold of 0 gave inadequate F-score:

| Appliance| CO | FHMM |
| --- | --- | --- |
| Electric space heater | 0.189193 | 1.000000 |
| Kettle | 0.396679	| 0.912690 |
| Laptop computer | 0.397019 | 0.375311 |
| Projector	| 0.088132 | 0.091067 |