# Exploratory Data Analysis (EDA) with NILMTK API

EDA is used by data scientists to analyze and investigate this UK-DALE dataset.  It summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.  This is to explore more API functions from NILMTK.

- Created Date : 16/4/2022
- Updated Date : 18/4/2022
- Author : KK Yong

**References:**
- [NILMTK API documentation](http://nilmtk.github.io/nilmtk/master/index.html)
- N. Batra et al., “[A demonstration of reproducible state-of-the-art energy disaggregation using NILMTK](https://nipunbatra.github.io/papers/2019/batra_buildsys19demo.pdf)” in Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Nov. 2019, pp. 358–359, doi: 10.1145/3360322.3360999.
- J. Kelly et al., “[NILMTK v0.2: a non-intrusive load monitoring toolkit for large scale data sets](https://arxiv.org/pdf/1409.5908.pdf)” in Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, Nov. 2014, pp. 182–183, doi: 10.1145/2674061.2675024.


In [None]:
import warnings
import dateutil

import matplotlib.pyplot as plt
import pandas as pd

import nilmtk as ntk

# Initialization for Python and NILMTK

Let's kick-off to process and analysis the data with Python.

In [None]:
import dateutil
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from datetime import datetime

import nilmtk as ntk
import util as ut

## Define constant and global variable

In [None]:
plt.rcParams['figure.figsize'] = [15, 10]
RAW_FILENAME = "../Dataset/ukdale.h5"

START_TS ='2013-08-01 00:00:00'
END_TS='2013-08-31 23:59:59'

## Initial NILMTK with loading data for House/Building 1

In [None]:
# Create object for ukdale
ukdale = ntk.DataSet(RAW_FILENAME)

# Set the duration of data to be extract
ukdale.set_window(start=START_TS,end=END_TS)

# Create object for house data
house_data = ukdale.buildings[1].elec

In [None]:
house_data.mains()

In [None]:
house_data.mains().power_series_all_data().plot.line()

# Using NILMTK APIs to perform statistical analysis

This is to see the proportion of energy main and submeter in house/building 1

In [None]:
house_data.proportion_of_energy_submetered()

### Get information for the Type of Poware -  Active, apparent and reactive power for 'house_data' object

In [None]:
house_data.available_ac_types('power')

In [None]:
house_data.mains().available_ac_types('power')

In [None]:
house_data.submeters().available_ac_types('power')

###  Execute NILMTK Statistical APIs

In [None]:
# Total Energy returns in 'kWh'

house_data.mains().total_energy() 

In [None]:
# Energy use per submeter

house_data.submeters().energy_per_meter()

***Notes:***

**column headings** are the ElecMeter instance numbers.  You can try to do "**print(house_data)**" or other APIs to further exploration.

The function fraction_per_meter does the same thing as energy_per_submeter but returns the fraction of energy per meter.

### Select meters on the basis of their energy consumption
Let's make a new MeterGroup which only contains the ElecMeters which used more than 10 kWh:

In [None]:
# energy_per_meter is a DataFrame where each row is a 
# power type ('active', 'reactive' or 'apparent').
# All appliance meters in REDD are record 'active' so just select
# the 'active' row:
energy_per_meter = house_data.submeters().energy_per_meter()

energy_per_meter = energy_per_meter.loc['active']
more_than_10 = energy_per_meter[energy_per_meter > 10]
more_than_10

In [None]:
instances = more_than_10.index
instances

## Plot fraction of energy consumption of each appliance

In [None]:
# Remove Null records
fraction = house_data.submeters().fraction_per_meter().dropna()

In [None]:
# Create convenient labels
labels = house_data.get_labels(fraction.index)
plt.figure(figsize=(10,30))
fraction.plot(kind='pie', labels=labels);

### Enhancing Data Visualization

In [None]:
# Sorting the value to descending order
fraction_sorted = fraction.sort_values(ascending=False)

# Create series object for the top 10 appliance
fraction_top = fraction_sorted.head(15)

# Sum up the Other appliance value
others_val = 1 - fraction_top.sum()
fraction_top

# Create Pie Chart

labels = house_data.get_labels(fraction_top.index)
labels.append('Others')

fraction_top['Others'] = others_val
plt.figure(figsize=(10,30))
fraction_top.plot(kind='pie', labels=labels, 
                  title="Top 15 of Appliance Energy Consumption", 
                  autopct='%1.1f%%', label="");

In [None]:
# Create variable to store the 'Others' items
fraction_others = fraction_sorted.iloc[15:]

# Change series of index values, thus, bar chart x-label shows appliance name
idx_labels = house_data.get_labels(fraction_others.index)
fraction_others.index = idx_labels

# Plot bar chart
fraction_others.plot(kind='bar', x=labels, title="Others Fraction of Appliance - Value is in %")

### Wiring Diagram

This is useful to quick view for nested MetaGroup or its categories.  However, if there are a lot of items, it won't show unreadable.  

If there is two or three levels in its meter hierachy, then can use the API, for example, **"house_data.meters_directly_downstream_of_mains()"**.

In [None]:
house_data.draw_wiring_graph()

### Plot appliances when they are in use

In [None]:
house_data.plot_when_on(on_power_threshold = 40)

## Stats and info for individual meters - e.g. Fridge

The ElecMeter class represents a single electricity meter. Each ElecMeter has a list of associated Appliance objects.  ElecMeter has many of the same stats methods as MeterGroup such as total_energy and available_power_ac_types and power_series and power_series_all_data. We will now explore some more stats functions (many of which are also available on MeterGroup)...

In [None]:
fridge_meter = house_data['fridge']

In [None]:
#Get upstream meter

fridge_meter.upstream_meter() # happens to be the mains meter group!

In [None]:
# Metadata about the class of meter

fridge_meter.device

In [None]:
# Dominant appliance
#
# If the metadata specifies that a meter has multiple meters connected to it then one 
# of those can be specified as the 'dominant' appliance, and this appliance can be retrieved 
# with this method

fridge_meter.dominant_appliance()

In [None]:
# Total energy

fridge_meter.total_energy() # kWh

In [None]:
# Get good sections
# If we plot the raw power data then we see there is one large gap where, supposedly, 
# the metering system was not working. (if we were to zoom in then we'd see lots of 
# smaller gaps too):

fridge_meter.plot()

In [None]:
# We can automatically identify the 'good sections' (i.e. the sections where every pair of consecutive 
# samples is less than max_sample_period specified in the dataset metadata):

good_sections = fridge_meter.good_sections(full_results=True)

# specifying full_results=False would give us a simple list of 
# TimeFrames.  But we want the full GoodSectionsResults object so we can
# plot the good sections...

good_sections.plot()

# The blue chunks show where the data is good. However, it does not show any other 'blue bar' that 
# has a gap.  If there is one or more white gap,  then it can show the large gap seen in the raw power 
# data. There may have lots of smaller gaps that we cannot see at this zoom level.

# Your Summary and Finding

Your work should cover analysis of spot anomalies, test a hypothesis with list of questions & assumptions for driving of next data modelling works.

- Created Date: ??
- Updated Date: ??

**Findings:**
- ?
- ?
- ?