# EDA (on per-appliance load) (Rough work)

This is a rough work script that determine the clipping threshold of per-appliance data. It was observed per-appliance load clipping was applied in different papers, as sharp load spikes were observed in per-appliance data, which created noise.

We attempted to look at UK-DALE house-1 data to find out a suitable load clipping value by observing its xx-quartile.

---

consider UK-DALE house-1 data first

select suitable range of data

Result should be a dataframe with column timestamp, and the resulting load.

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import os

In [2]:
house_1_folder = Path(Path(os.getcwd()).parent, 'datasets', 'UK-DALE-FULL-disaggregated', 'ukdale', 'house_1')

In [3]:
labels = open(house_1_folder / 'labels.dat').readlines()
labels = [l.strip().split(' ')[1] for l in labels]

In [4]:
labels

['aggregate',
 'boiler',
 'solar_thermal_pump',
 'laptop',
 'washing_machine',
 'dishwasher',
 'tv',
 'kitchen_lights',
 'htpc',
 'kettle',
 'toaster',
 'fridge',
 'microwave',
 'lcd_office',
 'hifi_office',
 'breadmaker',
 'amp_livingroom',
 'adsl_router',
 'livingroom_s_lamp',
 'soldering_iron',
 'gigE_&_USBhub',
 'hoover',
 'kitchen_dt_lamp',
 'bedroom_ds_lamp',
 'lighting_circuit',
 'livingroom_s_lamp2',
 'iPad_charger',
 'subwoofer_livingroom',
 'livingroom_lamp_tv',
 'DAB_radio_livingroom',
 'kitchen_lamp2',
 'kitchen_phone&stereo',
 'utilityrm_lamp',
 'samsung_charger',
 'bedroom_d_lamp',
 'coffee_machine',
 'kitchen_radio',
 'bedroom_chargers',
 'hair_dryer',
 'straighteners',
 'iron',
 'gas_oven',
 'data_logger_pc',
 'childs_table_lamp',
 'childs_ds_lamp',
 'baby_monitor_tx',
 'battery_charger',
 'office_lamp1',
 'office_lamp2',
 'office_lamp3',
 'office_pc',
 'office_fan',
 'LED_printer']

In [5]:
concerned_labels = [
    'aggregate',
    'washing_machine',
    'dishwasher',
    'fridge',
    'kettle',
    'microwave',
    'toaster',

    'tv',
    'htpc',
    'kitchen_radio',
    'gas_oven',

    'kitchen_lights',
]

In [6]:
# for each label, read the corresponding .dat file (should be a space-separated file)
# then combine all the data into a single DataFrame

data = {}
for label in concerned_labels:
    i = labels.index(label)
    file_path = house_1_folder / f'channel_{i+1}.dat'
    if file_path.exists():
        df = pd.read_csv(file_path, sep=' ', header=None)
        df.columns = ['timestamp', label]
        df.set_index('timestamp', inplace=True)     # for concat
        data[label] = df
    else:
        print(f"File {file_path} does not exist.")


Define each appliance's on-power threshold, and calculate the mean power and s.d. of their usage

Can observe the different percentiles to determine the threshold for noise

In [7]:
def describe_data(label, threshold=0):
    df = data[label]
    df = df[df[label] > threshold]  # filter the data to only keep rows with load > threshold
    print(f"Data for {label}:")
    print(df.describe(percentiles=[0.001, 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999]))

Washing Machine

In [42]:
# filter the data to only keep rows with load > 0

describe_data('washing_machine')

Data for washing_machine:
       washing_machine
count     1.098198e+06
mean      4.944810e+02
std       7.572846e+02
min       1.000000e+00
0.1%      1.000000e+00
1%        2.000000e+00
5%        3.000000e+00
10%       7.000000e+00
25%       2.800000e+01
50%       1.580000e+02
75%       3.230000e+02
90%       1.973000e+03
95%       2.169000e+03
99%       2.256000e+03
99.9%     2.348803e+03
max       3.999000e+03


Dishwasher

In [58]:
describe_data('dishwasher', threshold=2)

Data for dishwasher:
          dishwasher
count  551766.000000
mean      955.004127
std      1081.986313
min         3.000000
0.1%        6.000000
1%          6.000000
5%         25.000000
10%       116.000000
25%       118.000000
50%       121.000000
75%      2317.000000
90%      2352.000000
95%      2367.000000
99%      2395.000000
99.9%    2741.000000
max      3973.000000


Fridge

In [68]:
describe_data('fridge', 18)

Data for fridge:
             fridge
count  8.174812e+06
mean   9.507820e+01
std    3.979155e+01
min    1.900000e+01
0.1%   7.400000e+01
1%     8.000000e+01
5%     8.400000e+01
10%    8.500000e+01
25%    8.700000e+01
50%    8.900000e+01
75%    9.200000e+01
90%    9.600000e+01
95%    1.030000e+02
99%    2.510000e+02
99.9%  3.440000e+02
max    3.323000e+03


Kettle

In [80]:
describe_data('kettle',60)

Data for kettle:
              kettle
count  130149.000000
mean     2287.559451
std       321.394361
min        61.000000
0.1%       90.148000
1%        461.000000
5%       2245.000000
10%      2284.000000
25%      2318.000000
50%      2347.000000
75%      2372.000000
90%      2395.000000
95%      2411.000000
99%      2459.520000
99.9%    2691.852000
max      3948.000000


Microwave

In [79]:
describe_data('microwave', 60)

Data for microwave:
           microwave
count  101779.000000
mean     1400.437143
std       397.840407
min        61.000000
0.1%       66.000000
1%        154.000000
5%        241.000000
10%      1058.800000
25%      1408.000000
50%      1542.000000
75%      1586.000000
90%      1609.000000
95%      1620.000000
99%      1644.000000
99.9%    2854.000000
max      3267.000000


Toaster

In [147]:
describe_data('toaster', 15)

Data for toaster:
             toaster
count  105501.000000
mean     1548.904323
std       268.400537
min        16.000000
0.1%       30.000000
1%         55.000000
5%       1515.000000
10%      1535.000000
25%      1557.000000
50%      1576.000000
75%      1593.000000
90%      1608.000000
95%      1619.000000
99%      2350.000000
99.9%    2783.000000
max      3627.000000


TV

In [110]:
describe_data('tv', 12)

Data for tv:
                 tv
count  2.273942e+06
mean   1.059988e+02
std    1.910138e+01
min    1.300000e+01
0.1%   7.500000e+01
1%     8.600000e+01
5%     8.700000e+01
10%    8.800000e+01
25%    9.400000e+01
50%    1.030000e+02
75%    1.170000e+02
90%    1.290000e+02
95%    1.300000e+02
99%    1.330000e+02
99.9%  1.480000e+02
max    3.109000e+03


htpc

In [138]:
describe_data('htpc', 400)

Data for htpc:
        htpc
count    1.0
mean   514.0
std      NaN
min    514.0
0.1%   514.0
1%     514.0
5%     514.0
10%    514.0
25%    514.0
50%    514.0
75%    514.0
90%    514.0
95%    514.0
99%    514.0
99.9%  514.0
max    514.0


kitchen_radio

In [9]:
describe_data('kitchen_radio', 3)

Data for kitchen_radio:
       kitchen_radio
count     442.000000
mean       66.124434
std       213.365156
min         4.000000
0.1%        4.000000
1%          4.000000
5%          4.000000
10%         4.100000
25%         5.000000
50%         5.000000
75%         5.000000
90%       129.000000
95%       369.350000
99%      1025.000000
99.9%    2049.000000
max      2049.000000


Gas oven

In [121]:
describe_data('gas_oven', 700)

Data for gas_oven:
          gas_oven
count     3.000000
mean   1041.333333
std      24.826062
min    1027.000000
0.1%   1027.000000
1%     1027.000000
5%     1027.000000
10%    1027.000000
25%    1027.000000
50%    1027.000000
75%    1048.500000
90%    1061.400000
95%    1065.700000
99%    1069.140000
99.9%  1069.914000
max    1070.000000


Kitchen lights

In [133]:
describe_data('kitchen_lights', 430)

Data for kitchen_lights:
       kitchen_lights
count     2638.000000
mean       437.013268
std         18.177761
min        431.000000
0.1%       431.000000
1%         432.000000
5%         432.000000
10%        432.000000
25%        433.000000
50%        435.000000
75%        438.000000
90%        445.000000
95%        448.000000
99%        456.630000
99.9%      468.726000
max       1325.000000


Table for recording on-power threshold, noise threshold, and clipping threshold

Some of the On-power thresholds are from

```code
Zeyi Geng, Linfeng Yang, Wuqing Yu,
    A diffusion model-based framework to enhance the robustness of non-intrusive load disaggregation,
    DOI: https://doi.org/10.1016/j.energy.2025.135423.

Sykiotis, S., Kaselimi, M., Doulamis, A., & Doulamis, N. (2022). ELECTRIcity: An Efficient Transformer for Non-Intrusive Load Monitoring. Sensors, 22(8), 2926. https://doi.org/10.3390/s22082926
```

| Appliance       | On power threshold | Noise threshold | Clipping Threshold (max limit)|
|-----------------|-------------------|-----------------|-------------------|
| washing_machine | 20  |5 | 2500 |
| dishwasher      | 10  |5 | 2500 |
| fridge          | 50  |18| 350 |
| kettle          | 200 |30| 3100  |
| microwave       | 200 |30| 3000 |
| toaster         | 150 |15| 3000 |
| tv              | 70  |12| 250 |
| htpc            | 50  |10| 400 |
| kitchen_radio   | 15  |10| 200 |
| gas_oven        | 50  |5 | 1000 |
| kitchen_lights  | 60  |10| 430 |


Date: 01/07/2025

---