# Generating Fake Appliance Energy Data
Trying our best with Paul data only left us with the option of using the London Kaggle Data, as the dataset is larger. We also have information on the types of houses that can be used for predictions based on the similar house groupings. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
from datetime import datetime, timedelta

In [2]:
# Read and preprocess the london_sample.csv file
df = pd.read_csv("london_sample.csv")
df = df.drop(columns='Unnamed: 0')
df.columns = ['LCLid', 'DateTime', 'KWh']

# Remove rows with 'Null' values in the 'KWh' column
df = df[df.KWh != 'Null']

# Convert the 'KWh' column to float
df['KWh'] = df['KWh'].astype(float)

# Read and preprocess the informations_households.csv file
house_info = pd.read_csv('archive/informations_households.csv')
house_info.Acorn = house_info.Acorn.str.replace('ACORN-', "")


# Read and merge the Acorn Descriptions v2.csv file with the AcornName DataFrame
AcornDesc = pd.read_csv('Acorn Descriptions v2.csv', sep=':')

# Merge the energy consumption data with the household information
df_households = pd.merge(df, house_info, on='LCLid')

# Merge the resulting DataFrame with the Acorn descriptions DataFrame on the 'Acorn' column
df_households_acorn_desc = pd.merge(df_households, AcornDesc, on='Acorn')

# Drop the specified columns
df_households_acorn_desc = df_households_acorn_desc.drop(columns=['stdorToU', 'Acorn_grouped', 'file'])

# Display the resulting DataFrame
print(df_households_acorn_desc)
# Convert the 'DateTime' column to a pandas datetime object
df_households_acorn_desc['DateTime'] = pd.to_datetime(df_households_acorn_desc['DateTime'])
df_households_acorn_desc.to_csv('London_House_data.csv', index=False)

  result = method(y)


             LCLid                   DateTime        KWh Acorn  \
0        MAC000323  2012-03-06 14:00:00+00:00   488.0000     A   
1        MAC000323  2012-03-06 14:30:00+00:00   449.0000     A   
2        MAC000323  2012-03-06 15:00:00+00:00   424.0000     A   
3        MAC000323  2012-03-06 15:30:00+00:00   439.0000     A   
4        MAC000323  2012-03-06 16:00:00+00:00   291.0000     A   
...            ...                        ...        ...   ...   
7270639  MAC005509  2014-02-27 22:00:00+00:00   697.0000     C   
7270640  MAC005509  2014-02-27 22:30:00+00:00  1496.0001     C   
7270641  MAC005509  2014-02-27 23:00:00+00:00  1092.0000     C   
7270642  MAC005509  2014-02-27 23:30:00+00:00  1002.0000     C   
7270643  MAC005509  2014-02-28 00:00:00+00:00   282.0000     C   

                House Type  Amount of Beds  House Value   \
0         Detached Houses              5+            1m+   
1         Detached Houses              5+            1m+   
2         Detached Houses  

## Generating the Fake Data
The toolkit requires us to know the appliance energy, seperately, which is something we do not have for both Kaggle data or Pauls data, so in this case we would generate some Fake Data so we can try out the toolkit. 

In [5]:
# Define common appliances
appliances = ['Fridge', 'Washing Machine', 'TV', 'Oven', 'Air Conditioner']

# Define realistic energy consumption ranges (in watts) for each appliance
consumption_ranges = {
    'Fridge': (100, 200),
    'Washing Machine': (500, 1500),
    'TV': (50, 300),
    'Oven': (1000, 5000),
    'Air Conditioner': (1000, 3500)
}

# Function to generate random DateTime within a specified range
def random_datetime(start, end):
    total_seconds = int((end - start).total_seconds())
    random_seconds = random.randint(0, total_seconds)
    return start + timedelta(seconds=random_seconds)

# Set DateTime range
min_datetime = datetime.strptime("2011-12-06 15:30:00+00:00", "%Y-%m-%d %H:%M:%S%z")
max_datetime = datetime.strptime("2014-02-28 00:00:00+00:00", "%Y-%m-%d %H:%M:%S%z")

# Generate fake data
num_rows = 1000  # Adjust this value according to your needs
data = []

for i in range(num_rows):
    dt = random_datetime(min_datetime, max_datetime)
    consumption = {appliance: random.uniform(*consumption_ranges[appliance]) for appliance in appliances}
    data.append([dt, *consumption.values()])

# Create a Pandas DataFrame
df = pd.DataFrame(data, columns=['DateTime', *appliances])
print(df)
# Save the DataFrame to a CSV file
df.to_csv('fake_appliance_data.csv', index=False)


                     DateTime      Fridge  Washing Machine          TV  \
0   2013-01-05 07:40:11+00:00  140.717578      1199.616497  268.105619   
1   2013-06-09 01:03:07+00:00  160.390627      1328.789888  226.854459   
2   2012-07-13 10:57:19+00:00  117.402343       999.973534  170.604169   
3   2012-12-25 15:23:10+00:00  152.272234       999.406855  142.463900   
4   2012-05-05 10:43:06+00:00  149.502054       731.775460  178.945614   
..                        ...         ...              ...         ...   
995 2012-09-28 05:00:39+00:00  171.103585       929.379530  213.657181   
996 2014-01-10 21:38:38+00:00  138.565993      1292.892386  218.151523   
997 2012-07-17 00:28:33+00:00  149.128342      1238.027185  197.249036   
998 2013-01-04 15:48:36+00:00  199.699333      1497.313828  233.242279   
999 2012-11-10 07:41:39+00:00  111.105613       693.424326  190.337380   

            Oven  Air Conditioner  
0    4756.812327      1563.959972  
1    1210.766837      1095.792216  
2  