**Daten laden und vorbereiten**
1. Laden der Daten in einen Dataframe
2. Zeitspalte umwandeln (Unix-Timestamp -> Datetime)
3. nach homeid gruppieren (jeder Haushalt hat seine eigene Zeitreihe)
4. Sortieren nach Zeit innerhalb des Haushalts

In [9]:
import pandas as pd
import numpy as np
import os

In [10]:
def load_processed_data():
    """Load preprocessed sensor data from parquet file"""
    
    file_path = Path('..') / 'data' / 'processed' / 'final_processed_data3.parquet'
    
    if not file_path.exists():
        raise FileNotFoundError(f"Data file not found at {file_path}")
        
    # Load data
    df = pd.read_parquet(file_path)
    
    # Print validation info
    print("\nDataset loaded successfully:")
    print(f"Shape: {df.shape}")
    print(f"Homes: {df['homeid'].nunique()}")
    print(f"Date range: {df['timestamp_local'].min()} to {df['timestamp_local'].max()}")

In [None]:
df

In [11]:
"""
Cell generated by Data Wrangler.
"""
def clean_data(df):
    # Convert Unix timestamp to datetime
    df['timestamp_local'] = pd.to_datetime(df['timestamp_local'], unit='ms')
    # Set timestamp_local as index
    df.set_index('timestamp_local', inplace=True)
    # Sort by homeid and timestamp_local
    df = df.sort_values(by=['homeid', 'timestamp_local'])
    return df

df_clean = clean_data(df.copy())
df_clean.head()

Unnamed: 0_level_0,homeid,sensorid_electric,electric_min_consumption,electric_max_consumption,std_consumption,electric_median_consumption,electric_total_consumption_Wh,sensorid_gas,gas_mean_consumption,gas_min_consumption,...,gas_total_consumption_Wh,sensorid,median_temperature,_room,sensorid_room,median_value,roomid,income_band_mid,education_map,measured_entity
timestamp_local,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-09-20 09:00:00,47,1216.0,0.069,0.335,0.033905,0.194,0.179807,1221.0,0.112,0.112,...,0.224,1186.0,24.75,979.8,1197.2,20.72,652.0,0.0,8.0,3
2016-09-20 10:00:00,47,1216.0,0.068875,0.458375,0.035875,0.187625,0.17669,1221.0,0.112,0.112,...,0.21,1186.0,24.39375,980.8,1197.2,20.695,652.0,0.0,8.0,3
2016-09-20 11:00:00,47,1216.0,0.06875,0.58175,0.037846,0.18125,0.173574,1221.0,0.112,0.112,...,0.196,1186.0,24.0375,981.8,1197.2,20.67,652.0,0.0,8.0,3
2016-09-20 12:00:00,47,1216.0,0.068625,0.705125,0.039817,0.174875,0.170457,1221.0,0.112,0.112,...,0.182,1186.0,23.68125,982.8,1197.2,20.645,652.0,0.0,8.0,3
2016-09-20 13:00:00,47,1216.0,0.0685,0.8285,0.041788,0.1685,0.16734,1221.0,0.112,0.112,...,0.168,1186.0,23.325,983.8,1197.2,20.62,652.0,0.0,8.0,3


**Feature Engineering & Datenbereinigung**
1. Zyklische Transformation für Zeitdaten (hour_sin, hour_cos für Stunden)
2. Lag-Features erstellen (für vorherige Strom und Gaswerte)
3. Rolling-Average-Features (z.B gleitender Mittelwert über 3 oder 7 Zeitschritte)
4. Daten normalisieren (Min-Max-Scaling für LSTM)