# Merging der bereinigten Datensätze

**Libraries**<br>
pandas:     Datenverarbeitung<br>
os:         Betriebsystem-Funktionen für relative Pfadreferenzierung<br>

In [65]:
import pandas as pd
import os

dirname = os.path.abspath('')

df_path = os.path.join(dirname, '2019_Cleaned_Data_15min.csv')

path_building1 = os.path.join(dirname, 'Rohdaten_Last_Braunschweig\Building1.csv')
path_building2 = os.path.join(dirname, 'Rohdaten_Last_Braunschweig\Building2.csv')
path_building3 = os.path.join(dirname, 'Rohdaten_Last_Braunschweig\Building3.csv')
path_building4 = os.path.join(dirname, 'Rohdaten_Last_Braunschweig\Building4.csv')
path_building5 = os.path.join(dirname, 'Rohdaten_Last_Braunschweig\Building5.csv')

**10-minütigen Wettermessungen werden zu stündlichen zusammengefasst**<br>
Die Wahl von sum() und mean() beim Resampling wird auf Basis der jeweiligen Messgröße getroffen.
Basierend auf den timespamps "MESS_DATUM" werden die dataframes gemerged.

In [66]:
cleaned_df = pd.read_csv(df_path)

# Define a custom aggregation dictionary for each column
aggregation_dict = {
    'RWS_DAU_10':   'sum',      # Regendauer (10-min Messungen)
    'RWS_10':       'sum',      # Regenmenge (Höhe in mm, 10-min Messungen)
    'DS_10':        'sum',      # Diffuse Strahlung (10-min Messungen)
    'GS_10':        'sum',      # Globale Strahlung (10-min Messungen)
    'SD_10':        'sum',      # Sonnenschein-Dauer (10-min Messungen)
    'FF_10':        'mean',     # Durchschn. Windgeschwindigkeit
    'DD_10':        'mean',     # Durchschn. Windrichtung
    'PP_10':        'mean',     # Luftdruck auf Höhe der Messstation
    'TT_10':        'mean',     # Lufttemperatur 2 m über dem Boden
    'TM5_10':       'mean',     # Lufttemperatur 5 cm über dem Boden
    'RF_10':        'mean'      # Relative Luftfeuchtigkeit
}

# Convert the 'MESS_DATUM' column to datetime if it's not already
cleaned_df['MESS_DATUM'] = pd.to_datetime(cleaned_df['MESS_DATUM'], utc=True)
# Set the 'rec_time' column as the index
cleaned_df.set_index('MESS_DATUM', inplace=True)

# Apply the aggregation using the agg method
cleaned_df = cleaned_df.resample('1h').agg(aggregation_dict)

# Display the resulting DataFrame
cleaned_df = cleaned_df.reset_index()
print(cleaned_df)

                    MESS_DATUM  RWS_DAU_10  RWS_10  DS_10  GS_10  SD_10  \
0    2019-01-01 00:00:00+00:00        13.0    0.00    0.0    0.0    0.0   
1    2019-01-01 01:00:00+00:00        35.0    0.03    0.0    0.0    0.0   
2    2019-01-01 02:00:00+00:00        30.0    0.03    0.0    0.0    0.0   
3    2019-01-01 03:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
4    2019-01-01 04:00:00+00:00         8.0    0.00    0.0    0.0    0.0   
...                        ...         ...     ...    ...    ...    ...   
8755 2019-12-31 19:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
8756 2019-12-31 20:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
8757 2019-12-31 21:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
8758 2019-12-31 22:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
8759 2019-12-31 23:00:00+00:00         0.0    0.00    0.0    0.0    0.0   

         FF_10       DD_10        PP_10     TT_10    TM5_10      RF_10  
0     5.400000  260.000000

**Merging der Haushaltslasten und Verkettung mit den Wetterdaten**

In [67]:
data_building1 = pd.read_csv(path_building1)
data_building2 = pd.read_csv(path_building2)
data_building3 = pd.read_csv(path_building3)
data_building4 = pd.read_csv(path_building4)
data_building5 = pd.read_csv(path_building5)

data_building_list = [data_building1, data_building2, data_building3, data_building4, data_building5]

# Loop through each data_building DataFrame
for i, df in enumerate(data_building_list):
    # Convert the 'rec_time' column to datetime if it's not already
    df['rec_time'] = pd.to_datetime(df['rec_time'], utc=True)
    # Set the 'rec_time' column as the index
    df.set_index('rec_time', inplace=True)
    df.index += pd.Timedelta(hours=1)
    # Create a DatetimeIndex to enable resampling
    df.index = pd.to_datetime(df.index)
    # Resample the data to 1-hour intervals, maintaining the sum of each hour
    df = df.resample('1h', origin='start').sum()
    # Reset the index to turn the index into a regular column
    df = df.reset_index()
    data_building_list[i] = df

data_building1 = pd.merge(cleaned_df, data_building_list[0], left_index=True, right_index=True)
data_building2 = pd.merge(cleaned_df, data_building_list[1], left_index=True, right_index=True)
data_building3 = pd.merge(cleaned_df, data_building_list[2], left_index=True, right_index=True)
data_building4 = pd.merge(cleaned_df, data_building_list[3], left_index=True, right_index=True)
data_building5 = pd.merge(cleaned_df, data_building_list[4], left_index=True, right_index=True)

df = pd.concat([data_building1, data_building2, data_building3, data_building4, data_building5], ignore_index=True)

df = df.drop('rec_time', axis=1)
print(df)

df.to_csv('2019_Merged_Clean_Data_1h.csv', index=False) #index=FALSE for not including row indices


                     MESS_DATUM  RWS_DAU_10  RWS_10  DS_10  GS_10  SD_10  \
0     2019-01-01 00:00:00+00:00        13.0    0.00    0.0    0.0    0.0   
1     2019-01-01 01:00:00+00:00        35.0    0.03    0.0    0.0    0.0   
2     2019-01-01 02:00:00+00:00        30.0    0.03    0.0    0.0    0.0   
3     2019-01-01 03:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
4     2019-01-01 04:00:00+00:00         8.0    0.00    0.0    0.0    0.0   
...                         ...         ...     ...    ...    ...    ...   
43795 2019-12-31 19:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
43796 2019-12-31 20:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
43797 2019-12-31 21:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
43798 2019-12-31 22:00:00+00:00         0.0    0.00    0.0    0.0    0.0   
43799 2019-12-31 23:00:00+00:00         0.0    0.00    0.0    0.0    0.0   

          FF_10       DD_10        PP_10     TT_10    TM5_10      RF_10  \
0      5.400