<a href="https://colab.research.google.com/github/rjanow/Masterarbeit/blob/main/1_EDA_and_DataCleaning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Predicting UVI with LSTMs

[Notebook 1: EDA and Cleaning](./1_EDA and Cleaning.ipynb)

[Notebook 2: Modeling and Predictions](./2_Modeling and Predictions.ipynb)

[Notebook 3: Technical Report](./3_Technical_Report.ipynb)

In deiesem Notebook werden die aufgezeichenten UVI-Messungen weiter verarbeitet und für das Training vorbereitet.


- Einlesen der UVI-Werte
- Ersetzen von fehlenden Messwerten

- Einlesen der weiteren Inputwerte
- EDA (exploratory data analysis)

In [4]:
# Verbinden mit der Google-Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
## import modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pvlib

from datetime import datetime
from datetime import timedelta

import matplotlib
import seaborn as sns

In [26]:
latitude = 50.2
longitude = 7.8

seconds_in_day = 24*60*60
seconds_in_year = (365.2425)*seconds_in_day

In [9]:
# Pfad zur CSV-Datei auf Google Drive
drive_path = '/content/drive/My Drive/Colab_Notebooks/CSV_UVI/'

# Import der Messdaten

Die Messdaten sind in einer CSV-Datei gespeichert, diese muss importiert werden.

In [10]:
## Code zum Import der Messdaten
file_list = ['22.06', '22.07']  # Hier wird angegeben, welche Monate importiert werden sollen
dataframes = []
df_UVI_combined = []

for filename in file_list:
    file_path = drive_path + filename
    df_import = pd.read_csv(file_path)
    dataframes.append(df_import)

df_UVI_combined = pd.concat(dataframes, ignore_index=True)
df_UVI_combined['Datetime'] = pd.to_datetime(df_UVI_combined['Datetime'])

In [11]:
# Spaltennamen des Dataframe ausgeben
df_UVI_combined.columns
# Dataframe ausgeben
df_UVI_combined

Unnamed: 0,Datetime,Datum,Uhrzeit,Messzeitpunkt,erythem,UVI
0,2022-06-15 07:21:00,2022-06-15,07:21:00,26460,0.060209,2.408378
1,2022-06-15 07:23:00,2022-06-15,07:23:00,26580,0.061560,2.462381
2,2022-06-15 07:25:00,2022-06-15,07:25:00,26700,0.061976,2.479048
3,2022-06-15 07:27:00,2022-06-15,07:27:00,26820,0.063588,2.543531
4,2022-06-15 07:29:00,2022-06-15,07:29:00,26940,0.064412,2.576485
...,...,...,...,...,...,...
20847,2022-07-31 18:47:00,2022-07-31,18:47:00,67620,,0.000891
20848,2022-07-31 18:49:00,2022-07-31,18:49:00,67740,,0.000807
20849,2022-07-31 18:51:00,2022-07-31,18:51:00,67860,,0.000727
20850,2022-07-31 18:53:00,2022-07-31,18:53:00,67980,,0.000645


# Bereinigen der Messdaten

Hier wird erklärt, was zum Bereinigen der Messdaten getan werden muss.

- Fehlende Messtage müssen ersetzt werden
- Outliers müssen ersetzt werden

**Hier wird geprüft, ob die Messwerte zusammenhängend sind:**

In [12]:
def insert_missing_rows(df):
    # Sortieren Sie den DataFrame nach 'Datetime'
    df.sort_values(by='Datetime', inplace=True)

    # Initialisieren Sie eine leere Liste, um die Zeilen mit fehlenden Daten einzufügen
    rows_to_insert = []

    # Gruppieren Sie den DataFrame nach 'Datum'
    grouped = df.groupby('Datum')

    for date, group in grouped:
        # Sortieren Sie die Gruppe nach 'Datetime'
        group.sort_values(by='Datetime', inplace=True)

        # Iterieren Sie durch die Zeilen in der Gruppe
        for i in range(1, len(group)):
            current_time = group.iloc[i]['Datetime']
            prev_time = group.iloc[i - 1]['Datetime']
            time_diff = current_time - prev_time

            if time_diff > timedelta(minutes=2):
                while prev_time + timedelta(minutes=2) < current_time:
                    prev_time += timedelta(minutes=2)
                    new_row = {
                        'Datetime': prev_time,
                        'Datum': date,
                        'Uhrzeit': prev_time.time(),
                        'Messzeitpunkt': (prev_time - prev_time.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds(),
                        'erythem': 0,
                        'UVI': 0,
                        'DiffGreater2': 1,
                    }
                    rows_to_insert.append(new_row)

    # Fügen Sie die fehlenden Zeilen in den DataFrame ein
    if rows_to_insert:
        df = df.append(rows_to_insert, ignore_index=True)

    # Sortieren Sie den DataFrame wieder nach 'Datetime'
    df.sort_values(by='Datetime', inplace=True)
    df = df.reset_index(drop = True)

    return df

In [13]:
df_UVI_WRows = insert_missing_rows(df_UVI_combined)

  df = df.append(rows_to_insert, ignore_index=True)


In [14]:
df_UVI_WRows

Unnamed: 0,Datetime,Datum,Uhrzeit,Messzeitpunkt,erythem,UVI,DiffGreater2
0,2022-06-15 07:21:00,2022-06-15,07:21:00,26460.0,0.060209,2.408378,
1,2022-06-15 07:23:00,2022-06-15,07:23:00,26580.0,0.061560,2.462381,
2,2022-06-15 07:25:00,2022-06-15,07:25:00,26700.0,0.061976,2.479048,
3,2022-06-15 07:27:00,2022-06-15,07:27:00,26820.0,0.063588,2.543531,
4,2022-06-15 07:29:00,2022-06-15,07:29:00,26940.0,0.064412,2.576485,
...,...,...,...,...,...,...,...
20932,2022-07-31 18:47:00,2022-07-31,18:47:00,67620.0,,0.000891,
20933,2022-07-31 18:49:00,2022-07-31,18:49:00,67740.0,,0.000807,
20934,2022-07-31 18:51:00,2022-07-31,18:51:00,67860.0,,0.000727,
20935,2022-07-31 18:53:00,2022-07-31,18:53:00,67980.0,,0.000645,


# Sonnenstandswinkel hinzufügen

In [15]:
def calculate_solar_zenith_angle(dataframe, date_column, latitude, longitude, altitude=0):

    # Kopiere das ursprüngliche DataFrame, um es nicht zu ändern.
    result_df = dataframe.copy()

    # Konvertiere die Datumsspalte in einen datetime-Datentyp, falls sie es nicht bereits ist.
    if not pd.api.types.is_datetime64_any_dtype(dataframe[date_column]):
        result_df[date_column] = pd.to_datetime(dataframe[date_column])

    # Iteriere über die Zeilen des DataFrames und berechne den Solarzenitwinkel für jedes Datum.
    solar_zenith_angles = []
    for date in result_df[date_column]:
        solar_position = pvlib.solarposition.get_solarposition(date, latitude, longitude, altitude)
        solar_zenith_angle = solar_position['zenith'].values[0]
        solar_zenith_angles.append(solar_zenith_angle)

    # Füge die berechneten Solarzenitwinkel dem DataFrame hinzu.
    result_df['SolarZenithAngle'] = solar_zenith_angles

    return result_df

In [16]:
df_UVI_WRows_SZ = calculate_solar_zenith_angle(df_UVI_WRows, 'Datetime', latitude,
longitude)

In [17]:
df_UVI_WRows_SZ

Unnamed: 0,Datetime,Datum,Uhrzeit,Messzeitpunkt,erythem,UVI,DiffGreater2,SolarZenithAngle
0,2022-06-15 07:21:00,2022-06-15,07:21:00,26460.0,0.060209,2.408378,,54.595800
1,2022-06-15 07:23:00,2022-06-15,07:23:00,26580.0,0.061560,2.462381,,54.277291
2,2022-06-15 07:25:00,2022-06-15,07:25:00,26700.0,0.061976,2.479048,,53.959013
3,2022-06-15 07:27:00,2022-06-15,07:27:00,26820.0,0.063588,2.543531,,53.640984
4,2022-06-15 07:29:00,2022-06-15,07:29:00,26940.0,0.064412,2.576485,,53.323222
...,...,...,...,...,...,...,...,...
20932,2022-07-31 18:47:00,2022-07-31,18:47:00,67620.0,,0.000891,,87.047175
20933,2022-07-31 18:49:00,2022-07-31,18:49:00,67740.0,,0.000807,,87.336775
20934,2022-07-31 18:51:00,2022-07-31,18:51:00,67860.0,,0.000727,,87.625467
20935,2022-07-31 18:53:00,2022-07-31,18:53:00,67980.0,,0.000645,,87.913238


# Zeit und Datum in Sin und Cos codieren

In [39]:
def calculate_date_in_sine_cosine(dataframe, day, year):

    result_df = dataframe.copy()

    result_df['time_sin'] = np.sin(2*np.pi*result_df['Messzeitpunkt']/day)
    result_df['time_cos'] = np.cos(2*np.pi*result_df['Messzeitpunkt']/day)
    result_df['date_sin'] = np.sin((2*np.pi*result_df['Datetime'].dt.dayofyear * 24 * 60 * 60 + result_df['Datetime'].dt.hour * 60 * 60 + result_df['Datetime'].dt.minute * 60) / year)
    result_df['date_cos'] = np.cos((2*np.pi*result_df['Datetime'].dt.dayofyear * 24 * 60 * 60 + result_df['Datetime'].dt.hour * 60 * 60 + result_df['Datetime'].dt.minute * 60) / year)

    return result_df

In [40]:
df_UVI_SZ_SC = calculate_date_in_sine_cosine(df_UVI_WRows_SZ, seconds_in_day, seconds_in_year)

# Erstes Plotten der Messdaten

In [None]:
# Funktion zum Plotten aller Messdaten
def plot_data_per_day(dataframe, date_column, value_column, x_column, dates, save_path):
    for date in dates:
        subset = dataframe[dataframe[date_column] == date]

        plt.figure(figsize=(10, 6))
        ax = sns.lineplot(data=subset, x=x_column, y=value_column)

        #interval = 2  # Intervall in Stunden
        #ax.xaxis.set_major_locator(mdates.HourLocator(interval=interval))

        plt.xticks(rotation=45)
        plt.title(f'Verlauf des UVI für den {date}')
        plt.xlabel('Uhrzeit (UTC)')
        plt.ylabel('UVI')
        plt.tight_layout()
        # plt.show()

        plot_filename = f'{date}.png'
        plot_filepath = save_path + plot_filename
        plt.savefig(plot_filepath)  # Plot speichern
        plt.close()  # Plot schließen, um Ressourcen freizugeben

In [None]:
# Funktion zum Erzeugen einer Liste mit Daten die geplottet werden sollen
def generate_dates_to_plot(start_date, end_date):
    date_range = []
    current_date = start_date

    while current_date <= end_date:
        date_range.append(current_date.strftime('%Y-%m-%d'))
        current_date += timedelta(days=1)

    return date_range

In [None]:
# Erzeugen einer Liste mit Daten
start_date = datetime(2022, 6, 15)
end_date = datetime(2022, 6, 15)

dates_to_plot = generate_dates_to_plot(start_date, end_date)

In [None]:
# Speicherort für die Plots der täglichen Messdaten
daily_plots_path = '/content/drive/My Drive/Colab_Notebooks/plot_daily_UVI/'

In [None]:
# Funktion zum Plotten der Messdaten aufrufen
plot_data_per_day(df_UVI_combined, 'Datum', 'UVI', 'Uhrzeit', dates_to_plot, daily_plots_path)

# Plotten von Zusammenhängen in den Messdaten

In [None]:
## Code zum Plotten von Zusammenhängen in den Messdaten

# Daten zu "pickel" speichern

https://docs.python.org/3/library/pickle.html

 - The pickle module implements binary protocols for serializing and de-serializing a *Python object structure*. “Pickling” is the process whereby a *Python object hierarchy is converted into a byte stream*, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” 1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling".

In [None]:
## Code um Messdaten zu pickel zu speichern