<a href="https://colab.research.google.com/github/rjanow/Masterarbeit/blob/main/data_cleaning/DataCleaning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Skript um die Messdaten un Vorhersagedaten zusammen zu fühern und dieses zu bereinigen. Zudem werden weitere Features die für das Trainig des Netzwerks wichtig sind erzeugt.

Dateiname: Data_Cleaning

[Notebook 0: Data Cleaning](./0_DataCleaning.ipynb)

[Notebook 1: EDA](./1_EDA_and_Cleaning.ipynb)

[Notebook 2: Modeling and Predictions](./2_Modeling_and_Predictions.ipynb)

[Notebook 3: Technical Report](./3_Technical_Report.ipynb)

## Installation der Bibliothek PVlib:
Dieses Modul wird für die Berechnung des Sonnenstandwinkels verwendet.

In [1]:
# Verbinden mit der Google-Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Installation der pvlib um den Sonnenstandswinkel zu berechnen
!pip install pvlib



## Import der benötigten Module und allgemeines Setup:

In [3]:
# import der benötigten Module

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pvlib
import warnings
import matplotlib
import seaborn as sns

from datetime import datetime
from datetime import timedelta

from math import sqrt

warnings.filterwarnings('ignore', category=RuntimeWarning)

In [4]:
# Standort der Messstaion für die Berechnung des Sonnenstandswinkel
latitude = 50.8
longitude = 7.2

# Angaben um das Datum und die Uhrzeit in Sin und Cos zu codieren
seconds_in_day = 24*60*60
seconds_in_year = (365.2425)*seconds_in_day

tz, altitude, name = 'Europe/Berlin', 70, 'SanktAugustin'
tus = pvlib.location.Location(latitude, longitude, tz, altitude, name)

In [5]:
# Pfad zu den Messwerten / Vorhersagewerten und dem Speicherort auf Google Drive
# Import
folder_UVI = '/content/drive/My Drive/Colab_Notebooks/CSV_UVI/'
folder_Solys = '/content/drive/My Drive/Colab_Notebooks/SOLYS_CSV/'
folder_CAMS = '/content/drive/My Drive/Colab_Notebooks/CAMS_Vorhersage/'
folder_VarIdx = '/content/drive/My Drive/Colab_Notebooks/CAMS_Vorhersage/'
folder_save = '/content/drive/My Drive/Colab_Notebooks/Clean_Data/'

name_UVI = ['22.06', '22.07', '22.08', '22.09', '22.10', '22.11', '22.12', '23.01', '23.02', '23.03', '23.04', '23.05', '23.07', '23.08']  # Hier wird angegeben, welche Monate importiert werden sollen
name_Solys = 'Solys_CSV'
name_CAMS = 'CAMS_std_CSV'
name_CAMS_Glo = 'CAMS_Glo_CSV'
name_CAMS_TCC = 'CAMS_TCC_CSV'
# name_VarIdx = 'blabla'

# Export

## Import der UVI-Messdaten:

Die Messdaten sind in CSV-Dateien gespeichert und müssen importiert werden.

In [6]:
## Code zum Import der Messdaten
dataframes = []
df_UVI_combined = []

for name in name_UVI:
  file_path = folder_UVI + name
  # print(file_path)
  df_import = pd.read_csv(file_path)
  dataframes.append(df_import)

df_UVI_combined = pd.concat(dataframes, ignore_index=True)
df_UVI_combined['Datetime'] = pd.to_datetime(df_UVI_combined['Datetime'])
df_UVI_combined.set_index('Datetime', inplace = True)
df_UVI_combined.drop(['Datum', 'Uhrzeit', 'Messzeitpunkt'], axis = 1)

Unnamed: 0_level_0,UVI,UVA,UVB,erythem
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-06-15 07:21:00,2.408378,4.686417,281.330695,0.060209
2022-06-15 07:23:00,2.462381,4.793073,287.094062,0.061560
2022-06-15 07:25:00,2.479048,4.817792,288.902613,0.061976
2022-06-15 07:27:00,2.543531,4.953151,292.450776,0.063588
2022-06-15 07:29:00,2.576485,5.016957,295.552724,0.064412
...,...,...,...,...
2023-08-12 18:34:00,0.024202,0.016610,7.669374,0.000605
2023-08-12 18:36:00,0.020477,0.014359,6.381634,0.000512
2023-08-12 18:38:00,0.019234,0.013251,6.252191,0.000481
2023-08-12 18:40:00,0.018458,0.012915,5.933103,0.000461


## Import der Solys-Messdaten:

Messdaten mit einer 2-minütigen Auflösung. Die Daten sind noch nicht reduziert.

In [7]:
# Einlesen der Solys-Messdaten:
df_Solys = pd.read_csv(folder_Solys + name_Solys)

# Die Spalte mit dem Zeitstempel in das Datetimeformat umwandeln und als Index speichern
df_Solys['Datetime'] = pd.to_datetime(df_Solys['Datetime'])
df_Solys.set_index('Datetime', inplace = True)

In [8]:
df_Solys

Unnamed: 0_level_0,Glo,Dif,Glo_SPLite,Dir,Temp
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-04-25 08:06:00,245.833333,246.866667,259.233333,2.400000,10.766667
2022-04-25 08:08:00,237.058333,237.666667,249.200000,2.350000,10.658333
2022-04-25 08:10:00,206.466667,206.675000,217.633333,2.316667,10.058333
2022-04-25 08:12:00,188.858333,189.400000,200.900000,2.316667,10.608333
2022-04-25 08:14:00,186.991667,187.616667,199.800000,2.208333,10.391667
...,...,...,...,...,...
2023-11-09 08:10:00,9.975000,10.500000,11.266667,1.400000,8.683333
2023-11-09 08:12:00,10.233333,10.741667,11.291667,1.400000,8.708333
2023-11-09 08:14:00,10.775000,11.275000,11.691667,1.416667,8.716667
2023-11-09 08:16:00,11.916667,12.441667,12.883333,1.416667,8.700000


## Import der CAMS-Vorhersagedaten:

Die Daten sind noch nicht reduziert und es gibt für jede Stunde des Tages eine Vorhersage.

In [9]:
# Einlesen der Solys-Messdaten:
df_CAMS = pd.read_csv(folder_CAMS + name_CAMS)

# Die Spalte mit dem Zeitstempel in das Datetimeformat umwandeln und als Index speichern
df_CAMS['Datetime'] = pd.to_datetime(df_CAMS['Datetime'])
df_CAMS.set_index('Datetime', inplace = True)

In [10]:
df_CAMS

Unnamed: 0_level_0,aod469,aod550,gtco3,uvbed,uvbedcs,hcc,lcc,tcc
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-06-01 00:00:00,0.261836,0.211588,0.008161,-5.551115e-16,5.551115e-16,0.000000,0.000000,0.504647
2022-06-01 01:00:00,0.272371,0.220283,0.008044,-5.551115e-16,5.551115e-16,0.000000,0.000000,0.681122
2022-06-01 02:00:00,0.237297,0.190401,0.007992,-5.551115e-16,5.551115e-16,0.124487,0.015687,0.414249
2022-06-01 03:00:00,0.195020,0.155880,0.007971,-5.551115e-16,5.551115e-16,0.071994,0.005310,0.076343
2022-06-01 04:00:00,0.164975,0.131777,0.007959,7.204596e-02,7.198133e-02,0.000000,0.001114,0.001785
...,...,...,...,...,...,...,...,...
2023-07-31 19:00:00,0.131174,0.109557,0.006663,5.551115e-16,5.551115e-16,1.000000,0.259121,1.000000
2023-07-31 20:00:00,0.126791,0.106162,0.006666,5.551115e-16,5.551115e-16,1.000000,0.473365,1.000000
2023-07-31 21:00:00,0.133466,0.111809,0.006700,5.551115e-16,5.551115e-16,0.999878,0.934537,1.000000
2023-07-31 22:00:00,0.162170,0.136924,0.006741,5.551115e-16,5.551115e-16,1.000000,0.984985,1.000000


## Import der CAMS-Globalstrahlungs-Vorhersagedaten:

In [14]:
# Einlesen der Solys-Messdaten:
df_CAMS_GLO = pd.read_csv(folder_CAMS + name_CAMS_Glo)

# Die Spalte mit dem Zeitstempel in das Datetimeformat umwandeln und als Index speichern
df_CAMS_GLO['Datetime'] = pd.to_datetime(df_CAMS_GLO['Observation_period'])
df_CAMS_GLO.set_index('Datetime', inplace = True)
df_CAMS_GLO.drop(['Observation_period'], axis = 1)

Unnamed: 0_level_0,Clear_sky_GHI,Clear_sky_BHI,GHI,BHI,Clear_sky_GHI_new,Clear_sky_BHI_new,GHI_new,BHI_new
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-06-01 00:01:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2022-06-01 00:03:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2022-06-01 00:05:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2022-06-01 00:07:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2022-06-01 00:09:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
2023-08-31 23:51:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2023-08-31 23:53:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2023-08-31 23:55:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2023-08-31 23:57:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Import der CAMS-TCC-Vorhersagedaten:

In [None]:
# Einlesen der Solys-Messdaten:
df_CAMS_TCC = pd.read_csv(folder_CAMS + name_CAMS_TCC, index_col=[0, 1])

# ! Der Index muss noch in Datetime umgewndelt werden !

# Die Spalte mit dem Zeitstempel in das Datetimeformat umwandeln und als Index speichern
# df_CAMS_TCC['Datetime'] = pd.to_datetime(df_CAMS_TCC['Datetime'])
# df_CAMS_TCC.set_index('Datetime', inplace = True)

In [None]:
# df_CAMS_TCC

## Zusammenführen der DataFrames zu einem gemeinsamen DataFrame

Die CAMS-Solarstrahlungsdaten, Solys und UVI-Messdaten werden zusammengefügt.

In [None]:
def merge_asof_multiple_dfs(dfs, tolerance=pd.Timedelta('1 minute')):

    # Initialisiere das finale DataFrame mit dem ersten DataFrame in der Liste
    final_df = dfs[0].sort_values('Datetime')

    # Iteriere über die restlichen DataFrames in der Liste und führe sie schrittweise zusammen
    for df in dfs[1:]:
        df = df.sort_values('Datetime')
        final_df = pd.merge_asof(final_df, df, on='Datetime', tolerance=tolerance, direction='nearest')

    return final_df

In [None]:
# Zusammenführen der Dataframes
dataframes = [df_CAMS_GLO, df_UVI_combined, df_Solys]  # df_CAMS_TCC

df_complete = []
df_complete = merge_asof_multiple_dfs(dataframes).copy()
df_complete.set_index('Datetime', inplace = True)

In [None]:
# Umbenennen der Spalte Glo in ghi

df_complete.rename(columns={'Glo': 'ghi'}, inplace=True)

In [None]:
df_complete

In [None]:
# Alle NAN-Einträge entfernen
df_complete = df_complete.dropna(subset=['UVI', 'ghi'])

In [None]:
df_complete.isna().sum()

## Bereinigen der Messdaten

- Fehlende Messtage müssen ersetzt werden:
  - Prüfen, ob die UVI-Messwerte zusammenhängen.
  - Hinzufügen neuer Zeilen, falls die Messwerte nicht zusammen Hängen.

**Hier wird geprüft, ob die Messwerte zusammenhängend sind:**

In [None]:
def insert_missing_rows(df):
    # Sort the DataFrame by the 'Datetime' index
    df.sort_index(inplace=True)

    # Initialize a list to hold rows that need to be inserted
    rows_to_insert = []

    # Extract date from 'Datetime' index to facilitate grouping
    df['Datum'] = df.index.date

    # Group the DataFrame by 'Datum'
    grouped = df.groupby('Datum')

    if 'DiffGreater2' not in df.columns:
        df['DiffGreater2'] = 0

    for date, group in grouped:
        # Ensure the group is sorted by 'Datetime'
        group.sort_index(inplace=True)

        # Iterate through the group to find gaps in the data
        for i in range(1, len(group)):
            current_time = group.index[i]
            prev_time = group.index[i - 1]
            time_diff = current_time - prev_time

            # Check if the gap is greater than 2 minutes
            if time_diff > timedelta(minutes=2):
                while prev_time + timedelta(minutes=2) < current_time:
                    prev_time += timedelta(minutes=2)
                    new_row = {
                        'Datetime': prev_time,
                        'Datum': date,
                        'Uhrzeit': prev_time.time(),
                        'Messzeitpunkt': (prev_time - prev_time.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds(),
                        'erythem': 0,
                        'UVI': 0,
                        'UVA': 0,
                        'UVB':0,
                        'DiffGreater2': 1,
                    }
                    rows_to_insert.append(new_row)

    # Remove 'Datum' after processing
    # df.drop(columns='Datum', inplace=True)

    # Insert missing rows into a DataFrame
    if rows_to_insert:
        new_rows_df = pd.DataFrame(rows_to_insert)
        new_rows_df.set_index('Datetime', inplace=True)
        df = pd.concat([df, new_rows_df])

    # Sort the DataFrame by the 'Datetime' index
    df.sort_index(inplace=True)
    df = df.reset_index()
    # df['DiffGreater2'] = df['DiffGreater2'].fillna(0)

    return df.dropna(subset=['UVI'])

In [None]:
# Prüfen, ob die Messwerte Zusammenhängend
df_UVI_WRows = pd.DataFrame()
df_UVI_WRows = insert_missing_rows(df_complete)

In [None]:
df_UVI_WRows

## Sonnenstandswinkel hinzufügen

In [None]:
def calculate_solar_zenith_angle(dataframe, date_column, latitude, longitude, altitude=0):

    # Kopieren des ursprünglichen DataFrames.
    result_df = dataframe.copy()

    # Konvertiere der Datumsspalte in einen datetime-Datentyp, falls sie dies nicht bereits ist.
    if not pd.api.types.is_datetime64_any_dtype(dataframe[date_column]):
        result_df[date_column] = pd.to_datetime(dataframe[date_column])

    # Iteriere über die Zeilen des DataFrames und berechne den Solarzenitwinkel für jedes Datum.
    solar_zenith_angles = []
    for date in result_df[date_column]:
        solar_position = pvlib.solarposition.get_solarposition(date, latitude, longitude, altitude)
        solar_zenith_angle = solar_position['zenith'].values[0]
        solar_zenith_angles.append(solar_zenith_angle)

    # Füge die berechneten Solarzenitwinkel dem DataFrame hinzu.
    result_df['SZA'] = solar_zenith_angles

    return result_df

In [None]:
df_complete_WRows_SZ = pd.DataFrame()
df_complete_WRows_SZ = calculate_solar_zenith_angle(df_complete, 'Observation_period', latitude, longitude)

In [None]:
df_complete

## Zeit und Datum in Sin und Cos codieren
- Cyclical Encoding der Messzeit als Sinus und Cosinus

In [None]:
def calculate_date_in_sine_cosine(dataframe, day, year):

    # kopieren des ursprünglichen DataFrames
    result_df = dataframe.copy()

    # berechnen der Uhrzeit und des Datums als Sin und Cos
    result_df['time_sin'] = np.sin(2*np.pi*result_df['Messzeitpunkt']/day)
    result_df['time_cos'] = np.cos(2*np.pi*result_df['Messzeitpunkt']/day)
    result_df['date_sin'] = np.sin((2*np.pi*result_df['Observation_period'].dt.dayofyear * 24 * 60 * 60) / year)
    result_df['date_cos'] = np.cos((2*np.pi*result_df['Observation_period'].dt.dayofyear * 24 * 60 * 60) / year)

    return result_df

In [None]:
# Erweitern des DataFrames mit den UVI-Messwerten um den Solarzenitwinkel um die Uhrzeit als Sin/Cos
df_complete_SZ_SC = calculate_date_in_sine_cosine(df_complete_WRows_SZ, seconds_in_day, seconds_in_year)

## Erstellen einer Liste mit Stunden an denen es Messwerte gibt:

In [None]:
# Herausschreiben des Datums und der Stunde aus den Messdaten um im Folgeden nur die relevanten Vorhersagedaten auszuwählen
df_complete_SZ_SC['Date'] = df_complete_SZ_SC['Observation_period'].dt.date
df_complete_SZ_SC['Hour'] = df_complete_SZ_SC['Observation_period'].dt.hour

df_date_std = df_complete_SZ_SC.groupby(['Date', 'Hour']).size().reset_index(name='Count')
# DateTime-Splate mit dem Messzeitpunkt hinzufügen
df_date_std['Observation_period'] = pd.to_datetime(df_date_std['Date'].astype(str) + ' ' + df_date_std['Hour'].astype(str) + ':00:00')
# DateTime-Spalte als Index setzen
df_date_std.set_index('Observation_period', inplace=True)

In [None]:
# Liste mit Stunden an denen es Messwerte gibt. Bei 2 Min. Auslösung sollte es für jede Stunde 30 Messwerte geben.
df_date_std

## Variablilitäts Klassifikatoren:

### Funktionen:

In [None]:
# Klasse für den Klassifokator nach Skarveit

def sigma_skartveit_GHI(rad_df, cs):
    # Index nach Skarveit
    # berechnen des Variabiltätsindex sigma_skarveit_GHI mit Hilfe der Globalstrahlung
        """
        Eq. (3.1)
        paper uses DNI values. Here GHI
        can reach values above 1
        Skartveit A., J.A. Olseth, M.E. Tuft, 1998: An hourly
        diffuse fraction model with correction for variability
        and surface albedo. – Sol. Energy 63, 173–183, DOI:
        10.1016/S0038-092X(98)00067-X.

        Returns
        -------
        sigma_skartveit

        """
        # tus = pvlib.location.Location(latitude, longitude, tz, altitude, name)

        # kc = Clear-Sky-Index | kc_t_m1 = Clear-Sky-Index of previus hour | kc_t_p1 = Clear-Sky-Index following hour
        kc = []
        kc_t_m1 = []
        kc_t_p1 = []
        i = 0

        # füllen von kc | kc_t_m1 | kc_t_p1
        for i in range(1,len(rad_df['ghi'])-1):
            kc.append(rad_df['ghi'].iloc[i]/cs['ghi'].iloc[i])
            kc_t_m1.append(rad_df['ghi'].iloc[i-1]/cs['ghi'].iloc[i-1])
            kc_t_p1.append(rad_df['ghi'].iloc[i+1]/cs['ghi'].iloc[i+1])

        # Berechnung der arthmetischen Mittelwerte
        kc = np.array(kc).mean()
        kc_t_m1 = np.array(kc_t_m1).mean()
        kc_t_p1 = np.array(kc_t_p1).mean()

        # Umsetzung der Gleichung 3.1 für die GHI
        return sqrt((((kc-kc_t_m1)**2)+((kc-kc_t_p1)**2))/2)

In [None]:
def V_combria_GHI(rad_df, cs):
        # Index nach Combria
        # Standardabweichung der zeitlichen Unterschiede von Kc
        """
        Eq. (3.2)
        paper uses DNI values. Here GHI
        can reach values above 1
        Coimbra, C.F.M., J. Kleissl, R. Marquez, 2013: Overview of
        Solar-Forecasting Methods and a Metric for Accuracy Evaluation. – In: Kleissl, J. (Ed.): Solar Energy Forecasting and
        Resource Assessment. Oxford, 171–194.

        Returns
        -------
        V

        """
        V=0
        i=0
        kc = []
        kc_t_p1 = []
        delta_kc = []


        for i in range(len(rad_df['ghi'])-1):
            kc = rad_df['ghi'].iloc[i]/cs['ghi'].iloc[i]
            kc_t_p1 = rad_df['ghi'].iloc[i+1]/cs['ghi'].iloc[i+1]
            delta_kc = kc-kc_t_p1
            V = V+(delta_kc)**2

        # Umsetzung der Gleichung 3.2 für die GHI
        return sqrt(V/(i+1)), kc

In [None]:
def VI_stein_GHI(rad_df, cs):
        # Index nach Stein
        """
        Eq. (3.2)
        paper uses DNI values. Here GHI
        can reach values above 1
        Stein, J.S., M.J. Reno, C. Hansen, 2012: The variability index: a new and novel metric for quantifying irradiance and
        PV output variability. – In: World Renewable Energy Forum,
        Denver, CO.

        Returns
        -------
        VI
        """

        summe1 = sum(((rad_df['ghi'].diff().dropna()**2)+1)**(1/2))
        summe2 = sum(((cs['ghi'].diff().dropna()**2)+1)**(1/2))

        # Bedingte Überprüfung und Zuweisung
        ergebnis = summe1 / summe2 if summe2 != 0 else np.nan

        return ergebnis

In [None]:
stundliche_gruppe = df_complete_SZ_SC.groupby([df_complete_SZ_SC.index.date, df_complete_SZ_SC.index.hour])

df_std_ind = pd.DataFrame()
df_temp = pd.DataFrame()
stundliche_indizes = []
Messpunkt = []
temp_dfs = []

for datum, gruppe in stundliche_gruppe:
    if not gruppe.empty:  # Überprüfen, ob die Gruppe Daten enthält

        cs = tus.get_clearsky(gruppe.index)

        index_sigma = sigma_skartveit_GHI(gruppe, cs)
        index_coimbra, kc = V_combria_GHI(gruppe, cs)
        index_stein = VI_stein_GHI(gruppe, cs)

        temp_df = pd.DataFrame({'Datetime': [datum], 'index_sigma': [index_sigma], 'index_coimbra': [index_coimbra]
                                          , 'index_stein': [index_stein], 'kc' : [kc]})
        temp_dfs.append(temp_df)

df_std_ind = pd.concat(temp_dfs, ignore_index=True)

### Nachbearbeitung der Dataframes:

In [None]:
# Das Datums-Tuple aufteilen und als Datetimeindex speichern
def tuple_to_datetime(tup):
    datum, stunde = tup  # Tupel auspacken
    return datetime.strptime(f'{datum} {stunde}', '%Y-%m-%d %H')

# Anwenden der Funktion auf die Spalte 'date'
df_std_ind['Datetime'] = pd.to_datetime(df_std_ind['Datetime'].apply(tuple_to_datetime)).copy()
df_std_ind.set_index('Datetime', inplace=True)

In [None]:
# Alle nan Einträge löschen
df_std_ind.dropna(inplace = True)
df_std_ind = df_std_ind[~np.isinf(df_std_ind['index_sigma'])]
df_std_ind.max()

In [None]:
df_std_ind

## Verbinden des Dataframes mit den Varabilitäts-Indizes mit den CAMS-Vorhersagedaten:

In [None]:
df_CAMS_VarIdx = []
df_CAMS_VarIdx = pd.concat([df_CAMS, df_std_ind], axis = 1).dropna(subset=['index_sigma', 'uvbed'])

In [None]:
# Spalte mit der Uhrzeit einfügen

df_CAMS_VarIdx['Uhrzeit'] = df_CAMS_VarIdx.index.time

In [None]:
df_CAMS_VarIdx

## Prüfen ob der Dataframe richtig ist:

In [None]:
df_complete_SZ_SC.info()

In [None]:
df_complete_SZ_SC.isnull().sum()

## Abspeichern des DataFrames als CSV:

In [None]:
# Irradiance metrics over time
df_complete_SZ_SC[['UVI']].plot()
plt.title('UVI über die Zeit');

In [None]:
def export_dataframes_to_csv(df1, df2, file1_name, file2_name, folder_name):

    try:
        # Exportiere den ersten DataFrame in eine CSV-Datei
        df1.to_csv(folder_name + file1_name)
        print(f'DataFrame 1 wurde erfolgreich in "{file1_name}" exportiert.')

        # # Exportiere den zweiten DataFrame in eine CSV-Datei
        df2.to_csv(folder_name + file2_name)
        print(f'DataFrame 2 wurde erfolgreich in "{file2_name}" exportiert.')

        # # Exportiere den dritten DataFrame in eine CSV-Datei
        # df3.to_csv(folder_name + file3_name)
        # print(f'DataFrame 3 wurde erfolgreich in "{file3_name}" exportiert.')

    except Exception as e:
        print(f'Fehler beim Export der DataFrames: {str(e)}')

In [None]:
# export_dataframes_to_csv(df_complete_SZ_SC, df_CAMS_VarIdx, 'Messdaten_CAMS_GHI.csv', 'Vorhersagedaten_CAMS_VarIdx', folder_save)

## Sonstiges:

In [None]:
# Clear Sky Tage finden:

start_time = '10:00:00'
end_time = '15:00:00'

df_filtered = pd.DataFrame()
indices_to_keep = []

# filtern nach relevanten Uhrzeiten
indices_to_keep = df_CAMS_VarIdx.between_time(start_time, end_time).index

In [None]:
df_filtered = df_CAMS_VarIdx.between_time(start_time, end_time)

In [None]:
df_filtered.sort_values(by='index_stein', ascending=False).head(20)

In [None]:
# Plotten eines interessanten Datums
gewünschtes_datum = '2022-07-02'
gefilterte_daten = df_filtered[df_filtered.index.date == pd.to_datetime(gewünschtes_datum).date()]
UVI_gefiltert = df_complete['UVI'][df_complete.index.date == pd.to_datetime(gewünschtes_datum).date()]
plt.plot(UVI_gefiltert)

In [None]:
def plot_dual_axis(df1, df2, x1, y1, x2, y2, label1='Data 1', label2='Data 2', title='Dual Axis Plot'):

    fig, ax1 = plt.subplots()

    # Plotting the first DataFrame
    color = 'tab:blue'
    ax1.set_xlabel('X-axis')
    ax1.set_ylabel(label1, color=color)
    ax1.plot(df1[x1], df1[y1], color=color)
    ax1.tick_params(axis='y', labelcolor=color)

    # Creating a second y-axis for the second DataFrame
    ax2 = ax1.twinx()
    color = 'tab:red'
    ax2.set_ylabel(label2, color=color)
    ax2.plot(df2[x2], df2[y2], color=color)
    ax2.tick_params(axis='y', labelcolor=color)

    # Adding title
    plt.title(title)

    fig.tight_layout()
    plt.show()

In [None]:
# Plotten eines Clear-Sky Tages:
gewünschtes_datum = '2022-07-02'

#
plot_dual_axis(df_complete[df_complete.index.date == pd.to_datetime(gewünschtes_datum).date()], df_CAMS_VarIdx[df_CAMS_VarIdx.index.date == pd.to_datetime(gewünschtes_datum).date()], 'Uhrzeit', 'UVI', 'Uhrzeit', 'GHI', 'UV-Index', 'Globalstrahlung', 'Aufstieg vs. Abstieg')

In [None]:
df_CAMS_VarIdx[df_CAMS_VarIdx.index.date == pd.to_datetime(gewünschtes_datum).date()]