<a href="https://colab.research.google.com/github/rjanow/Masterarbeit/blob/main/UV_Measurement_to_CSV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Hier werden die Rohdaten des BTS2048-UV-WP in eine nutzbare CSV-Datei geschrieben**

Dokumentenname: UV_Measurement_to_CSV.ipynb




Es werden die OR0-Daten (NasaAmes-Format), die eigentlich für den Versand an das BFS gedacht sind umgewandelt und in eine CSV-Datei geschrieben. Aufgrund der großen Datenmenge geschieht dies für jeden Monat getrennt.

In [38]:
# Import der benötigten Module
import os, sys
import glob
import json
import datetime
import pandas as pd
import numpy as np
import csv
import re

from scipy.io import netcdf
from datetime import timedelta
from datetime import datetime

from google.colab import drive
from google.colab import files

Zu Beginn muss die Google-Drive eingerichtet werden, in der die Messdaten (OR0-Dateien) gespeichert sind. Danach werden alle verfügbaren Unterordner aufgerufen. So wird geprüft, ob der Mount richtig funktioniert hat.

In [2]:
drive.mount('/content/drive')

drive_path = '/content/drive/MyDrive'
# Durchsuche den Google Drive-Pfad
for root, dirs, files in os.walk(drive_path):
    for dir in dirs:
        # Gib den Namen des Unterordners aus
        print(os.path.join(root, dir))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Colab_Notebooks
/content/drive/MyDrive/Colab Notebooks
/content/drive/MyDrive/Colab_Notebooks/CSV_Messdaten
/content/drive/MyDrive/Colab_Notebooks/CouchDB File
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.06
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.07
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.08
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.09
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.10
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.11
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.12
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/23.01
/content/drive/

Danach werden die einzelnen OR0-Dateien (NasaAmes Format) geladen und umgewandelt.

Das NasaAmes Format: https://espoarchive.nasa.gov/content/Ames_Format_Specification_v20

Die Messdaten sind unter dem FFI (File Format Index) 2005 gespeichert. Dieser Standard ist durch die Nasa nicht dokumentiert. Deshalb nachfolgend ein eigener Parser, der die Daten in eine nutzbare CSV umwandelt.

**Beschreibung der einzelnen Dictionaries und deren Inhalt:**

- **file_names** = enthält die Dateinamen der einzelnen OR0-Dateien
- **file_content** = Enthält den Inhalt der OR0-Dateien
- **end_line_header** = enthält die Zeile an dem der Header endet

**Dateien einlesen:**

In [3]:
# Pfad zum Ordner mit den Dateien in Google Drive
folder_path = '/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/22.06'

# OR0-Dateien im Ordner lesen
file_paths = glob.glob(folder_path + '/*.OR0')

# Liste für die Dateinamen erstellen
file_names = []

# Schleife über die Dateien
for file_path in file_paths:

    if os.path.getsize(file_path) > 100 * 1024:
      # Dateiname extrahieren
      file_name = os.path.splitext(os.path.basename(file_path))[0]

      # Datei öffnen und Inhalt lesen
      with open(file_path, 'r') as file:
          file_content = file.read()

      # Variable für die Datei erstellen
      globals()[file_name] = file_content

      file_names.append(file_name)

**Dataframe mit Wellenlängen erstellen:**
- Dieser wird später genutzt um die Spalten des Dataframe zu benennen.

In [5]:
def create_df_Wellenlaenge(start, end, step):
    # Erstelle eine Liste mit den gewünschten Werten
    numbers_list = [round(num, 3) for num in list(np.arange(start, end + step, step))]
    # Erstelle den Dataframe
    df = pd.DataFrame({'Wellenlaenge': numbers_list})

    return df

In [6]:
np_Wellenlaenge = np.round(np.arange(290.0, 420.05, 0.1), decimals = 1)
df_Wellenlaenge = pd.DataFrame({'Wellenlaenge': np_Wellenlaenge})

**String aufteilen in einzelne Zeile schreiben:**

- Zur weitern Verarbeitung müssen alle Elemente als einzelene Strings abgespeichet werden.

In [7]:
file_content = {}  # Dictionary für die Messungen erstellen

for file_name in file_names:
    file_variables = globals()[file_name]
    file_content[file_name] = file_variables

# Auf Variablen zugreifen und String in Zeilen aufteilen
for file_name, variable in file_content.items():
    file_content[file_name] = file_content[file_name].split('\n')

In [None]:
# die ersten 10 File Namen ausgeben und die ersten 20 Zeilen der ersten Messung
print(file_names[:10])
file_content['SA220615'][:20]

**Header extrahieren:**

In [9]:
# Funktion um den Dateiheader zu extrahieren
def perform_action(file_variables, file_names):
    header_dict = {file_name: "" for file_name in file_names}
    end_line_header_fnc = 0

    for file_name, data in file_variables.items():
        for i, line in enumerate(data):
            header_dict[file_name] += line + "\n"
            if line.strip() == "Pyranometer: readout interval [secs]=5":
                end_line_header_fnc = i
                break

    return header_dict, end_line_header_fnc

In [10]:
file_header, end_line_header = perform_action(file_content, file_names)

**Nach Datum in Header suchen und in Datetime umwandeln:**

In [11]:
def extract_dates_from_dict(dictionary):
    dates = {}

    for key, value in dictionary.items():
        header_content = value  # Annahme: Der Wert im Dictionary enthält den Header-Inhalt
        header_split = header_content.split('\n')
        date_line = header_split[6].split()
        start_date = "-".join(date_line[:3])
        end_date = "-".join(date_line[3:])

        if start_date == end_date:
            date_object = datetime.strptime(start_date, "%Y-%m-%d")
            dates[key] = date_object
        else:
            dates[key] = "Error: Start and end dates are not the same"
            print("Error: Start and end dates are not the same")

    return dates

In [44]:
# dict welches das Datum jeder einzelnen Datei enthält:
date = extract_dates_from_dict(file_header)

In [None]:
date

**Header aus Datensatz löschen:**

In [13]:
def remove_header(lines_content, end_line):
    lines_WO_header_fnc = {}
    lines_WO_header_fnc = lines_content.copy()

    for key, value in lines_WO_header_fnc.items():
        del value[:end_line+1]

    return lines_WO_header_fnc

In [15]:
lines_WO_header = remove_header(file_content, end_line_header)

In [16]:
print(lines_WO_header['SA220615'][:20])

['26460', '113 4.000E-02 9.999E+9', '2.000E+00 0.000E+00 999.9 9.999E+9 9.999E+9 ', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0

**Elemente aufteilen:**

In [17]:
def flatten_and_split(input_list):
    result = []
    for sublist in input_list:
        elements = sublist.split()
        result.extend(elements)
    return result

In [18]:
def process_dict(input_dict):
    processed_dict = {}
    for key, value in input_dict.items():
        processed_value = flatten_and_split(value)
        sublists = []
        sublist = []
        for element in processed_value:
            if element.isdigit() and len(element) == 5:
                if sublist:
                    sublists.append(sublist)
                    sublist = []
            sublist.append(element)
        if sublist:
            sublists.append(sublist)
        processed_dict[key] = sublists
    return processed_dict

In [19]:
processed_dict = process_dict(lines_WO_header)

In [20]:
print(processed_dict['SA220615'][:2])

[['26460', '113', '4.000E-02', '9.999E+9', '2.000E+00', '0.000E+00', '999.9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '9.999E+9', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0000E+00', '0.0

**Zeitstempel in Datensatz finden:**

In [21]:
def find_5_digit_integers(input_list):
    result = []
    for sublist in input_list:
        for element in sublist:
            if isinstance(element, str) and element.isdigit() and len(element) == 5:
                result.append(element)
    return result

In [22]:
def find_5_digit_integers_in_dict(input_dict):
    result_dict = {}
    for key, value in input_dict.items():
        result_dict[key] = find_5_digit_integers(value)
    return result_dict

In [23]:
result_dict = find_5_digit_integers_in_dict(processed_dict)

In [None]:
result_dict

**Zeitstempel in eine Uhrzeit umwandeln:**

In [25]:
def seconds_to_time(seconds):
    hours = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    return f"{hours:02}:{minutes:02}:{seconds:02}"

In [26]:
def convert_and_store_first_entries(input_dict):
    result_datetime_dict = {}
    for key, value in input_dict.items():
        date = key[2:]  # Extrahiere das Datum aus dem Schlüssel
        times = []
        for sublist in value:
            if sublist:  # Überprüfe, ob die Liste nicht leer ist
                first_entry_seconds = int(sublist[0])  # Der erste Eintrag in der Liste
                time = seconds_to_time(first_entry_seconds)
                times.append(time)
            else:
                times.append(None)
        result_datetime_dict[date] = times
    return result_datetime_dict

In [27]:
result_first_entries_dict = convert_and_store_first_entries(processed_dict)

In [28]:
print(result_first_entries_dict)

{'220615': ['07:21:00', '07:23:00', '07:25:00', '07:27:00', '07:29:00', '07:31:00', '07:33:00', '07:35:00', '07:37:00', '07:39:00', '07:41:00', '07:43:00', '07:45:00', '08:15:00', '08:17:00', '08:19:00', '08:21:00', '08:23:00', '08:25:00', '08:27:00', '08:29:00', '08:31:00', '08:33:00', '08:35:00', '08:37:00', '08:39:00', '08:41:00', '08:43:00', '08:45:00', '08:47:00', '08:49:00', '08:51:00', '08:53:00', '08:55:00', '08:57:00', '08:59:00', '09:01:00', '09:03:00', '09:05:00', '09:07:00', '09:09:00', '09:11:00', '09:13:00', '09:15:00', '09:17:00', '09:19:00', '09:21:00', '09:23:00', '09:25:00', '09:27:00', '09:29:00', '09:31:00', '09:33:00', '09:35:00', '09:37:00', '09:39:00', '09:41:00', '09:43:00', '09:45:00', '09:47:00', '09:49:00', '09:51:00', '09:53:00', '09:55:00', '09:57:00', '09:59:00', '10:01:00', '10:03:00', '10:05:00', '10:07:00', '10:09:00', '10:11:00', '10:13:00', '10:15:00', '10:17:00', '10:19:00', '10:21:00', '10:23:00', '10:25:00', '10:27:00', '10:29:00', '10:31:00', '10:

**dict in Dataframe speichern:**

In [33]:
# Erstelle eine leere Liste, um die Zeilen für den DataFrame aufzunehmen
data_rows = []

# Iteriere durch das verschachtelte Dictionary und erstelle Zeilen für den DataFrame
for key, value in processed_dict.items():
    for sublist in value:
        data_rows.append([key] + sublist)

# Definiere Spaltennamen für den DataFrame
columns = ['Datum'] + [f'Wert{i}' for i in range(1, len(data_rows[0]))]

# Erstelle den Pandas DataFrame
df = pd.DataFrame(data_rows, columns=columns)

In [None]:
df

In [42]:
def save_dataframe_to_drive(dataframe, folder_path, filename):

    # Erstelle den vollen Pfad zur Datei
    full_path = os.path.join(folder_path, filename)

    # Speichere den DataFrame als CSV-Datei auf Google Drive
    dataframe.to_csv(full_path, index=False)

    print(f'Der DataFrame wurde als {filename} in {folder_path} auf Google Drive gespeichert.')

In [43]:
save_dataframe_to_drive(df, '/content/drive/My Drive/', 'output.csv')

Der DataFrame wurde als output.csv in /content/drive/My Drive/ auf Google Drive gespeichert.
