<a href="https://colab.research.google.com/github/rjanow/Masterarbeit/blob/main/UV_Measurement_to_CSV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Hier werden die Rohdaten des BTS2048-UV-WP in eine nutzbare CSV-Datei geschrieben**

Dokumentenname: UV_Measurement_to_CSV.ipynb




Es werden die OR0-Daten (NasaAmes-Format), die eigentlich für den Versand an das BFS gedacht sind umgewandelt und in eine CSV-Datei geschrieben.

In [2]:
# Import der benötigten Module
import os, sys
import glob
import json
import datetime
import pandas as pd
import numpy as np
import csv
import re

from scipy.io import netcdf
from datetime import timedelta
from datetime import datetime

from google.colab import drive
from google.colab import files

Zu Beginn muss die Google-Drive eingerichtet werden, in der die Messdaten (OR0-Dateien) gespeichert sind. Danach werden alle verfügbaren Unterordner aufgerufen. So wird geprüft, ob der Mount richtig funktioniert hat.

In [3]:
drive.mount('/content/drive')

drive_path = '/content/drive/MyDrive'
# Durchsuche den Google Drive-Pfad
for root, dirs, files in os.walk(drive_path):
    for dir in dirs:
        # Gib den Namen des Unterordners aus
        print(os.path.join(root, dir))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Colab_Notebooks
/content/drive/MyDrive/Colab Notebooks
/content/drive/MyDrive/Colab_Notebooks/CSV_Messdaten
/content/drive/MyDrive/Colab_Notebooks/CouchDB File
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data


Danach werden die einzelnen OR0-Dateien (NasaAmes Format) geladen und umgewandelt.

Das NasaAmes Format: https://espoarchive.nasa.gov/content/Ames_Format_Specification_v20

Die Messdaten sind unter dem FFI (File Format Index) 2005 gespeichert. Dieser Standard ist durch die Nasa nicht dokumentiert. Deshalb nachfolgend ein eigener Parser, der die Daten in eine nutzbare CSV umwandelt.

**Dict und Inhalt**

- **file_names** = enthält die Dateinamen der einzelnen OR0-Dateien
- **file_content** = Enthält den Inhalt der OR0-Dateien
- **end_line_header** = enthält die Zeile an dem der Header endet

**Dateien einlesen:**

In [4]:
# Pfad zum Ordner mit den Dateien in Google Drive
folder_path = '/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data'

# OR0-Dateien im Ordner lesen
file_paths = glob.glob(folder_path + '/*.OR0')

# Liste für die Dateinamen erstellen
file_names = []

# Schleife über die Dateien
for file_path in file_paths:

    if os.path.getsize(file_path) > 100 * 1024:
      # Dateiname extrahieren
      file_name = os.path.splitext(os.path.basename(file_path))[0]

      # Datei öffnen und Inhalt lesen
      with open(file_path, 'r') as file:
          file_content = file.read()

      # Variable für die Datei erstellen
      globals()[file_name] = file_content

      file_names.append(file_name)

In [5]:
# print(file_names)

['SA220615', 'SA220616', 'SA220617', 'SA220618', 'SA220619', 'SA220620', 'SA220621', 'SA220622', 'SA220623', 'SA220624', 'SA220625', 'SA220626', 'SA220627', 'SA220628', 'SA220629', 'SA220630', 'SA220701', 'SA220702', 'SA220703', 'SA220704', 'SA220705', 'SA220707', 'SA220708', 'SA220709', 'SA220710', 'SA220711', 'SA220712', 'SA220713', 'SA220714', 'SA220715', 'SA220716', 'SA220717', 'SA220718', 'SA220719', 'SA220720', 'SA220721', 'SA220722', 'SA220723', 'SA220724', 'SA220725', 'SA220726', 'SA220727', 'SA220728', 'SA220729', 'SA220730', 'SA220731', 'SA220801', 'SA220802', 'SA220803', 'SA220804', 'SA220805', 'SA220806', 'SA220807', 'SA220808', 'SA220809', 'SA220810', 'SA220811', 'SA220812', 'SA220813', 'SA220814', 'SA220815', 'SA220816', 'SA220817', 'SA220818', 'SA220819', 'SA220820', 'SA220821', 'SA220822', 'SA220823', 'SA220824', 'SA220825', 'SA220826', 'SA220827', 'SA220829', 'SA220830', 'SA220831', 'SA220901', 'SA220902', 'SA220906', 'SA220907', 'SA220908', 'SA220909', 'SA220912', 'SA

**String aufteilen in einzelne Zeile schreiben:**

In [6]:
file_variables = {}  # Dictionary für die Variablen erstellen

for file_name in file_names:
    file_content = globals()[file_name]
    file_variables[file_name] = file_content

# Auf Variablen zugreifen und String in Zeilen aufteilen
for file_name, variable in file_variables.items():
    file_variables[file_name] = file_variables[file_name].split('\n')

In [None]:
# die ersten 10 File Namen ausgeben und die ersten 20 Zeilen der ersten Datei
# print(file_names[:10])
# file_variables['SA220615'][:20]

**Header extrahieren:**

In [8]:
# Funktion um den Dateiheader zu extrahieren
def extract_header(dataset):
    header_content = ""
    end_line_header = None

    for i, line in enumerate(dataset):
        header_content += line + "\n"
        if line.strip() == "Pyranometer: readout interval [secs]=5":
            end_line_header = i
            break


    return header_content, end_line_header

In [9]:
new_dict = {file_name: None for file_name in file_names}

In [10]:
def perform_action(file_variables, file_names):
    header_dict = {file_name: "" for file_name in file_names}

    for file_name, data in file_variables.items():
        for i, line in enumerate(data):
            header_dict[file_name] += line + "\n"
            if line.strip() == "Pyranometer: readout interval [secs]=5":
                end_line_header_fnc = i
                break

    return header_dict, end_line_header_fnc

In [11]:
file_header, end_line_header = perform_action(file_variables, file_names)

**Dataframe mit Wellenlängen erstellen:**

In [12]:
def create_df_Wellenlaenge(start, end, step):
    # Erstelle eine Liste mit den gewünschten Werten
    numbers_list = [round(num, 3) for num in list(np.arange(start, end + step, step))]
    # Erstelle den Dataframe
    df = pd.DataFrame({'Wellenlaenge': numbers_list})

    return df

In [13]:
np_Wellenlaenge = np.round(np.arange(290.0, 420.05, 0.1), decimals = 1)
df_Wellenlaenge = pd.DataFrame({'Wellenlaenge': np_Wellenlaenge})

**Nach Datum in Header suchen:**

In [14]:
def extract_dates_from_dict(dictionary):
    dates = {}

    for key, value in dictionary.items():
        header_content = value  # Annahme: Der Wert im Dictionary enthält den Header-Inhalt
        header_split = header_content.split('\n')
        date_line = header_split[6].split()
        start_date = "-".join(date_line[:3])
        end_date = "-".join(date_line[3:])

        if start_date == end_date:
            date_object = datetime.strptime(start_date, "%Y-%m-%d")
            dates[key] = date_object
        else:
            dates[key] = "Error: Start and end dates are not the same"
            print("Error: Start and end dates are not the same")

    return dates

In [None]:
date = extract_dates_from_dict(file_header)

**Header aus Datensatz löschen:**

In [17]:
def remove_header(lines_content, end_line):
    lines_WO_header_fnc = {}
    lines_WO_header_fnc = lines_content.copy()

    for key, value in lines_WO_header_fnc.items():
        del value[:end_line+1]

    return lines_WO_header_fnc

In [18]:
lines_WO_header = remove_header(file_variables, end_line_header)

In [19]:
print(lines_WO_header['SA220615'][:20])

['26460', '113 4.000E-02 9.999E+9', '2.000E+00 0.000E+00 999.9 9.999E+9 9.999E+9 ', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0

**Datensatz als CSV speichern:**

In [20]:
def dict_to_csv(dict_data, file_path):
    with open(file_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(dict_data.keys())
        writer.writerow(dict_data.values())

ValueError: ignored

In [None]:
file_path =

In [None]:
dict_to_csv(lines_WO_header, file_path)

**Zeitstempel in Datensatz finden:**

In [None]:
# Funktion um den Anfang einer einzelnen Messung zu finden
def split_dataset(dataset):
    data_packages = []
    current_package = []

    for i, line in enumerate(dataset):
        if line.strip().isdigit() and len(line.strip()) == 5:
            if i + 1 < len(dataset) and dataset[i + 1].strip().isdigit() and len(dataset[i + 1].strip()) in [2, 3]:
                if current_package:
                    data_packages.append(current_package)
                    current_package = []
            current_package.append(line)

    if current_package:
        data_packages.append(current_package)

    return data_packages

In [None]:
data_packages = split_dataset(lines_WO_header)

In [None]:
data_packages

Den Zeitstempeln ein Indice zuorden:

In [None]:
def find_indices(elemente, meine_liste):
    indices = []

    for element in elemente:
        if element in meine_liste:
            indices.append(meine_liste.index(element))
        else:
            indices.append(-1)

    return indices

In [None]:
# Beispielaufruf der Funktion
indices_Timestamp = find_indices(package, lines_WOH_split)
print(indices_Timestamp)

**Zeitstempel in eine Uhrzeit umwandeln:**

In [None]:
# Funktion um den Zeitstempel der Messung in eine Uhrzeit und ein Datum umzuwandeln
def seconds_to_time(time_seconds, date):
    data = []

    for sec in time_seconds:
        hours = int(sec) // 3600
        minutes = (int(sec) % 3600) // 60
        seconds = int(sec) % 60
        time = pd.to_datetime(f"{date} {hours:02d}:{minutes:02d}:{seconds:02d}")
        data.append({'Datum':date,'Stunden': hours, 'Minuten': minutes, 'Sekunden': seconds, 'Uhrzeit': time})

    df = pd.DataFrame(data)
    return df

In [None]:
df_time = seconds_to_time(package, date)

# Ausgabe des DataFrames
print(df_time)

**Herausschreiben der einzelnen Messungen**

Spalten des Datensatz in einzelne Elemente aufteilen:

In [None]:
result_list = extract_numbers(lines_WO_header)

In [None]:
def split_columns(df):
    new_columns = {}
    for column in df.columns:
        split_values = df[column].str.split(' ', expand=True)
        num_values = split_values.shape[1]
        new_columns.update({f'{column}_{i+1}': split_values[i] for i in range(num_values)})

    df_split = pd.DataFrame(new_columns)
    return df_split

In [None]:


def split_data(lines_WOH_fnc, split_indices):
    data_parts = []
    for i in range(len(split_indices)):
        if i == 0:
            data_parts.append(lines_WOH_fnc[:split_indices[i]])
        else:
            data_parts.append(lines_WOH_fnc[split_indices[i-1]:split_indices[i]])
    data_parts.append(lines_WOH_fnc[split_indices[-1]:])
    df = pd.DataFrame(data_parts)
    num_cols = max([len(x) for x in data_parts])
    col_names = ['Col{}'.format(i+1) for i in range(num_cols)]
    df.columns = col_names
    df = df.drop(df.index[0]).reset_index(drop=True)
    return df


In [None]:
print(lines_WO_header[:5], type(data_packages[0]))

In [None]:
df_Messung = split_data(lines_WOH_split, indices_Timestamp)
print(df_Messung)

Dataframe in die richtige Form bringen:

In [None]:
df_Mea_Time = pd.concat([df_time, df_Messung], axis = 1)

In [None]:
start_index = 18
num_elements = 1301

columns_to_rename = ["Col" + str(i) for i in range(start_index, start_index + num_elements)]

In [None]:
for item in columns_to_rename:
    old_column_name = item
    new_column_name = str(df_Wellenlaenge.loc[columns_to_rename.index(item), 'Wellenlaenge'])
    df_Mea_Time.rename(columns={old_column_name: new_column_name}, inplace=True)

In [None]:
df_Mea_Time

**Abspeichern der Messdaten als CSV**

In [None]:
df_Mea_Time.to_csv('/content/drive/MyDrive/Colab_Notebooks/CSV_Messdaten/mea_time.csv', index=False)