<a href="https://colab.research.google.com/github/rjanow/Masterarbeit/blob/main/UV_Measurement.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Hier werden die Rohdaten des BTS2048-UV-WP in eine nutzbare CSV-Datei geschrieben**

Dokumentenname: Import_UV_Measurement.ipynb

Es werden die OR0-Daten (NasaAmes-Format), die eigentlich für den Versand an das BFS gedacht sind umgewandelt und in eine CSV-Datei geschrieben.

In [2]:
# Import der benötigten Module
import os, sys
import json
import datetime
import pandas as pd
import numpy as np
import csv
import re

from scipy.io import netcdf
from datetime import timedelta
from datetime import datetime

Zu Beginn muss die Google-Drive eingerichtet werden, in der die Messdaten (OR0-Dateien) gespeichert sind. Danach werden alle verfügbaren Unterordner aufgerufen. So wird geprüft, ob der Mount richtig funktioniert hat.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

drive_path = '/content/drive/MyDrive'
# Durchsuche den Google Drive-Pfad
for root, dirs, files in os.walk(drive_path):
    for dir in dirs:
        # Gib den Namen des Unterordners aus
        print(os.path.join(root, dir))

Mounted at /content/drive
/content/drive/MyDrive/Colab_Notebooks
/content/drive/MyDrive/Colab Notebooks
/content/drive/MyDrive/Colab_Notebooks/CSV_Messdaten
/content/drive/MyDrive/Colab_Notebooks/CouchDB File
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten
/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data


Danach werden die einzelnen OR0-Dateien (NasaAmes Format) geladen und umgewandelt.

Das NasaAmes Format: https://espoarchive.nasa.gov/content/Ames_Format_Specification_v20

Die Messdaten sind unter dem FFI (File Format Index) 2005 gespeichert. Dieser Standard ist durch die Nasa nicht dokumentiert. Deshalb nachfolgend ein eigener Parser, der die Daten in eine nutzbare CSV umwandelt.

**Umwandeln einer Datei...**

- **lines** = enthält die OR0-Datei in Zeilen aufgeteilt
- **header** = Enthält den Dateikopf
- **end_line_header** = enthält die Zeile an dem der Header endet

**Datei einlesen:**

In [5]:
# Testweise Öffnen einer Datei um zu prüfen, ob die Verbindung zu Google Drive funktioniert
file_path = r'/content/drive/MyDrive/Colab_Notebooks/NasaAmes_Messdaten/Data/SA220615.OR0'  # Dateipfad in der Google-Drive

with open(file_path, 'r') as file:
    # Inhalt der Datei lesen
    content = file.read()

In [6]:
# Zeilen in eine Liste aufteilen
lines = content.split('\n')

In [7]:
print(lines[0:15])

['NLHEAD=207 FFI=2005        ', 'R.Janowitz', 'DGUV', 'ELONG=7.182 NLAT=50.780 HEIGHT=70m', 's-UV-Mo-Net', '1 1', '2022 06 15  2022 06 15', '0 360', '1301', '1301', '  290.000  290.100  290.200  290.300  290.400  290.500  290.600  290.700', '  290.800  290.900  291.000  291.100  291.200  291.300  291.400  291.500', '  291.600  291.700  291.800  291.900  292.000  292.100  292.200  292.300', '  292.400  292.500  292.600  292.700  292.800  292.900  293.000  293.100', '  293.200  293.300  293.400  293.500  293.600  293.700  293.800  293.900']


**Header extrahieren:**

In [8]:
# Funktion um den Dateiheader zu extrahieren
def extract_header(dataset):
    header_content = ""
    end_line_header = None

    for i, line in enumerate(dataset):
        header_content += line + "\n"
        if line.strip() == "Pyranometer: readout interval [secs]=5":
            end_line_header = i
            break


    return header_content, end_line_header

In [9]:
header = extract_header(lines)

**Dataframe mit Wellenlängen erstellen**

In [10]:
def create_df_Wellenlaenge(start, end, step):
    # Erstelle eine Liste mit den gewünschten Werten
    numbers_list = [round(num, 3) for num in list(np.arange(start, end + step, step))]
    # Erstelle den Dataframe
    df = pd.DataFrame({'Wellenlaenge': numbers_list})

    return df

In [11]:
df_Wellenlaenge = create_df_Wellenlaenge(290, 420, 0.1)

In [12]:
df_Wellenlaenge

Unnamed: 0,Wellenlaenge
0,290.0
1,290.1
2,290.2
3,290.3
4,290.4
...,...
1297,419.7
1298,419.8
1299,419.9
1300,420.0


**Nach Datum in Header suchen:**

In [13]:
# Funktion um das Datum aus dem Header zu extrahieren
def extract_date(header_content):
    header_split = header_content.split('\n')
    date_line = header_split[6]
    start_date = date_line.split()[0:3]
    end_date = date_line.split()[3:6]
    if start_date == end_date:
        start_date = [int(element) for element in start_date]
        date_object = datetime(start_date[0], start_date[1], start_date[2])
        return date_object
    else:
        return "Error: Start and end dates are not the same"

In [14]:
date = extract_date(header[0])
print("Datum:", date)

Datum: 2022-06-15 00:00:00


In [19]:
header[0]

'NLHEAD=207 FFI=2005        \nR.Janowitz\nDGUV\nELONG=7.182 NLAT=50.780 HEIGHT=70m\ns-UV-Mo-Net\n1 1\n2022 06 15  2022 06 15\n0 360\n1301\n1301\n  290.000  290.100  290.200  290.300  290.400  290.500  290.600  290.700\n  290.800  290.900  291.000  291.100  291.200  291.300  291.400  291.500\n  291.600  291.700  291.800  291.900  292.000  292.100  292.200  292.300\n  292.400  292.500  292.600  292.700  292.800  292.900  293.000  293.100\n  293.200  293.300  293.400  293.500  293.600  293.700  293.800  293.900\n  294.000  294.100  294.200  294.300  294.400  294.500  294.600  294.700\n  294.800  294.900  295.000  295.100  295.200  295.300  295.400  295.500\n  295.600  295.700  295.800  295.900  296.000  296.100  296.200  296.300\n  296.400  296.500  296.600  296.700  296.800  296.900  297.000  297.100\n  297.200  297.300  297.400  297.500  297.600  297.700  297.800  297.900\n  298.000  298.100  298.200  298.300  298.400  298.500  298.600  298.700\n  298.800  298.900  299.000  299.100  299

**Header aus Datensatz löschen:**

In [None]:
def remove_header(lines_content, end_line):
        lines_WO_header_fnc = []
        lines_WO_header_fnc = lines_content

        del lines_WO_header_fnc[:end_line+1]

        return lines_WO_header_fnc

Spalten des Datensatz in einzelne Elemente aufteilen:

In [None]:
def extract_numbers(ursprungs_liste):
    neue_liste = []

    for element in ursprungs_liste:
        zahlen = element.split()
        for zahl in zahlen:
            neue_liste.append(zahl)

    return neue_liste

In [None]:
lines_WO_header = remove_header(lines, header[1])
lines_WOH_split = extract_numbers(lines_WO_header)

print(lines_WO_header[0:10],'\n', lines_WOH_split[0:10])

['26460', '113 4.000E-02 9.999E+9', '2.000E+00 0.000E+00 999.9 9.999E+9 9.999E+9 ', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00', '  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00'] 
 ['26460', '113', '4.000E-02', '9.999E+9', '2.000E+00', '0.000E+00', '999.9', '9.999E+9', '9.999E+9', '9.999E+9']


**Zeitstempel in Datensatz finden:**

In [None]:
# Funktion um den Anfang einer einzelnen Messung zu finden
def split_dataset(dataset):
    data_packages = []
    current_package = []

    for i, line in enumerate(dataset):
        if line.strip().isdigit() and len(line.strip()) == 5:
            if i + 1 < len(dataset) and dataset[i + 1].strip().isdigit() and len(dataset[i + 1].strip()) in [2, 3]:
                if current_package:
                    data_packages.append(current_package)
                    current_package = []
            current_package.append(line)

    if current_package:
        data_packages.append(current_package)

    return data_packages

In [None]:
data_packages = split_dataset(lines_WO_header)

In [None]:
for i, package in enumerate(data_packages):
    print(package)

['26460', '26580', '26700', '26820', '26940', '27060', '27180', '27300', '27420', '27540', '27660', '27780', '27900', '29700', '29820', '29940', '30060', '30180', '30300', '30420', '30540', '30660', '30780', '30900', '31020', '31140', '31260', '31380', '31500', '31620', '31740', '31860', '31980', '32100', '32220', '32340', '32460', '32580', '32700', '32820', '32940', '33060', '33180', '33300', '33420', '33540', '33660', '33780', '33900', '34020', '34140', '34260', '34380', '34500', '34620', '34740', '34860', '34980', '35100', '35220', '35340', '35460', '35580', '35700', '35820', '35940', '36060', '36180', '36300', '36420', '36540', '36660', '36780', '36900', '37020', '37140', '37260', '37380', '37500', '37620', '37740', '37860', '37980', '38100', '38340', '38460', '38580', '38700', '38820', '38940', '39060', '39180', '39300', '39420', '39540', '39660', '39780', '39900', '40020', '40140', '40260', '40380', '40500', '40620', '40740', '40860', '40980', '41100', '41220', '41340', '41460', 

Den Zeitstempeln ein Indice zuorden:

In [None]:
def find_indices(elemente, meine_liste):
    indices = []

    for element in elemente:
        if element in meine_liste:
            indices.append(meine_liste.index(element))
        else:
            indices.append(-1)

    return indices

In [None]:
# Beispielaufruf der Funktion
indices_Timestamp = find_indices(package, lines_WOH_split)
print(indices_Timestamp)

[0, 1318, 2636, 3954, 5272, 6590, 7908, 9226, 10544, 11862, 13180, 14498, 15816, 17134, 18452, 19770, 21088, 22406, 23724, 25042, 26360, 27678, 28996, 30314, 31632, 32950, 34268, 35586, 36904, 38222, 39540, 40858, 42176, 43494, 44812, 46130, 47448, 48766, 50084, 51402, 52720, 54038, 55356, 56674, 57992, 59310, 60628, 61946, 63264, 64582, 65900, 67218, 68536, 69854, 71172, 72490, 73808, 75126, 76444, 77762, 79080, 80398, 81716, 83034, 84352, 85670, 86988, 88306, 89624, 90942, 92260, 93578, 94896, 96214, 97532, 98850, 100168, 101486, 102804, 104122, 105440, 106758, 108076, 109394, 110712, 112030, 113348, 114666, 115984, 117302, 118620, 119938, 121256, 122574, 123892, 125210, 126528, 127846, 129164, 130482, 131800, 133118, 134436, 135754, 137072, 138390, 139708, 141026, 142344, 143662, 144980, 146298, 147616, 148934, 150252, 151570, 152888, 154206, 155524, 156842, 158160, 159478, 160796, 162114, 163432, 164750, 166068, 167386, 168704, 170022, 171340, 172658, 173976, 175294, 176612, 177930

**Zeitstempel in eine Uhrzeit umwandeln:**

In [None]:
# Funktion um den Zeitstempel der Messung in eine Uhrzeit und ein Datum umzuwandeln
def seconds_to_time(time_seconds, date):
    data = []

    for sec in time_seconds:
        hours = int(sec) // 3600
        minutes = (int(sec) % 3600) // 60
        seconds = int(sec) % 60
        time = pd.to_datetime(f"{date} {hours:02d}:{minutes:02d}:{seconds:02d}")
        data.append({'Datum':date,'Stunden': hours, 'Minuten': minutes, 'Sekunden': seconds, 'Uhrzeit': time})

    df = pd.DataFrame(data)
    return df

In [None]:
df_time = seconds_to_time(package, date)

# Ausgabe des DataFrames
print(df_time)

         Datum  Stunden  Minuten  Sekunden             Uhrzeit
0   2022-06-15        7       21         0 2022-06-15 07:21:00
1   2022-06-15        7       23         0 2022-06-15 07:23:00
2   2022-06-15        7       25         0 2022-06-15 07:25:00
3   2022-06-15        7       27         0 2022-06-15 07:27:00
4   2022-06-15        7       29         0 2022-06-15 07:29:00
..         ...      ...      ...       ...                 ...
319 2022-06-15       18       49         0 2022-06-15 18:49:00
320 2022-06-15       18       51         0 2022-06-15 18:51:00
321 2022-06-15       18       53         0 2022-06-15 18:53:00
322 2022-06-15       18       55         0 2022-06-15 18:55:00
323 2022-06-15       18       57         0 2022-06-15 18:57:00

[324 rows x 5 columns]


**Herausschreiben der einzelnen Messungen**

In [None]:
def split_data(lines_WOH_fnc, split_indices):
    data_parts = []
    for i in range(len(split_indices)):
        if i == 0:
            data_parts.append(lines_WOH_fnc[:split_indices[i]])
        else:
            data_parts.append(lines_WOH_fnc[split_indices[i-1]:split_indices[i]])
    data_parts.append(lines_WOH_fnc[split_indices[-1]:])
    df = pd.DataFrame(data_parts)
    num_cols = max([len(x) for x in data_parts])
    col_names = ['Col{}'.format(i+1) for i in range(num_cols)]
    df.columns = col_names
    df = df.drop(df.index[0]).reset_index(drop=True)
    return df


In [None]:
print(lines_WO_header[:5], type(data_packages[0]))

['26460', '113 4.000E-02 9.999E+9', '2.000E+00 0.000E+00 999.9 9.999E+9 9.999E+9 ', '9.999E+9 9.999E+9 9.999E+9 9.999E+9', '9.999E+9 9.999E+9 9.999E+9 9.999E+9'] <class 'list'>


In [None]:
df_Messung = split_data(lines_WOH_split, indices_Timestamp)
print(df_Messung)

Dataframe in die richtige Form bringen:

In [None]:
df_Mea_Time = pd.concat([df_time, df_Messung], axis = 1)

In [None]:
df_Mea_Time

Unnamed: 0,Datum,Stunden,Minuten,Sekunden,Uhrzeit,Col1,Col2,Col3,Col4,Col5,...,Col1309,Col1310,Col1311,Col1312,Col1313,Col1314,Col1315,Col1316,Col1317,Col1318
0,2022-06-15,7,21,0,2022-06-15 07:21:00,26460,113,4.000E-02,9.999E+9,2.000E+00,...,6.9350E-01,6.9191E-01,6.8556E-01,6.6990E-01,6.5661E-01,6.4664E-01,6.4017E-01,6.3688E-01,6.3653E-01,6.4241E-01
1,2022-06-15,7,23,0,2022-06-15 07:23:00,26580,111,4.000E-02,9.999E+9,2.000E+00,...,7.2334E-01,7.2015E-01,7.1598E-01,6.9777E-01,6.9125E-01,6.7876E-01,6.6992E-01,6.6380E-01,6.6182E-01,6.6927E-01
2,2022-06-15,7,25,0,2022-06-15 07:25:00,26700,111,4.000E-02,9.999E+9,2.000E+00,...,7.2460E-01,7.1990E-01,7.0922E-01,6.9384E-01,6.8664E-01,6.7670E-01,6.6620E-01,6.6193E-01,6.6160E-01,6.7178E-01
3,2022-06-15,7,27,0,2022-06-15 07:27:00,26820,99,4.000E-02,9.999E+9,2.000E+00,...,7.3936E-01,7.3744E-01,7.2915E-01,7.0938E-01,7.0058E-01,6.9258E-01,6.8047E-01,6.7285E-01,6.7156E-01,6.7532E-01
4,2022-06-15,7,29,0,2022-06-15 07:29:00,26940,99,4.000E-02,9.999E+9,2.000E+00,...,7.4128E-01,7.4184E-01,7.3163E-01,7.1063E-01,7.0125E-01,6.9214E-01,6.8244E-01,6.7641E-01,6.7705E-01,6.8511E-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
319,2022-06-15,18,49,0,2022-06-15 18:49:00,67740,89,4.000E-02,9.999E+9,2.000E+00,...,8.2003E-02,8.1929E-02,8.0928E-02,7.9448E-02,7.8117E-02,7.7030E-02,7.6182E-02,7.5869E-02,7.6060E-02,7.6254E-02
320,2022-06-15,18,51,0,2022-06-15 18:51:00,67860,89,4.000E-02,9.999E+9,2.000E+00,...,7.6489E-02,7.6430E-02,7.5537E-02,7.3979E-02,7.2590E-02,7.1879E-02,7.1247E-02,7.0469E-02,7.0405E-02,7.0938E-02
321,2022-06-15,18,53,0,2022-06-15 18:53:00,67980,89,4.000E-02,9.999E+9,2.000E+00,...,7.2186E-02,7.1815E-02,7.0883E-02,6.9747E-02,6.8370E-02,6.7801E-02,6.7060E-02,6.6754E-02,6.6754E-02,6.7196E-02
322,2022-06-15,18,55,0,2022-06-15 18:55:00,68100,89,4.000E-02,9.999E+9,2.000E+00,...,6.8763E-02,6.8413E-02,6.7896E-02,6.6890E-02,6.5856E-02,6.4921E-02,6.4072E-02,6.3652E-02,6.3222E-02,6.3753E-02


In [None]:
start_index = 5
num_elements = 1318

columns_to_rename = ["col" + str(i) for i in range(start_index, start_index + num_elements)]

In [None]:
df_Wellenlaenge.iloc[0][0]

290.0

In [None]:
# Mapping-Datenstruktur für die Umbenennung erstellen
mapping = dict(zip(columns_to_rename, df_Wellenlänge))

# Die Spalten umbenennen
df_MeaTime_final = columns_to_rename.rename(columns=mapping)

AttributeError: ignored

**Umschreiben der Messwerte in einen Dataframe**

**Umwandeln aller Messdaten**