# Datenaktualsierung der Messwerte der Wetterstationen der Wasserschutzpolizei Zürich
Datum: 12.01.2022

**Ausgangslage:**

Hr. Namnick liefert jährlich per Mail die Jahresdaten der Wetterstationen. Bisher haben wir die neu gelieferten Jahresdaten in SAS eingelesen, die Variabelnamen standardisiert und richtig sortiert und letztlich die neuen mit den bisherigen Jahren zeitlich sortiert zusammengehängt.

Odi hat zusätzlich noch ein [**Python-Skript serverseitig**](https://github.com/opendatazurich/ogd-data-processing/blob/main/sid_wapo_wetterstationen/convert_csv.py) generiert um aus der mitgelieferten `utc` das `cet` Datum aus dem Datumsfeld zu berechnen. Mit diesem Notebook können wir das aber gleich beim Update erledigen.

**Probleme:**

1. Die Datenlierferung von Namnick kommt jedes Mal struktuerell anders daher. Die Anpassungen in SAS sind eher müsham zu machen
2. Zeitumrechnungen in Python sind viel einfacher als in SAS

**Ansatz:**

Schritt für Schritt update mit Jupyter Lab. Nach Datenaufbereitung noch die Metadaten generieren mit SAS(?) 



**Dataset auf PROD Datenkatalog**:  https://data.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen

## Einstellungen
### Importiere die notwendigen Packages

In [1]:
#%pip install openpyxl geopandas altair fiona requests folium mplleaflet contextily seaborn datetime plotly

In [2]:
import pandas as pd
import datetime
import time
import numpy as np

import pivottablejs
from pivottablejs import pivot_ui
import altair as alt
import matplotlib.pyplot as plt
#from datetime import datetime
import geopandas as gpd
import folium 

Definiere Settings. Hier das Zahlenformat von Float-Werten (z.B. *'{:,.2f}'.format* mit Komma als Tausenderzeichen), 

In [3]:
pd.options.display.float_format = '{:.1f}'.format
pd.set_option('display.width', 100)
pd.set_option('display.max_columns', 15)

### Zeitvariabeln
Bestimme den aktuellst geladenen Monat. Hier ist es der Stand vor 2 Monaten. 
Bestimme noch weitere evt. sinnvolle Zeitvariabeln.

Zum Unterschied zwischen import `datetime` und `from datetime import datetime`, siehe https://stackoverflow.com/questions/15707532/import-datetime-v-s-from-datetime-import-datetime

Zuerst die Zeitvariabeln als Strings

In [4]:
now = datetime.date.today()
date_today = now.strftime("%Y-%m-%d")
year_today = now.strftime("%Y")
month_today = now.strftime("%m")
day_today = now.strftime("%d")
print(now, date_today, year_today, month_today,day_today )

2022-01-14 2022-01-14 2022 01 14


Und hier noch die Zeitvariabeln als Integers:
- `aktuellesJahr`
- `aktuellerMonat`: Der gerade jetzt aktuelle Monat
- `selectedMonat`: Der aktuellste Monat in den Daten. In der Regel zwei Monate her.

In [5]:
int_times = now.timetuple()

aktuellesJahr = int_times[0]
aktuellerMonat = int_times[1]
selectedMonat = int_times[1]-2

print(aktuellesJahr, 
      aktuellerMonat,
      'datenstand: ', 
      selectedMonat,
     int_times)


2022 1 datenstand:  -1 time.struct_time(tm_year=2022, tm_mon=1, tm_mday=14, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=14, tm_isdst=-1)


## Importiere die neu gelieferten Datensätze 

Die neu gelieferten Datensätze werden im Quelldatenverzeichnis abgelegt. Und zwar unter folgendem Pfad: `\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\01_gelieferte_rohdaten_jahresbestaende`

**Was gilt es zu beachten?**

1. Die Datensätze haben NICHT die gleiche Anzahl Variabeln: Tiefenbrunnen beinhaltet keine `water_temperature`,`precipitation`, `global_radiation` und `water_level`. Diese werden jedoch in der Datenaufbereitung als Spalten eingefügt, einfach ohne Werte zu beinhalten.
2. Die Variabelnamen heissen in den Importfiles anders und müssen umbenannt werden, damit sie dem Standard der API entsprechen. Manchmal werden die Spaltennamen mit Labelnamen (z.B. bei den 2021er-Daten) oder mit technischen Namen (hier noch nicht abgebildet) geliefert.
3. Die Variabeln kommen jedas Jahr in anderer Reihenfolgen daher. Man muss achten, dass diese richtig umgenannt werden.
4. Am Schluss werden die neuen Daten der bisherigen Zeitreihe hinzugefügt.




### Daten importieren und das Datumsformat umwandeln

(z.B. 01.01.2021 00:15 in ein Dateformat umwandeln)

Die Dateframes mit Daten vom 

- Mytenquai beinhalten immer `_my`im Namen
- Tiefenbrunnen beinhalten immer `_tb`im Namen


**Importpfade definieren**

In [278]:
    imp_fp_my = r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\01_gelieferte_rohdaten_jahresbestaende\Mythenquai_2021.csv" #Mythenquai
    imp_fp_tb = r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\01_gelieferte_rohdaten_jahresbestaende\Tiefenbrunnen_2021.csv" #Tiefenbrunnen

In [279]:
# Read the data from Mythenquai
data2beimported_my = pd.read_csv(
    imp_fp_my 
    , sep=';'
    , encoding='ansi'
    #, parse_dates=['Datum / Uhrzeit (MEZ)']   
    , parse_dates=[0]  
    , date_parser=lambda x: datetime.datetime.strptime(x, '%d.%m.%Y %H:%M')
    , low_memory=False    
    )

print('done')

done


In [280]:
data2beimported_my.head(2)

Unnamed: 0,Datum / Uhrzeit (MEZ),Lufttemperatur (°C),Luftfeuchte (%),Windböen (max) 10 min. (m/s),Windgeschw. Ø 10min. (m/s),Windstärke Ø 10 min. (bft),Windrichtung (°),Windchill (°C),Luftdruck QNH (hPa),Taupunkt (°C)
0,2021-01-01 00:00:00,2.5,92,1.6,0.9,1,16,2.5,1004.7,1.4
1,2021-01-01 00:10:00,2.7,92,1.4,0.1,0,105,2.7,1004.7,1.5


In [281]:
# Read the data from Tiefenbrunnen
data2beimported_tb = pd.read_csv(
    imp_fp_tb 
    , sep=';'
    , encoding='ansi'
    , parse_dates=[0] 
    #, parse_dates=['Datum / Uhrzeit (MEZ)']    
    , date_parser=lambda x: datetime.datetime.strptime(x, '%d.%m.%Y %H:%M')
    , low_memory=False    
    )

print('done')
#Hinweis: Falls es hier ein Problem mit den Daten gibt, dann checke, ob es im csv eine leere Zeile mit Semikolons gibt. Dieses verursacht Probleme

done


In [282]:
data2beimported_tb.head(2) #Lustigerweise zeigt .head(1) ein anderes Datumsformat an. Warum auch immer...

Unnamed: 0,Datum / Uhrzeit (MEZ),Lufttemperatur (°C),Luftfeuchte (%),Windböen (max) 10 min. (m/s),Windgeschw. Ø 10min. (m/s),Windstärke Ø 10 min. (bft),Windrichtung (°),Windchill (°C),Wassertemperatur (°C),Luftdruck QFE (hPa),Taupunkt (°C)
0,2021-01-01 00:00:00,1.5,92,0.9,0.4,1,45,1.5,6.7,957.2,0.3
1,2021-01-01 00:10:00,1.5,92,0.9,0.6,1,54,1.5,6.7,957.1,0.4


### Rename

In den Importfiles gibt es in den Variabelnamen versteckte Zeichen, welche das renaming verhindern. Daher müssen zuerst mal die Zeilennamen mit `set_axis()`gesetzt werden.

Vorgehen:

1. Welche Variabeln mit welchen Namen kommen vor?
2. Definiere die Namen, damit sie einfach renamed werden können
3. Mache das renaming

#### Welche Variabeln mit welchen Namen kommen vor?

In [283]:
# Welche Variabeln mit welchen Namen kommen vor?
data2beimported_my.columns
#data2beimported_my.head(1)

Index(['Datum / Uhrzeit (MEZ)', 'Lufttemperatur (°C)', 'Luftfeuchte (%)',
       'Windböen (max) 10 min. (m/s)', 'Windgeschw. Ø 10min. (m/s)', 'Windstärke Ø 10 min. (bft)',
       'Windrichtung (°)', 'Windchill (°C)', 'Luftdruck QNH (hPa)', 'Taupunkt (°C)'],
      dtype='object')

In [284]:
data2beimported_tb.columns
#--> Unterschied 2021: Wassertemperatur

Index(['Datum / Uhrzeit (MEZ)', 'Lufttemperatur (°C)', 'Luftfeuchte (%)',
       'Windböen (max) 10 min. (m/s)', 'Windgeschw. Ø 10min. (m/s)', 'Windstärke Ø 10 min. (bft)',
       'Windrichtung (°)', 'Windchill (°C)', 'Wassertemperatur (°C)', 'Luftdruck QFE (hPa)',
       'Taupunkt (°C)'],
      dtype='object')

#### Definiere die Namen, damit sie einfach renamed werden können
Diesen 2. Schritt lasse ich weg, weil die Datenstuktur und die Zeilennamen nicht immer identisch sind. 

In [285]:
#Definiere die Namen, damit sie einfach renamed werden können
##df = df.set_axis(['Datum / Uhrzeit (MEZ)', 'Lufttemperatur (°C)', 'Luftfeuchte (%)','Windböen (max) 10 min. (m/s)', 'Windgeschw. Ø 10min. (m/s)','Windstärke Ø 10 min. (bft)', 'Windrichtung (°)', 'Windchill (°C)','Wassertemperatur (°C)', 'Luftdruck QFE (hPa)', 'Taupunkt (°C)'], axis='columns')

#### Mache das renaming

    ACHTUNG:
    Die Variabelnamen im von Namnick gelieferten CSV haben unsichtbare Sonderzeichen. Falls das Renaming nicht geht, dann kopiere die Variabelnamen von Hand vom CSV ins Mapping unten(!)

**1. Mythenquai**

Vorher:

In [286]:
data2beimported_my.dtypes

Datum / Uhrzeit (MEZ)           datetime64[ns]
Lufttemperatur (°C)                    float64
Luftfeuchte (%)                          int64
Windböen (max) 10 min. (m/s)           float64
Windgeschw. Ø 10min. (m/s)             float64
Windstärke Ø 10 min. (bft)               int64
Windrichtung (°)                         int64
Windchill (°C)                         float64
Luftdruck QNH (hPa)                    float64
Taupunkt (°C)                          float64
dtype: object

In [287]:
#Mache das renaming
data2beimported_my = data2beimported_my.rename(
    columns={"Datum / Uhrzeit (MEZ)":"timestamp_cet"
             ,"Lufttemperatur (°C)":"air_temperature"
             ,"Wassertemperatur (°C)":"water_temperature"  # kommt 2021 nur im Tiefenbrunnen vor, ab 2022 wieder im Mythenquai            
             ,"Windböen (max) 10 min. (m/s)":"wind_gust_max_10min"
             ,"Windgeschw. Ø 10min. (m/s)":"wind_speed_avg_10min"
             ,"Windstärke Ø 10 min. (bft)":"wind_force_avg_10min"
             ,"Windrichtung (°)":"wind_direction"
             ,"Windchill (°C)":"windchill"
             ,"Luftdruck QNH (hPa)":"barometric_pressure_qfe"
             ,"Niederschlag (Summe in mm)":"precipitation" # kommt nur in Mythenquai vor. Aber erst wieder 2022  
             ,"Taupunkt (°C)":"dew_point"
             ,"Globalstrahlung (W/m²)":"global_radiation" # kommt nur in Mythenquai vor. Aber erst wieder 2022  
             ,"Luftfeuchte (%)": "humidity"           
             ,"Pegel (m)":"water_level" # kommt nur in Mythenquai vor. Aber erst wieder 2022
            }
)

Nachher:

In [288]:
data2beimported_my.head(2)

Unnamed: 0,timestamp_cet,air_temperature,humidity,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,dew_point
0,2021-01-01 00:00:00,2.5,92,1.6,0.9,1,16,2.5,1004.7,1.4
1,2021-01-01 00:10:00,2.7,92,1.4,0.1,0,105,2.7,1004.7,1.5


**2. Tiefenbrunnen**

In [289]:
data2beimported_tb.columns

Index(['Datum / Uhrzeit (MEZ)', 'Lufttemperatur (°C)', 'Luftfeuchte (%)',
       'Windböen (max) 10 min. (m/s)', 'Windgeschw. Ø 10min. (m/s)', 'Windstärke Ø 10 min. (bft)',
       'Windrichtung (°)', 'Windchill (°C)', 'Wassertemperatur (°C)', 'Luftdruck QFE (hPa)',
       'Taupunkt (°C)'],
      dtype='object')

In [290]:
#Mache das renaming
data2beimported_tb = data2beimported_tb.rename(
    columns={"Datum / Uhrzeit (MEZ)":"timestamp_cet"
             ,"Lufttemperatur (°C)":"air_temperature"
             ,"Wassertemperatur (°C)":"water_temperature"  # kommt 2021 nur im Tiefenbrunnen vor, ab 2022 wieder im Mythenquai            
             ,"Windböen (max) 10 min. (m/s)":"wind_gust_max_10min"
             ,"Windgeschw. Ø 10min. (m/s)":"wind_speed_avg_10min"
             ,"Windstärke Ø 10 min. (bft)":"wind_force_avg_10min"
             ,"Windrichtung (°)":"wind_direction"
             ,"Windchill (°C)":"windchill"
             ,"Luftdruck QNH (hPa)":"barometric_pressure_qfe"
             ,"Niederschlag (Summe in mm)":"precipitation" # kommt nur in Mythenquai vor. Aber erst wieder 2022  
             ,"Taupunkt (°C)":"dew_point"
             ,"Globalstrahlung (W/m²)":"global_radiation" # kommt nur in Mythenquai vor. Aber erst wieder 2022  
             ,"Luftfeuchte (%)": "humidity"           
             ,"Pegel (m)":"water_level" # kommt nur in Mythenquai vor. Aber erst wieder 2022
            }
)

In [291]:
data2beimported_tb.head(2)

Unnamed: 0,timestamp_cet,air_temperature,humidity,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,water_temperature,Luftdruck QFE (hPa),dew_point
0,2021-01-01 00:00:00,1.5,92,0.9,0.4,1,45,1.5,6.7,957.2,0.3
1,2021-01-01 00:10:00,1.5,92,0.9,0.6,1,54,1.5,6.7,957.1,0.4


### Zeitvariabeln CET und UTC berechnen und einfügen


Berechne die cet und utc Zeiten

In [292]:
data2beimported_my['timestamp_cet'] = data2beimported_my.timestamp_cet.dt.tz_localize("Europe/Zurich", ambiguous=np.ones(len(data2beimported_my.index)), nonexistent="shift_forward")
data2beimported_my['timestamp_utc'] = data2beimported_my.timestamp_cet.dt.tz_convert("UTC")

Formatiere die Zeitvariabeln im Isoformat (ISO 8601)

In [293]:
data2beimported_my['timestamp_cet'] = data2beimported_my.timestamp_cet.apply(datetime.datetime.isoformat)
data2beimported_my['timestamp_utc'] = data2beimported_my.timestamp_utc.apply(datetime.datetime.isoformat)

In [295]:
data2beimported_my.head(2)
#data2beimported_my.dtypes

Unnamed: 0,timestamp_cet,air_temperature,humidity,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,dew_point,timestamp_utc
0,2021-01-01T00:00:00+01:00,2.5,92,1.6,0.9,1,16,2.5,1004.7,1.4,2020-12-31T23:00:00+00:00
1,2021-01-01T00:10:00+01:00,2.7,92,1.4,0.1,0,105,2.7,1004.7,1.5,2020-12-31T23:10:00+00:00


Berechne die cet und utc Zeiten

In [296]:
data2beimported_tb['timestamp_cet'] = data2beimported_tb.timestamp_cet.dt.tz_localize("Europe/Zurich", ambiguous=np.ones(len(data2beimported_tb.index)), nonexistent="shift_forward")
data2beimported_tb['timestamp_utc'] = data2beimported_tb.timestamp_cet.dt.tz_convert("UTC")

Formatiere die Zeitvariabeln im Isoformat (ISO 8601)

In [297]:
data2beimported_tb['timestamp_cet'] = data2beimported_tb.timestamp_cet.apply(datetime.datetime.isoformat)
data2beimported_tb['timestamp_utc'] = data2beimported_tb.timestamp_utc.apply(datetime.datetime.isoformat)

In [298]:
data2beimported_tb.head(2)
#data2beimported_tb.dtypes

Unnamed: 0,timestamp_cet,air_temperature,humidity,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,water_temperature,Luftdruck QFE (hPa),dew_point,timestamp_utc
0,2021-01-01T00:00:00+01:00,1.5,92,0.9,0.4,1,45,1.5,6.7,957.2,0.3,2020-12-31T23:00:00+00:00
1,2021-01-01T00:10:00+01:00,1.5,92,0.9,0.6,1,54,1.5,6.7,957.1,0.4,2020-12-31T23:10:00+00:00


### Reihenfolge der Variabeln anpassen, damit sie mit der bisherigen Zeitreihe attached werden können

In [299]:
    column_order = [
        'timestamp_utc',
        'timestamp_cet',
        'air_temperature',
        'water_temperature',
        'wind_gust_max_10min',
        'wind_speed_avg_10min',
        'wind_force_avg_10min',
        'wind_direction',
        'windchill',
        'barometric_pressure_qfe',
        'precipitation',
        'dew_point',
        'global_radiation',
        'humidity',
        'water_level',
    ]
    data2beimported_my = data2beimported_my.reindex(columns=column_order)
    data2beimported_tb = data2beimported_tb.reindex(columns=column_order)    

In [300]:
#data2beimported_my

## Importiere die bereits veröffentlichten Zeitreihen der Messstationen 

- Beachte dabei die Notation des Pfades...
- Definiere mal aktuell noch keine weiteren Parameter beim Import

**Dataset auf INTEG Datenkatalog**:  https://data.integ.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen

**Dataset auf PROD Datenkatalog**:  https://data.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen

In [301]:
#Die Datasets sind nur zum Testen auf INT-DWH-Dropzone. Wenn der Test vorbei ist, sind sie auf PROD. 
# Über den Status kann man einfach switchen

status = "www";
print(status)

www


In [348]:
# Filepath
if status == "www":
    #fp = r"//szh.loc/ssz/applikationen/OGD_Dropzone/DWH/bev_monat_bestand_quartier_geschl_ag_herkunft_od3250/BEV325OD3250.csv"
    fp_my = r"https://data.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen/download/messwerte_mythenquai_2007-2020.csv"
    fp_tb = r"https://data.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen/download/messwerte_tiefenbrunnen_2007-2020.csv"
else:
    fp_my = r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\02_veroeffentlichte_zeitreihe\messwerte_mythenquai_2007-2020.csv"
    fp_tb = r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\02_veroeffentlichte_zeitreihe\messwerte_tiefenbrunnen_2007-2020.csv"


print(fp_my, fp_tb)

https://data.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen/download/messwerte_mythenquai_2007-2020.csv https://data.stadt-zuerich.ch/dataset/sid_wapo_wetterstationen/download/messwerte_tiefenbrunnen_2007-2020.csv


Ich importiere die Datumsfelder bewussts als STRINGS. Nicht im Datumsformat.
Dies erlaubt später ein einfaches zusammmensetzen der neuen zu den bestehenden Daten.

- [**Cheat Sheet**](https://strftime.org/)
- Python documentation for **strptime**: [string **parse** time, Python 3](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime)
- Python documentation for strptime/**strftime**: [string **format** time,Python 3](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [349]:
# Read the data
data2bextended_my = pd.read_csv(
    fp_my
    , sep=','
    , low_memory=False    
    )

print('done')

done


In [350]:
# Read the data
data2bextended_tb = pd.read_csv(
    fp_tb 
    ,sep=','
    ,low_memory=False    
    )

print('done')

done


In der Folge ein paar erste Tests:
 - 1) Zeige eine kurze Vorschau der importierten Daten
 - 2) Weise die Datentypen aus
 - 3) Zeige die Shape (Umfang) des Datensatzes an

In [351]:
#data2bextended_my.head(2)
data2bextended_my.dtypes

timestamp_utc               object
timestamp_cet               object
air_temperature            float64
water_temperature          float64
wind_gust_max_10min        float64
wind_speed_avg_10min       float64
wind_force_avg_10min       float64
wind_direction               int64
windchill                  float64
barometric_pressure_qfe    float64
precipitation              float64
dew_point                  float64
global_radiation           float64
humidity                     int64
water_level                float64
dtype: object

In [352]:
#data2bextended_tb.head(2)
data2bextended_tb.dtypes

timestamp_utc               object
timestamp_cet               object
air_temperature            float64
water_temperature          float64
wind_gust_max_10min        float64
wind_speed_avg_10min       float64
wind_force_avg_10min       float64
wind_direction               int64
windchill                  float64
barometric_pressure_qfe    float64
precipitation              float64
dew_point                  float64
global_radiation           float64
humidity                     int64
water_level                float64
dtype: object

In [353]:
data2bextended_my.shape

(708364, 15)

In [354]:
data2bextended_tb.shape

(704988, 15)

Beschreibe einzelne Attribute

In [355]:
data2bextended_my.describe()
#data2bextended_tb.describe()

Unnamed: 0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
count,708364.0,658722.0,708364.0,708364.0,708364.0,708364.0,708364.0,703623.0,658722.0,708364.0,658722.0,708364.0,658722.0
mean,11.6,13.4,3.5,1.8,1.8,185.0,10.5,972.5,0.0,6.8,137.7,75.1,405.9
std,8.0,6.8,2.6,1.4,1.3,106.0,8.6,14.0,0.2,6.5,299.0,16.5,0.1
min,-13.4,2.4,-0.1,0.0,0.0,0.0,-25.6,930.7,0.0,-17.2,0.0,16.0,405.2
25%,5.2,6.4,1.7,0.9,0.9,105.0,3.8,965.7,0.0,2.0,0.0,64.0,405.9
50%,11.4,13.1,2.9,1.6,1.6,177.0,10.4,970.3,0.0,6.9,7.0,79.0,405.9
75%,17.5,19.6,4.7,2.5,2.4,286.0,16.9,975.4,0.0,12.1,161.0,87.0,406.0
max,37.7,28.0,32.0,17.1,16.8,360.0,37.8,1036.0,17.0,24.6,4293.0,100.0,406.5


Wie viele Nullwerte gibt es im Datensatz?

In [356]:
data2bextended_my.isnull().sum()

timestamp_utc                  0
timestamp_cet                  0
air_temperature                0
water_temperature          49642
wind_gust_max_10min            0
wind_speed_avg_10min           0
wind_force_avg_10min           0
wind_direction                 0
windchill                      0
barometric_pressure_qfe     4741
precipitation              49642
dew_point                      0
global_radiation           49642
humidity                       0
water_level                49642
dtype: int64

In [357]:
data2bextended_tb.isnull().sum()

timestamp_utc                   0
timestamp_cet                   0
air_temperature                 0
water_temperature               0
wind_gust_max_10min             0
wind_speed_avg_10min            0
wind_force_avg_10min            0
wind_direction                  0
windchill                       0
barometric_pressure_qfe         0
precipitation              704988
dew_point                       0
global_radiation           704988
humidity                        0
water_level                704988
dtype: int64

## Hänge die neuen Daten an die bestehende Zeitreihen an

1. Mythenquai --> neue Zeitreihe heisst `df_zeitreihe_my`
2. Tiefenbrunnen --> neue Zeitreihe heisst `df_zeitreihe_tb`


### Neue Zeitreihe Tiefenbrunnen

In [358]:
df_zeitreihe_tb = pd.concat([data2bextended_tb, data2beimported_tb])

# Die Datumsformate scheinen verloren gegangen zu sein... daher:
df_zeitreihe_tb['timestamp_utc'] = pd.to_datetime(df_zeitreihe_tb['timestamp_utc'], utc=True)
#df_zeitreihe_tb['timestamp_cet'] = pd.to_datetime(df_zeitreihe_tb['timestamp_cet'], utc=False)

### Neue Zeitreihe Mythenquai

In [359]:
df_zeitreihe_my = pd.concat([data2bextended_my, data2beimported_my])

# Die Datumsformate scheinen verloren gegangen zu sein... daher:
df_zeitreihe_my['timestamp_utc'] = pd.to_datetime(df_zeitreihe_my['timestamp_utc'], utc=True)
#df_zeitreihe_my['timestamp_cet'] = pd.to_datetime(df_zeitreihe_my['timestamp_cet'], utc=False)

### Checks - hat alles funktioniert wie es soll?
Checke den neu zusammengesetzten Dataframes

In [360]:
df_zeitreihe_my.dtypes
df_zeitreihe_my.head(3)

Unnamed: 0,timestamp_utc,timestamp_cet,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
0,2007-04-22 19:20:00+00:00,2007-04-22T21:20:00+02:00,18.9,15.2,1.6,0.7,0.7,321,18.9,973.5,0.0,4.4,3.0,38,405.9
1,2007-04-22 19:30:00+00:00,2007-04-22T21:30:00+02:00,18.1,15.2,1.3,0.8,0.8,346,18.1,973.7,0.0,4.8,3.0,41,405.9
2,2007-04-22 19:40:00+00:00,2007-04-22T21:40:00+02:00,17.7,15.1,0.9,0.2,0.2,4,17.7,973.7,0.0,5.1,3.0,43,405.9


In [361]:
df_zeitreihe_tb.dtypes

timestamp_utc              datetime64[ns, UTC]
timestamp_cet                           object
air_temperature                        float64
water_temperature                      float64
wind_gust_max_10min                    float64
wind_speed_avg_10min                   float64
wind_force_avg_10min                   float64
wind_direction                           int64
windchill                              float64
barometric_pressure_qfe                float64
precipitation                          float64
dew_point                              float64
global_radiation                       float64
humidity                                 int64
water_level                            float64
dtype: object

In [362]:
df_zeitreihe_my.isnull().sum()

timestamp_utc                   0
timestamp_cet                   0
air_temperature                 0
water_temperature          100397
wind_gust_max_10min             0
wind_speed_avg_10min            0
wind_force_avg_10min            0
wind_direction                  0
windchill                       0
barometric_pressure_qfe      4741
precipitation              100397
dew_point                       0
global_radiation           100397
humidity                        0
water_level                100397
dtype: int64

In [363]:
df_zeitreihe_tb.isnull().sum()

timestamp_utc                   0
timestamp_cet                   0
air_temperature                 0
water_temperature               0
wind_gust_max_10min             0
wind_speed_avg_10min            0
wind_force_avg_10min            0
wind_direction                  0
windchill                       0
barometric_pressure_qfe     52549
precipitation              757537
dew_point                       0
global_radiation           757537
humidity                        0
water_level                757537
dtype: int64

In [364]:
df_zeitreihe_my.describe()

Unnamed: 0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
count,759119.0,658722.0,759119.0,759119.0,759119.0,759119.0,759119.0,754378.0,658722.0,759119.0,658722.0,759119.0,658722.0
mean,11.5,13.4,3.5,1.9,1.8,184.5,10.4,975.6,0.0,6.8,137.7,75.2,405.9
std,8.0,6.8,2.6,1.4,1.3,106.0,8.6,17.8,0.2,6.5,299.0,16.4,0.1
min,-13.4,2.4,-0.1,0.0,0.0,0.0,-25.6,930.7,0.0,-17.2,0.0,16.0,405.2
25%,5.2,6.4,1.7,0.9,1.0,103.0,3.7,966.1,0.0,1.9,0.0,65.0,405.9
50%,11.3,13.1,2.9,1.6,1.7,176.0,10.3,970.9,0.0,6.8,7.0,79.0,405.9
75%,17.5,19.6,4.7,2.6,2.4,286.0,16.8,977.2,0.0,12.1,161.0,87.0,406.0
max,37.7,28.0,32.0,17.1,16.8,360.0,37.8,1037.5,17.0,24.6,4293.0,100.0,406.5


In [365]:
df_zeitreihe_tb.describe()

Unnamed: 0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
count,757537.0,757537.0,757537.0,757537.0,757537.0,757537.0,757537.0,704988.0,0.0,757537.0,0.0,757537.0,0.0
mean,11.4,13.3,2.6,1.2,1.1,156.2,10.8,970.1,,6.9,,75.6,
std,7.8,6.6,2.7,1.5,1.4,107.7,8.2,7.9,,6.4,,15.2,
min,-14.0,-2.4,0.0,0.0,0.0,0.0,-20.5,930.7,,-17.5,,16.0,
25%,5.1,6.6,0.9,0.1,0.1,54.0,4.1,965.9,,2.0,,66.0,
50%,11.3,12.9,1.8,0.7,0.8,151.0,10.7,970.3,,6.9,,79.0,
75%,17.4,19.4,3.5,1.6,1.6,254.0,17.1,974.8,,12.1,,87.0,
max,35.8,70.0,35.0,21.9,21.9,360.0,92.6,1024.0,,70.0,,104.0,


In [366]:
outliers_watertemp = df_zeitreihe_tb.query('water_temperature > 30')
#outliers_watertemp.shape
#outliers_watertemp.describe()
outliers_watertemp.head(2)

Unnamed: 0,timestamp_utc,timestamp_cet,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
437877,2015-12-01 11:50:00+00:00,2015-12-01T12:50:00+01:00,10.6,62.4,7.2,3.1,4.7,66,5.2,982.8,,5.0,,68,
648633,2019-12-06 10:30:00+00:00,2019-12-06T11:30:00+01:00,2.1,70.0,0.9,0.1,0.0,188,2.1,974.0,,1.1,,93,


In [367]:
outliers_airtemp = df_zeitreihe_tb.query('air_temperature > 39')
outliers_airtemp.describe()
outliers_airtemp

Unnamed: 0,timestamp_utc,timestamp_cet,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level


## Grafische Plausibilisierungen
### Verwende das Datum als Index

While we did already parse the `datetime` column into the respective datetime type, it currently is just a regular column. 
**To enable quick and convenient queries and aggregations, we need to turn it into the index of the DataFrame**

In [368]:
df_zeitreihe_my = df_zeitreihe_my.set_index("timestamp_utc")

In [369]:
df_zeitreihe_my.info()
df_zeitreihe_my.index.year.unique()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 759119 entries, 2007-04-22 19:20:00+00:00 to 2021-12-31 23:00:00+00:00
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   timestamp_cet            759119 non-null  object 
 1   air_temperature          759119 non-null  float64
 2   water_temperature        658722 non-null  float64
 3   wind_gust_max_10min      759119 non-null  float64
 4   wind_speed_avg_10min     759119 non-null  float64
 5   wind_force_avg_10min     759119 non-null  float64
 6   wind_direction           759119 non-null  int64  
 7   windchill                759119 non-null  float64
 8   barometric_pressure_qfe  754378 non-null  float64
 9   precipitation            658722 non-null  float64
 10  dew_point                759119 non-null  float64
 11  global_radiation         658722 non-null  float64
 12  humidity                 759119 non-null  int64  
 13  water_level  

Int64Index([2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
            2021],
           dtype='int64', name='timestamp_utc')

In [370]:
df_zeitreihe_tb = df_zeitreihe_tb.set_index("timestamp_utc")


In [371]:
df_zeitreihe_tb.info()
df_zeitreihe_tb.index.year.unique()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 757537 entries, 2007-04-15 09:30:00+00:00 to 2021-12-31 23:00:00+00:00
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   timestamp_cet            757537 non-null  object 
 1   air_temperature          757537 non-null  float64
 2   water_temperature        757537 non-null  float64
 3   wind_gust_max_10min      757537 non-null  float64
 4   wind_speed_avg_10min     757537 non-null  float64
 5   wind_force_avg_10min     757537 non-null  float64
 6   wind_direction           757537 non-null  int64  
 7   windchill                757537 non-null  float64
 8   barometric_pressure_qfe  704988 non-null  float64
 9   precipitation            0 non-null       float64
 10  dew_point                757537 non-null  float64
 11  global_radiation         0 non-null       float64
 12  humidity                 757537 non-null  int64  
 13  water_level  

Int64Index([2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020,
            2021],
           dtype='int64', name='timestamp_utc')

In [372]:
sommerzeitwechsel_tb = df_zeitreihe_tb.loc["2021-03-28 01":"2021-03-28 05"].resample("H").mean()
sommerzeitwechsel_tb

Unnamed: 0_level_0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2021-03-28 01:00:00+00:00,4.3,6.5,0.5,0.1,0.2,57.0,4.3,,,-0.0,,73.5,
2021-03-28 02:00:00+00:00,3.6,6.6,0.2,0.1,0.2,50.5,3.6,,,-0.3,,76.0,
2021-03-28 03:00:00+00:00,3.0,6.7,1.1,0.4,0.3,64.2,3.0,,,-0.4,,78.2,
2021-03-28 04:00:00+00:00,2.6,6.7,0.7,0.1,0.2,35.5,2.6,,,-0.2,,81.3,
2021-03-28 05:00:00+00:00,2.4,6.6,0.1,0.0,0.0,73.7,2.4,,,0.1,,84.3,


In [373]:
sommerzeitwechsel_my = df_zeitreihe_my.loc["2021-03-28 01":"2021-03-28 05"].resample("H").mean()
sommerzeitwechsel_my

Unnamed: 0_level_0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2021-03-28 01:00:00+00:00,5.0,,2.0,1.7,1.9,120.1,4.8,1030.9,,1.4,,77.7,
2021-03-28 02:00:00+00:00,4.3,,2.0,1.7,1.8,150.0,4.2,1030.5,,1.2,,80.7,
2021-03-28 03:00:00+00:00,3.9,,1.3,0.9,1.0,144.7,3.9,1030.3,,1.1,,81.5,
2021-03-28 04:00:00+00:00,3.2,,1.3,0.8,0.7,211.7,3.2,1030.5,,0.9,,84.3,
2021-03-28 05:00:00+00:00,3.4,,2.2,1.5,1.3,288.3,3.1,1031.2,,0.6,,82.3,


In [374]:
# first we create the sums per week
weekly_means = df_zeitreihe_tb.resample("W").mean()
# then we generate the weekly means for each quarter
quarterly_means = df_zeitreihe_tb.resample("Q").mean()
quarterly_median = df_zeitreihe_tb.resample("Q").median()
# for readability we'll revert the values back to integers
#weekly_means.dropna().astype(int).head(5)
#weekly_means
#quarterly_means
#quarterly_median

### Einfache Visualisierungen zur Plausi

Exploriere die Daten mit Pivottable.JS

Daten zu gross hierfür... Stürzt ab

In [375]:
#from pivottablejs import pivot_ui

#pivot_ui(df_zeitreihe_tb)

### Zeitpunkte und Zeiträume abfragen

A particular powerful feature of the Pandas DataFrame is its indexing capability that also works using time-based entities, such as dates and times. We have already created the index above, so let's put it to use.

In [376]:
#data2betested_my.loc["2017-06-30"]
df_zeitreihe_tb.loc["2018-10-04":"2020-12-30"].head(2)

Unnamed: 0_level_0,timestamp_cet,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2018-10-04 00:00:00+00:00,2018-10-04T02:00:00+02:00,11.4,18.3,0.0,0.0,0.0,343,11.4,978.6,,10.0,,91,
2018-10-04 00:10:00+00:00,2018-10-04T02:10:00+02:00,11.3,18.3,0.2,0.0,0.0,342,11.3,978.7,,10.0,,92,


### Aggriere Werte nach Zeitausschnitten

Mit den Funktionen zur Zeit kann einfach zwischen Stunden, Tagen, Monaten, etc. gewechselt und aggregiert werden.


In [377]:
bsp_median = df_zeitreihe_tb.resample("M").median().head(10)
bsp_median.head(2)

Unnamed: 0_level_0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2007-04-30 00:00:00+00:00,17.0,15.2,1.6,0.8,0.8,117.0,16.8,972.8,,8.9,,60.0,
2007-05-31 00:00:00+00:00,15.7,16.1,2.6,1.1,1.1,192.0,14.1,967.7,,9.1,,68.0,


In [378]:
# first we create the sums per week
weekly_medians = df_zeitreihe_tb.resample("W").median()
# then we generate the weekly means for each quarter
quarterly_medians = df_zeitreihe_tb.resample("Q").median()
# for readability we'll revert the values back to integers
quarterly_medians.dropna().astype(int).head(2)

Unnamed: 0_level_0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1


### Visualisierungen nach Zeitausschnitten

Liniendiagramm 
[Link zur Doku](https://altair-viz.github.io/gallery/multiline_highlight.html)

In [379]:
#weekly_medians.dtypes
days = df_zeitreihe_tb.resample("D").median()
months = df_zeitreihe_tb.resample("M").median()
years = df_zeitreihe_tb.resample("Y").median()
years.dtypes

air_temperature            float64
water_temperature          float64
wind_gust_max_10min        float64
wind_speed_avg_10min       float64
wind_force_avg_10min       float64
wind_direction             float64
windchill                  float64
barometric_pressure_qfe    float64
precipitation              float64
dew_point                  float64
global_radiation           float64
humidity                   float64
water_level                float64
dtype: object

In [380]:
months[['air_temperature','water_temperature']].head(2)

Unnamed: 0_level_0,air_temperature,water_temperature
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1
2007-04-30 00:00:00+00:00,17.0,15.2
2007-05-31 00:00:00+00:00,15.7,16.1


In [381]:
chart1 = alt.Chart(months[['air_temperature','water_temperature']].reset_index().melt("timestamp_utc")).mark_line(strokeWidth=1, opacity=0.25).encode(
    x='timestamp_utc',
    y='value',
    color='variable',
).properties(width=800, height=400)

chart2 = alt.Chart(years[['air_temperature','water_temperature']].reset_index().melt("timestamp_utc")).mark_line(interpolate="basis", opacity=1).encode(
    x='timestamp_utc',
    y='value',
    color='variable',
)

chart1 + chart2

In [382]:
rolling = months[['air_temperature','water_temperature']].rolling(60, center=True, win_type="triang").mean()

chart1 = alt.Chart(rolling.reset_index().melt("timestamp_utc")).mark_line(strokeWidth=1.5, opacity=1).encode(
    x='timestamp_utc', y='value', color='variable',
).properties(width=800, height=400)

# same as the two charts in previous code cell, except more transparent
chart2 = alt.Chart(months[['air_temperature','water_temperature']].reset_index().melt("timestamp_utc")).mark_line(strokeWidth=1.6, opacity=0.25).encode(
    x='timestamp_utc', y='value', color='variable',
)

chart3 = alt.Chart(years[['air_temperature','water_temperature']].reset_index().melt("timestamp_utc")).mark_line(interpolate="basis", opacity=0.25).encode(
    x='timestamp_utc', y='value', color='variable',
)

chart1 + chart2 + chart3

In [383]:
# add a column for month names
months['month_number'] = months.index.month
months['month_name'] = months.index.month_name()
months.head(2)

Unnamed: 0_level_0,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level,month_number,month_name
timestamp_utc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2007-04-30 00:00:00+00:00,17.0,15.2,1.6,0.8,0.8,117.0,16.8,972.8,,8.9,,60.0,,4,April
2007-05-31 00:00:00+00:00,15.7,16.1,2.6,1.1,1.1,192.0,14.1,967.7,,9.1,,68.0,,5,May


In [384]:
grp_months = months[['air_temperature','water_temperature', 'month_number', 'month_name']].groupby("month_name").mean()

In [385]:
alt.Chart(grp_months.reset_index()).mark_bar(width=20).encode(
    x='month_number:O',
    y='air_temperature:Q',
    color='month_name:O'
).properties(width=300, height=300).interactive()

In [386]:
months = df_zeitreihe_tb.resample("M").median()
years = df_zeitreihe_tb.resample("Y").median()
#months = df_zeitreihe_tb.resample("M").min()
#years = df_zeitreihe_tb.resample("Y").min()

brush = alt.selection(type='interval', encodings=['x'])

upper = alt.Chart(years[['air_temperature','water_temperature']].reset_index().melt("timestamp_utc")).mark_area(interpolate="basis").encode(
    x = alt.X('timestamp_utc:T', axis=None),
    y = alt.Y('value:Q', axis=None),
    color='variable'
).properties(width=800, height=50).add_selection(brush)

lower = alt.Chart(months[['air_temperature','water_temperature']].reset_index().melt("timestamp_utc")).mark_line(strokeWidth=1).encode(
    x = alt.X('timestamp_utc:T', scale=alt.Scale(domain=brush)),
    y='value',
    color='variable',
).properties(width=800, height=300)

upper & lower

## Datenexport

Wenn alle Tests positiv und plausibel sind, kann die neu updateten Datasets als csv exportiert und später veröffentlicht werden.

Checke zuerst kurz, ob im Verlauf der Plausis etwas falsches reingerutscht ist

In [387]:
df_zeitreihe_tb.shape
#df_zeitreihe_tb.describe()

(757537, 14)

In [388]:
df_zeitreihe_my.shape
#df_zeitreihe_my.describe()

(759119, 14)

### Exportpfade definieren:

#### Dynamisch berechnete Min- und Max-Jahre

In [389]:
#years.index.year
years = df_zeitreihe_tb.resample("Y").median()
min_year = years.index.year.min()
max_year= years.index.year.max()

print(min_year, max_year, r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\02_veroeffentlichte_zeitreihe\messwerte_mythenquai_"+str(min_year)+"-"+str(max_year)+".csv")

2007 2021 \\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\02_veroeffentlichte_zeitreihe\messwerte_mythenquai_2007-2021.csv


#### Pfade zusammensetzen

In [398]:
export_fp_my = r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\02_veroeffentlichte_zeitreihe\messwerte_mythenquai_"+str(min_year)+"-"+str(max_year)+".csv" #Mythenquai
export_fp_tb= r"\\szh\ssz\applikationen\OGD\Daten\Quelldaten\SID\WAPO\02_veroeffentlichte_zeitreihe\messwerte_tiefenbrunnen_"+str(min_year)+"-"+str(max_year)+".csv" #Tiefenbrunnen

#### Exportoptionen festlegen

Optionen:
`DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)`

**Export ausführen:**

1. Indexe aufheben. Dabei verliert der indexierte Wert das Iso-Datumsformat
2. Zeitformat bei zuvor indexiertem Datum nochmals ISO-konform definieren: Dies ist nur für utc wichtig, da cet nie verwendet wurde zuvor und weiterhin als String in ISO-Format vorliegt.

##### Mythenquai


In [399]:
df_my = df_zeitreihe_my.reset_index()
df_my['timestamp_utc'] = df_my.timestamp_utc.apply(datetime.datetime.isoformat)
df_my.head(2)

Unnamed: 0,timestamp_utc,timestamp_cet,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
0,2007-04-22T19:20:00+00:00,2007-04-22T21:20:00+02:00,18.9,15.2,1.6,0.7,0.7,321,18.9,973.5,0.0,4.4,3.0,38,405.9
1,2007-04-22T19:30:00+00:00,2007-04-22T21:30:00+02:00,18.1,15.2,1.3,0.8,0.8,346,18.1,973.7,0.0,4.8,3.0,41,405.9


In [394]:
df_my.to_csv(export_fp_my
                       ,sep=','
                       ,index=False
                      )

##### Tiefenbrunnen

In [400]:
df_tb = df_zeitreihe_tb.reset_index()

# Achtung: Reihenfolge spielt hier eine Rolle. 
df_tb['timestamp_utc'] = df_tb.timestamp_utc.apply(datetime.datetime.isoformat)
df_tb.head(2)

Unnamed: 0,timestamp_utc,timestamp_cet,air_temperature,water_temperature,wind_gust_max_10min,wind_speed_avg_10min,wind_force_avg_10min,wind_direction,windchill,barometric_pressure_qfe,precipitation,dew_point,global_radiation,humidity,water_level
0,2007-04-15T09:30:00+00:00,2007-04-15T11:30:00+02:00,17.4,11.9,0.6,0.0,0.0,154,17.4,973.2,,9.3,,59,
1,2007-04-15T09:40:00+00:00,2007-04-15T11:40:00+02:00,16.7,12.1,0.5,0.1,0.1,150,16.7,973.3,,10.9,,69,


In [396]:
df_tb.to_csv(export_fp_tb
             ,sep=','
             ,index=False
             ,encoding='utf-8'
            )