<a href="https://colab.research.google.com/github/hepuliu/Masters_Thesis/blob/sandbox_pink/sandbox.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Master Thesis Simulation Sandbox
Flood Prevention Dam Sizing with Machine Learining Approaches - Hepu Liu

### Overall Project Simulation Steps
1. Process discharge data from Waldangelbach Station

2. Process precipitation data from Baiertal  Station

3. Build Prediction Model (Model A)

4. Process precipitation data from Stifterhof Station

5. Process precipitation data from Waibstadt Station (optional)

6. Process precipitation data from Stetten Station (optional

7. Fit data to Model A to predict discharge

### Variable Naming Conventions

- Weather Stations Naming: ('p' for precipitation, 'd' for discharge, 'a' to 'd' for different stations, 'r' for result)

  - da: Waldangelbach Station
  - pa: Baiertal Station
  - pb: Stifterhof Station
  - pc: Waibstadt Station
  - pd: Stetten Station
  - pr: combined/resulting precipitation
  - dr: predicted/resulting discharge

- Variable Naming Coventions: 
  - df: data frame
  - trs: training set
  - tes: testing set



## Importing Libraries

In [1]:
# importing libraries
import csv
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from fbprophet import Prophet
from google.colab import drive
drive.mount

<function google.colab.drive.mount>

## Importing Datasets

In [2]:
# import datasets
da_df = pd.read_csv('/content/drive/MyDrive/thesis/dataset/cleaned_df/da_df.csv')
pa_df = pd.read_csv('/content/drive/MyDrive/thesis/dataset/cleaned_df/pa_df.csv')
pb_df = pd.read_csv('/content/drive/MyDrive/thesis/dataset/cleaned_df/pb_df.csv')
da_df.head()

Unnamed: 0,t,discharge [m3/s]
0,2007-01-01 00:00:00,0.226
1,2007-01-01 01:00:00,0.248
2,2007-01-01 02:00:00,0.248
3,2007-01-01 03:00:00,0.32
4,2007-01-01 04:00:00,0.346


## Data Processing

In [3]:
da_df.columns = ['ds','y']
# da_df['t'] = 
da_df

Unnamed: 0,ds,y
0,2007-01-01 00:00:00,0.226
1,2007-01-01 01:00:00,0.248
2,2007-01-01 02:00:00,0.248
3,2007-01-01 03:00:00,0.320
4,2007-01-01 04:00:00,0.346
...,...,...
110851,2019-08-24 19:00:00,0.164
110852,2019-08-24 20:00:00,0.135
110853,2019-08-24 21:00:00,0.106
110854,2019-08-24 22:00:00,0.093


## Data Visualization

In [4]:
# Plot Line Graph 20000 row with GPU = 3mins
def line_plot(df, title):
  label_font = {'family':'serif', 'color':'black', 'size':'12'}
  title_font = {'family':'serif', 'color':'black', 'size':'14'}
  fig = plt.figure(figsize=(8,8))
  plt.plot(df['ds'], df['yhat'])
  plt.xlabel( 't', fontdict = label_font)
  plt.ylabel( 'd', fontdict = label_font)
  plt.title(title, fontdict = title_font)
   
# line_plot(da_df, 'Discharge A')


## Prediction

In [5]:
## FBProphet

# Single Variant Prediction Model
def single_var_predictor(df):
  predictor = Prophet(interval_width=0.95)
  predictor.fit(df)
  return predictor

# Make Prediction Dataframe
def prediction_df(predictor,df):
  prediction_df = predictor.predict(df).loc[:,['ds','yhat']]
  prediction_df['ds'] = prediction_df['ds'].apply(lambda x:x)
  return prediction_df

# Prediction for Discharge [15s for 2000 rows, 45s for 20000rows with GPU, 4m for all]
discharge_predictor = single_var_predictor(da_df)
da_dr = prediction_df(discharge_predictor, da_df)
da_dr

# da_dr
# da_dr = da_dr[:200]
# line_plot(da_dr, 'Discharge A')

Unnamed: 0,ds,yhat
0,2007-01-01 00:00:00,0.379820
1,2007-01-01 01:00:00,0.379688
2,2007-01-01 02:00:00,0.379846
3,2007-01-01 03:00:00,0.379658
4,2007-01-01 04:00:00,0.378752
...,...,...
110851,2019-08-24 19:00:00,0.152156
110852,2019-08-24 20:00:00,0.153327
110853,2019-08-24 21:00:00,0.153869
110854,2019-08-24 22:00:00,0.153493


# Archive

In [6]:
# # Cleanup Discharge A DataFrame da_df
# da_df = pd.read_csv('/content/drive/MyDrive/thesis/dataset/Wiesloch_waldangelbach_hourly_20070101-20210501.csv')
# da_df = da_df.iloc[13:].reset_index(drop=True)
# da_df.columns = da_df.iloc[0]
# da_df = da_df.iloc[3:].reset_index(drop=True)
# da_df = da_df.iloc[:, 4:7] # precipitation unit [m3/s]
# da_df['Uhrzeit'] = da_df['Uhrzeit'].str.replace(' v', '')
# da_df['t'] = pd.to_datetime(da_df['Datum']+' '+da_df['Uhrzeit'], format=('%y-%m-%d %H:%M:%S'))
# da_df = da_df.iloc[:,2:]
# da_df.columns = ['discharge [m3/s]', 't']
# da_df = da_df[['t','discharge [m3/s]']]
# da_df.to_csv('/content/drive/MyDrive/thesis/dataset/cleaned_df/da_df.csv', index=False)

In [7]:
# # Cleanup Precipitation A DataFrame pa_df
# pa_df = pd.read_csv('/content/drive/MyDrive/thesis/dataset/Weather_station_Baiertal.csv')
# pa_df.columns = pa_df.iloc[0]
# pa_df = pa_df.iloc[1:].reset_index(drop=True)
# pa_df['t'] = pd.to_datetime(pa_df['date']+' '+pa_df['time'], format=('%y-%m-%d %H:%M'))
# pa_df = pa_df.iloc[:,2:]
# cols = list(pa_df.columns)
# cols = [cols[-1]] + cols[:-1]
# pa_df = pa_df[cols]
# pa_df.to_csv('/content/drive/MyDrive/thesis/dataset/cleaned_df/pa_df.csv', index=False)


In [8]:
# # Cleanup Precipitation B DataFrame pb_df
# pb_df = pd.read_csv('/content/drive/MyDrive/thesis/dataset/Weather_station_Stifterhof.csv')
# pb_df.columns = pb_df.iloc[0]
# pb_df = pb_df.iloc[1:].reset_index(drop=True)
# pb_df['t'] = pd.to_datetime(pb_df['date']+' '+pb_df['time'], format=('%y-%m-%d %H:%M'))
# pb_df = pb_df.iloc[:,2:]
# cols = list(pb_df.columns)
# cols = [cols[-1]] + cols[:-1]
# pb_df = pb_df[cols]
# pb_df.to_csv('/content/drive/MyDrive/thesis/dataset/cleaned_df/pb_df.csv', index=False)