# **Time Series Weather Synthetic Data Generation Using TimeGAN Model**

* TimeGAN - Utilizing this ML model to generate time series data. [Github](https://github.com/jsyoon0823/TimeGAN/tree/master) and [paper](https://papers.nips.cc/paper_files/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf).
* ydata_synthetic - Updated library for TimeGAN and other models. [Github](https://github.com/ydataai/ydata-synthetic/tree/dev) and [article].(https://towardsdatascience.com/synthetic-time-series-data-a-gan-approach-869a984f2239)
* Following along with the [TimeGAN_Synthetic_stock_data.ipynb](https://github.com/ydataai/ydata-synthetic/blob/dev/examples/timeseries/TimeGAN_Synthetic_stock_data.ipynb) which generates synthetic stock data.


**Dataset Information**
* The data used in this notebook was downloaded from [ESGF](https://aims2.llnl.gov/search) and has been cleaned for simplicity
* Data cleaning steps:
  - Originally downloaded as NetCDF file (.nc) from ESGF with only one variable included: daily average temperature.
  - Originally included several years, now only contains data for 2015 (01/01/2015 to 12/31/2015).
  -  Output from global climate model (GCMs) with a daily 100km resolution. GCMs are what are of interest for meterologists to have downscaled data from.
  -  Columns: 'Date' and 'Temperature'
  -  365 Rows representing each day

In [71]:
import ydata_synthetic
from os import path
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

from ydata_synthetic.synthesizers import ModelParameters, TrainParameters
from ydata_synthetic.preprocessing.timeseries.stock import transformations

#Below is giving errors for some reason
#from ydata_synthetic.synthesizers.timeseries import TimeSeriesSynthesizer

#TypeError: unsupported operand type(s) for |: 'types.GenericAlias' and 'NoneType'
#This is working when we use a GoogleColab notebook but not when I try it on Jupyter

# **Loading Dataset and Standardizing**


In [70]:
# Loading dataset using pandas
weather_df = pd.read_csv('/Users/gabbyvaillant/Downloads/BNL/temperature_BNL_tas_2015.csv')

#Set Date column as index
weather_df = weather_df.set_index('Date').sort_index()

#Assign dataset path
data_path = '/Users/gabbyvaillant/Downloads/BNL/temperature_BNL_tas_2015.csv'

#Use 'transformations' function from ydata_synthetic library to normalize and break into sequences
processed_weather = transformations(data_path, seq_len=30)

cols = list(weather_df.columns)
print(weather_df.shape)

(365, 1)


# **Defining Model Hyperparameters**

**Networks:**
- Generator
- Discriminator
- Embedder
- Recovery Network

**Parameters:**
* seq_len: Sequence length
* n_seq: Represents the number of features in the dataset
* hidden_dim: Represents the dimensionality of the hidden layers in the model. Keep tuning to optimize model.
* gamma: Represents the weight of the divergence term in the loss function.
* noise_dim: Defines the dimensionality of the noise input to the GAN.
* dim: Sets the size of the layers in the GAN
* batch_size: Specifies the number of samples per gradient update. Adjust based on the capacity of GPU/CPU.
* log_step: Frequency of logging the training progress.
* learning_rate: Controls how quickly the model converges
* epochs: Number of training iterations. Start small and increase for final training.



**NOTE:** Parameters should be optimized and tailored to the specific dataset you are working with.

In [55]:
#Specifying parameters for TimeGAN model

seq_len = 30 #To capture monthly patterns
n_seq = 1 #'Temperature' is the only feature
hidden_dim = 24
gamma = 1

noise_dim = 32

#NOTE: When adding downscaling aspect to the model, the noise input will not exist, bc
#the generator will recieve the low-resolution dataset we are aiming to downscale as input instead of noise

dim = 128
batch_size = 128

log_step = 100
learning_rate = 5e-4

# For quick prototyping
# epochs=50000
epochs = 10

gan_args = ModelParameters(
    batch_size=batch_size, lr=learning_rate, noise_dim=noise_dim, layers_dim=dim
)

train_args = TrainParameters(
    epochs=epochs, sequence_length=seq_len, number_sequences=n_seq
)

# **Training the GAN Synthesizer**


In [72]:
"""
This cell giving problems because I can't import TimeSeriesSynthesizer
found in cell 1.
"""

if path.exists("synthesizer_stock.pkl"):
    synth = TimeSeriesSynthesizer.load("synthesizer_weather.pkl")
else:
    synth = TimeSeriesSynthesizer(modelname="timegan", model_parameters=gan_args)
    synth.fit(weather_df, train_args, num_cols=cols)
    synth.save("synthesizer_weather.pkl")

NameError: name 'TimeSeriesSynthesizer' is not defined

# Notes

07/17

When I run this file in Google Colab, I do not get the error like I do here and it runs. I need figure out why it is not working in other IDEs like Jupyter and Spyder. 

Also, once I run the code and train the model, it does not allow me to retrain the model again when I edit the code to increase the number of epochs (increasing this should increase accuracy of synthetic data being similar to the original). I need to figure out what is going wrong (Review cell 8).