[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jorgpg5/synthetic_data/blob/main/gretelai_timeseries_blueprint_example.ipynb#scrollTo=ikPM_hn42y-Q)


### Please run the next command to download the dataset example

In [None]:
!git clone https://github.com/jorgpg5/synthetic_data.git

Cloning into 'synthetic_data'...
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 7 (delta 0), reused 4 (delta 0), pack-reused 0[K
Unpacking objects: 100% (7/7), done.


# Synthesize Time Series data from your own DataFrame

This Blueprint demonstrates how to create synthetic time series data with Gretel ([Source](https://github.com/gretelai/gretel-blueprints/blob/main/gretel/create_synthetic_data_from_time_series/blueprint.ipynb)) 

We assume that within the dataset
there is at least:

1) A specific column holding time data points

2) One or more columns that contain measurements or numerical observations for each point in time.

For this Blueprint, we will generate a very simple sine wave as our time series data.

In [None]:
%%capture

!pip install -U "gretel-client<0.8.0" gretel-synthetics pandas

In [None]:
# Load your Gretel API key. You can acquire this from the Gretel Console 
# @ https://console.gretel.cloud

from gretel_client import get_cloud_client

client = get_cloud_client(prefix="api", api_key="prompt")
client.install_packages()

Enter Gretel API key: ··········


INFO pkg_installers.py: Authenticating with package manager
INFO pkg_installers.py: Installing packages (this might take a while)
ERROR pkg_installers.py: /usr/bin/python3 -m pip --disable-pip-version-check install https://gretel-opt-prod-usw2.s3.amazonaws.com/priv/pip/gretel-helpers/0.8.4/gretel_helpers-0.8.4-py3-none-any.whl?AWSAccessKeyId=ASIARC2BUADH2HE5BN3F&Signature=RCGKe5Ehj6scBcUnZKgtu%2Fl2REQ%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEAoaCXVzLXdlc3QtMiJGMEQCIDxIllHBugZ1yCVGBBKx088vPjkxONKvroxZhdV8GzUtAiAVL9c8VCc8VigFom5661lKWBmRm%2F0QXqKQ8WXVhK1ioiqkAggjEAMaDDA3NDc2MjY4MjU3NSIMfkttCDsqEJwdIh4yKoECxnRbtPZHulbqGxRtXe9O91qaubg6cPqpsxzcoprJroweQZTR9PChzXc4mD7ZY9VY%2BGZsQl9QC5X3X0sJPVwUUZN06wI48nQjyl3HAA%2BgMeyu5D0F3P1VBX5IVebhGaN8dmnXZn891GQolxLyWlzVHkz20NXkdXr1a3u3EjzDpqH2lQXT5W2Jv3LUrEKK%2F9ghra1OmoLQC%2FXQBVgEAZVvqlcL%2BaENJOgqPpn15G40VguFYox7BRy2ouuA0tSB%2F8JhX3IUGfGSfGUF4kxOcYgHrGqJUih3hafU1tVMy3ILaaMK9z9DXuUW2lO498nW2HONhhlykkev6fe9iV7N6SLe5%2FUwttmniAY6mwEAOD9%2BlyI4IRFs%2FIJ

In [None]:
# Create a simple timeseries sine wave

import datetime
import pandas as pd
import numpy as np

pd.options.plotting.backend = "plotly"


train_df = pd.read_csv('/content/synthetic_data/example_dataset.csv')

train_df

Unnamed: 0,EDA,ECG,Label,Left Pupil Diameter (m),Right Pupil Diameter (m),Eye Opening Left,Eye Opening Right,PERCLOS Value,Blinking
0,4.3114,-0.1822,2,0.002089,0.002242,0.4522,0.4111,0.5904,0
1,4.3129,-0.1535,2,0.002101,0.002183,0.4679,0.4372,0.5904,0
2,4.3120,-0.1209,2,0.002115,0.002224,0.4489,0.4276,0.5903,0
3,4.3129,-0.0918,2,0.002274,0.002392,0.5027,0.4294,0.5902,0
4,4.3116,-0.0696,2,0.002195,0.002318,0.4237,0.4388,0.5902,0
...,...,...,...,...,...,...,...,...,...
17995,5.0606,-0.0341,2,0.002624,0.002474,0.1496,0.0798,0.4570,0
17996,5.0598,0.0034,2,0.002624,0.002474,0.1496,0.0949,0.4570,0
17997,5.0613,0.0277,2,0.002624,0.002474,0.1496,0.0974,0.4571,0
17998,5.0589,0.0610,2,0.002624,0.002474,0.1496,0.0834,0.4571,0


In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=9, cols=1)

fig.append_trace(go.Scatter(
    y=train_df.EDA,
    name='EDA',
), row=1, col=1)

fig.append_trace(go.Scatter(
    y=train_df.ECG,
    name='ECG',
), row=2, col=1)

fig.append_trace(go.Scatter(
    y=train_df.Label,
    name='Label',
), row=3, col=1)

fig.append_trace(go.Scatter(
    y=train_df['Left Pupil Diameter (m)'],
    name='Left pupil diameter',
), row=4, col=1)

fig.append_trace(go.Scatter(
    y=train_df['Right Pupil Diameter (m)'],
    name='Right pupil diameter',
), row=5, col=1)

fig.append_trace(go.Scatter(
    y=train_df['Eye Opening Left'],
    name='Eye Opening Left',
), row=6, col=1)

fig.append_trace(go.Scatter(
    y=train_df['Eye Opening Right'],
    name='Eye Opening Right',
), row=7, col=1)

fig.append_trace(go.Scatter(
    y=train_df['PERCLOS Value'],
    name='PERCLOS',
), row=8, col=1)

fig.append_trace(go.Scatter(
    y=train_df.Blinking,
    name='Blink',
), row=9, col=1)

fig.update_layout(height=1200, width=1200, title_text="Individual channels")
fig.show()

In [None]:
train_df

Unnamed: 0,EDA,ECG,Label,Left Pupil Diameter (m),Right Pupil Diameter (m),Eye Opening Left,Eye Opening Right,PERCLOS Value,Blinking
0,4.3114,-0.1822,2,0.002089,0.002242,0.4522,0.4111,0.5904,0
1,4.3129,-0.1535,2,0.002101,0.002183,0.4679,0.4372,0.5904,0
2,4.3120,-0.1209,2,0.002115,0.002224,0.4489,0.4276,0.5903,0
3,4.3129,-0.0918,2,0.002274,0.002392,0.5027,0.4294,0.5902,0
4,4.3116,-0.0696,2,0.002195,0.002318,0.4237,0.4388,0.5902,0
...,...,...,...,...,...,...,...,...,...
17995,5.0606,-0.0341,2,0.002624,0.002474,0.1496,0.0798,0.4570,0
17996,5.0598,0.0034,2,0.002624,0.002474,0.1496,0.0949,0.4570,0
17997,5.0613,0.0277,2,0.002624,0.002474,0.1496,0.0974,0.4571,0
17998,5.0589,0.0610,2,0.002624,0.002474,0.1496,0.0834,0.4571,0


In [None]:
train_df['idx_col'] = train_df.index
train_df

Unnamed: 0,EDA,ECG,Label,Left Pupil Diameter (m),Right Pupil Diameter (m),Eye Opening Left,Eye Opening Right,PERCLOS Value,Blinking,idx_col
0,4.3114,-0.1822,2,0.002089,0.002242,0.4522,0.4111,0.5904,0,0
1,4.3129,-0.1535,2,0.002101,0.002183,0.4679,0.4372,0.5904,0,1
2,4.3120,-0.1209,2,0.002115,0.002224,0.4489,0.4276,0.5903,0,2
3,4.3129,-0.0918,2,0.002274,0.002392,0.5027,0.4294,0.5902,0,3
4,4.3116,-0.0696,2,0.002195,0.002318,0.4237,0.4388,0.5902,0,4
...,...,...,...,...,...,...,...,...,...,...
17995,5.0606,-0.0341,2,0.002624,0.002474,0.1496,0.0798,0.4570,0,17995
17996,5.0598,0.0034,2,0.002624,0.002474,0.1496,0.0949,0.4570,0,17996
17997,5.0613,0.0277,2,0.002624,0.002474,0.1496,0.0974,0.4571,0,17997
17998,5.0589,0.0610,2,0.002624,0.002474,0.1496,0.0834,0.4571,0,17998


In [None]:
# Create the Gretel Synthtetics Training / Model Configuration

from pathlib import Path

checkpoint_dir = str(Path.cwd() / "checkpoints-sin")

config_template = {
    "epochs": 500,
    "early_stopping": False,
    "vocab_size": 20,
    "reset_states": True, 
    "checkpoint_dir": checkpoint_dir,
    "overwrite": True,
    "rnn_units" : 512,
}

In [None]:
# Capture transient import errors in Google Colab

try:
    from gretel_helpers.series_models import TimeseriesModel
except FileNotFoundError:
    from gretel_helpers.series_models import TimeseriesModel

# Params:
# - time_column: The single column name that represents your points in time
# - trend_columns: One or more columns that are the observations / measurements that are associated with
#                  the points in time. These should be numerical.
# - other_seed_columns: An optional list of other columns that should be used along with the time_column
#                       as seeds to the synthetic generator.

model = TimeseriesModel(
    training_df=train_df,
    time_column="idx_col",
    trend_columns=["EDA", "ECG", "Label", "Left Pupil Diameter (m)", "Right Pupil Diameter (m)", "Eye Opening Left", "Eye Opening Right", "PERCLOS Value", "Blinking"],
    synthetic_config=config_template
).train()

INFO model.py: Detecting record field delimiter...
INFO model.py: Analyzing DataFrame for optimal column batches and ordering...
INFO model.py: Creating model and data storage directories...
INFO batch.py: Creating directory structure for batch jobs...
INFO model.py: Generating training data from source dataset...
INFO batch.py: Generating training DF and CSV for batch 0
INFO model.py: Creating data validators...
INFO model.py: Creating validator for synthetic batch 0
100%|██████████| 18000/18000 [00:00<00:00, 35657.63it/s]


Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (64, None, 256)           5120      
_________________________________________________________________
dropout_12 (Dropout)         (64, None, 256)           0         
_________________________________________________________________
lstm_8 (LSTM)                (64, None, 512)           1574912   
_________________________________________________________________
dropout_13 (Dropout)         (64, None, 512)           0         
_________________________________________________________________
lstm_9 (LSTM)                (64, None, 512)           2099200   
_________________________________________________________________
dropout_14 (Dropout)         (64, None, 512)           0         
_________________________________________________________________
dense_4 (Dense)              (64, None, 20)           

In [None]:
model.generate??

In [None]:
synthetic_df = model.generate(max_invalid=36000).df

HBox(children=(FloatProgress(value=0.0, description='Valid record count ', max=18000.0, style=ProgressStyle(de…

HBox(children=(FloatProgress(value=0.0, description='Invalid record count ', max=36000.0, style=ProgressStyle(…

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_7 (Embedding)      (1, None, 256)            5120      
_________________________________________________________________
dropout_21 (Dropout)         (1, None, 256)            0         
_________________________________________________________________
lstm_14 (LSTM)               (1, None, 512)            1574912   
_________________________________________________________________
dropout_22 (Dropout)         (1, None, 512)            0         
_________________________________________________________________
lstm_15 (LSTM)               (1, None, 512)            2099200   
_________________________________________________________________
dropout_23 (Dropout)         (1, None, 512)            0         
_________________________________________________________________
dense_7 (Dense)              (1, None, 20)            

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=2, cols=1)

fig.append_trace(go.Scatter(
    y=train_df.EDA,
), row=1, col=1)

fig.append_trace(go.Scatter(
    y=synthetic_df.EDA,
), row=2, col=1)

fig.update_layout(height=600, width=1200, title_text="Original VS Synthetic")
fig.show()