**Fine-tuning a synthetic data model to generate synthetic data based on a 311 call center dataset using gretel_synthetics ACTGAN**

This notebook is an example on how to fine-tune Gretel's open source synthetic data models to produce synthetic data that retains the statistical properties of our original data.

In this example, we use [ACTGAN](https://docs.gretel.ai/create-synthetic-data/models/synthetics/gretel-actgan), a generative adversarial network which is best suited for tabular, structured numerical, high column count data.

Steps
1. Load the training data
2. Create and configure model
3. Train model on training data
4. Sample synthetic data from the model
5. Evaluate the quality of the synthetic output

In [None]:
# Import Pandas for importing and working with DataFrames
import pandas as pd
# Import ACTGAN from gretel_synthetics
from gretel_synthetics.actgan import ACTGAN

In [None]:
# Load the training dataset as a Pandas DataFrame
train_df = pd.read_csv("./data/311_call_center_10k.csv")
train_df.head()

In [None]:
# Create the model and specify configuration
NUM_EPOCHS = 500
model = ACTGAN(
    verbose=True,
    binary_encoder_cutoff=10, # use a binary encoder for data transforms if the cardinality of a column is below this value
    auto_transform_datetimes=True,
    epochs=NUM_EPOCHS,
)

In [None]:
# Train the model on the training dataset
model.fit(train_df)

In [None]:
# Sample synthetic data from the model
NUM_SYN_SAMPLES = 10000
syn_df = model.sample(NUM_SYN_SAMPLES)
syn_df.head()

In [None]:
# Save the generated data to a CSV file
syn_df.to_csv("./out/syn.csv");

In [None]:
#syn_df = pd.read_csv("./out/syn.csv", index_col = 0);

In [None]:
import gretel_synthetics.utils.stats as stats
import plotly as pl
import plotly.figure_factory as plff

In [None]:
train_correl = stats.calculate_correlation(train_df)
train_correl.head()

In [None]:
syn_correl = stats.calculate_correlation(syn_df)
syn_correl.head()

In [None]:
correl_diff = abs(train_correl - syn_correl)
correl_diff.head()

In [None]:
pl.express.imshow(train_correl)

In [None]:
pl.express.imshow(syn_correl)

In [None]:
pl.express.imshow(correl_diff)

In [None]:
pca_train = stats.compute_pca(train_df)
pca_syn = stats.compute_pca(syn_df)

In [None]:
pl.express.scatter(pca_train)

In [None]:
pl.express.scatter(pca_syn)