-
Notifications
You must be signed in to change notification settings - Fork 86
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Prep for ACTGAN SAL release GitOrigin-RevId: 18a0b574982197ec87aa45f155e60c1bceb188c6
- Loading branch information
1 parent
c678d70
commit 553c1cf
Showing
10 changed files
with
384 additions
and
162 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
ACTGAN | ||
====== | ||
|
||
The ACTGAN sub-package contains an alternate implementation of the SDV CTGAN model. It | ||
provides some improvement and automation around automatic detection of datetime fields | ||
and optional usage of a binary encoder for discrete columns for better memory usage. | ||
|
||
Please see the "ACTGAN_Demo" Notebook in the "examples" directory in the repository root. | ||
|
||
|
||
.. automodule:: gretel_synthetics.actgan.actgan_wrapper | ||
:members: | ||
|
||
.. automodule:: gretel_synthetics.actgan.structures | ||
:members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7e237ea2", | ||
"metadata": {}, | ||
"source": [ | ||
"# ACTGAN\n", | ||
"\n", | ||
"This Notebook provides an overview of how to use ACTGAN. It is compatable with SDV CTGAN from version 0.17.X of SDV. The notable changes are exposed through additional keyword parameters when creating the `ACTGAN` instance. Specifically:\n", | ||
"\n", | ||
"- Binary encoding usage. CTGAN uses One Hot Encoding for discrete/categorical columns which can lead to memory issues depending on the cardinality of these columns. You may now specify a cardinality cutoff that will trigger the switch to using a binary encoder, which saves significant memory usage.\n", | ||
"\n", | ||
"\n", | ||
"- Auto datetime detection. When enabled, each column will be scanned for potential DateTime values. The strfmt of each column will be determined and the underlying SDV Table Metadata will be automatically configured to use a `UnixTimestampEncoder` for these columns. This will give better variability during data sampling and prevent DateTime\n", | ||
"columns from being treated as categorical.\n", | ||
"\n", | ||
"- Empty field detection. Any columns that are empty (or all NaN) will be transformed for fitting and reverse transformed to being empty during sampling. Empty columns can cause training execptions otherwise.\n", | ||
"\n", | ||
"\n", | ||
"- Epoch callback. Optionally allow the passing of an `EpochInfo` object to any callable when a training epoch completes." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2347c31b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import pandas as pd\n", | ||
"\n", | ||
"from gretel_synthetics.actgan import ACTGAN" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8d0676a7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"train_df = pd.read_csv(\"http://gretel-public-website.s3-website-us-west-2.amazonaws.com/datasets/311_call_center_10k.csv\")\n", | ||
"train_df.head()\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "05d2208a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"class EpochTracker:\n", | ||
" \"\"\"\n", | ||
" Simple example that just accumulates ``EpochInfo`` events,\n", | ||
" but demonstrates how you can route epoch information to\n", | ||
" arbitrary callables during model fitting.\n", | ||
" \"\"\"\n", | ||
" \n", | ||
" def __init__(self):\n", | ||
" self.epochs = []\n", | ||
" \n", | ||
" def add(self, epoch_data):\n", | ||
" self.epochs.append(epoch_data)\n", | ||
" \n", | ||
"epoch_tracker = EpochTracker()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7f127e44", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model = ACTGAN(\n", | ||
" verbose=True,\n", | ||
" binary_encoder_cutoff=10, # use a binary encoder for data transforms if the cardinality of a column is below this value\n", | ||
" auto_transform_datetimes=True,\n", | ||
" epochs=100,\n", | ||
" epoch_callback=epoch_tracker.add\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8b8ef7f0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model.fit(train_df)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "42618395", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Tracked and stored epoch information\n", | ||
"\n", | ||
"epoch_tracker.epochs[42]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "9e5acde6", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"syn_df = model.sample(100)\n", | ||
"syn_df.head()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.