Skip to content

Commit

Permalink
Updating cbn_sample_size to 250_000
Browse files Browse the repository at this point in the history
* Updating cbn_sample_size

* Set cbn_sample_size default for the ACTGAN class in actgan_wrapper

* Update Docstrings

* changing None to 0

GitOrigin-RevId: 81b83d2f91fc794147a78c37474f8d53c62057eb
  • Loading branch information
mvansegbroeck committed Aug 2, 2023
1 parent 0546ca6 commit fb1bd22
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
6 changes: 3 additions & 3 deletions src/gretel_synthetics/actgan/actgan.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,8 +206,8 @@ class ACTGANSynthesizer(BaseSynthesizer):
be replaced by a random value that is a known mode for a given column.
cbn_sample_size:
Number of rows to sample from each column for identifying clusters for the cluster-based normalizer.
This only applies to float columns. By default, no sampling is done and all values are considered,
which may be very slow.
This only applies to float columns. If set to ``0``, no sampling is done and all values are considered,
which may be very slow. Defaults to 250_000.
log_frequency:
Whether to use log frequency of categorical levels in conditional
sampling. Defaults to ``True``.
Expand Down Expand Up @@ -239,7 +239,7 @@ def __init__(
discriminator_steps: int = 1,
binary_encoder_cutoff: int = 500,
binary_encoder_nan_handler: Optional[str] = None,
cbn_sample_size: Optional[int] = None,
cbn_sample_size: Optional[int] = 250_000,
log_frequency: bool = True,
verbose: bool = False,
epochs: int = 300,
Expand Down
6 changes: 3 additions & 3 deletions src/gretel_synthetics/actgan/actgan_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,8 +279,8 @@ class ACTGAN(_ACTGANModel):
be replaced by a random value that is a known mode for a given column.
cbn_sample_size:
Number of rows to sample from each column for identifying clusters for the cluster-based normalizer.
This only applies to float columns. By default, no sampling is done and all values are considered,
which may be very slow.
This only applies to float columns. If set to ``0``, no sampling is done and all values are considered,
which may be very slow. Defaults to 250_000.
log_frequency:
Whether to use log frequency of categorical levels in conditional
sampling. Defaults to ``True``.
Expand Down Expand Up @@ -331,7 +331,7 @@ def __init__(
discriminator_steps: int = 1,
binary_encoder_cutoff: int = 500,
binary_encoder_nan_handler: Optional[str] = None,
cbn_sample_size: Optional[int] = None,
cbn_sample_size: Optional[int] = 250_000,
log_frequency: bool = True,
verbose: bool = False,
epochs: int = 300,
Expand Down

0 comments on commit fb1bd22

Please sign in to comment.