What is the number of training epochs? #66

apavlo89 · 2020-08-03T15:27:00Z

CTGAN version: latest
Python version: 3.7.7
Operating System: Windows 10

Description

Not so much an issue but more of a question. What is the default number of training epochs if I don't specify the number?

What I Did


import ctgan as ctgan
import pandas as pd
import numpy as np
#STEP 1: Load data

data = pd.read_csv('D:/test/Machine Learning/FULLDATA.csv')

discrete_columns = list(data.columns)  #selects column names only

#Step 2: Fit CTGAN to your data
#Once you have the data ready, you need to import and create an instance of the CTGANSynthesizer class and fit it passing your data and the list of discrete columns.

from ctgan import CTGANSynthesizer

ctgan = CTGANSynthesizer()
ctgan.fit(data, discrete_columns)

#create synthetic data for x number of rows
samples = ctgan.sample(1000)

#save synthetic database to csv
samples.to_csv(r'D:/test/Machine Learning/syntheticdatabase.csv')

The text was updated successfully, but these errors were encountered:

csala · 2020-08-03T15:30:36Z

Hello @apavlo89

The default number of epochs is 300.

If you want to know the default values for this and other arguments you can have a look at the API Reference section in our documentation: https://sdv-dev.github.io/CTGAN/api/ctgan.synthesizer.html#ctgan.synthesizer.CTGANSynthesizer

apavlo89 · 2020-08-03T15:39:05Z

Thank you very much! Is there a specific reason for choosing 300 epochs as the default? Is there some kind of optimum metric for the number of epochs based on the database?

gautambak · 2020-08-04T16:26:53Z

Thank you very much! Is there a specific reason for choosing 300 epochs as the default? Is there some kind of optimum metric for the number of epochs based on the database?

I assumed it was just a default number.

Not sure if this helps but in the demo in the readme you can set the epochs by ctgan.fit(data, discrete_columns, epochs=5)
or in the CTGANSynthesizer class you can adjust it within the fit function. Maybe play with the epochs to see what works best for your data?

apavlo89 · 2020-08-04T16:38:47Z

I'm quite new to machine learning especially in network techniques so would you say there's a pattern to look for in each/after a few epochs? What am I aiming for? I'd say after epoch 150 my Loss D and Loss G values were hovering around a specific range of values.

My computer then ran out of RAM at 215 epochs. In Epoch 215 the Generator Loss and Discriminator Loss was: G: 1.6974, Loss D: -77.0800.

It gave me the error DefaultCPUAllocator: not enough memory: you tried to allocate 2709625764 bytes. Buy new RAM! :(

gautambak · 2020-08-04T17:01:19Z

I'm learning also so I can't be too much help but I think you can experiment with different settings and datasets then see if they make sense to you?

As for memory, try google colab you can add the line !pip install ctgan - you can get more memory + a gpu for free.

apavlo89 · 2020-08-04T17:52:38Z

I'm learning also so I can't be too much help but I think you can experiment with different settings and datasets then see if they make sense to you?

As for memory, try google colab you can add the line !pip install ctgan - you can get more memory + a gpu for free.

Wow, that is totally amazing! THANKS!

csala · 2020-08-05T16:11:18Z

Thank you very much! Is there a specific reason for choosing 300 epochs as the default? Is there some kind of optimum metric for the number of epochs based on the database?

The default values for the model hyperparameters are, in most cases, the ones that were used to generate the results on the paper.

Regarding the value 300 in particular, the number was decided based on the performance obtained on the different datasets that were used for benchmarking, but different datasets might require different settings. In most cases, a lower number of epochs, of just a few dozens, can be more than enough to explore a particular problem a bit faster and get an idea of what the model can do on your data. However, if you want to get the most out of the model, you will probably need to tweak it a little bit and find the optimal value for each dataset you work on.

It gave me the error DefaultCPUAllocator: not enough memory: you tried to allocate 2709625764 bytes. Buy new RAM! :(

Yeah, that's indeed quite an annoying error message to get, but it comes directly from PyTorch. There isn't much that we can do about it!

csala · 2020-10-20T13:52:04Z

Closing this, as the question has already been responded.

csala added the question General question about the software label Aug 3, 2020

csala closed this as completed Oct 20, 2020

JulienGervai mentioned this issue Oct 29, 2020

Best number of epochs sdv-dev/SDV#222

Closed

npatki mentioned this issue Jun 3, 2022

CTGAN loss values documentation sdv-dev/SDV#812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the number of training epochs? #66

What is the number of training epochs? #66

apavlo89 commented Aug 3, 2020

csala commented Aug 3, 2020 •

edited

apavlo89 commented Aug 3, 2020 •

edited

gautambak commented Aug 4, 2020

apavlo89 commented Aug 4, 2020

gautambak commented Aug 4, 2020

apavlo89 commented Aug 4, 2020

csala commented Aug 5, 2020

csala commented Oct 20, 2020

What is the number of training epochs? #66

What is the number of training epochs? #66

Comments

apavlo89 commented Aug 3, 2020

Description

What I Did

csala commented Aug 3, 2020 • edited

apavlo89 commented Aug 3, 2020 • edited

gautambak commented Aug 4, 2020

apavlo89 commented Aug 4, 2020

gautambak commented Aug 4, 2020

apavlo89 commented Aug 4, 2020

csala commented Aug 5, 2020

csala commented Oct 20, 2020

csala commented Aug 3, 2020 •

edited

apavlo89 commented Aug 3, 2020 •

edited