Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single thread data transform is slow for huge table #151

Closed
liuzrcc opened this issue Apr 28, 2021 · 2 comments
Closed

Single thread data transform is slow for huge table #151

liuzrcc opened this issue Apr 28, 2021 · 2 comments
Labels
internal The issue doesn't change the API or functionality
Milestone

Comments

@liuzrcc
Copy link

liuzrcc commented Apr 28, 2021

Problem Description

About DataTransformer.transform() in ctgan/data_transformer.py. If the data has both too many rows and columns, the processing speed is quite low. For data of ~1.5M rows and ~1500 discrete columns, the transformation will take ~3 hours.

Expected behavior

Have you considered making the column traverse in parallel? Or is it possible at all?

Additional context

I think this question is also somehow related to #141 where it proposes to save intermediate results to make the repeat of the whole process faster. Here the question is about the possibility to accelerate the transformation process.

@liuzrcc liuzrcc changed the title Single thread data transform is slow for huge tabular Single thread data transform is slow for huge table Apr 28, 2021
@npatki
Copy link
Contributor

npatki commented May 21, 2021

Hi Zhuoran, thanks for filing this feedback! We're aware that there are many performance-related suggestions and I think it'll make a good focus for a future release.

Let's keep this open and I'll label this as a new feature. We can use this space to discuss parallelization in CTGAN, and will update it once we have improvements.

@npatki npatki added the internal The issue doesn't change the API or functionality label May 21, 2021
@katxiao
Copy link
Contributor

katxiao commented Aug 9, 2022

Resolved by #239 by @mfhbree

@katxiao katxiao closed this as completed Aug 9, 2022
@katxiao katxiao added this to the 0.5.2 milestone Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal The issue doesn't change the API or functionality
Projects
None yet
Development

No branches or pull requests

3 participants