Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF2 #3

Open
firmai opened this issue Apr 9, 2020 · 24 comments
Open

TF2 #3

firmai opened this issue Apr 9, 2020 · 24 comments

Comments

@firmai
Copy link

firmai commented Apr 9, 2020

Hi I just want to know whether you are perhaps planning on releasing a version for Tensor Flow 2, it would probably be around for the next few years and I think this is an interesting repository that could be used more in the near future. Thanks for your work!

@Baukebrenninkmeijer
Copy link

This, or a pytorch version would both be super great to have. TF 1.4 is kind of a bummer :(.

@shaanchandra
Copy link

Is there a Pytorch implementation available? Tensorflow is really hard to work with now. If anyone has worked or wants to collaborate on open-sourcing a Pytorch version of this, lemme know! I will be interested :)

@fjxmlzn
Copy link
Owner

fjxmlzn commented Aug 12, 2021

Thank you all for the suggestions, and I agree that TF2 or PyTorch version of DoppelGANger would be very useful. Unfortunately, we do not have that so far. If/When you have a TF2 or PyTorch implementation, please let me know I'll add a link to it. Thank you!

@yzion
Copy link

yzion commented Dec 6, 2021

did someone managed to update it to TF2?

@chameleonzz
Copy link

Hi, when I installed TensorFlow 1.4.0, pycharm warned that python 3.5 has reached its end-of-life date and it is no longer supported in pycharm. The DoppelGANger seemingly not worked normally. Is there any solution?

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 12, 2022

@chameleonzz Could you please post error messages or screenshots of the errors?

@chameleonzz
Copy link

chameleonzz commented Jul 12, 2022 via email

@yzion
Copy link

yzion commented Jul 12, 2022

Hi @chameleonzz
Try to run it with the TF2 branch. In this branch there is a support in tensorflow 2 so you can use later version of python and cuda.
Please update here if it solved your problem

@chameleonzz
Copy link

It is feasible for DG with TF 2.1.0 and Python 3.7. However, when I tried to run the code gan_task.py of 'example_training', there was a warning "unresolved reference 'gan' ". I tried to pip install gan, but it seems no corresponding package names gan. How to solve the problem? Thank you for your help.

@yzion
Copy link

yzion commented Jul 13, 2022

@chameleonzz can you share more information?
what was the python command that you ran? can you share the full warning? is it running with this warning?
the folder gan is part of the project so maybe there is an improt issue that you need to solve.
gan_folder

@chameleonzz
Copy link

Thank you for your answer. I know how to solve the problem finally. If I want to use the DoppelGANger, there are three main steps. Firstly, a virtual environment is needed to be built, such as tf 2.1.0 + python 3.7. Secondly, pip some packages, such as gan, GPUTaskScheduler and Tensorflow-privacy. Gan package can be downloaded in DoppelGANger Github. Those packages can be downloaded in Github and installed with the command 'pip install -e path/package_file_name. Thirdly, open the entire DoppelGANger(DG) item with pycharm.
But I was confused with another problem. In DG, there are some examples, the type of data files includes '.pkl' and '.npz'. If I want to create a similar data file with my data, how to decide the attributes and features of my data. If there are row data including goggle, web and FCC_MBA and an explanatory for how to decide the attributes and attributes of those data. Maybe more people can understand the work more easily. Besides, what outputs file will be generated after running each example project?
At last, thank you very much for your enthusiastic answers each time. I wish the DG project can be used in more data-driven research. It is really a significant work.

@yzion
Copy link

yzion commented Jul 17, 2022

Any time :)
you can look for examples in the README file of the project.
there are exmaples for the pkl files and also fot the npz files.
if you need there are also links to download the dataset that was used in this project. so you can download it and read it with python to look on the structre.
moreover, there are links to a number of blogposts so you can try used them.
if you still struggling let me know and I will try to help
Good luck

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 17, 2022

Thank @yzion for the help and the answers!

@chameleonzz Re: how to decide the attributes and features for your own data.

The definition of features and attributes can be very flexible, depending on the aspects of the data you want DoppelGANger to capture. More specifically, let's take a simple example. Let's say your original data is a table in the following format.

ColumnA ColumnB ColumnC
1 2 3
1 2 4
2 2 3
2 2 5
2 2 6

You can treat any (even several) columns as attributes (or metadata), and group the rows according to those attributes, and treat the rest of the columns as features (or time-series).

For example, you can choose to treat ColumnA and ColumnB as attributes, and ColumnC as the feature. You will get 2 samples: {attributes=(1,2), features=(3,4)}, {attributes=(2,2), features=(3,5,6)}. DoppelGANger (ideally) will be able to learn the temporal correlations of features that are associated with the same attribute (i.e., (3,4) in the first sample, and (3,5,6) in the second sample). But you can also choose to treat only ColumnA as the attributes, or any other combinations of the columns you want. In short, how to choose features/attributes depends on the context of your application, and which part you want DoppelGANger to model as temporal correlations.

Hope this clarification helps!

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 17, 2022

By the way, for future readers of this thread:

If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion

If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.

@chameleonzz
Copy link

By the way, for future readers of this thread:

If you are looking for TF2 implementation of DoppelGANger, you can look at https://github.com/fjxmlzn/DoppelGANger/tree/TF2 by @yzion

If you are looking for PyTorch implementation of DoppelGANger, you can look at https://synthetics.docs.gretel.ai/en/stable/models/timeseries_dgan.html#timeseries-dgan by Gretel AI.

Recently, I met with another problem.
I tried to run main.py in the example_training file and main_generate_data.py in the example_generating_data file. However, the result was that only a file named results was created. And in sub-files of 'results', there was only a worker_*.log.txt.
Q1: Why no synthetic datasets of [web/google/FCC_MBA] were generated?
Snipaste_2022-07-24_23-13-48
I looked for whether there is a place in the code to specify the dataset path. But I found nothing.

Q2: When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz. Whether another codes need to be written to achieve this work?

At last, thank you for your continued patient answers.

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 24, 2022

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

@chameleonzz
Copy link

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

results

After running example_training/main.py, the content of worker.log was as follows.(The 'aux_disc-False,dataset-FCC_MBA,epoch-17000,epoch_checkpoint_freq-70,extra_checkpoint_freq-850,run-0,sample_len-1,self_norm-False,' file was taken as an example.)
workerlog

After running example_generating_data/main_generate_data.py, the content of worker_generate_data.log was as follows.
worker_generate_data_log

I wonder if the results of example_training/main.py and example_generating_data/main_generate_data.py only have those output files? If I want to generate synthetic data corresponding to real datasets (web/google/FCC_MBA), what should I do?

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 25, 2022

@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.

Could you please delete results folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?

@chameleonzz
Copy link

@chameleonzz No, there should be other files, and the content of worker.log or worker_generate_data.log should be more than this line.

Could you please delete results folder completely, and try running example_training/main.py again, and paste here the console output plus the content of worker.log again?

Thanks for your answer.
I have tried several times to delete results folder completely, and try running example_training/main.py again. But the output has also no change. It was the same as in the previous pictures.
Should I change some places in example_training/main.py and run it again?

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 25, 2022

Could you please paste here the console (i.e., terminal) output?

@fjxmlzn
Copy link
Owner

fjxmlzn commented Jul 25, 2022

@chameleonzz
Also, we can move our future discussion of this question to #30 since the problem you see should likely not be due to TF2

@JimmyZhan1213
Copy link

JimmyZhan1213 commented Sep 2, 2022

Hello, have you solved the problem of incomplete training and generated output now? I had a similar problem recently and I only had a worker.log under the folder I generated.
image
image

@fjxmlzn
Copy link
Owner

fjxmlzn commented Sep 2, 2022

For the previous problem, please refer to #30. For this issue, would you mind creating a new issue? We can discuss it there. This is a different problem.

@chameleonzz
Copy link

Re: Q1. Can you share the content of worker_generate_data.log? Also, after running example_training/main.py, you should see another worker.log in these sub-folders. Did you see them?

Re: Q2. Yes, another code needs to be written. You can refer to the README for an example of what those files should look like (after 'Let's look at a concrete example'). I will soon create an example of how these files were created for the datasets in our paper and share it here.

Recently, the example_traning\main.py was re-run on a computer with intel i7-11800H CPU @2.30 GHz and 64 GB memory. It cost 5740 minutes to generate the results file named ‘dataset-google,epoch-400,run-0,sample-len-1’. And the results file named 'dataset-google,epoch-400,run-0,sample-len-5' is generating now?
I have three questions now.

  1. When I know the attributes and features of my datasets, how to generate the four files including data_attribute_output.pkl, data_feature_output.pkl, data_test.npz and data_train.npz.
  2. How to decide parameters and Hyperparameters, such as epoch, extra_checkpoint_freq, and so on.
  3. Now I have not run the example_generating_data\main_generate_data.py, will a result file whose format is a CSV or Xls is generated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants