Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage1 Training #27

Open
lychenyoko opened this issue Jun 4, 2021 · 2 comments
Open

Stage1 Training #27

lychenyoko opened this issue Jun 4, 2021 · 2 comments

Comments

@lychenyoko
Copy link

Hi, I tried to do stage-1 training from scratch via python3 train.py and got the output as:

/opt/conda/envs/discofacegan_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type,(1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/conda/envs/discofacegan_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type,(1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/conda/envs/discofacegan_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type,(1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/conda/envs/discofacegan_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type,(1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/conda/envs/discofacegan_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type,(1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/conda/envs/discofacegan_env/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type,(1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Creating the run dir: results/00004-sgan-ffhq256-4gpu
Copying files to the run dir
dnnlib: Running training.training_loop.training_loop() on localhost...

Calling Function: training.training_loop.training_loop

Streaming data using training.dataset.TFRecordDataset...
Dataset shape = [3, 256, 256]
Dynamic range = [0, 255]
Label size    = 0
---Debugging---
The run directory is: results/87
Traceback (most recent call last):
  File "train.py", line 121, in <module>
    main()
  File "train.py", line 116, in main
    dnnlib.submit_run(**kwargs)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/dnnlib/submission/submit.py", line 290, in submit_run
    run_wrapper(submit_config)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/dnnlib/submission/submit.py", line 242, in run_wrapper
    util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/dnnlib/util.py", line258, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/training/training_loop.py", line 139, in training_loop
    network_pkl = misc.locate_network_pkl(resume_run_id, resume_snapshot)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/training/misc.py", line 135, in locate_network_pkl
    pkls = list_network_pkls(run_id_or_run_dir_or_network_pkl)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/training/misc.py", line 118, in list_network_pkls
    run_dir = locate_run_dir(run_id_or_run_dir)
  File "/trainman-mount/trainman-storage-06c5b4b4-59b3-49f3-a31f-f513d0a7f027/2021_work/DiscoFaceGAN/training/misc.py", line 115, in locate_run_dir
    raise IOError('Cannot locate result subdir for run', run_id_or_run_dir)
OSError: [Errno Cannot locate result subdir for run] 87

Could you help me with this? Thanks!

@YuDeng
Copy link
Contributor

YuDeng commented Jun 7, 2021

Hi, this error is caused by the default setting in training_loop where resume_run_id, resume_snapshot, and resume_kimg are set to some specific number.

In stage1 training, these parameters should be set to None because the network is trained from scratch. I have fix this bug by replacing their default value with None. It should work now for stage1 training by running train.py.

@lychenyoko
Copy link
Author

Thanks and issue solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants