Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use create_from_images_raw in training #2

Open
aiXander opened this issue Feb 8, 2020 · 8 comments
Open

Unable to use create_from_images_raw in training #2

aiXander opened this issue Feb 8, 2020 · 8 comments

Comments

@aiXander
Copy link

aiXander commented Feb 8, 2020

Whenever I create a dataset with create_from_images_raw, and then run a training job, I run into the following error:

Traceback (most recent call last):
  File "run_training.py", line 213, in <module>
    main()
  File "run_training.py", line 208, in main
    run(**vars(args))
  File "run_training.py", line 133, in run
    dnnlib.submit_run(**kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/training_loop.py", line 158, in training_loop
    training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 246, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 119, in __init__
    tfr_shapes.append(parse_tfrecord_np(record).shape)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 45, in parse_tfrecord_np
    data = ex.features.feature["data"].bytes_list.value[0]  # temporary pylint workaround # pylint: disable=no-member
IndexError: list index (0) out of range

The first issue was that training/dataset.py refers to (line 44)

data = ex.features.feature["data"].bytes_list.value[0]

instead of

data = ex.features.feature["img"].bytes_list.value[0]

But even with that adjustment the contents of the tf_records file seem to have wrong dimensions:

Traceback (most recent call last):
  File "run_training.py", line 213, in <module>
    main()
  File "run_training.py", line 208, in main
    run(**vars(args))
  File "run_training.py", line 133, in run
    dnnlib.submit_run(**kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/training_loop.py", line 158, in training_loop
    training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 246, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 119, in __init__
    tfr_shapes.append(parse_tfrecord_np(record).shape)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 48, in parse_tfrecord_np
    return np.fromstring(data, np.uint8).reshape(shape)
ValueError: cannot reshape array of size 228962 into shape (3,1024,1024)
@aiXander
Copy link
Author

aiXander commented Feb 8, 2020

Minimal code to reproduce this error:

import tensorflow as tf
import cv2
import numpy as np

shape = (3, 1024, 1024)
img = np.random.random((1024, 1024,3)) * 255
img = img.astype(np.uint8)
cv2.imwrite('test.jpg', img)

with tf.gfile.FastGFile('test.jpg', 'rb') as fid:
    encoded_jpg = fid.read()

ex = tf.train.Example(
    features=tf.train.Features(
        feature={
            "img":tf.train.Feature(bytes_list=tf.train.BytesList(value=[encoded_jpg]))
        }
    )
)

tfr_opt = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.NONE)
tfr_writer = tf.python_io.TFRecordWriter("test.tfrecords", tfr_opt)
tfr_writer.write(ex.SerializeToString())
tfr_writer.close()

for record in tf.python_io.tf_record_iterator("test.tfrecords", tfr_opt):
    ex = tf.train.Example()
    ex.ParseFromString(record)
    data = ex.features.feature["img"].bytes_list.value[0]
    
    img = np.fromstring(data, np.uint8).reshape(shape)

@1kaiser
Copy link

1kaiser commented Feb 9, 2020

same here Local submit - run_dir: /content/drive/My Drive/stylegan2/results/00000-stylegan2-face-1gpu-config-f dnnlib: Running training.training_loop.training_loop() on localhost... Streaming data using training.dataset.TFRecordDataset... Traceback (most recent call last): File "run_training.py", line 209, in <module> main() File "run_training.py", line 204, in main run(**vars(args)) File "run_training.py", line 129, in run dnnlib.submit_run(**kwargs) File "/content/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/content/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/content/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(**submit_config.run_func_kwargs) File "/content/stylegan2/training/training_loop.py", line 156, in training_loop training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args) File "/content/stylegan2/training/dataset.py", line 239, in load_dataset dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs) File "/content/stylegan2/training/dataset.py", line 112, in __init__ tfr_shapes.append(parse_tfrecord_np(record).shape) File "/content/stylegan2/training/dataset.py", line 45, in parse_tfrecord_np 0 IndexError: list index (0) out of range

@aiXander
Copy link
Author

aiXander commented Feb 9, 2020

Ok so the fix is to change two lines in training/dataset.py:

line 112/113: switch the comment to:

                #tfr_shapes.append(parse_tfrecord_np(record).shape)
                tfr_shapes.append(parse_tfrecord_np_raw(record))

line 166/167: switch the comment to:

            #dset = dset.map(parse_tfrecord_tf, num_parallel_calls=num_threads)
            dset = dset.map(parse_tfrecord_tf_raw, num_parallel_calls=num_threads)

@pender
Copy link

pender commented Feb 15, 2020

@tr1pzz

Ok so the fix is to change two lines in training/dataset.py...

I had the same issue as you. I tried your fix and got a different error:

pender@penderforge:/mnt/brian/stylegan2-nonsquare$ python3 run_training.py --num-gpus=1 --data-dir=datasets --config=config-e --dataset=nm-data-png --mirror-augment=true --metric=none --total-kimg=12000 --min-h=5 --min-w=3 --res-log2=7 --result-dir=results
Local submit - run_dir: results/00007-stylegan2-nm-data-png-1gpu-config-e
dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset...
Traceback (most recent call last):
  File "run_training.py", line 209, in <module>
    main()
  File "run_training.py", line 204, in main
    run(**vars(args))
  File "run_training.py", line 129, in run
    dnnlib.submit_run(**kwargs)
  File "/mnt/brian/stylegan2-nonsquare/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/mnt/brian/stylegan2-nonsquare/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/mnt/brian/stylegan2-nonsquare/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/mnt/brian/stylegan2-nonsquare/training/training_loop.py", line 156, in training_loop
    training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
  File "/mnt/brian/stylegan2-nonsquare/training/dataset.py", line 239, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
  File "/mnt/brian/stylegan2-nonsquare/training/dataset.py", line 167, in __init__
    dset = dset.map(parse_tfrecord_tf_raw, num_parallel_calls=num_threads)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1913, in map
    self, map_func, num_parallel_calls, preserve_cardinality=False))
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in __init__
    use_legacy_function=use_legacy_function)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2713, in __init__
    self._function = wrapper_fn._get_concrete_function_internal()
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1853, in _get_concrete_function_internal
    *args, **kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1847, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2147, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2038, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2707, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in converted code:

    /mnt/brian/stylegan2-nonsquare/training/dataset.py:27 parse_tfrecord_tf_raw  *
        features = tf.parse_single_example(
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:1019 parse_single_example
        serialized, features, example_names, name
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:1063 parse_single_example_v2_unoptimized
        return parse_single_example_v2(serialized, features, name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:2093 parse_single_example_v2
        dense_defaults, dense_shapes, name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:2210 _parse_single_example_v2_raw
        name=name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_parsing_ops.py:1201 parse_single_example
        dense_shapes=dense_shapes, name=name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py:551 _apply_op_helper
        (prefix, dtypes.as_dtype(input_arg.type).name))

    TypeError: Input 'serialized' of 'ParseSingleExample' Op has type uint8 that does not match expected type of string.

I created the data set using dataset_tool.py after implementing your changes. This is on a custom data set from 6000+ png images that are each 384x640.

@cyrilzakka
Copy link

Also experiencing the same issue outlined by @pender. Using (1024, 1024, 3) 150K JPEG images. Honestly have no clue what's causing it but currently investigating.

@cyrilzakka
Copy link

I've discovered that the error only happens when using create_from_images_raw and not create_from_images. Will investigate further after my exams.

@sdtblck
Copy link

sdtblck commented Feb 28, 2020

Having the same error here, same resolution and filetype of images

@aiXander
Copy link
Author

I've updated my bugfix, I accidentally posted the wrong change line here, sorry for that!
Current version of the fix should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants