Unable to use create_from_images_raw in training #2

aiXander · 2020-02-08T16:40:18Z

Whenever I create a dataset with create_from_images_raw, and then run a training job, I run into the following error:

Traceback (most recent call last):
  File "run_training.py", line 213, in <module>
    main()
  File "run_training.py", line 208, in main
    run(**vars(args))
  File "run_training.py", line 133, in run
    dnnlib.submit_run(**kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/training_loop.py", line 158, in training_loop
    training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 246, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 119, in __init__
    tfr_shapes.append(parse_tfrecord_np(record).shape)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 45, in parse_tfrecord_np
    data = ex.features.feature["data"].bytes_list.value[0]  # temporary pylint workaround # pylint: disable=no-member
IndexError: list index (0) out of range

The first issue was that training/dataset.py refers to (line 44)

data = ex.features.feature["data"].bytes_list.value[0]

instead of

data = ex.features.feature["img"].bytes_list.value[0]

But even with that adjustment the contents of the tf_records file seem to have wrong dimensions:

Traceback (most recent call last):
  File "run_training.py", line 213, in <module>
    main()
  File "run_training.py", line 208, in main
    run(**vars(args))
  File "run_training.py", line 133, in run
    dnnlib.submit_run(**kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/training_loop.py", line 158, in training_loop
    training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 246, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 119, in __init__
    tfr_shapes.append(parse_tfrecord_np(record).shape)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/dataset.py", line 48, in parse_tfrecord_np
    return np.fromstring(data, np.uint8).reshape(shape)
ValueError: cannot reshape array of size 228962 into shape (3,1024,1024)

The text was updated successfully, but these errors were encountered:

aiXander · 2020-02-08T17:24:22Z

Minimal code to reproduce this error:

import tensorflow as tf
import cv2
import numpy as np

shape = (3, 1024, 1024)
img = np.random.random((1024, 1024,3)) * 255
img = img.astype(np.uint8)
cv2.imwrite('test.jpg', img)

with tf.gfile.FastGFile('test.jpg', 'rb') as fid:
    encoded_jpg = fid.read()

ex = tf.train.Example(
    features=tf.train.Features(
        feature={
            "img":tf.train.Feature(bytes_list=tf.train.BytesList(value=[encoded_jpg]))
        }
    )
)

tfr_opt = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.NONE)
tfr_writer = tf.python_io.TFRecordWriter("test.tfrecords", tfr_opt)
tfr_writer.write(ex.SerializeToString())
tfr_writer.close()

for record in tf.python_io.tf_record_iterator("test.tfrecords", tfr_opt):
    ex = tf.train.Example()
    ex.ParseFromString(record)
    data = ex.features.feature["img"].bytes_list.value[0]
    
    img = np.fromstring(data, np.uint8).reshape(shape)

1kaiser · 2020-02-09T03:31:17Z

same here Local submit - run_dir: /content/drive/My Drive/stylegan2/results/00000-stylegan2-face-1gpu-config-f dnnlib: Running training.training_loop.training_loop() on localhost... Streaming data using training.dataset.TFRecordDataset... Traceback (most recent call last): File "run_training.py", line 209, in <module> main() File "run_training.py", line 204, in main run(**vars(args)) File "run_training.py", line 129, in run dnnlib.submit_run(**kwargs) File "/content/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/content/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/content/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(**submit_config.run_func_kwargs) File "/content/stylegan2/training/training_loop.py", line 156, in training_loop training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args) File "/content/stylegan2/training/dataset.py", line 239, in load_dataset dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs) File "/content/stylegan2/training/dataset.py", line 112, in __init__ tfr_shapes.append(parse_tfrecord_np(record).shape) File "/content/stylegan2/training/dataset.py", line 45, in parse_tfrecord_np 0 IndexError: list index (0) out of range

aiXander · 2020-02-09T09:55:01Z

Ok so the fix is to change two lines in training/dataset.py:

line 112/113: switch the comment to:

                #tfr_shapes.append(parse_tfrecord_np(record).shape)
                tfr_shapes.append(parse_tfrecord_np_raw(record))

line 166/167: switch the comment to:

            #dset = dset.map(parse_tfrecord_tf, num_parallel_calls=num_threads)
            dset = dset.map(parse_tfrecord_tf_raw, num_parallel_calls=num_threads)

pender · 2020-02-15T20:52:20Z

@tr1pzz

Ok so the fix is to change two lines in training/dataset.py...

I had the same issue as you. I tried your fix and got a different error:

pender@penderforge:/mnt/brian/stylegan2-nonsquare$ python3 run_training.py --num-gpus=1 --data-dir=datasets --config=config-e --dataset=nm-data-png --mirror-augment=true --metric=none --total-kimg=12000 --min-h=5 --min-w=3 --res-log2=7 --result-dir=results
Local submit - run_dir: results/00007-stylegan2-nm-data-png-1gpu-config-e
dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset...
Traceback (most recent call last):
  File "run_training.py", line 209, in <module>
    main()
  File "run_training.py", line 204, in main
    run(**vars(args))
  File "run_training.py", line 129, in run
    dnnlib.submit_run(**kwargs)
  File "/mnt/brian/stylegan2-nonsquare/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/mnt/brian/stylegan2-nonsquare/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/mnt/brian/stylegan2-nonsquare/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/mnt/brian/stylegan2-nonsquare/training/training_loop.py", line 156, in training_loop
    training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
  File "/mnt/brian/stylegan2-nonsquare/training/dataset.py", line 239, in load_dataset
    dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
  File "/mnt/brian/stylegan2-nonsquare/training/dataset.py", line 167, in __init__
    dset = dset.map(parse_tfrecord_tf_raw, num_parallel_calls=num_threads)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1913, in map
    self, map_func, num_parallel_calls, preserve_cardinality=False))
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in __init__
    use_legacy_function=use_legacy_function)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2713, in __init__
    self._function = wrapper_fn._get_concrete_function_internal()
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1853, in _get_concrete_function_internal
    *args, **kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1847, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2147, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2038, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2707, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
TypeError: in converted code:

    /mnt/brian/stylegan2-nonsquare/training/dataset.py:27 parse_tfrecord_tf_raw  *
        features = tf.parse_single_example(
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:1019 parse_single_example
        serialized, features, example_names, name
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:1063 parse_single_example_v2_unoptimized
        return parse_single_example_v2(serialized, features, name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:2093 parse_single_example_v2
        dense_defaults, dense_shapes, name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/parsing_ops.py:2210 _parse_single_example_v2_raw
        name=name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_parsing_ops.py:1201 parse_single_example
        dense_shapes=dense_shapes, name=name)
    /home/pender/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py:551 _apply_op_helper
        (prefix, dtypes.as_dtype(input_arg.type).name))

    TypeError: Input 'serialized' of 'ParseSingleExample' Op has type uint8 that does not match expected type of string.

I created the data set using dataset_tool.py after implementing your changes. This is on a custom data set from 6000+ png images that are each 384x640.

cyrilzakka · 2020-02-21T16:40:37Z

Also experiencing the same issue outlined by @pender. Using (1024, 1024, 3) 150K JPEG images. Honestly have no clue what's causing it but currently investigating.

cyrilzakka · 2020-02-22T10:21:31Z

I've discovered that the error only happens when using create_from_images_raw and not create_from_images. Will investigate further after my exams.

sdtblck · 2020-02-28T21:35:01Z

Having the same error here, same resolution and filetype of images

aiXander · 2020-03-15T02:18:49Z

I've updated my bugfix, I accidentally posted the wrong change line here, sorry for that!
Current version of the fix should work

cyrilzakka mentioned this issue Feb 21, 2020

Unable to use run_training.py with custom dataset skyflynil/stylegan2#7

Open

EvgenyKashin mentioned this issue Mar 25, 2020

Add using raw dataset by default, fix dataset.close() bug #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to use create_from_images_raw in training #2

Unable to use create_from_images_raw in training #2

aiXander commented Feb 8, 2020 •

edited

Loading

aiXander commented Feb 8, 2020 •

edited

Loading

1kaiser commented Feb 9, 2020

aiXander commented Feb 9, 2020 •

edited

Loading

pender commented Feb 15, 2020

cyrilzakka commented Feb 21, 2020

cyrilzakka commented Feb 22, 2020

sdtblck commented Feb 28, 2020

aiXander commented Mar 15, 2020

Unable to use create_from_images_raw in training #2

Unable to use create_from_images_raw in training #2

Comments

aiXander commented Feb 8, 2020 • edited Loading

aiXander commented Feb 8, 2020 • edited Loading

1kaiser commented Feb 9, 2020

aiXander commented Feb 9, 2020 • edited Loading

pender commented Feb 15, 2020

cyrilzakka commented Feb 21, 2020

cyrilzakka commented Feb 22, 2020

sdtblck commented Feb 28, 2020

aiXander commented Mar 15, 2020

aiXander commented Feb 8, 2020 •

edited

Loading

aiXander commented Feb 8, 2020 •

edited

Loading

aiXander commented Feb 9, 2020 •

edited

Loading