You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, occasionally while training I get this repeated warning message for many inputs (thousands of times in a row). This comes using default parameters with either gin/best/baseline_imagenet.gin or gin/best/baselinefinetune_imagenet.gin, on single or multiple GPUs. Training does not seem to fail. Is this possibly something on my end? Thanks in advance! I've attached example excerpt below.
2019-04-19 21:40:41.379233: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 72 2019-04-19 21:40:41.379277: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 459 2019-04-19 21:40:41.379319: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 306
The text was updated successfully, but these errors were encountered:
This is also something we have on our end, when doing batch (non-episodic) training.
The reason is that the "batch" pipeline is set up so that:
it goes through a full epoch of the dataset before starting iterating again over examples that have been seen previously, and
it samples each class proportionally to the number of total examples in that class (which is implemented by the Op in directed_interleave_dataset_op.cc).
So on average the proportion of examples from each class stays proportional to the class imbalance, but towards the end of an epoch, some classes have been sampled more often and are depleted of examples, so it has to re-sample until a class with remaining examples is selected.
An alternative would be to have .repeat() on each class's stream of examples before the class selector (sample_from_datasets()) rather than after. This would be done in BatchReader.create_dataset_input_pipeline, by calling self.construct_class_datasets(..., repeat=True) at the beginning and removing dataset = dataset.repeat() at the end.
We did not test whether that change would have any influence on training time or performance. If you end up trying it out, please let us know :).
If you experience that warning when doing episodic training, on the other hand, it could indicate that the limit of number of open files is too low (see ulimit -n), and some of the classes might not be read at all.
A limit of 1024 is not enough for training on "all" datasets. 100000 is enough, but there is probably a more reasonable value in between.
Hello, occasionally while training I get this repeated warning message for many inputs (thousands of times in a row). This comes using default parameters with either gin/best/baseline_imagenet.gin or gin/best/baselinefinetune_imagenet.gin, on single or multiple GPUs. Training does not seem to fail. Is this possibly something on my end? Thanks in advance! I've attached example excerpt below.
2019-04-19 21:40:41.379233: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 72 2019-04-19 21:40:41.379277: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 459 2019-04-19 21:40:41.379319: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 306
The text was updated successfully, but these errors were encountered: