New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving ImageNet-1k support #749
Comments
I agree, honestly I didn't understand the raptly/rationale to my question at #735 (comment) |
Agreed with TFDS approach for simplicity. I think it's also possible to use local path instead of GCS bucket. |
Yes, it's possible. However, keeping things inside a GCS Bucket is necessary to leverage TPU-based training runs. So, it kind of solves different purposes. |
tfds still requires you to download the dataset manually. Are you referring to the process of converting from .tar.gz to TFRecords? |
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further. |
W.r.t the current support for ImageNet-1k, we can improve things:
First, the user needs to keep the
ILSVRC2012_img_train.tar
andILSVRC2012_img_val.tar
archives to this path:gs://[BUCKET-NAME]/tensorflow_datasets/downloads/manual
.builder.download_and_prepare()
takes some time but it's lesser than what the current process of obtaining the initial TFRecords takes.tfds.load("imagenet2012", data_dir=data_dir)
and that is it.The above two points assume the user already has access to the GCS bucket and all the necessary privileges to write data into it.
General recommendations
W.r.t
keras-cv/keras_cv/datasets/imagenet/load.py
Line 92 in e607e05
enable interleaved reading by setting
num_parallel_reads=tf.data.AUTOTUNE
.W.r.t
keras-cv/keras_cv/datasets/imagenet/load.py
Line 113 in e607e05
enable prefetching of a few batches so that the accelerator doesn't have to wait by using
dataset.prefetch(tf.data.AUTOTUNE)
.The text was updated successfully, but these errors were encountered: