Improving ImageNet-1k support #749

sayakpaul · 2022-08-30T16:11:22Z

W.r.t the current support for ImageNet-1k, we can improve things:

First, let's start leveraging TFDS. It significantly reduces the work expected to be done by a user. Let's walk through an example.

First, the user needs to keep the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar archives to this path: gs://[BUCKET-NAME]/tensorflow_datasets/downloads/manual.

One this is done, the user does the following:

import tensorflow_datasets as tfds

data_dir = "gs://[BUCKET-NAME]/tensorflow_datasets"
builder = tfds.builder("imagenet2012", data_dir=data_dir)
builder.download_and_prepare()

builder.download_and_prepare() takes some time but it's lesser than what the current process of obtaining the initial TFRecords takes.

The the user can load the ImageNet-1k dataset with tfds.load("imagenet2012", data_dir=data_dir) and that is it.

The above two points assume the user already has access to the GCS bucket and all the necessary privileges to write data into it.

General recommendations

W.r.t

keras-cv/keras_cv/datasets/imagenet/load.py

Line 92 in e607e05

dataset = tf.data.TFRecordDataset(filenames=filenames)

enable interleaved reading by setting num_parallel_reads=tf.data.AUTOTUNE.

W.r.t

keras-cv/keras_cv/datasets/imagenet/load.py

Line 113 in e607e05

return dataset

enable prefetching of a few batches so that the accelerator doesn't have to wait by using dataset.prefetch(tf.data.AUTOTUNE).

The text was updated successfully, but these errors were encountered:

bhack · 2022-08-30T18:47:12Z

I agree, honestly I didn't understand the raptly/rationale to my question at #735 (comment)

sebastian-sz · 2022-08-31T06:16:14Z

Agreed with TFDS approach for simplicity.

I think it's also possible to use local path instead of GCS bucket.

sayakpaul · 2022-08-31T06:19:26Z

Agreed with TFDS approach for simplicity.

I think it's also possible to use local path instead of GCS bucket.

Yes, it's possible. However, keeping things inside a GCS Bucket is necessary to leverage TPU-based training runs. So, it kind of solves different purposes.

bhack · 2022-09-09T18:16:03Z

#774

tanzhenyu · 2022-10-27T16:50:25Z

tfds still requires you to download the dataset manually. Are you referring to the process of converting from .tar.gz to TFRecords?

github-actions · 2024-02-13T01:47:35Z

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions · 2024-02-28T01:47:28Z

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

tanzhenyu added the stat: awaiting external input waiting for others to respond label Oct 27, 2022

sachinprasadhs added stat:awaiting response from contributor and removed stat: awaiting external input waiting for others to respond labels Jan 29, 2024

github-actions bot added the stale label Feb 13, 2024

github-actions bot closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving ImageNet-1k support #749

Improving ImageNet-1k support #749

sayakpaul commented Aug 30, 2022 •

edited

bhack commented Aug 30, 2022 •

edited

sebastian-sz commented Aug 31, 2022

sayakpaul commented Aug 31, 2022

bhack commented Sep 9, 2022

tanzhenyu commented Oct 27, 2022

github-actions bot commented Feb 13, 2024

github-actions bot commented Feb 28, 2024

Improving ImageNet-1k support #749

Improving ImageNet-1k support #749

Comments

sayakpaul commented Aug 30, 2022 • edited

General recommendations

bhack commented Aug 30, 2022 • edited

sebastian-sz commented Aug 31, 2022

sayakpaul commented Aug 31, 2022

bhack commented Sep 9, 2022

tanzhenyu commented Oct 27, 2022

github-actions bot commented Feb 13, 2024

github-actions bot commented Feb 28, 2024

sayakpaul commented Aug 30, 2022 •

edited

bhack commented Aug 30, 2022 •

edited