Require OCR-CC information (image IDs) #5

prajwalgatti · 2021-10-04T12:53:11Z

Hello @zyang-ur, and all

Thanks for this work, it is quite interesting.

I'm trying to obtain the OCR-CC dataset but due to my constraints, I can't download the 1.7TB dataset.
However, I have the CC dataset and it would be possible for me to obtain the subset of images that are in OCR-CC.

Could you please share the image IDs of CC that were used to construct OCR-CC?

Thanks in advance!

zyang-ur · 2021-10-05T03:16:20Z

Hi @prajwalgatti ,

Thank you for your interest.

In this case, you could download the index files only, at:
path/to/azcopy copy https://tapvqacaption.blob.core.windows.net/data/data/imdb/cc <local_path>/data --recursive

The "image_name" in the index files are the IDs of CC. Thank you.

zyang-ur closed this as completed Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Require OCR-CC information (image IDs) #5

Require OCR-CC information (image IDs) #5

prajwalgatti commented Oct 4, 2021

zyang-ur commented Oct 5, 2021

Require OCR-CC information (image IDs) #5

Require OCR-CC information (image IDs) #5

Comments

prajwalgatti commented Oct 4, 2021

zyang-ur commented Oct 5, 2021