Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to download GRIT-20M dataset? #11326

Open
stihuangyuan opened this issue Dec 31, 2023 · 2 comments
Open

how to download GRIT-20M dataset? #11326

stihuangyuan opened this issue Dec 31, 2023 · 2 comments
Assignees

Comments

@stihuangyuan
Copy link

how to download GRIT-20M dataset?

@hhaAndroid
Copy link
Collaborator

In these few days, we will prepare the documents and thank you for your attention.

@lluo-Desktop
Copy link

In these few days, we will prepare the documents and thank you for your attention.

Hi @hhaAndroid
In GRIT-20M dataset download script, image size is 256(--image_size 256). This image size is samll to model input size. Do you have any suggests for download parameters?

img2dataset --url_list /path/to/GRIT_dataset/grit-20m --input_format "parquet"\
    --url_col "url" --caption_col "caption" --output_format webdataset \
    --output_folder /tmp/grit --processes_count 4 --thread_count 64 --image_size 256 \
    --resize_only_if_bigger=True --resize_mode="keep_ratio" --skip_reencode=True \
    --save_additional_columns '["id","noun_chunks","ref_exps","clip_similarity_vitb32","clip_similarity_vitl14"]' \
    --enable_wandb False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants