Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you upload resources to another cloud like Gdrive, Onedrive or Dropbox? #17

Closed
khiemledev opened this issue Oct 17, 2021 · 5 comments

Comments

@khiemledev
Copy link

In my country, the download speed from Baidu is very slow and I can't down needed resources. Can you please upload them to GDrive, OneDrive, or Dropbox? Thank you!

@luo3300612
Copy link
Owner

Sorry, since the whole file is ~70G. I am not able to afford to upload it to GDrive/OneDrive. But you can follow the Data preparation step. There are 5 keys in my hdf5 feature file. The first three keys can be obtained when extracting region features with extract_region_feature.py. The forth key can be obtained when extracting grid features with code in grid-feats-vqa. The last key can be obtained with align.ipynb.

@khiemledev
Copy link
Author

khiemledev commented Oct 18, 2021

I do some tricks and successful to download the files. Thank you for your reply!

I have another question. Can you please tell me how to produce coco_train_ids.npy, coco_test_ids.npy, and coco_restval_ids.npy files for my own dataset already in COCO format?

@luo3300612
Copy link
Owner

coco_train_ids.npy is a (N,) numpy array where N=the number of images for training. It contains id to specify the image-text pair in captions_train2014.json:

>>>import json
>>>info = json.load(open('captions_train2014.json'))
>>>annotations = info['annotations']
>>>print(annotations[0])
{'image_id': 318556, 'id': 48, 'caption': 'A very clean and well decorated empty bathroom'}

so the image feature is 318556_features/boxes/size/grids/mask, and the corresponding caption is 'A very clean and well decorated empty bathroom'.

However, since the code is highly limited to the COCO dataset, it is recommended to re-write dataset.py for your own dataset.

Or you need to create the hdf5, train/val/test_ids.npy and captions_train2014.json/captions_val2014.json file for your own dataset.

Hope it helps.

@khiemledev
Copy link
Author

It's very helpful. Thank you very much!

@YuigaWada
Copy link

For those who don't have a Baidu account, I created a mirror of the data distributed on Baidu Pan. You can download the data from this link without login.
Use at your own risk :)
(also related to #36 issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants