How to get the Personality-Captions dataset? #1704
Comments
|
On trying to execute the above command I got an error: |
You need to run it from within the root of the ParlAI directory. Alternatively, if you've installed properly, you can:
|
… On Thu, Jul 4, 2019 at 11:20 AM Stephen Roller ***@***.***> wrote:
You need to run it from within the root of the ParlAI directory.
Alternatively, if you've installed properly, you can:
python -m parlai.scripts.display_data -t personality_captions
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1704?email_source=notifications&email_token=ACUOJ6H2B7IXV2O3TREQDQ3P5YINHA5CNFSM4HMDV2XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZHVWTI#issuecomment-508517197>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACUOJ6GRIH7V65UAID75FMDP5YINHANCNFSM4HMDV2XA>
.
|
Does it mean that 1 image is trained with 1 caption? |
Each image in the training set has one corresponding caption - there are roughly 186k image/caption/personality triples in the train set. The test set has 10k images, each with 5 captions based on one personality. |
Thank you, Sir. I am facing a little difficulty in downloading the dataset, I am getting this error: raise ReadTimeout(e, request=request) |
I'm not sure much how we can help you there; we're not associated with the multimedia-commons servers and they seem to load fine for us. Maybe your system has something different about SSL rootcerts or a firewall? Can you try running this from your machine:
|
I tried the above command but I am facing the same issue. |
There’s nothing I can do to help you with that; there’s something larger wrong with your machine and its internet setup or similar. |
Are there any other means by which I can access the dataset, maybe a google drive link? I would be obliged if you could kindly help me out. |
I’m sorry, but that might constitute copyright infringement, so I cannot help there. You’re looking for the YFCC100M dataset; it can probably be found elsewhere. |
Hello Sir, The research paper states that the images for training, test and validation have been randomly selected from the YFCC100M dataset and therefore just downloading that particular dataset will not prove to be beneficial for me. I am interested in using the captions as there are 215 personality traits, it could further really help me with my project. I would be really grateful if I could be helped out. Thank you. |
The examples in the dataset have the corresponding image hash, which you can use to select which images from the YFCC100m dataset; e.g. the first example in the dataset is
This image hash corresponds to the image from the YFCC100m dataset. |
Thank you! I tried downloading the images via: https://github.com/stefanbirkner/yfcc100m-downloader I am not getting allowed to download images directly from: |
That code uses the same servers that Kurt’s code uses. If that’s working for now, then parlai’s download script should work, no? |
I do understand but the same error persists. I have tried it on two different systems, Mac and Ubuntu. |
I have just tested the download script in ParlAI, and it works fine on the servers we work on, and there's been multiple proposed solutions for you. Here's a sample of the output:
The YFCC website says something about needing an AWS account, but I don't know how that's actually enforced. If you've downloaded the YFCC dataset using that other script, then you have the full dataset and the personality captions uses a strict subset, so you have more than you need. Just putting those files into your parlai/data/yfcc_images/ folder should do the trick. I really don't know how to help you more. I suggest manually going to one of those URLs and making sure it loads. |
raise URLError(err) I do not know how to fix this issue. |
Facebook does not own the images, so I cannot redistribute them in any manner. You are responsible for obtaining the images and requisite permission to use them. Thank you for posting the error. Based on a quick google search it looks like there are stack overflow posts discussing a mix of either (1) outdated libraries, but those go back to 2016 and seem unlikely; and (2) issues around proxies. It looks like ParlAI/parlai/core/build_data.py Lines 86 to 93 in f57f5ac
|
On Mac, it is anaconda3 |
Hi @Yukti-09 that isn't quite enough information for us to debug. We've been unable to reproduce these errors on several different environments. I'd suggest looking to the internet to help you debug at this point. |
I have download the personality_captions dataset successfully. But where can I find the 5 reference captions in val.json and test.json? This two files have a lot of sentences for one "image_hash". |
hi there! if you're looking at the raw Hope that helps! |
Thank for the quick reply! I missed 'additional_comments' before, now the data is no problem. |
HI Nick can you please tell me how you have downloaded the data set please |
I want to get this Personality-Captions dataset, but I'm confused about "The Personality-Captions dataset can be accessed via ParlAI, with -t personality_captions" in https://parl.ai/projects/personality_captions/
-t personality_captions ???
Can you tell me the whole command or link???
Thanks a lot!
The text was updated successfully, but these errors were encountered: