Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Personality-Captions Dataset #3738

Closed
Neha10252018 opened this issue Jun 22, 2021 · 56 comments
Closed

Personality-Captions Dataset #3738

Neha10252018 opened this issue Jun 22, 2021 · 56 comments
Labels

Comments

@Neha10252018
Copy link

I have downloaded the dataset, but when i go through the data set training dataset has one caption per image, where as testing has 5 different columns, can someone tell me in the final output of the paper you guys have shown that one image has 5 different personality trait outputs, but from the dataset i can see that there is only one comment and personality trait per image, How is it possible to get 5 different output for a single image, can someone please explain

@Neha10252018
Copy link
Author

Please can the author respond to it( may be @klshuster ) please

@klshuster
Copy link
Contributor

Hi there, you are correct that the training and validation splits only have 1 caption per image, whereas the test set has 5 captions for image; the test set was collected this way such that reference BLEU scores could be computed.

If you're referring to Table 6 in the paper, those are generated outputs, where the model outputs a response conditioned on the listed personality. This table demonstrates the flexibility of the model (and the efficacy of personality conditioning)

@Neha10252018
Copy link
Author

Hi Thanks for your Response, So its not the compulsion that we need to get 5 outputs for a single Image?

like 1 image with 1 personality trait and its caption has to be the out put in general is what your are meaning?

@klshuster
Copy link
Contributor

i am not sure I understand the question, could you please elaborate?

@Neha10252018
Copy link
Author

well what my doubt was are the dataset designed in such a way that we need to compulsorily get 5 output for each image?

Or one trait and caption per image will also be okay?

@Neha10252018
Copy link
Author

so if i am implementing my model i am getting the output as below is that correct?

image

@Neha10252018
Copy link
Author

Also from the training set can i ignore the columns candidates and 500 candidates and carryout my work? will that affect anything?

@klshuster
Copy link
Contributor

  1. You only need to output one caption per image. The 5 captions on the test set are used for computing automated metrics. The additional captions on images in the paper are examples from the model with different personalities at inference time. They are merely showing that you can control the model.
  2. The candidates and 500_candidates fields are used for response selection (retrieval-based models). They are used to determine retrieval-based metrics as specified in the paper (R@1). If you are training a generative model, you do not need to worry about them

Hope that answers your questions

@Neha10252018
Copy link
Author

so if i am implementing my model i am getting the output as below is that correct?

image

So is this output correct as per the dataset?

@klshuster
Copy link
Contributor

yes!

@Neha10252018
Copy link
Author

is there a way i can get your complete code to check how it is working? i mean any link to the complete code?

@klshuster
Copy link
Contributor

this project page details how to use the dataset within ParlAI: https://parl.ai/projects/personality_captions/

@PineappleWill
Copy link

I have downloaded the dataset, but when i go through the data set training dataset has one caption per image, where as testing has 5 different columns, can someone tell me in the final output of the paper you guys have shown that one image has 5 different personality trait outputs, but from the dataset i can see that there is only one comment and personality trait per image, How is it possible to get 5 different output for a single image, can someone please explain

Hi, could you please tell me how to download the personality-caption dataset? I can't find any clue in ParlAI

@klshuster
Copy link
Contributor

Hi, could you please tell me how to download the personality-caption dataset? I can't find any clue in ParlAI

Hi @PineappleWill, as mentioned in the linked project page, the dataset can be accessed via -t personality_captions using ParlAI. As is also mentioned in the project page, you can take a look at the ParlAI Quick-Start page to understand how to setup and use ParlAI. Specifically, once you've installed ParlAI, simply run parlai display_data -t personality_captions; this will download the dataset for you.

@Neha10252018
Copy link
Author

Should we also divide image dataset to train test and validate? becoz we only have captions files as test train and validate in .json format

@klshuster
Copy link
Contributor

the data entries in the json files include fields for the image id corresponding to the relevant image. The images are indeed unique by split

@Neha10252018
Copy link
Author

so you mean no need to split the images again? when i downloaded the data set i got 2 folders one is personality_captions and the other one is yfcc_images.

@Neha10252018
Copy link
Author

or you mean like i need to create a seperate test train and val folder for images taking its ID's from the json files?

Or should i just train the train folder? Please can you clear this doubt.

@klshuster
Copy link
Contributor

the yfcc images folder has all of the images. the splits of the images are within the json files. You will need to look at the json files to determine which images correspond to which split.

@Neha10252018
Copy link
Author

so are you meaning like i need to create different folders again?

@klshuster
Copy link
Contributor

how are you interacting with the dataset? if you are using parlai you don't need to create folders, it's all handled within the code.

If you are using the dataset outside of ParlAI, I can't really help you as I do not know what system you are using

@Neha10252018
Copy link
Author

i am working on google collab

@Neha10252018
Copy link
Author

outside ParlAi

@Neha10252018
Copy link
Author

i am getting below error when i run the code

image

@Neha10252018
Copy link
Author

@klshuster could you please let me know

@klshuster
Copy link
Contributor

we don't support windows so I am unable to help you out with this

@Neha10252018
Copy link
Author

if you can have a look at that image i am inside ParlAi environment so is that anything you can help?

@mojtaba-komeili
Copy link
Contributor

The error seems to be coming from urlib3 version mismatch. Have you tried using pip to install the particular version that it needs directly?

@Neha10252018
Copy link
Author

@mojtaba-komeili thanks for responding but which version i need to check and how?

@mojtaba-komeili
Copy link
Contributor

Based on this I say anything 1.* that is higher than 1.25.9 should work (maybe 1.26.6).

@Neha10252018
Copy link
Author

  1. You only need to output one caption per image. The 5 captions on the test set are used for computing automated metrics. The additional captions on images in the paper are examples from the model with different personalities at inference time. They are merely showing that you can control the model.
  2. The candidates and 500_candidates fields are used for response selection (retrieval-based models). They are used to determine retrieval-based metrics as specified in the paper (R@1). If you are training a generative model, you do not need to worry about them

Hope that answers your questions

@klshuster As you mentioned in the first point regarding additional captions in the paper was an example to show how to control the model, may I know from where those 5 different captions for the same images taken?, as per the dataset one image has one caption so a bit confused, Could you please let me know in detail

@Neha10252018
Copy link
Author

@klshuster Please can you respond to this last question so it will be helpful to me for my project to carryon

@Neha10252018
Copy link
Author

Can someone please respond it would be of great help @stephenroller

@klshuster
Copy link
Contributor

these are captions from the retrieval model during inference. the candidate set is all utterances from the training set. the model is given a test image and the shown personality, and then is asked to retrieve a relevant response (from the training utterances).

@Neha10252018
Copy link
Author

@klshuster I am really sorry could you please elaborate a bit as it s a bit confusing and tricky

@klshuster
Copy link
Contributor

Are you familiar with how dialogue retrieval models work? A retrieval model is given a personality and an image and is asked to generate an answer. The model is a retrieval-based model, not a generative model. That means the model scores a set of candidate sentences and returns the highest scoring sentence as its response.

The model needs a set of utterances from which to choose for its response. We take all human utterances from the training set of Personality-Captions and allow the model to select a response from this set. Because the model was trained on several images with several personalities, the model can select several different top responses for the same image, if the personality input is varied (Happy, Sad, Angry, etc.)

@Neha10252018
Copy link
Author

@klshuster yeah i understand that thank you, but really confused regarding that candidates column in validation set and candidates, additional_comments, 500_candidates columns in testing set, could you please let me know where i need to use this becoz training set doesn't have any additional column and the model is trained based on that set only

@klshuster
Copy link
Contributor

I described those additional columns here: #3738 (comment)

If you're training a generative model you don't need to worry about them.

@Neha10252018
Copy link
Author

One last question , for generative model should i use that additional comments column?

Because for training i am giving only 3 columns so.

Also can i concatenate the comment and additional comment to make it a single column and use it for testing?

Same for Retrieval model can i concatenate all the columns(candidates, 500_candidates, comments, additional comments) and make a single comment column?

@klshuster
Copy link
Contributor

what do you mean by concatenating the columns?

@Neha10252018
Copy link
Author

merging the columns and making it as one comment column

@klshuster
Copy link
Contributor

they are separate comments so i am not sure how that would work

  • additional_comments: For the test set, we collected 5 captions (with 5 styles) per image. This is for measuring reference BLEU scores.
  • candidates: For the valid and test set, we evaluate retrieval models by having them rank the 100 captions in this field (1 of the 100 is the gold caption)
  • 500_candidates: For the test set, we have an additional ranking measure where the we place the 5 gold captions in a set of 500 (to mimic the ratio of the 1 in 100 for candidates)

I'm not sure how else I can describe this

@Neha10252018
Copy link
Author

@klshuster may i know on which OS u have run the code pls

@klshuster
Copy link
Contributor

ubuntu

@Neha10252018
Copy link
Author

When i run the below code given by you guys to evaluate the model inside Parlai env then i am getting the below
Creating or loading model
13:59:31 | Opt:
13:59:31 | activation: relu
13:59:31 | adam_alpha: 0.0005
13:59:31 | additional_layer_dropout: 0.2
13:59:31 | additional_layer_text: 1
13:59:31 | aggregate_micro: False
13:59:31 | allow_missing_init_opts: False
13:59:31 | area_under_curve_class: None
13:59:31 | area_under_curve_digits: -1
13:59:31 | attention_dropout: 0.2
13:59:31 | batch_length_range: 5
13:59:31 | batch_sort: False
13:59:31 | batch_sort_cache: none
13:59:31 | batchsize: 500
13:59:31 | bpe_add_prefix_space: None
13:59:31 | bpe_debug: False
13:59:31 | bpe_dropout: None
13:59:31 | bpe_merge: None
13:59:31 | bpe_num_symbols: 30000
13:59:31 | bpe_vocab: None
13:59:31 | context_length: -1
13:59:31 | datapath: C:\Users\nehaj\anaconda3\Lib\site-packages\data
13:59:31 | datasplit: 200k
13:59:31 | datatype: valid
13:59:31 | dict_build_first: True
13:59:31 | dict_class: None
13:59:31 | dict_endtoken: END
13:59:31 | dict_file: C:\Users\nehaj\anaconda3\Lib\site-packages\data\models\personality_captions/transresnet/model.dict
13:59:31 | dict_include_test: False
13:59:31 | dict_include_valid: False
13:59:31 | dict_initpath: None
13:59:31 | dict_language: english
13:59:31 | dict_loaded: True
13:59:31 | dict_lower: False
13:59:31 | dict_max_ngram_size: -1
13:59:31 | dict_maxexs: -1
13:59:31 | dict_maxtokens: -1
13:59:31 | dict_minfreq: 0
13:59:31 | dict_nulltoken: NULL
13:59:31 | dict_starttoken: START
13:59:31 | dict_textfields: text,labels
13:59:31 | dict_tokenizer: re
13:59:31 | dict_unktoken: UNK
13:59:31 | display_examples: False
13:59:31 | download_path: None
13:59:31 | dropout: 0.4
13:59:31 | dynamic_batching: None
13:59:31 | embedding_size: 300
13:59:31 | embedding_type: None
13:59:31 | embeddings_scale: True
13:59:31 | encoder_type: transformer
13:59:31 | eval_batchsize: 18
13:59:31 | evaltask: None
13:59:31 | ffn_size: 1200
13:59:31 | fixed_cands_path: None
13:59:31 | freeze_patience: 2
13:59:31 | hidden_dim: 300
13:59:31 | hide_labels: False
13:59:31 | image_cropsize: 224
13:59:31 | image_features: resnet
13:59:31 | image_features_dim: 2048
13:59:31 | image_mode: resnet152
13:59:31 | image_size: 256
13:59:31 | include_image: True
13:59:31 | include_labels: True
13:59:31 | include_persona: True
13:59:31 | include_personality: True
13:59:31 | include_resnet_features: False
13:59:31 | include_uru_features: False
13:59:31 | init_model: none
13:59:31 | init_opt: None
13:59:31 | is_debug: False
13:59:31 | learn_positional_embeddings: False
13:59:31 | learningrate: 0.0005
13:59:31 | load_embeddings_from: /private/home/samuelhumeau/data/crawl-300d-2M.vec
13:59:31 | load_encoder_from: None
13:59:31 | load_transformer_from: /private/home/samuelhumeau/pretrained/encoder_reddit/redditbest.mdl
13:59:31 | log_every_n_secs: 5.0
13:59:31 | log_keep_fields: all
13:59:31 | loglevel: info
13:59:31 | max_length_sentence: 32
13:59:31 | max_train_time: 17280.0
13:59:31 | metrics: default
13:59:31 | model: projects.personality_captions.transresnet.transresnet:TransresnetAgent
13:59:31 | model_file: C:\Users\nehaj\anaconda3\Lib\site-packages\data\models\personality_captions/transresnet/model
13:59:31 | model_parallel: False
13:59:31 | multitask_weights: [1]
13:59:31 | mutators: None
13:59:31 | n_decoder_layers: -1
13:59:31 | n_encoder_layers: -1
13:59:31 | n_heads: 6
13:59:31 | n_layers: 4
13:59:31 | n_positions: 1000
13:59:31 | n_segments: 0
13:59:31 | no_cuda: False
13:59:31 | num_cands: 100
13:59:31 | num_epochs: -1
13:59:31 | num_examples: -1
13:59:31 | num_layers_all: 2
13:59:31 | num_layers_image_encoder: 1
13:59:31 | num_layers_text_encoder: 1
13:59:31 | num_test_labels: 5
13:59:31 | numthreads: 1
13:59:31 | numworkers: 4
13:59:31 | one_cand_set: False
13:59:31 | output_scaling: 1.0
13:59:31 | override: "{'datatype': 'valid', 'ffn_size': 1200, 'attention_dropout': 0.2, 'relu_dropout': 0.2, 'n_positions': 1000}"
13:59:31 | parlai_home: /private/home/kshuster/ParlAI
13:59:31 | pretrained: True
13:59:31 | pytorch_context_length: -1
13:59:31 | pytorch_datafile:
13:59:31 | pytorch_datapath: None
13:59:31 | pytorch_include_labels: True
13:59:31 | pytorch_preprocess: False
13:59:31 | pytorch_teacher_batch_sort: False
13:59:31 | pytorch_teacher_dataset: None
13:59:31 | pytorch_teacher_task: None
13:59:31 | relu_dropout: 0.2
13:59:31 | report_filename:
13:59:31 | save_after_valid: True
13:59:31 | save_every_n_secs: -1
13:59:31 | save_format: conversations
13:59:31 | share_word_embeddings: True
13:59:31 | short_final_eval: False
13:59:31 | show_advanced_args: False
13:59:31 | shuffle: False
13:59:31 | starttime: May02_13-51
13:59:31 | task: personality_captions
13:59:31 | tensorboard_comment:
13:59:31 | tensorboard_log: False
13:59:31 | tensorboard_logdir: None
13:59:31 | tensorboard_metrics: None
13:59:31 | tensorboard_tag: None
13:59:31 | truncate: 32
13:59:31 | use_provided_candidates: True
13:59:31 | validation_cutoff: 1.0
13:59:31 | validation_every_n_epochs: 1
13:59:31 | validation_every_n_secs: -1
13:59:31 | validation_max_exs: -1
13:59:31 | validation_metric: accuracy
13:59:31 | validation_metric_mode: max
13:59:31 | validation_patience: 5
13:59:31 | validation_share_agent: False
13:59:31 | variant: aiayn
13:59:31 | verbose: False
13:59:31 | world_logs:
13:59:31 | Evaluating task personality_captions using datatype valid.
13:59:31 | creating task(s): personality_captions
Please confirm that you have obtained permission to work with the YFCC100m dataset, as outlined by the steps listed at https://multimediacommons.wordpress.com/yfcc100m-core-dataset/ [Y/y]: y
NOTE: This script will download each image individually from the s3 server on which the images are hosted. This will take a very long time. Are you sure you would like to continue? [Y/y]: y
[downloading images to C:\Users\nehaj\anaconda3\Lib\site-packages\data\yfcc_images]

And once this is done it just keeps asking me the below

Please confirm that you have obtained permission to work with the YFCC100m dataset, as outlined by the steps listed at https://multimediacommons.wordpress.com/yfcc100m-core-dataset/ [Y/y]: y
NOTE: This script will download each image individually from the s3 server on which the images are hosted. This will take a very long time. Are you sure you would like to continue? [Y/y]: y
[downloading images to C:\Users\nehaj\anaconda3\Lib\site-packages\data\yfcc_images]
0%| | 0/201858 [00:00<?, ?it/s]14:14:32 | Of 201858 items, 201803 already existed; only going to download 55 items.
100%|███████████████████████████████████████████████████████████████████████▉| 201803/201858 [00:21<00:00, 9417.55it/s]14:14:32 | Going to download 1 chunks with 100 images per chunk using 32 processes.
Downloading: 100%|██████████████████████████████████████████████████████████▉| 201803/201858 [00:40<00:00, 9417.55it/s]14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac80c5633d76c27b352ee6352ddbb3.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac88d66ad654f2739bbfdfbe55c2bdb.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8b1beeb050fa26b970dc2fee5ef539.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac81b9680ba5cab4436ad2528555da.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8788acffc6226816827a943e69bbf.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8d98a8fab765d92a678295fe9b2d.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac866ff82c6319994c8568a91f8aa2a.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac845e9d3081d9415d8a4b49c7dca7.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8aaf341c293ed43ce33358c4801c.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8b31a49a2ed5375724ad7e8fd80ff.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac827b33adaaefb430ce867c4fec7732.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac84b4dc481dd318ae3977a755bd3742.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac83a308c357da220a394e7164da956.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8cfa458c978d903e87459ffd66c2a.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac84b63a2b27ddc08dd7e769593b25c2.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac83a907bc318b96b466179876ad093.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac854f4a9d99e34ec5223b991aa2c887.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac80835da2c2e5f021dd63ed56d0be93.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8a4530c32027b32d52bc899697d8.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8ea6f73a10a31c3d4920c174fe96.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac882dd35d8bf3cee5efeddbe1f399.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8d22b10c9dee47015761898ae75.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8ba82a7ae6d761e1d3582dddbdecdf.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac87681e52e3a709e48cb40ce18bedf.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac827d5424278a62aa6aad18d439a.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac83f9f225f695c2a633d44dcbbce55.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac833f723b92e13b6e314d644f76837.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8a241d2041cb26eba548ec1e7d128.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac849ffe6d25ee9bb21787a39c1926.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8287a425804c7c8cc56ca590de1435.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8567fbe08f7d825bf47ea6846693dc.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac85d4417242b9a36c3f45a6a32a138.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8bcdc098f3698665b91ab9146cc3.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8e6eca5713ee25f631e79d9a3355ac.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac817cd3ccfec3358265dee15ec616af.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac84ecd0fc1f3e17772d8f561a11add.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac822b755268b2a6ce231cc0e1ad588.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac83cdb55a9e6a79ecc879451dacb3.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8d7d1ce0f32d1e7ee4d838c9b1b94.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac80a9f51a66169dda8eec89cda2a289.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8ddea4829c7eba37ac0c81a7ef634.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac83184fd40c475501ef12160eefa1c.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac85a84da55bfb3497f038822344596d.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac86f15b386ea87d3d240fac81f166c.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8dd7f5795743061f480a56aec7c97.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac841340786841ee7e665051fe58d9.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8176a2fb143c79c22488d104ece72.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac89925318fa67f3e018da2547bbe2.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac87838e7846735b27d54a7d5dbc4ee.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8628ffeed36884336ceab1586cb1.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac81db90ac691dfdd275b2e6ec299ca4.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8f9ff369308ac4d3643d3114c6718b.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac8ee3225ea20433642b347f8fa8d81.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac83d37bf87d21f31bcbc3c3f7714f99.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:15:19 | ←[31mBad download - chunk: 0, dest_file: ac85ef996efd9aed1f91293c1552e9e5.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
Downloading: 100%|███████████████████████████████████████████████████████████| 201858/201858 [01:08<00:00, 2933.63it/s]
14:15:19 | Of 55 items attempted downloading, 55 had errors.
Please confirm that you have obtained permission to work with the YFCC100m dataset, as outlined by the steps listed at https://multimediacommons.wordpress.com/yfcc100m-core-dataset/ [Y/y]: y
NOTE: This script will download each image individually from the s3 server on which the images are hosted. This will take a very long time. Are you sure you would like to continue? [Y/y]: y
[downloading images to C:\Users\nehaj\anaconda3\Lib\site-packages\data\yfcc_images]
0%| | 0/201858 [00:00<?, ?it/s]14:15:56 | Of 201858 items, 201803 already existed; only going to download 55 items.
100%|███████████████████████████████████████████████████████████████████████▉| 201803/201858 [00:22<00:00, 9072.63it/s]14:15:56 | Going to download 1 chunks with 100 images per chunk using 32 processes.
Downloading: 100%|██████████████████████████████████████████████████████████▉| 201803/201858 [00:39<00:00, 9072.63it/s]14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac80c5633d76c27b352ee6352ddbb3.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac88d66ad654f2739bbfdfbe55c2bdb.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8b1beeb050fa26b970dc2fee5ef539.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac81b9680ba5cab4436ad2528555da.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8788acffc6226816827a943e69bbf.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8d98a8fab765d92a678295fe9b2d.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac866ff82c6319994c8568a91f8aa2a.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac845e9d3081d9415d8a4b49c7dca7.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8aaf341c293ed43ce33358c4801c.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8b31a49a2ed5375724ad7e8fd80ff.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac827b33adaaefb430ce867c4fec7732.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac84b4dc481dd318ae3977a755bd3742.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac83a308c357da220a394e7164da956.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8cfa458c978d903e87459ffd66c2a.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac84b63a2b27ddc08dd7e769593b25c2.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac83a907bc318b96b466179876ad093.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac854f4a9d99e34ec5223b991aa2c887.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac80835da2c2e5f021dd63ed56d0be93.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8a4530c32027b32d52bc899697d8.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8ea6f73a10a31c3d4920c174fe96.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac882dd35d8bf3cee5efeddbe1f399.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8d22b10c9dee47015761898ae75.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8ba82a7ae6d761e1d3582dddbdecdf.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac87681e52e3a709e48cb40ce18bedf.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac827d5424278a62aa6aad18d439a.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac83f9f225f695c2a633d44dcbbce55.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac833f723b92e13b6e314d644f76837.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8a241d2041cb26eba548ec1e7d128.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac849ffe6d25ee9bb21787a39c1926.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8287a425804c7c8cc56ca590de1435.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8567fbe08f7d825bf47ea6846693dc.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac85d4417242b9a36c3f45a6a32a138.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8bcdc098f3698665b91ab9146cc3.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8e6eca5713ee25f631e79d9a3355ac.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac817cd3ccfec3358265dee15ec616af.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac84ecd0fc1f3e17772d8f561a11add.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac822b755268b2a6ce231cc0e1ad588.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac83cdb55a9e6a79ecc879451dacb3.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8d7d1ce0f32d1e7ee4d838c9b1b94.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac80a9f51a66169dda8eec89cda2a289.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8ddea4829c7eba37ac0c81a7ef634.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac83184fd40c475501ef12160eefa1c.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac85a84da55bfb3497f038822344596d.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac86f15b386ea87d3d240fac81f166c.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8dd7f5795743061f480a56aec7c97.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac841340786841ee7e665051fe58d9.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8176a2fb143c79c22488d104ece72.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac89925318fa67f3e018da2547bbe2.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac87838e7846735b27d54a7d5dbc4ee.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8628ffeed36884336ceab1586cb1.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac81db90ac691dfdd275b2e6ec299ca4.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8f9ff369308ac4d3643d3114c6718b.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac8ee3225ea20433642b347f8fa8d81.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac83d37bf87d21f31bcbc3c3f7714f99.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
14:16:44 | ←[31mBad download - chunk: 0, dest_file: ac85ef996efd9aed1f91293c1552e9e5.jpg, http status code: 404, error_msg: [Response not OK] Response: <Response [404]>←[0m
Downloading: 100%|███████████████████████████████████████████████████████████| 201858/201858 [01:10<00:00, 2851.83it/s]
14:16:44 | Of 55 items attempted downloading, 55 had errors.
Please confirm that you have obtained permission to work with the YFCC100m dataset, as outlined by the steps listed at https://multimediacommons.wordpress.com/yfcc100m-core-dataset/ [Y/y]:

@Neha10252018
Copy link
Author

@klshuster could you please help me with this

@klshuster
Copy link
Contributor

what happens if you enter y at the prompt? there are a few images with broken download links (we don't host the YFCC images) so this is not unexpected

@Neha10252018
Copy link
Author

the same things happens i.e it gets broken all the time and i need to again enter y and this repeats

@Neha10252018
Copy link
Author

When i run the interactive session i am getting the below error

(base) C:\Users\nehaj\anaconda3\Parlai_Project\ParlAI-ce02a0eb9e4d8bf38377d0908ed7bd3b47d7ab2a\projects\personality_captions>python interactive.py -mf models:personality_captions/transresnet/model
C:\Users\nehaj\anaconda3\lib\site-packages\torchvision\transforms\transforms.py:310: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
11:21:28 | ←[33mOverriding opt["n_positions"] to 1000 (previously: None)←[0m
11:21:28 | loading dictionary from C:\Users\nehaj\anaconda3\Lib\site-packages\data\models\personality_captions/transresnet/model.dict
11:21:28 | num words = 250006
Creating or loading model
Traceback (most recent call last):
File "interactive.py", line 287, in
setup_interactive()
File "interactive.py", line 282, in setup_interactive
SHARED['agent'] = create_agent(opt, requireModelExists=True)
File "C:\Users\nehaj\anaconda3\lib\site-packages\parlai\core\agents.py", line 402, in create_agent
model = create_agent_from_opt_file(opt)
File "C:\Users\nehaj\anaconda3\lib\site-packages\parlai\core\agents.py", line 355, in create_agent_from_opt_file
return model_class(opt_from_file)
File "C:\Users\nehaj\anaconda3\lib\site-packages\projects\personality_captions\transresnet\transresnet.py", line 105, in init
self._setup_cands()
File "C:\Users\nehaj\anaconda3\lib\site-packages\projects\personality_captions\transresnet\transresnet.py", line 154, in _setup_cands
self.fixed_cands = [c.replace('\n', '') for c in f.readlines()]
File "C:\Users\nehaj\anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6742: character maps to

@klshuster
Copy link
Contributor

it seems the candidates file is corrupted

@Neha10252018
Copy link
Author

Sorry then how can i proceed

@klshuster
Copy link
Contributor

what is the value of the --fixed-cands-path parameter? that is the file that is corrupted

@Neha10252018
Copy link
Author

Sorry but not able to find that

@Neha10252018
Copy link
Author

@klshuster one last question, even for generative mode while testing ,we need to input an image and a personality trait and the model should generate the caption right

@klshuster
Copy link
Contributor

that is correct

@github-actions
Copy link

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants