Unicode error while running evaluation on tiny-dev dataset #9

graviraja · 2019-05-03T06:08:15Z

I have downloaded the tiny-dev dataset, preprocessed data and the pretrained model. while running evaluation code using the following command

python -m language.question_answering.bert_joint.run_nq \
  --logtostderr \
  --bert_config_file=bert-joint-baseline/bert_config.json \
  --vocab_file=bert-joint-baseline/vocab-nq.txt \
  --predict_file=tiny-dev/nq-dev-sample.no-annot.jsonl.gz \
  --init_checkpoint=bert-joint-baseline/bert_joint.ckpt \
  --do_predict \
  --output_dir=bert_model_output

it is throwing following error.

The text was updated successfully, but these errors were encountered:

g-vallejo · 2019-05-15T12:49:41Z

@graviraja did you manage to solve this issue? This is because you're probably using python3. I got over it changing a couple of lines, there are other python3 issues. For this one just replace "r" with "rb" in
lines 865 and 867. The reason why is here. I meant in run_nq.py

graviraja · 2019-05-24T09:23:26Z

@gisvl yes i solved it. Thank you for your input.

renatoviolin · 2019-06-14T20:29:52Z

@graviraja How did you solve the "TypeError: unsupported operand type(s) for +: 'zip' and 'list'" ?
I tried to cast to list, but fails inside sorted.
I remove the sorted and fails at end predictions.

graviraja · 2019-06-21T05:58:41Z

@renatoviolin convert them into list first and then zip it. Provide the key in sorted function. I modified the compute_pred_dict function as follows.

    raw_results_by_id = [(int(res["unique_id"] + 1), dict(res)) for res in raw_results]

    # Cast example id to int32 for each example, similarly to the raw results.
    sess = tf.Session()
    all_candidates = candidates_dict.items()
    example_ids = tf.to_int32(np.array([int(k) for k, _ in all_candidates
                                        ])).eval(session=sess)
    examples_by_id = list(zip(example_ids, all_candidates))

    # Cast unique_id also to int32 for features.
    feature_ids = []
    features = []
    for f in dev_features:
        feature_ids.append(f.features.feature["unique_ids"].int64_list.value[0] + 1)
        features.append(dict(f.features.feature))
    feature_ids = tf.to_int32(np.array(feature_ids)).eval(session=sess)
    features_by_id = list(zip(feature_ids, features))

    # Join examplew with features and raw results.
    examples = []
    merged = sorted(examples_by_id + raw_results_by_id + features_by_id, key=lambda x: x[0])

graviraja closed this as completed May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode error while running evaluation on tiny-dev dataset #9

Unicode error while running evaluation on tiny-dev dataset #9

graviraja commented May 3, 2019

g-vallejo commented May 15, 2019 •

edited

graviraja commented May 24, 2019

renatoviolin commented Jun 14, 2019

graviraja commented Jun 21, 2019 •

edited

Unicode error while running evaluation on tiny-dev dataset #9

Unicode error while running evaluation on tiny-dev dataset #9

Comments

graviraja commented May 3, 2019

g-vallejo commented May 15, 2019 • edited

graviraja commented May 24, 2019

renatoviolin commented Jun 14, 2019

graviraja commented Jun 21, 2019 • edited

g-vallejo commented May 15, 2019 •

edited

graviraja commented Jun 21, 2019 •

edited