Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode error while running evaluation on tiny-dev dataset #9

Closed
graviraja opened this issue May 3, 2019 · 4 comments
Closed

Unicode error while running evaluation on tiny-dev dataset #9

graviraja opened this issue May 3, 2019 · 4 comments

Comments

@graviraja
Copy link

I have downloaded the tiny-dev dataset, preprocessed data and the pretrained model. while running evaluation code using the following command

python -m language.question_answering.bert_joint.run_nq \
  --logtostderr \
  --bert_config_file=bert-joint-baseline/bert_config.json \
  --vocab_file=bert-joint-baseline/vocab-nq.txt \
  --predict_file=tiny-dev/nq-dev-sample.no-annot.jsonl.gz \
  --init_checkpoint=bert-joint-baseline/bert_joint.ckpt \
  --do_predict \
  --output_dir=bert_model_output

it is throwing following error.

naturalqa_error

@g-vallejo
Copy link

g-vallejo commented May 15, 2019

@graviraja did you manage to solve this issue? This is because you're probably using python3. I got over it changing a couple of lines, there are other python3 issues. For this one just replace "r" with "rb" in
lines 865 and 867. The reason why is here. I meant in run_nq.py

@graviraja
Copy link
Author

@gisvl yes i solved it. Thank you for your input.

@renatoviolin
Copy link

@graviraja How did you solve the "TypeError: unsupported operand type(s) for +: 'zip' and 'list'" ?
I tried to cast to list, but fails inside sorted.
I remove the sorted and fails at end predictions.

@graviraja
Copy link
Author

graviraja commented Jun 21, 2019

@renatoviolin convert them into list first and then zip it. Provide the key in sorted function. I modified the compute_pred_dict function as follows.

    raw_results_by_id = [(int(res["unique_id"] + 1), dict(res)) for res in raw_results]

    # Cast example id to int32 for each example, similarly to the raw results.
    sess = tf.Session()
    all_candidates = candidates_dict.items()
    example_ids = tf.to_int32(np.array([int(k) for k, _ in all_candidates
                                        ])).eval(session=sess)
    examples_by_id = list(zip(example_ids, all_candidates))

    # Cast unique_id also to int32 for features.
    feature_ids = []
    features = []
    for f in dev_features:
        feature_ids.append(f.features.feature["unique_ids"].int64_list.value[0] + 1)
        features.append(dict(f.features.feature))
    feature_ids = tf.to_int32(np.array(feature_ids)).eval(session=sess)
    features_by_id = list(zip(feature_ids, features))

    # Join examplew with features and raw results.
    examples = []
    merged = sorted(examples_by_id + raw_results_by_id + features_by_id, key=lambda x: x[0])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants