SQuAD training issue #15

dkurt · 2021-06-28T12:50:49Z

source: https://github.com/microsoft/fastformers/tree/main/examples/question-answering

python run_squad.py \
  --model_type bert \
  --model_name_or_path bert-base-uncased \
  --do_train \
  --do_eval \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

gives

06/28/2021 15:44:54 - WARNING - transformers.tokenization_utils_base -   Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.
06/28/2021 15:44:54 - WARNING - transformers.tokenization_utils_base -   Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 142, in squad_convert_example_to_features
    return_token_type_ids=True,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1521, in encode_plus
    **kwargs,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 372, in _encode_plus
    verbose=verbose,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 578, in _prepare_for_model
    stride=stride,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 675, in truncate_sequences
    assert len(ids) > num_tokens_to_remove
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_squad.py", line 827, in <module>
    main()
  File "run_squad.py", line 765, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
  File "run_squad.py", line 459, in load_and_cache_examples
    threads=args.threads,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 331, in squad_convert_examples_to_features
    disable=not tqdm_enabled,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 325, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
AssertionError

/cc @ykim362

The text was updated successfully, but these errors were encountered:

ykim362 · 2021-07-30T16:14:12Z

Hi @dkurt. Thanks for your interest in fastformers.

This repository is for showing how to use the models presented in the fastformers paper as a guidance.
So, we don't fully support broad tasks from the transformers library.
Please feel free to modify and adopt the methods in the example directory for your task.

To avoid any confusion, the codebase is now cleaned up and those non-supported examples are removed from the repository.

ykim362 closed this as completed Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQuAD training issue #15

SQuAD training issue #15

dkurt commented Jun 28, 2021 •

edited

Loading

ykim362 commented Jul 30, 2021

SQuAD training issue #15

SQuAD training issue #15

Comments

dkurt commented Jun 28, 2021 • edited Loading

ykim362 commented Jul 30, 2021

dkurt commented Jun 28, 2021 •

edited

Loading