Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQuAD training issue #15

Closed
dkurt opened this issue Jun 28, 2021 · 1 comment
Closed

SQuAD training issue #15

dkurt opened this issue Jun 28, 2021 · 1 comment

Comments

@dkurt
Copy link

dkurt commented Jun 28, 2021

source: https://github.com/microsoft/fastformers/tree/main/examples/question-answering

python run_squad.py \
  --model_type bert \
  --model_name_or_path bert-base-uncased \
  --do_train \
  --do_eval \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /tmp/debug_squad/

gives

06/28/2021 15:44:54 - WARNING - transformers.tokenization_utils_base -   Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.
06/28/2021 15:44:54 - WARNING - transformers.tokenization_utils_base -   Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 142, in squad_convert_example_to_features
    return_token_type_ids=True,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1521, in encode_plus
    **kwargs,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 372, in _encode_plus
    verbose=verbose,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 578, in _prepare_for_model
    stride=stride,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 675, in truncate_sequences
    assert len(ids) > num_tokens_to_remove
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run_squad.py", line 827, in <module>
    main()
  File "run_squad.py", line 765, in main
    train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
  File "run_squad.py", line 459, in load_and_cache_examples
    threads=args.threads,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/transformers/data/processors/squad.py", line 331, in squad_convert_examples_to_features
    disable=not tqdm_enabled,
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 325, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/home/sdp/.pyenv/versions/3.7.10/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
AssertionError

/cc @ykim362

@ykim362
Copy link
Member

ykim362 commented Jul 30, 2021

Hi @dkurt. Thanks for your interest in fastformers.

This repository is for showing how to use the models presented in the fastformers paper as a guidance.
So, we don't fully support broad tasks from the transformers library.
Please feel free to modify and adopt the methods in the example directory for your task.

To avoid any confusion, the codebase is now cleaned up and those non-supported examples are removed from the repository.

@ykim362 ykim362 closed this as completed Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants