Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast' #2133

Closed
cyk1337 opened this issue May 14, 2020 · 3 comments
Closed

ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast' #2133

cyk1337 opened this issue May 14, 2020 · 3 comments
Labels

Comments

@cyk1337
Copy link

cyk1337 commented May 14, 2020

Hi I was trying to run my code on the source code repo, everything seems fine until run into the batch_by_size function at line 220 in fairseq/data/data_utils.py:

from fairseq.data.data_utils_fast import batch_by_size_fast

The block error occurs:

def batch_by_size(
    indices, num_tokens_fn, max_tokens=None, max_sentences=None,
    required_batch_size_multiple=1,
):
    """
    Yield mini-batches of indices bucketed by size. Batches may contain
    sequences of different lengths.

    Args:
        indices (List[int]): ordered list of dataset indices
        num_tokens_fn (callable): function that returns the number of tokens at
            a given index
        max_tokens (int, optional): max number of tokens in each batch
            (default: None).
        max_sentences (int, optional): max number of sentences in each
            batch (default: None).
        required_batch_size_multiple (int, optional): require batch size to
            be a multiple of N (default: 1).
    """
    try:
        from fairseq.data.data_utils_fast import batch_by_size_fast
    except ImportError:
        raise ImportError(
            'Please build Cython components with: `pip install --editable .` '
            'or `python setup.py build_ext --inplace`'
        )

    max_tokens = max_tokens if max_tokens is not None else -1
    max_sentences = max_sentences if max_sentences is not None else -1
    bsz_mult = required_batch_size_multiple

    if isinstance(indices, types.GeneratorType):
        indices = np.fromiter(indices, dtype=np.int64, count=-1)

    return batch_by_size_fast(indices, num_tokens_fn, max_tokens, max_sentences, bsz_mult)

Code

The reported error is as follows:

2020-05-15 01:02:23 | INFO | fairseq_cli.train | model default-captioning-arch, criterion LabelSmoothedCrossEntropyCriterion
2020-05-15 01:02:23 | INFO | fairseq_cli.train | num. model params: 45776896 (num. trained: 45776896)
2020-05-15 01:02:24 | INFO | fairseq_cli.train | training on 4 GPUs
2020-05-15 01:02:24 | INFO | fairseq_cli.train | max tokens per GPU = 4096 and max sentences per GPU = None
2020-05-15 01:02:24 | INFO | fairseq.trainer | no existing checkpoint found .checkpoints/checkpoint_last.pt
2020-05-15 01:02:24 | INFO | fairseq.trainer | loading train data for epoch 1
2020-05-15 01:02:24 | INFO | fairseq.data.data_utils | loaded 566747 examples from: output/train-captions.en
<!-- before everthing's fine -->
Traceback (most recent call last):
  File "/home/c/Cpt/fair/main.py", line 37, in <module>
    train()
  File "/home/c/Cpt/fair/main.py", line 33, in train
    cli_main()
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 355, in cli_main
    nprocs=args.distributed_world_size,
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 220, in batch_by_size
    from fairseq.data.data_utils_fast import batch_by_size_fast
ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/c/miniconda3/envs/fa/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 324, in distributed_main
    main(args, init_distributed=True)
  File "/home/c/Cpt/fair/fairseq_cli/train.py", line 104, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "/home/c/Cpt/fair/fairseq/checkpoint_utils.py", line 157, in load_checkpoint
    epoch=1, load_dataset=True, **passthrough_args
  File "/home/c/Cpt/fair/fairseq/trainer.py", line 296, in get_train_iterator
    epoch=epoch
  File "/home/c/Cpt/fair/fairseq/tasks/fairseq_task.py", line 181, in get_batch_iterator
    required_batch_size_multiple=required_batch_size_multiple,
  File "/home/c/Cpt/fair/fairseq/data/data_utils.py", line 223, in batch_by_size
    'Please build Cython components with: `pip install --editable .` '
ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`

What I have tried?

  1. Following the tips to install Cython from all kinds of sources (build from source, pip, conda) -> did not work.
  2. Found that data/data_utils_fast.pyx is a Cython file. Thus tried:
import pyximport
pyximport.install()

Also failed :(

Similar to Issue 1376

My environment?

  • fairseq Version (master):
  • PyTorch Version (1.4)
  • OS (Ubuntu 16.04 LST):
  • How you installed fairseq: source, did not install.
  • Build command you used (if compiling from source): no compile.
  • Python version: 3.7
  • CUDA/cuDNN version: 10.1
  • GPU models and configuration: Everything's fine since it runs correctly under installed fairseq library.
  • Any other relevant information: I just want to run upon the source code without manually building it.
@myleott
Copy link
Contributor

myleott commented May 14, 2020

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

@cyk1337
Copy link
Author

cyk1337 commented May 15, 2020

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

Hi Myle, yes, it perfectly works now! I appreciate your response!

@VJJJJJJ1
Copy link

You cloned fairseq master right? Then you should also run python setup.py build_ext --inplace from the root fairseq directory to build the Cython components.

I tried, but it didn't work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants