Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

ValueError: invalid literal for int() with base 10: '' #54

@zjj1999

Description

@zjj1999

when I using preprocess pipeline, it shows:

========================================================================= FAILURES =========================================================================
_____________________________________________________ test_run_pipeline_locally_3_langs_with_comments ______________________________________________________
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/zhangjiajie/miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 87, in process
    nlines, size_gb = job.result()
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 263, in result
    self._result = self.func(*self.args, **self.kwargs)
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 52, in split_train_test_valid
    n_lines = get_nlines(all_tok)
  File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 119, in get_nlines
    return int(process.stdout.decode().split(' ')[0])
ValueError: invalid literal for int() with base 10: ''
"""

The above exception was the direct cause of the following exception:

    def test_run_pipeline_locally_3_langs_with_comments():
        copy_and_clean_folder()
>       preprocess(root, lang1, lang2, keep_comments, local=True,
                   lang3=lang3, test_size=10, size_gb=0)

preprocessing/test_preprocess.py:65: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
preprocessing/preprocess.py:64: in preprocess
    dataset.process_languages(
preprocessing/src/dataset.py:166: in process_languages
    print(type(jobs[i].result()))
../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:446: in result
    return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = None

    def __get_result(self):
        if self._exception:
            try:
>               raise self._exception
E               ValueError: invalid literal for int() with base 10: ''

../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:391: ValueError
------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------
/Users/zhangjiajie/Documents/code/TransCoder-main/data/test_dataset/cpp-java-python.with_comments.XLM-syml
java: process ...
java: tokenizing 2 json files ...
cpp: process ...
cpp: tokenizing 2 json files ...
python: process ...
python: tokenizing 2 json files ...
------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------
100%|██████████| 50/50 [00:02<00:00, 24.21it/s]
100%|██████████| 50/50 [00:03<00:00, 16.08it/s]]
100%|██████████| 100/100 [00:03<00:00, 26.65it/s]
100%|██████████| 50/50 [00:02<00:00, 22.24it/s]
100%|██████████| 50/50 [00:02<00:00, 24.27it/s]]
100%|██████████| 150/150 [00:02<00:00, 72.05it/s] 

Can someone help me

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions