This repository was archived by the owner on Nov 1, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 261
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
ValueError: invalid literal for int() with base 10: '' #54
Copy link
Copy link
Open
Description
when I using preprocess pipeline, it shows:
========================================================================= FAILURES =========================================================================
_____________________________________________________ test_run_pipeline_locally_3_langs_with_comments ______________________________________________________
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/zhangjiajie/miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 87, in process
nlines, size_gb = job.result()
File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 263, in result
self._result = self.func(*self.args, **self.kwargs)
File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/dataset.py", line 52, in split_train_test_valid
n_lines = get_nlines(all_tok)
File "/Users/zhangjiajie/Documents/code/TransCoder-main/preprocessing/src/utils.py", line 119, in get_nlines
return int(process.stdout.decode().split(' ')[0])
ValueError: invalid literal for int() with base 10: ''
"""
The above exception was the direct cause of the following exception:
def test_run_pipeline_locally_3_langs_with_comments():
copy_and_clean_folder()
> preprocess(root, lang1, lang2, keep_comments, local=True,
lang3=lang3, test_size=10, size_gb=0)
preprocessing/test_preprocess.py:65:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
preprocessing/preprocess.py:64: in preprocess
dataset.process_languages(
preprocessing/src/dataset.py:166: in process_languages
print(type(jobs[i].result()))
../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:446: in result
return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = None
def __get_result(self):
if self._exception:
try:
> raise self._exception
E ValueError: invalid literal for int() with base 10: ''
../../../miniforge3/envs/mytorch/lib/python3.9/concurrent/futures/_base.py:391: ValueError
------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------
/Users/zhangjiajie/Documents/code/TransCoder-main/data/test_dataset/cpp-java-python.with_comments.XLM-syml
java: process ...
java: tokenizing 2 json files ...
cpp: process ...
cpp: tokenizing 2 json files ...
python: process ...
python: tokenizing 2 json files ...
------------------------------------------------------------------- Captured stderr call -------------------------------------------------------------------
100%|██████████| 50/50 [00:02<00:00, 24.21it/s]
100%|██████████| 50/50 [00:03<00:00, 16.08it/s]]
100%|██████████| 100/100 [00:03<00:00, 26.65it/s]
100%|██████████| 50/50 [00:02<00:00, 22.24it/s]
100%|██████████| 50/50 [00:02<00:00, 24.27it/s]]
100%|██████████| 150/150 [00:02<00:00, 72.05it/s]
Can someone help me
Metadata
Metadata
Assignees
Labels
No labels