You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template) 我已经仔细阅读了 README 上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)
Search before reporting 先搜索,再报告
I have searched the Data-Juicer issues and found no similar bugs. 我已经在 issue列表 中搜索但是没有发现类似的bug报告。
OS 系统
Ubuntu
Installation Method 安装方式
pip
Data-Juicer Version Data-Juicer版本
v0.1.2
Python Version Python版本
3.10
Describe the bug 描述这个bug
video_split_by_duration_mapper这个算子跑一会就会报RuntimeError: can't start new thread
To Reproduce 如何复现
video_split_by_duration_mapper这个算子跑一会就会报RuntimeError: can't start new thread,机器cpu计算资源充足
2024-07-23 20:55:27 | INFO | data_juicer.core.executor:54 - Setting up data formatter...
2024-07-23 20:55:27 | INFO | data_juicer.core.executor:76 - Preparing exporter...
2024-07-23 20:55:27 | INFO | data_juicer.core.executor:153 - Loading dataset from data formatter...
2024-07-23 20:55:27 | INFO | data_juicer.format.formatter:185 - Unifying the input dataset formats...
2024-07-23 20:55:27 | INFO | data_juicer.format.formatter:200 - There are 50000 sample(s) in the original dataset.
Filter (num_proc=4): 100%|##########| 50000/50000 [00:20<00:00, 2433.82 examples/s]
2024-07-23 20:55:48 | INFO | data_juicer.format.formatter:214 - 50000 samples left after filtering empty text.
2024-07-23 20:55:48 | INFO | data_juicer.format.formatter:237 - Converting relative paths in the dataset to their absolute version. (Based on the directory of input dataset file)
Map (num_proc=4): 100%|##########| 50000/50000 [00:18<00:00, 2691.31 examples/s]
2024-07-23 20:56:07 | INFO | data_juicer.format.mixture_formatter:137 - sampled 50000 from 50000
2024-07-23 20:56:07 | INFO | data_juicer.format.mixture_formatter:143 - There are 50000 in final dataset
2024-07-23 20:56:07 | INFO | data_juicer.core.executor:159 - Preparing process operators...
2024-07-23 20:56:07 | INFO | data_juicer.core.executor:166 - Processing data...
video_split_by_duration_mapper_process (num_proc=4): 3%|3 | 1711/50000 [42:31<28:22:59, 2.12s/ examples]moov atom not found
video_split_by_duration_mapper_process (num_proc=4): 11%|# | 5489/50000 [5:39:39<56:01:58, 4.53s/ examples]Exception in thread Thread-1 (accepter):
Traceback (most recent call last):
File "/root/anaconda3/envs/sora/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/sora/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/sora/lib/python3.10/site-packages/multiprocess/managers.py", line 193, in accepter
t.start()
File "/root/anaconda3/envs/sora/lib/python3.10/threading.py", line 935, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Screenshots 截图
Additional 额外信息
No response
The text was updated successfully, but these errors were encountered:
Before Reporting 报告之前
I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template) 我已经仔细阅读了 README 上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)
Search before reporting 先搜索,再报告
OS 系统
Ubuntu
Installation Method 安装方式
pip
Data-Juicer Version Data-Juicer版本
v0.1.2
Python Version Python版本
3.10
Describe the bug 描述这个bug
video_split_by_duration_mapper这个算子跑一会就会报RuntimeError: can't start new thread
To Reproduce 如何复现
video_split_by_duration_mapper这个算子跑一会就会报RuntimeError: can't start new thread,机器cpu计算资源充足
Configs 配置信息
process:
- video_split_by_duration_mapper:
split_duration: 10
min_last_split_duration: 3
keep_original_sample: false
Logs 报错日志
2024-07-23 20:55:27 | INFO | data_juicer.core.executor:54 - Setting up data formatter...
2024-07-23 20:55:27 | INFO | data_juicer.core.executor:76 - Preparing exporter...
2024-07-23 20:55:27 | INFO | data_juicer.core.executor:153 - Loading dataset from data formatter...
2024-07-23 20:55:27 | INFO | data_juicer.format.formatter:185 - Unifying the input dataset formats...
2024-07-23 20:55:27 | INFO | data_juicer.format.formatter:200 - There are 50000 sample(s) in the original dataset.
Filter (num_proc=4): 100%|##########| 50000/50000 [00:20<00:00, 2433.82 examples/s]
2024-07-23 20:55:48 | INFO | data_juicer.format.formatter:214 - 50000 samples left after filtering empty text.
2024-07-23 20:55:48 | INFO | data_juicer.format.formatter:237 - Converting relative paths in the dataset to their absolute version. (Based on the directory of input dataset file)
Map (num_proc=4): 100%|##########| 50000/50000 [00:18<00:00, 2691.31 examples/s]
2024-07-23 20:56:07 | INFO | data_juicer.format.mixture_formatter:137 - sampled 50000 from 50000
2024-07-23 20:56:07 | INFO | data_juicer.format.mixture_formatter:143 - There are 50000 in final dataset
2024-07-23 20:56:07 | INFO | data_juicer.core.executor:159 - Preparing process operators...
2024-07-23 20:56:07 | INFO | data_juicer.core.executor:166 - Processing data...
video_split_by_duration_mapper_process (num_proc=4): 3%|3 | 1711/50000 [42:31<28:22:59, 2.12s/ examples]moov atom not found
video_split_by_duration_mapper_process (num_proc=4): 11%|# | 5489/50000 [5:39:39<56:01:58, 4.53s/ examples]Exception in thread Thread-1 (accepter):
Traceback (most recent call last):
File "/root/anaconda3/envs/sora/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/sora/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/envs/sora/lib/python3.10/site-packages/multiprocess/managers.py", line 193, in accepter
t.start()
File "/root/anaconda3/envs/sora/lib/python3.10/threading.py", line 935, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Screenshots 截图
Additional 额外信息
No response
The text was updated successfully, but these errors were encountered: