Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scannet pre-preocess error #1

Closed
Gilgamesh666666 opened this issue Oct 22, 2021 · 3 comments
Closed

Scannet pre-preocess error #1

Gilgamesh666666 opened this issue Oct 22, 2021 · 3 comments

Comments

@Gilgamesh666666
Copy link

Hi,

Thank you for the excellent work. I run the preprocess code in scannet according to the instruction, but it raised an error, and i couldn't find the bug (T-T). I will very appreciate if you could give me some hint. Thanks
Here is the terminal output
`Running stage 'scannet':

poetry run python mix3d/datasets/preprocessing/scannet_preprocessing.py preprocess --git_repo=./data/raw/scannet/ScanNet --data_dir=/media/zebai/T7/Datasets/ScannetV2/ScanNet --save_dir=./data/processed/scannet
2021-10-22 11:15:24.771 | INFO | mix3d.datasets.preprocessing.base_preprocessing:preprocess:45 - Tasks for train: 1201
[Parallel(n_jobs=12)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=12)]: Done 1 tasks | elapsed: 4.7s
[Parallel(n_jobs=12)]: Done 8 tasks | elapsed: 6.5s
[Parallel(n_jobs=12)]: Done 17 tasks | elapsed: 14.1s
[Parallel(n_jobs=12)]: Done 26 tasks | elapsed: 18.1s
[Parallel(n_jobs=12)]: Done 37 tasks | elapsed: 24.1s
[Parallel(n_jobs=12)]: Done 48 tasks | elapsed: 30.2s
[Parallel(n_jobs=12)]: Done 61 tasks | elapsed: 37.2s
[Parallel(n_jobs=12)]: Done 74 tasks | elapsed: 44.2s
[Parallel(n_jobs=12)]: Done 89 tasks | elapsed: 51.4s
[Parallel(n_jobs=12)]: Done 104 tasks | elapsed: 1.1min
[Parallel(n_jobs=12)]: Done 121 tasks | elapsed: 1.2min
[Parallel(n_jobs=12)]: Done 138 tasks | elapsed: 1.3min
[Parallel(n_jobs=12)]: Done 157 tasks | elapsed: 1.4min
[Parallel(n_jobs=12)]: Done 176 tasks | elapsed: 1.6min
[Parallel(n_jobs=12)]: Done 197 tasks | elapsed: 1.8min
[Parallel(n_jobs=12)]: Done 218 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 219 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 220 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 221 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 222 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 223 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 224 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 225 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 226 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 227 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 228 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 229 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 230 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 231 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 232 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 233 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 234 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 235 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 236 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 237 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 238 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 239 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 240 tasks | elapsed: 2.0min
[Parallel(n_jobs=12)]: Done 241 tasks | elapsed: 2.0min
2021-10-22 11:17:25.204 | ERROR | fire.core:_CallAndUpdateTrace:681 - An error has been caught in function '_CallAndUpdateTrace', process 'MainProcess' (136997), thread 'MainThread' (139858255038272):
Traceback (most recent call last):

File "mix3d/datasets/preprocessing/scannet_preprocessing.py", line 212, in
Fire(ScannetPreprocessing)
│ └ <class 'main.ScannetPreprocessing'>
└ <function Fire at 0x7f3325018d30>

File "/home/zebai/.conda/envs/predator/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
│ │ │ │ │ └ 'scannet_preprocessing.py'
│ │ │ │ └ {}
│ │ │ └ Namespace(completion=None, help=False, interactive=False, separator='-', trace=False, verbose=False)
│ │ └ ['preprocess', '--git_repo=./data/raw/scannet/ScanNet', '--data_dir=/media/zebai/T7/Datasets/ScannetV2/ScanNet', '--save_dir=...
│ └ <class 'main.ScannetPreprocessing'>
└ <function _Fire at 0x7f3324b12c10>
File "/home/zebai/.conda/envs/predator/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
│ └ <function _CallAndUpdateTrace at 0x7f3324b12d30>
└ <bound method BasePreprocessing.preprocess of <main.ScannetPreprocessing object at 0x7f3323679fa0>>

File "/home/zebai/.conda/envs/predator/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
│ │ └ {}
│ └ []
└ <bound method BasePreprocessing.preprocess of <main.ScannetPreprocessing object at 0x7f3323679fa0>>

File "/home/zebai/mix3d/mix3d/datasets/preprocessing/base_preprocessing.py", line 46, in preprocess
parallel_results = Parallel(n_jobs=self.n_jobs, verbose=10)(
│ │ └ 12
│ └ <main.ScannetPreprocessing object at 0x7f3323679fa0>
└ <class 'joblib.parallel.Parallel'>

File "/home/zebai/.conda/envs/predator/lib/python3.8/site-packages/joblib/parallel.py", line 1054, in call
self.retrieve()
│ └ <function Parallel.retrieve at 0x7f332493fa60>
└ Parallel(n_jobs=12)
File "/home/zebai/.conda/envs/predator/lib/python3.8/site-packages/joblib/parallel.py", line 933, in retrieve
self._output.extend(job.get(timeout=self.timeout))
│ │ │ │ │ │ └ None
│ │ │ │ │ └ Parallel(n_jobs=12)
│ │ │ │ └ functools.partial(<function LokyBackend.wrap_future_result at 0x7f332493c670>, <Future at 0x7f33234898b0 state=finished raise...
│ │ │ └ <Future at 0x7f33234898b0 state=finished raised TerminatedWorkerError>
│ │ └ <method 'extend' of 'list' objects>
│ └ [{'filepath': 'data/processed/scannet/train/0000_00.npy', 'scene': 0, 'sub_scene': 0, 'raw_filepath': '/media/zebai/T7/Datase...
└ Parallel(n_jobs=12)
File "/home/zebai/.conda/envs/predator/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
│ │ └ None
│ └ <function Future.result at 0x7f3324d9d1f0>
└ <Future at 0x7f33234898b0 state=finished raised TerminatedWorkerError>
File "/home/zebai/.conda/envs/predator/lib/python3.8/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
└ <Future at 0x7f33234898b0 state=finished raised TerminatedWorkerError>
File "/home/zebai/.conda/envs/predator/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
│ └ TerminatedWorkerError('A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmen...
└ <Future at 0x7f33234898b0 state=finished raised TerminatedWorkerError>

joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGSEGV(-11)}
ERROR: failed to reproduce 'dvc.yaml': output 'data/processed/scannet/color_mean_std.yaml' does not exist`

@kumuji
Copy link
Owner

kumuji commented Oct 22, 2021

Hi!
I can't really see where is the error. Seems like some of the workers just exited unexpectedly. Could you try running it in sequential mode?

Run like this: scripts/preprocess_scannet.bash -r. Argument -r used to process the data sequentially.

@Gilgamesh666666
Copy link
Author

Hi,

Thank you for the reply! i have run the command, but it raises a new error as the follow.
scripts/preprocess_scannet.bash: line 35: 34642 Segmentation fault (core dumped) poetry run python mix3d/datasets/preprocessing/scannet_preprocessing.py preprocess_sequential --git_repo="$GIT_REPO" --data_dir="$DATA_DIR" --save_dir="$SAVE_DIR" :(
I can't figure out how to repair :(

@kumuji
Copy link
Owner

kumuji commented Oct 27, 2021

Sadly, I need more information to help you. Could you find the exact line that causes the problem?
For example with https://github.com/inducer/pudb

@kumuji kumuji closed this as completed Oct 27, 2021
@kumuji kumuji reopened this Oct 27, 2021
@kumuji kumuji closed this as completed Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants