Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train with only coco problem #11

Closed
zhaishengfu opened this issue Nov 8, 2021 · 4 comments
Closed

train with only coco problem #11

zhaishengfu opened this issue Nov 8, 2021 · 4 comments

Comments

@zhaishengfu
Copy link

zhaishengfu commented Nov 8, 2021

hello, I want to use COCO only to train . I change the dataset.yaml and prohmr.yaml as following:

COCO-TRAIN-2014:
    TYPE: ImageDataset
    DATASET_FILE: data/datasets/coco_2014_train_spin.npz
    IMG_DIR: C:/Users/DM/Documents/Code/common_data/train_datasets/COCO/train2014
COCO-VAL:
    TYPE: ImageDataset
    DATASET_FILE: data/datasets/coco_val.npz
    IMG_DIR: C:/Users/DM/Documents/Code/common_data/train_datasets/COCO/val2017
CMU-MOCAP:
    DATASET_FILE: data/datasets/cmu_mocap.npz
DATASETS:
  TRAIN: 
    COCO-TRAIN-2014:
      WEIGHT: 1.0
  VAL:
    COCO-VAL:
      WEIGHT: 1.0
  MOCAP: CMU-MOCAP

and there is error :
TypeError: cannot serialize '_io.BufferedReader' object
which is caused by
File "train/train_prohmr.py", line 63, in
trainer.fit(model, datamodule=data_module)

My environment is Win10

@nkolot
Copy link
Owner

nkolot commented Nov 8, 2021

Can you try to set num_workers to 0? This should give a more informative error message.

@zhaishengfu
Copy link
Author

zhaishengfu commented Nov 9, 2021

Hello, below is my detailed test process:

  1. When i set toth train_dataloader and mocap_dataloader num_workers to zero :
        train_dataloader = torch.utils.data.DataLoader(self.train_dataset, self.cfg.TRAIN.BATCH_SIZE, shuffle=True, drop_last=True, 
num_workers=self.cfg.GENERAL.NUM_WORKERS) #zsf test, what is the problem??
        mocap_dataloader = torch.utils.data.DataLoader(self.mocap_dataset, self.cfg.TRAIN.NUM_TRAIN_SAMPLES * self.cfg.TRAIN.BATCH_SIZE, shuffle=True, drop_last=True, num_workers=0)

After that I find the COCO image dir for train is not correct, I changed it and It seems the train process is ok:
image

  1. If I set only train_dataloader num_worker to zero and let mocap_dataloader num_worker to be 1, there will be error( I checked the mocap data dir is correct):
Epoch 0:   0%|                                                                                                                                                                                                                                                   | 0/2499 [00:00<?, ?it/sT 
raceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 114, in _main
Traceback (most recent call last):
      File "train/train_prohmr.py", line 63, in <module>
prepare(preparation_data)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
    trainer.fit(model, datamodule=data_module)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 553, in fit
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    self._run(model)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 918, in _run
    run_name="__mp_main__")
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 263, in run_path
    self._dispatch()
    pkg_name=pkg_name, script_name=fname)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 986, in _dispatch
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
self.accelerator.start_training(self)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 92, in start_training
    exec(code, run_globals)
      File "C:\Users\DM\Documents\Code\avatar-pose\ProHMR\train\train_prohmr.py", line 63, in <module>
self.training_type_plugin.start_training(trainer)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 161, in start_training
    trainer.fit(model, datamodule=data_module)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 553, in fit
self._results = trainer.run_stage()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 996, in run_stage
    self._run(model)    return self._run_train()

  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1045, in _run_train
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 918, in _run
    self.fit_loop.run()
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
self._dispatch()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 986, in _dispatch
    self.advance(*args, **kwargs)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 200, in advance
        epoch_output = self.epoch_loop.run(train_dataloader)self.accelerator.start_training(self)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run

  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 92, in start_training
    self.advance(*args, **kwargs)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 118, in advance
    self.training_type_plugin.start_training(trainer)    
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 161, in start_training
_, (batch, is_last) = next(dataloader_iter)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\profiler\base.py", line 104, in profile_iterable
self._results = trainer.run_stage()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 996, in run_stage
    value = next(iterator)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 668, in prefetch_iterator
    return self._run_train()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1045, in _run_train
    last = next(it)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 589, in __next__
    self.fit_loop.run()
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
return self.request_next_batch(self.loader_iters)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 575, in loader_iters
self.advance(*args, **kwargs)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 200, in advance
    self._loader_iters = self.create_loader_iters(self.loaders)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 633, in create_loader_iters
epoch_output = self.epoch_loop.run(train_dataloader)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\base.py", line 111, in run
    return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 105, in apply_to_collection
self.advance(*args, **kwargs)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 118, in advance
    v, dtype, function, *args, wrong_dtype=wrong_dtype, include_none=include_none, **kwargs
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 96, in apply_to_collection
_, (batch, is_last) = next(dataloader_iter)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\profiler\base.py", line 104, in profile_iterable
    return function(data, *args, **kwargs)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 234, in __iter__
value = next(iterator)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 668, in prefetch_iterator
    self._loader_iter = iter(self.loader)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
    last = next(it)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 589, in __next__
    return self._get_iterator()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
    return self.request_next_batch(self.loader_iters)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 575, in loader_iters
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    self._loader_iters = self.create_loader_iters(self.loaders)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 633, in create_loader_iters
    w.start()
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 105, in apply_to_collection
    self._popen = self._Popen(self)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
v, dtype, function, *args, wrong_dtype=wrong_dtype, include_none=include_none, **kwargs
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\utilities\apply_func.py", line 96, in apply_to_collection
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return function(data, *args, **kwargs)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 234, in __iter__
    return Popen(process_obj)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    self._loader_iter = iter(self.loader)
      File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
reduction.dump(process_obj, to_child)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    return self._get_iterator()    
ForkingPickler(file, protocol).dump(obj)  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator

BrokenPipeError: [Errno 32] Broken pipe    return _MultiProcessingDataLoaderIter(self)

  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    w.start()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\DM\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

It seems like a multiprocess error?? (the runtime error says 'using fork to start child')

and could you tell me what it means to set num_worker to zero? does it means single process??

@nkolot
Copy link
Owner

nkolot commented Nov 10, 2021

Setting num_workers=0 means that it uses the main process to load data. For num_workers=n>0 n new processes are spawned to load the data. I think we can close this issue for now.

@nkolot nkolot closed this as completed Nov 10, 2021
@Yinlemei
Copy link

Hi, in text to image generation, I want to change to my own dataset, the format of dataset is coco format, how can I get the coco_val.npz file of my own dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants