Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TalkNet_Training_Offline error #25

Open
rikabi89 opened this issue Dec 2, 2022 · 1 comment
Open

TalkNet_Training_Offline error #25

rikabi89 opened this issue Dec 2, 2022 · 1 comment

Comments

@rikabi89
Copy link

rikabi89 commented Dec 2, 2022

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
[NeMo W 2022-12-02 09:58:37 modelPT:138] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
dataset:
target: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
manifest_filepath: H:/ControllableTalkNet/tTrump\trainfiles.json
max_duration: null
min_duration: 0.1
int_values: false
load_audio: false
normalize: false
sample_rate: 22050
trim: false
durs_file: H:/ControllableTalkNet/tTrump\durations.pt
f0_file: H:/ControllableTalkNet/tTrump\f0s.pt
blanking: true
vocab:
notation: phonemes
punct: true
spaces: true
stresses: false
add_blank_at: last
dataloader_params:
drop_last: false
shuffle: true
batch_size: 16
num_workers: 4

[NeMo W 2022-12-02 09:58:37 modelPT:145] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
dataset:
target: nemo.collections.asr.data.audio_to_text.AudioToCharWithDursF0Dataset
manifest_filepath: H:/ControllableTalkNet/tTrump\valfiles.json
max_duration: null
min_duration: 0.1
int_values: false
load_audio: false
normalize: false
sample_rate: 22050
trim: false
durs_file: H:/ControllableTalkNet/tTrump\durations.pt
f0_file: H:/ControllableTalkNet/tTrump\f0s.pt
blanking: true
vocab:
notation: phonemes
punct: true
spaces: true
stresses: false
add_blank_at: last
dataloader_params:
drop_last: false
shuffle: false
batch_size: 16
num_workers: 1

[NeMo I 2022-12-02 09:58:37 modelPT:439] Model TalkNetDursModel was successfully restored from H:\ControllableTalkNet\talknet_durs.nemo.
[NeMo I 2022-12-02 09:58:37 collections:173] Dataset loaded with 134 files totalling 0.21 hours
[NeMo I 2022-12-02 09:58:37 collections:174] 0 files were filtered totalling 0.00 hours
[NeMo I 2022-12-02 09:58:37 collections:173] Dataset loaded with 134 files totalling 0.21 hours
[NeMo I 2022-12-02 09:58:37 collections:174] 0 files were filtered totalling 0.00 hours
[NeMo W 2022-12-02 09:58:37 modelPT:660] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.
[NeMo I 2022-12-02 09:58:37 modelPT:751] Optimizer config = Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.001
weight_decay: 1e-06
)
[NeMo I 2022-12-02 09:58:37 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x0000021A2DF86EB0>"
will be used during training (effective maximum steps = 180) -
Parameters :
(min_lr: 3.0e-06
warmup_ratio: 0.02
max_steps: 180
)
Warm-starting from H:\ControllableTalkNet\talknet_durs.nemo
[NeMo I 2022-12-02 09:58:37 exp_manager:216] Experiments will be logged at H:\ControllableTalkNet\tTrump\TalkNetDurs\2022-12-02_09-57-24
[NeMo I 2022-12-02 09:58:37 exp_manager:563] TensorboardLogger has been set up
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2022-12-02 09:58:38 modelPT:660] The lightning trainer received accelerator: dp. We recommend to use 'ddp' instead.
[NeMo I 2022-12-02 09:58:38 modelPT:751] Optimizer config = Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.001
weight_decay: 1e-06
)
[NeMo I 2022-12-02 09:58:38 lr_scheduler:621] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x0000021A2E22DCD0>"
will be used during training (effective maximum steps = 180) -
Parameters :
(min_lr: 3.0e-06
warmup_ratio: 0.02
max_steps: 180
)

| Name | Type | Params

0 | embed | Embedding | 7.6 K
1 | model | ConvASREncoder | 2.5 M
2 | proj | Conv1d | 513

2.5 M Trainable params
0 Non-trainable params
2.5 M Total params
9.841 Total estimated model params size (MB)
Validation sanity check: 0%
0/2 [00:00<?, ?it/s]

PicklingError Traceback (most recent call last)
Cell In[6], line 68
66 initialize(config_path="conf")
67 cfg = compose(config_name="talknet-durs")
---> 68 train(cfg)

Cell In[6], line 62, in train(cfg)
60 exp_manager(trainer, cfg.get('exp_manager', None))
61 trainer.callbacks.extend([pl.callbacks.LearningRateMonitor(), LogEpochTimeCallback()]) # noqa
---> 62 trainer.fit(model)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:460, in Trainer.fit(self, model, train_dataloader, val_dataloaders, datamodule)
455 # links data to the trainer
456 self.data_connector.attach_data(
457 model, train_dataloader=train_dataloader, val_dataloaders=val_dataloaders, datamodule=datamodule
458 )
--> 460 self._run(model)
462 assert self.state.stopped
463 self.training = False

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:758, in Trainer._run(self, model)
755 self.pre_dispatch()
757 # dispatch start_training or start_evaluating or start_predicting
--> 758 self.dispatch()
760 # plugin will finalized fitting (e.g. ddp_spawn will load trained model)
761 self.post_dispatch()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:799, in Trainer.dispatch(self)
797 self.accelerator.start_predicting(self)
798 else:
--> 799 self.accelerator.start_training(self)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\accelerators\accelerator.py:96, in Accelerator.start_training(self, trainer)
95 def start_training(self, trainer: 'pl.Trainer') -> None:
---> 96 self.training_type_plugin.start_training(trainer)

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py:144, in TrainingTypePlugin.start_training(self, trainer)
142 def start_training(self, trainer: 'pl.Trainer') -> None:
143 # double dispatch to initiate the training loop
--> 144 self._results = trainer.run_stage()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:809, in Trainer.run_stage(self)
807 if self.predicting:
808 return self.run_predict()
--> 809 return self.run_train()

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:844, in Trainer.run_train(self)
841 if not self.is_global_zero and self.progress_bar_callback is not None:
842 self.progress_bar_callback.disable()
--> 844 self.run_sanity_check(self.lightning_module)
846 self.checkpoint_connector.has_trained = False
848 # enable train mode

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:1112, in Trainer.run_sanity_check(self, ref_model)
1109 self.on_sanity_check_start()
1111 # run eval step
-> 1112 self.run_evaluation()
1114 self.on_sanity_check_end()
1116 self.state.stage = stage

File ~\anaconda3\envs\talknet\lib\site-packages\pytorch_lightning\trainer\trainer.py:954, in Trainer.run_evaluation(self, on_epoch)
951 dataloader = self.accelerator.process_dataloader(dataloader)
952 dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx]
--> 954 for batch_idx, batch in enumerate(dataloader):
955 if batch is None:
956 continue

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:355, in DataLoader.iter(self)
353 return self._iterator
354 else:
--> 355 return self._get_iterator()

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:301, in DataLoader._get_iterator(self)
299 else:
300 self.check_worker_number_rationality()
--> 301 return _MultiProcessingDataLoaderIter(self)

File ~\anaconda3\envs\talknet\lib\site-packages\torch\utils\data\dataloader.py:914, in _MultiProcessingDataLoaderIter.init(self, loader)
907 w.daemon = True
908 # NB: Process.start() actually take some time as it needs to
909 # start a process and pass the arguments over via a pipe.
910 # Therefore, we only add a worker to self._workers list after
911 # it started, so that we do not call .join() if program dies
912 # before it starts, and del tries to join but will get:
913 # AssertionError: can only join a started process.
--> 914 w.start()
915 self._index_queues.append(index_queue)
916 self._workers.append(w)

File ~\anaconda3\envs\talknet\lib\multiprocessing\process.py:121, in BaseProcess.start(self)
118 assert not _current_process._config.get('daemon'),
119 'daemonic processes are not allowed to have children'
120 _cleanup()
--> 121 self._popen = self._Popen(self)
122 self._sentinel = self._popen.sentinel
123 # Avoid a refcycle if the target function holds an indirect
124 # reference to the process object (see bpo-30775)

File ~\anaconda3\envs\talknet\lib\multiprocessing\context.py:224, in Process._Popen(process_obj)
222 @staticmethod
223 def _Popen(process_obj):
--> 224 return _default_context.get_context().Process._Popen(process_obj)

File ~\anaconda3\envs\talknet\lib\multiprocessing\context.py:327, in SpawnProcess._Popen(process_obj)
324 @staticmethod
325 def _Popen(process_obj):
326 from .popen_spawn_win32 import Popen
--> 327 return Popen(process_obj)

File ~\anaconda3\envs\talknet\lib\multiprocessing\popen_spawn_win32.py:93, in Popen.init(self, process_obj)
91 try:
92 reduction.dump(prep_data, to_child)
---> 93 reduction.dump(process_obj, to_child)
94 finally:
95 set_spawning_popen(None)

File ~\anaconda3\envs\talknet\lib\multiprocessing\reduction.py:60, in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)

PicklingError: Can't pickle <class 'nemo.collections.common.parts.preprocessing.collections.AudioTextEntity'>: attribute lookup AudioTextEntity on nemo.collections.common.parts.preprocessing.collections failed

@rikabi89
Copy link
Author

rikabi89 commented Dec 2, 2022

I get this error at Step 4 in the notebook for offline training. Any help please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant