Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on a large set is much slower than on a smaller set - proportionally #1624

Open
Jamiroquai88 opened this issue Jan 19, 2024 · 19 comments

Comments

@Jamiroquai88
Copy link

Tested versions

Running 3.1

System information

ubuntu 20.04, GPU V100 (p3.2x instance)

Issue description

Bonjour Hervé,

I noticed that when training PyanNet on a large set, training speed deteriorates significantly. I have a training and development set (statistics below).

Train:

  • 26.000 hours of audio
  • 7501003 lines in rttm

Dev:

  • 545 hours of audio
  • 157514 lines in rttm

When I train on training, one epoch takes 1 day, 17 hours, around 1.05it/s.
When I swap training for dev, one epoch takes 17 minutes, showing around 6.50it/s.
I have ~48x more audio in training, however, if I iterated 48 times over the development set, it would take me ~13.5 hours, which is around 3 times faster than training on a train.

Do you have some ideas where this comes from? Both sets are on the same disk. I am going to investigate further, I just wanted to know if you have an idea where to start.
Thanks.

-Jan

Minimal reproduction example (MRE)

can't share my data, sorry

@hbredin
Copy link
Member

hbredin commented Jan 19, 2024

Thanks Jan. Not quite sure. I was just (like 2 minutes ago) discussing with a colleague about the fact that pyannote is missing some kind of profiling to debug this kind of behavior.

Did you train for multiple epochs or just one to report this number? The initial data loading takes a veeeeery long time so that's why I ask (and that's also the point of the caching mechanism that has just been merged).

Could also be related to system builtin caching mechanism where recently opened files are faster to access.

Could also be related to file formats that have no fast seek to specific time IO. Are all your files in the same format?

Whatever you find, I'd love to know about it so that we can fix it.

cc'ing @flyingleafe just in case.

@Jamiroquai88
Copy link
Author

Thank you for the response.
I am running a simple profiler in pytorch_lightning and this is the output for dev (discarded small percentages):

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                           	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                            	|  -              	|  257953         	|  826.98         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                               	|  824.72         	|  1              	|  824.72         	|  99.727         	|
|  run_training_batch                                                                                                                                                               	|  0.10136        	|  6136           	|  621.92         	|  75.203         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                                          	|  0.10114        	|  6136           	|  620.62         	|  75.047         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                     	|  0.055061       	|  6136           	|  337.86         	|  40.854         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                                             	|  0.033064       	|  6136           	|  202.88         	|  24.533         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                          	|  0.010633       	|  6136           	|  65.245         	|  7.8895         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                                   	|  0.0097329      	|  6146           	|  59.819         	|  7.2334         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                                                	|  0.0096235      	|  6147           	|  59.156         	|  7.1533         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                                       	|  0.0076828      	|  6136           	|  47.141         	|  5.7004         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                                     	|  0.0043802      	|  6136           	|  26.877         	|  3.25

Obviously, I need to run this on train, but I need to create a smaller subset (2, 5, 10 thousand hours), 26k hours is too much. I'll update you as soon as I have some numbers.

@Jamiroquai88
Copy link
Author

So I ran the same code on 2000 hours of training audios. There is a huge shift in [_TrainingEpochLoop].train_dataloader_next from 5.7% with 545 hours to 61.26% with 2000 hours, even more significant with more training data it seems - I was running at ~3it/s. So there is something going on with dataloader, I am gonna investigate it further.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                           	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                            	|  -              	|  1076155        	|  8397.9         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                               	|  8395.7         	|  1              	|  8395.7         	|  99.973         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                                       	|  0.20081        	|  25617          	|  5144.1         	|  61.255         	|
|  run_training_batch                                                                                                                                                               	|  0.10187        	|  25617          	|  2609.6         	|  31.074         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                                          	|  0.10165        	|  25617          	|  2604.0         	|  31.007         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                     	|  0.055749       	|  25617          	|  1428.1         	|  17.006         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                                             	|  0.033222       	|  25617          	|  851.06         	|  10.134         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                          	|  0.010263       	|  25617          	|  262.91         	|  3.1307         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                                   	|  0.0098499      	|  25627          	|  252.42         	|  3.0058         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                                                	|  0.0097316      	|  25628          	|  249.4          	|  2.9698         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                                     	|  0.0043083      	|  25617          	|  110.37         	|  1.3142

@hbredin
Copy link
Member

hbredin commented Jan 19, 2024

Let me know if/how I can help.

@Jamiroquai88
Copy link
Author

Not sure how to profile this in some advanced way, but with some manual profiling I believe that one of the issues is on this line

annotations = self.annotations[self.annotations["file_id"] == file_id]

(please note that this is 3.1.0) - I used a different data structure (dictionary mapping file_id) and got an improvement for 2k hours set, from the previous 61.26% to 41.13%. But I can see that caching also touched the code I just modified, would you recommend updating?

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                           	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                            	|  -              	|  1076155        	|  6170.8         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                               	|  6168.5         	|  1              	|  6168.5         	|  99.963         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                                       	|  0.11352        	|  25617          	|  2908.1         	|  47.127         	|
|  run_training_batch                                                                                                                                                               	|  0.10246        	|  25617          	|  2624.6         	|  42.533         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                                          	|  0.10224        	|  25617          	|  2619.1         	|  42.443         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                     	|  0.056324       	|  25617          	|  1442.9         	|  23.382         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                                             	|  0.033204       	|  25617          	|  850.59         	|  13.784         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                          	|  0.010335       	|  25617          	|  264.75         	|  4.2903         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                                   	|  0.0098066      	|  25627          	|  251.31         	|  4.0726         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                                                	|  0.0096959      	|  25628          	|  248.49         	|  4.0268         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                                     	|  0.0044196      	|  25617          	|  113.22         	|  1.8347         	|

@hbredin
Copy link
Member

hbredin commented Jan 20, 2024

It is a good idea to do your tests with the latest develop commit, indeed.
However, know that it will not fix this issue.

Abouth this line of code:

annotations = self.annotations[self.annotations["file_id"] == file_id] 

It is indeed far from being efficient. self.annotations contains as many entries as there are lines in your RTTM files. Therefore, it really does not scale and we should update it to something more efficient.

Instead of using a dict, I would rely on np.searchsorted which should be very efficient because self.annotations["file_id"] is indeed sorted (please double check this assertion, though).

Would you give it go and open a PR?

Something that would look like (!!! untested code !!!)

start_idx, end_idx = np.searchsorted(self.annotations["file_id"], [file_id, file_id + 1])
annotations = self.annotations[start_idx:end_idx] 

@Jamiroquai88
Copy link
Author

This is a version with the current develop and np.searchsorted implementation - I was able to make it work after some minor changes. I was expecting more, I am gonna run the original develop to make sure that there is some gain.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  8172.3         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  7987.8         	|  1              	|  7987.8         	|  97.741         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.18439        	|  25617          	|  4723.6         	|  57.8           	|
|  run_training_batch                                                                                                                                             	|  0.10278        	|  25617          	|  2632.8         	|  32.216         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.10255        	|  25617          	|  2627.0         	|  32.146         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.056394       	|  25617          	|  1444.6         	|  17.677         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033148       	|  25617          	|  849.16         	|  10.391         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010539       	|  25617          	|  269.97         	|  3.3035         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0098991      	|  25627          	|  253.68         	|  3.1042         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0097804      	|  25628          	|  250.65         	|  3.0671         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  182.96         	|  1              	|  182.96         	|  2.2388         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0043837      	|  25617          	|  112.3          	|  1.3741         	|

@Jamiroquai88
Copy link
Author

Jamiroquai88 commented Jan 24, 2024

I ran the current develop, and this is the result

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  7478.0         	|  100 %          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  7287.7         	|  1              	|  7287.7         	|  97.456         |
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.16016        	|  25617          	|  4102.9         	|  54.867         |
|  run_training_batch                                                                                                                                             	|  0.10243        	|  25617          	|  2623.9         	|  35.089         |
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.10221        	|  25617          	|  2618.4         	|  35.015         |
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.056538       	|  25617          	|  1448.3         	|  19.368         |
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.032996       	|  25617          	|  845.25         	|  11.303         |
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010314       	|  25617          	|  264.21         	|  3.5332         |
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.010153       	|  25627          	|  260.18         	|  3.4793         |
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.010037       	|  25628          	|  257.23         	|  3.4399         |
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  186.49         	|  1              	|  186.49         	|  2.4939         |

which is very close to np.searchsorted implementation.

Again, I tried an implementation with the dictionary (this time it was more difficult and took me a while to realize what was going on with caching and why my dictionary was empty np.ndarray suddenly) ...

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  865450         	|  6243.3         	|  100 %          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  6043.4         	|  1              	|  6043.4         	|  96.798         |
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.12278        	|  25617          	|  3145.3         	|  50.38          |
|  run_training_batch                                                                                                                                             	|  0.091999       	|  25617          	|  2356.7         	|  37.749         |
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.091822       	|  25617          	|  2352.2         	|  37.676         |
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.057007       	|  25617          	|  1460.3         	|  23.391         |
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.025018       	|  25617          	|  640.88         	|  10.265         |
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.009844       	|  25627          	|  252.27         	|  4.0407         |
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0097397      	|  25628          	|  249.61         	|  3.9981         |
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010489       	|  19287          	|  202.31         	|  3.2404         |
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  180.86         	|  1              	|  180.86         	|  2.8969         |

Anyway, seems like the new develop is faster than 3.1 (I am not exactly sure why that would be).
Do you still want me to make a PR with np.searchsorted implementation?
It also seems that my setup is not exactly consistent (running on AWS p3.2x V100), since it is using the network disk. I will copy my audio to the local SSD and run it again.

@hbredin
Copy link
Member

hbredin commented Jan 24, 2024

Thanks Jan!

Would be great indeed if you could run with SSD with the 3 versions:

  • develop
  • develop with np.searchsorted
  • develop with your dict-based version.

@Jamiroquai88
Copy link
Author

So I ran all three on a local volume with the highest possible disk setup on AWS
develop

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  6138.5         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  5944.6         	|  1              	|  5944.6         	|  96.841         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.11072        	|  25617          	|  2836.2         	|  46.203         	|
|  run_training_batch                                                                                                                                             	|  0.099618       	|  25617          	|  2551.9         	|  41.572         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.099421       	|  25617          	|  2546.9         	|  41.49          	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.053946       	|  25617          	|  1381.9         	|  22.512         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033313       	|  25617          	|  853.39         	|  13.902         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010018       	|  25617          	|  256.63         	|  4.1807         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0086623      	|  25627          	|  221.99         	|  3.6163         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0085564      	|  25628          	|  219.28         	|  3.5722         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  190.2          	|  1              	|  190.2          	|  3.0985         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0039778      	|  25617          	|  101.9          	|  1.66           	|

develop with np.searchsorted

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  6481.7         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  6310.8         	|  1              	|  6310.8         	|  97.364         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.1264         	|  25617          	|  3237.9         	|  49.955         	|
|  run_training_batch                                                                                                                                             	|  0.097115       	|  25617          	|  2487.8         	|  38.382         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.096915       	|  25617          	|  2482.7         	|  38.303         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.051565       	|  25617          	|  1320.9         	|  20.379         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033248       	|  25617          	|  851.71         	|  13.14          	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.0099497      	|  25617          	|  254.88         	|  3.9323         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.009078       	|  25627          	|  232.64         	|  3.5892         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0089707      	|  25628          	|  229.9          	|  3.5469         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  169.56         	|  1              	|  169.56         	|  2.616          	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0038971      	|  25617          	|  99.833         	|  1.5402         	|

develop with dict-based version

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  4990.5         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  4792.8         	|  1              	|  4792.8         	|  96.039         	|
|  run_training_batch                                                                                                                                             	|  0.097173       	|  25617          	|  2489.3         	|  49.88          	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.096991       	|  25617          	|  2484.6         	|  49.787         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.068138       	|  25617          	|  1745.5         	|  34.976         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.051949       	|  25617          	|  1330.8         	|  26.666         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033514       	|  25617          	|  858.54         	|  17.204         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.0095288      	|  25617          	|  244.1          	|  4.8913         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0086254      	|  25627          	|  221.04         	|  4.4293         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0085361      	|  25628          	|  218.76         	|  4.3836         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  179.21         	|  1              	|  179.21         	|  3.591          	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0038855      	|  25617          	|  99.535         	|  1.9945         	|

@Jamiroquai88
Copy link
Author

But what is strange, when I change the number of workers from 2 to 4 (I have 4 physical cores and 8 threads per GPU), numbers change dramatically. This is develop with dict-based version:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  3529.9         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  3336.4         	|  1              	|  3336.4         	|  94.519         	|
|  run_training_batch                                                                                                                                             	|  0.10077        	|  25617          	|  2581.5         	|  73.131         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.10056        	|  25617          	|  2576.1         	|  72.978         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.054865       	|  25617          	|  1405.5         	|  39.816         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033089       	|  25617          	|  847.63         	|  24.013         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010313       	|  25617          	|  264.18         	|  7.4841         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0102         	|  25627          	|  261.4          	|  7.4054         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.010089       	|  25628          	|  258.55         	|  7.3247         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  174.65         	|  1              	|  174.65         	|  4.9476         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0042494      	|  25617          	|  108.86         	|  3.0839         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.0038756      	|  25617          	|  99.282         	|  2.8126         	|

@hbredin
Copy link
Member

hbredin commented Jan 25, 2024

Oh. I did not realize you were using such a small number of workers.
I usually use 10 :)

@Jamiroquai88
Copy link
Author

I was starting on g4dn.xlarge which has half of the resources of p3.2xlarge, but that drop when increasing the number of workers is huge, I was not expecting that. So far it seems like a sweet spot is the number of threads (usually twice the number of cores).

But since my goal is to train on 26k hours, with the current develop I only get around 2.00it/s (21hrs per epoch), with np.searchsorted 1.85 (23hrs per epoch), while with the dict-based one, I am on exactly the same number as before, 4.55it/s (9hrs14min per epoch).

Since one epoch is taking quite a long time, I can't run the whole profiler, so these are just approximate numbers.

@hbredin
Copy link
Member

hbredin commented Jan 28, 2024

I'd be curious to have a look at your dict-based approach :)

Regarding your last point (long epoch), you could actually use the limit_train_batches option of pytorch-lightning's Trainer to reduce the size of an epoch.

@Jamiroquai88
Copy link
Author

I am trying to push my changes to a branch named segments_dict.
But when doing

git push --set-upstream origin segments_dict

I got

ERROR: Permission to pyannote/pyannote-audio.git denied to Jamiroquai88.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
git remote -v

says

origin	git@github.com:pyannote/pyannote-audio.git (fetch)
origin	git@github.com:pyannote/pyannote-audio.git (push)

Sorry to bother you with this. Any ideas about what might be wrong here? ssh key has been added on GitHub for quite some time.

@hbredin
Copy link
Member

hbredin commented Jan 30, 2024

I guess you need to fork the repo, push to your own fork, and open a PR from it?

@Jamiroquai88
Copy link
Author

Right, makes sense. Thank you.
#1633

@Jamiroquai88
Copy link
Author

Closing this, since this is only critical for large amounts of data.

@hbredin
Copy link
Member

hbredin commented Feb 16, 2024

Re-opening it as I think it is worth looking into it (and also there's your related PR that I still need to have a look at).

@hbredin hbredin reopened this Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants