Training on a large set is much slower than on a smaller set - proportionally #1624

Jamiroquai88 · 2024-01-19T08:37:15Z

Tested versions

Running 3.1

System information

ubuntu 20.04, GPU V100 (p3.2x instance)

Issue description

Bonjour Hervé,

I noticed that when training PyanNet on a large set, training speed deteriorates significantly. I have a training and development set (statistics below).

Train:

26.000 hours of audio
7501003 lines in rttm

Dev:

545 hours of audio
157514 lines in rttm

When I train on training, one epoch takes 1 day, 17 hours, around 1.05it/s.
When I swap training for dev, one epoch takes 17 minutes, showing around 6.50it/s.
I have ~48x more audio in training, however, if I iterated 48 times over the development set, it would take me ~13.5 hours, which is around 3 times faster than training on a train.

Do you have some ideas where this comes from? Both sets are on the same disk. I am going to investigate further, I just wanted to know if you have an idea where to start.
Thanks.

-Jan

Minimal reproduction example (MRE)

can't share my data, sorry

The text was updated successfully, but these errors were encountered:

hbredin · 2024-01-19T09:06:20Z

Thanks Jan. Not quite sure. I was just (like 2 minutes ago) discussing with a colleague about the fact that pyannote is missing some kind of profiling to debug this kind of behavior.

Did you train for multiple epochs or just one to report this number? The initial data loading takes a veeeeery long time so that's why I ask (and that's also the point of the caching mechanism that has just been merged).

Could also be related to system builtin caching mechanism where recently opened files are faster to access.

Could also be related to file formats that have no fast seek to specific time IO. Are all your files in the same format?

Whatever you find, I'd love to know about it so that we can fix it.

cc'ing @flyingleafe just in case.

Jamiroquai88 · 2024-01-19T11:27:11Z

Thank you for the response.
I am running a simple profiler in pytorch_lightning and this is the output for dev (discarded small percentages):

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                           	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                            	|  -              	|  257953         	|  826.98         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                               	|  824.72         	|  1              	|  824.72         	|  99.727         	|
|  run_training_batch                                                                                                                                                               	|  0.10136        	|  6136           	|  621.92         	|  75.203         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                                          	|  0.10114        	|  6136           	|  620.62         	|  75.047         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                     	|  0.055061       	|  6136           	|  337.86         	|  40.854         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                                             	|  0.033064       	|  6136           	|  202.88         	|  24.533         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                          	|  0.010633       	|  6136           	|  65.245         	|  7.8895         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                                   	|  0.0097329      	|  6146           	|  59.819         	|  7.2334         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                                                	|  0.0096235      	|  6147           	|  59.156         	|  7.1533         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                                       	|  0.0076828      	|  6136           	|  47.141         	|  5.7004         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                                     	|  0.0043802      	|  6136           	|  26.877         	|  3.25

Obviously, I need to run this on train, but I need to create a smaller subset (2, 5, 10 thousand hours), 26k hours is too much. I'll update you as soon as I have some numbers.

Jamiroquai88 · 2024-01-19T14:15:35Z

So I ran the same code on 2000 hours of training audios. There is a huge shift in [_TrainingEpochLoop].train_dataloader_next from 5.7% with 545 hours to 61.26% with 2000 hours, even more significant with more training data it seems - I was running at ~3it/s. So there is something going on with dataloader, I am gonna investigate it further.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                           	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                            	|  -              	|  1076155        	|  8397.9         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                               	|  8395.7         	|  1              	|  8395.7         	|  99.973         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                                       	|  0.20081        	|  25617          	|  5144.1         	|  61.255         	|
|  run_training_batch                                                                                                                                                               	|  0.10187        	|  25617          	|  2609.6         	|  31.074         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                                          	|  0.10165        	|  25617          	|  2604.0         	|  31.007         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                     	|  0.055749       	|  25617          	|  1428.1         	|  17.006         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                                             	|  0.033222       	|  25617          	|  851.06         	|  10.134         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                          	|  0.010263       	|  25617          	|  262.91         	|  3.1307         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                                   	|  0.0098499      	|  25627          	|  252.42         	|  3.0058         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                                                	|  0.0097316      	|  25628          	|  249.4          	|  2.9698         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                                     	|  0.0043083      	|  25617          	|  110.37         	|  1.3142

hbredin · 2024-01-19T14:47:16Z

Let me know if/how I can help.

Jamiroquai88 · 2024-01-19T20:30:16Z

Not sure how to profile this in some advanced way, but with some manual profiling I believe that one of the issues is on this line

pyannote-audio/pyannote/audio/tasks/segmentation/speaker_diarization.py

Line 331 in f45da71

annotations = self.annotations[self.annotations["file_id"] == file_id]

(please note that this is 3.1.0) - I used a different data structure (dictionary mapping file_id) and got an improvement for 2k hours set, from the previous 61.26% to 41.13%. But I can see that caching also touched the code I just modified, would you recommend updating?

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                           	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                            	|  -              	|  1076155        	|  6170.8         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                               	|  6168.5         	|  1              	|  6168.5         	|  99.963         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                                       	|  0.11352        	|  25617          	|  2908.1         	|  47.127         	|
|  run_training_batch                                                                                                                                                               	|  0.10246        	|  25617          	|  2624.6         	|  42.533         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                                          	|  0.10224        	|  25617          	|  2619.1         	|  42.443         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                     	|  0.056324       	|  25617          	|  1442.9         	|  23.382         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                                             	|  0.033204       	|  25617          	|  850.59         	|  13.784         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                          	|  0.010335       	|  25617          	|  264.75         	|  4.2903         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                                   	|  0.0098066      	|  25627          	|  251.31         	|  4.0726         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                                                	|  0.0096959      	|  25628          	|  248.49         	|  4.0268         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                                     	|  0.0044196      	|  25617          	|  113.22         	|  1.8347         	|

hbredin · 2024-01-20T09:08:10Z

It is a good idea to do your tests with the latest develop commit, indeed.
However, know that it will not fix this issue.

Abouth this line of code:

annotations = self.annotations[self.annotations["file_id"] == file_id]

It is indeed far from being efficient. self.annotations contains as many entries as there are lines in your RTTM files. Therefore, it really does not scale and we should update it to something more efficient.

Instead of using a dict, I would rely on np.searchsorted which should be very efficient because self.annotations["file_id"] is indeed sorted (please double check this assertion, though).

Would you give it go and open a PR?

Something that would look like (!!! untested code !!!)

start_idx, end_idx = np.searchsorted(self.annotations["file_id"], [file_id, file_id + 1])
annotations = self.annotations[start_idx:end_idx]

Jamiroquai88 · 2024-01-23T04:33:37Z

This is a version with the current develop and np.searchsorted implementation - I was able to make it work after some minor changes. I was expecting more, I am gonna run the original develop to make sure that there is some gain.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  8172.3         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  7987.8         	|  1              	|  7987.8         	|  97.741         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.18439        	|  25617          	|  4723.6         	|  57.8           	|
|  run_training_batch                                                                                                                                             	|  0.10278        	|  25617          	|  2632.8         	|  32.216         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.10255        	|  25617          	|  2627.0         	|  32.146         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.056394       	|  25617          	|  1444.6         	|  17.677         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033148       	|  25617          	|  849.16         	|  10.391         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010539       	|  25617          	|  269.97         	|  3.3035         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0098991      	|  25627          	|  253.68         	|  3.1042         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0097804      	|  25628          	|  250.65         	|  3.0671         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  182.96         	|  1              	|  182.96         	|  2.2388         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0043837      	|  25617          	|  112.3          	|  1.3741         	|

Jamiroquai88 · 2024-01-24T15:48:53Z

I ran the current develop, and this is the result

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  7478.0         	|  100 %          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  7287.7         	|  1              	|  7287.7         	|  97.456         |
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.16016        	|  25617          	|  4102.9         	|  54.867         |
|  run_training_batch                                                                                                                                             	|  0.10243        	|  25617          	|  2623.9         	|  35.089         |
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.10221        	|  25617          	|  2618.4         	|  35.015         |
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.056538       	|  25617          	|  1448.3         	|  19.368         |
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.032996       	|  25617          	|  845.25         	|  11.303         |
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010314       	|  25617          	|  264.21         	|  3.5332         |
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.010153       	|  25627          	|  260.18         	|  3.4793         |
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.010037       	|  25628          	|  257.23         	|  3.4399         |
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  186.49         	|  1              	|  186.49         	|  2.4939         |

which is very close to np.searchsorted implementation.

Again, I tried an implementation with the dictionary (this time it was more difficult and took me a while to realize what was going on with caching and why my dictionary was empty np.ndarray suddenly) ...

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  865450         	|  6243.3         	|  100 %          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  6043.4         	|  1              	|  6043.4         	|  96.798         |
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.12278        	|  25617          	|  3145.3         	|  50.38          |
|  run_training_batch                                                                                                                                             	|  0.091999       	|  25617          	|  2356.7         	|  37.749         |
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.091822       	|  25617          	|  2352.2         	|  37.676         |
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.057007       	|  25617          	|  1460.3         	|  23.391         |
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.025018       	|  25617          	|  640.88         	|  10.265         |
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.009844       	|  25627          	|  252.27         	|  4.0407         |
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0097397      	|  25628          	|  249.61         	|  3.9981         |
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010489       	|  19287          	|  202.31         	|  3.2404         |
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  180.86         	|  1              	|  180.86         	|  2.8969         |

Anyway, seems like the new develop is faster than 3.1 (I am not exactly sure why that would be).
Do you still want me to make a PR with np.searchsorted implementation?
It also seems that my setup is not exactly consistent (running on AWS p3.2x V100), since it is using the network disk. I will copy my audio to the local SSD and run it again.

hbredin · 2024-01-24T16:02:55Z

Thanks Jan!

Would be great indeed if you could run with SSD with the 3 versions:

develop
develop with np.searchsorted
develop with your dict-based version.

Jamiroquai88 · 2024-01-25T14:40:23Z

So I ran all three on a local volume with the highest possible disk setup on AWS
develop

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  6138.5         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  5944.6         	|  1              	|  5944.6         	|  96.841         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.11072        	|  25617          	|  2836.2         	|  46.203         	|
|  run_training_batch                                                                                                                                             	|  0.099618       	|  25617          	|  2551.9         	|  41.572         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.099421       	|  25617          	|  2546.9         	|  41.49          	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.053946       	|  25617          	|  1381.9         	|  22.512         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033313       	|  25617          	|  853.39         	|  13.902         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010018       	|  25617          	|  256.63         	|  4.1807         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0086623      	|  25627          	|  221.99         	|  3.6163         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0085564      	|  25628          	|  219.28         	|  3.5722         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  190.2          	|  1              	|  190.2          	|  3.0985         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0039778      	|  25617          	|  101.9          	|  1.66           	|

develop with np.searchsorted

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  6481.7         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  6310.8         	|  1              	|  6310.8         	|  97.364         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.1264         	|  25617          	|  3237.9         	|  49.955         	|
|  run_training_batch                                                                                                                                             	|  0.097115       	|  25617          	|  2487.8         	|  38.382         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.096915       	|  25617          	|  2482.7         	|  38.303         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.051565       	|  25617          	|  1320.9         	|  20.379         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033248       	|  25617          	|  851.71         	|  13.14          	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.0099497      	|  25617          	|  254.88         	|  3.9323         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.009078       	|  25627          	|  232.64         	|  3.5892         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0089707      	|  25628          	|  229.9          	|  3.5469         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  169.56         	|  1              	|  169.56         	|  2.616          	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0038971      	|  25617          	|  99.833         	|  1.5402         	|

develop with dict-based version

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  4990.5         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  4792.8         	|  1              	|  4792.8         	|  96.039         	|
|  run_training_batch                                                                                                                                             	|  0.097173       	|  25617          	|  2489.3         	|  49.88          	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.096991       	|  25617          	|  2484.6         	|  49.787         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.068138       	|  25617          	|  1745.5         	|  34.976         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.051949       	|  25617          	|  1330.8         	|  26.666         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033514       	|  25617          	|  858.54         	|  17.204         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.0095288      	|  25617          	|  244.1          	|  4.8913         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0086254      	|  25627          	|  221.04         	|  4.4293         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.0085361      	|  25628          	|  218.76         	|  4.3836         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  179.21         	|  1              	|  179.21         	|  3.591          	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0038855      	|  25617          	|  99.535         	|  1.9945         	|

Jamiroquai88 · 2024-01-25T15:57:14Z

But what is strange, when I change the number of workers from 2 to 4 (I have 4 physical cores and 8 threads per GPU), numbers change dramatically. This is develop with dict-based version:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                         	|  Mean duration (s)	|  Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                          	|  -              	|  922420         	|  3529.9         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                             	|  3336.4         	|  1              	|  3336.4         	|  94.519         	|
|  run_training_batch                                                                                                                                             	|  0.10077        	|  25617          	|  2581.5         	|  73.131         	|
|  [LightningModule]PyanNet.optimizer_step                                                                                                                        	|  0.10056        	|  25617          	|  2576.1         	|  72.978         	|
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                   	|  0.054865       	|  25617          	|  1405.5         	|  39.816         	|
|  [LightningModule]PyanNet.configure_gradient_clipping                                                                                                           	|  0.033089       	|  25617          	|  847.63         	|  24.013         	|
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                        	|  0.010313       	|  25617          	|  264.18         	|  7.4841         	|
|  [Strategy]SingleDeviceStrategy.batch_to_device                                                                                                                 	|  0.0102         	|  25627          	|  261.4          	|  7.4054         	|
|  [LightningModule]PyanNet.transfer_batch_to_device                                                                                                              	|  0.010089       	|  25628          	|  258.55         	|  7.3247         	|
|  [LightningModule]PyanNet.prepare_data                                                                                                                          	|  174.65         	|  1              	|  174.65         	|  4.9476         	|
|  [Callback]RichProgressBar.on_train_batch_end                                                                                                                   	|  0.0042494      	|  25617          	|  108.86         	|  3.0839         	|
|  [_TrainingEpochLoop].train_dataloader_next                                                                                                                     	|  0.0038756      	|  25617          	|  99.282         	|  2.8126         	|

hbredin · 2024-01-25T16:12:15Z

Oh. I did not realize you were using such a small number of workers.
I usually use 10 :)

Jamiroquai88 · 2024-01-25T20:12:11Z

I was starting on g4dn.xlarge which has half of the resources of p3.2xlarge, but that drop when increasing the number of workers is huge, I was not expecting that. So far it seems like a sweet spot is the number of threads (usually twice the number of cores).

But since my goal is to train on 26k hours, with the current develop I only get around 2.00it/s (21hrs per epoch), with np.searchsorted 1.85 (23hrs per epoch), while with the dict-based one, I am on exactly the same number as before, 4.55it/s (9hrs14min per epoch).

Since one epoch is taking quite a long time, I can't run the whole profiler, so these are just approximate numbers.

hbredin · 2024-01-28T20:38:53Z

I'd be curious to have a look at your dict-based approach :)

Regarding your last point (long epoch), you could actually use the limit_train_batches option of pytorch-lightning's Trainer to reduce the size of an epoch.

Jamiroquai88 · 2024-01-30T09:30:05Z

I am trying to push my changes to a branch named segments_dict.
But when doing

git push --set-upstream origin segments_dict

I got

ERROR: Permission to pyannote/pyannote-audio.git denied to Jamiroquai88.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

git remote -v

says

origin	git@github.com:pyannote/pyannote-audio.git (fetch)
origin	git@github.com:pyannote/pyannote-audio.git (push)

Sorry to bother you with this. Any ideas about what might be wrong here? ssh key has been added on GitHub for quite some time.

hbredin · 2024-01-30T09:32:27Z

I guess you need to fork the repo, push to your own fork, and open a PR from it?

Jamiroquai88 · 2024-01-30T10:18:26Z

Right, makes sense. Thank you.
#1633

Jamiroquai88 · 2024-02-16T07:29:33Z

Closing this, since this is only critical for large amounts of data.

hbredin · 2024-02-16T08:10:03Z

Re-opening it as I think it is worth looking into it (and also there's your related PR that I still need to have a look at).

Jamiroquai88 closed this as completed Feb 16, 2024

hbredin reopened this Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on a large set is much slower than on a smaller set - proportionally #1624

Training on a large set is much slower than on a smaller set - proportionally #1624

Jamiroquai88 commented Jan 19, 2024

hbredin commented Jan 19, 2024

Jamiroquai88 commented Jan 19, 2024

Jamiroquai88 commented Jan 19, 2024

hbredin commented Jan 19, 2024

Jamiroquai88 commented Jan 19, 2024

hbredin commented Jan 20, 2024 •

edited

Loading

Jamiroquai88 commented Jan 23, 2024

Jamiroquai88 commented Jan 24, 2024 •

edited

Loading

hbredin commented Jan 24, 2024

Jamiroquai88 commented Jan 25, 2024

Jamiroquai88 commented Jan 25, 2024

hbredin commented Jan 25, 2024

Jamiroquai88 commented Jan 25, 2024

hbredin commented Jan 28, 2024

Jamiroquai88 commented Jan 30, 2024

hbredin commented Jan 30, 2024

Jamiroquai88 commented Jan 30, 2024

Jamiroquai88 commented Feb 16, 2024

hbredin commented Feb 16, 2024

Training on a large set is much slower than on a smaller set - proportionally #1624

Training on a large set is much slower than on a smaller set - proportionally #1624

Comments

Jamiroquai88 commented Jan 19, 2024

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

hbredin commented Jan 19, 2024

Jamiroquai88 commented Jan 19, 2024

Jamiroquai88 commented Jan 19, 2024

hbredin commented Jan 19, 2024

Jamiroquai88 commented Jan 19, 2024

hbredin commented Jan 20, 2024 • edited Loading

Jamiroquai88 commented Jan 23, 2024

Jamiroquai88 commented Jan 24, 2024 • edited Loading

hbredin commented Jan 24, 2024

Jamiroquai88 commented Jan 25, 2024

Jamiroquai88 commented Jan 25, 2024

hbredin commented Jan 25, 2024

Jamiroquai88 commented Jan 25, 2024

hbredin commented Jan 28, 2024

Jamiroquai88 commented Jan 30, 2024

hbredin commented Jan 30, 2024

Jamiroquai88 commented Jan 30, 2024

Jamiroquai88 commented Feb 16, 2024

hbredin commented Feb 16, 2024

hbredin commented Jan 20, 2024 •

edited

Loading

Jamiroquai88 commented Jan 24, 2024 •

edited

Loading