Skip to content

Releases: marovira/helios-ml

0.2.0

09 May 21:14
Compare
Choose a tag to compare
0.2.0 Pre-release
Pre-release

Updates

  • Fixes the way epochs are numbered. This should ensure that all epoch counts are now consistent with each other regardless of training type.
  • Fixes an issue where training with iterations and gradient accumulation resulted in half iterations being run after training should've stopped.
  • Removes F1, recall, and precision metrics. The implementations were not generic enough to be shipped with Helios.
  • Refactors the MAE implementation to make it more generic in terms of the types of tensors it accepts.
  • Adds a numpy version of MAE.

Full Changelog

0.1.9...0.2.0

0.1.9

07 May 21:56
Compare
Choose a tag to compare
0.1.9 Pre-release
Pre-release

Update

  • Adds a flag to disable the printing of the banner.

Full Changelog

0.1.8...0.1.9

0.1.8

03 May 20:19
Compare
Choose a tag to compare
0.1.8 Pre-release
Pre-release

Updates

  • Allow easy access to the datasets held by the DataModule. Previously there was no direct way of accessing them without having to go through the private members of the DataModule. This complicated certain cases where the length of the dataset was required.
  • Added a way to halt training based on arbitrary conditions. The main use-case for this is to allow the Model sub-classes to halt training when the trained network has converged to a value or if the network is diverging and there's no reason to continue.
  • Addresses a potential crash that occurs whenever training occurs on a None checkpoint path.

Full Changelog

0.1.7...0.1.8

0.1.7

01 May 19:53
Compare
Choose a tag to compare
0.1.7 Pre-release
Pre-release

Updates

  • For iteration training, the global iteration is now updated correctly. Previously it was updated in the middle of the training loop, which caused the progress bar and the log flag passed in to the model after the batch was over to be out of sync with the global iteration count. This has now been addressed by updating the global iteration count at the top of the iteration loop.
  • Removes the callback system from the trainer. Given the current implementation, there's nothing that the callbacks could do that couldn't be performed by overriding the corresponding function in the model or the datamodule.
  • Adds wrappers for printing which allow the user to choose which rank (global or local) the print should happen on.
  • Adds a wrapper for torch.distributed.barrier which works in both distributed and regular contexts.

Full Changelog

0.1.6...0.1.7

0.1.6

26 Apr 22:57
Compare
Choose a tag to compare
0.1.6 Pre-release
Pre-release

Updates

  • Adds a getter for swa_utils.EMA so the underlying network can be easily retrieved.
  • Better import support for the core package. Aliasing the core package and importing sub-modules from it is now supported.
  • The trainer no longer prints duplicate messages when using distributed training.
  • Allows the trainer to populate the registries itself. This provides better support for distributed training that uses spawn.
  • The internal distributed flag for the trainer is now correctly set when invoked through torchrun.

Full Changelog

0.1.5...0.1.6

0.1.5

26 Apr 01:19
Compare
Choose a tag to compare
0.1.5 Pre-release
Pre-release

Updates

  • Fixes an issue where printing/saving were incorrectly called whenever training by iteration used accumulation steps. This was caused by an incorrect guarding of the printing, validation, and saving operations.

Full Changelog

0.1.4...0.1.5

0.1.4

25 Apr 04:37
Compare
Choose a tag to compare
0.1.4 Pre-release
Pre-release

Updates

  • Added a context manager to disable cuDNN benchmark on scope.
  • Fixes an issue where cuDNN is disabled upon entering the validation code but is never re-enabled. This could lead to poor performance after the first validation cycle.

Full Changelog

0.1.3...0.1.4

0.1.3

25 Apr 01:17
Compare
Choose a tag to compare
0.1.3 Pre-release
Pre-release

Updates

  • args and kwargs are now consistently typed throughout the code.
  • Re-works the way strip_training_data works in the model to allow it to be more flexible. The new function is now called trained_state_dict and will return the state of the final trained model. The function accepts arbitrary arguments for further flexibility.
  • Progress bars now restart correctly when training on iteration. Previously the progress bar would restart at 0 instead of using the last saved iteration.
  • Saved checkpoints now have epoch numbers starting at 1 instead of 0.
  • Improved running loss system. The model now contains a table of running losses that is automatically updated from the main loss table and is reset at the end of every iteration cycle.

Full Changelog

0.1.2...0.1.3

0.1.2

16 Apr 22:34
Compare
Choose a tag to compare
0.1.2 Pre-release
Pre-release

Updates

  • When saving a checkpoint, metadata set by the model wasn't being correctly set. This has now been addressed.

Full Changelog

0.1.1...0.1.2

0.1.1

15 Apr 22:16
Compare
Choose a tag to compare
0.1.1 Pre-release
Pre-release

Updates

  • Removes all instances of the name "Pyro" from the code base.
  • Replaces the README with RST instead of markdown. Hopefully this should make Pypi render things better.

Full Changelog

0.1.0...0.1.1