Add trainer subsystem with SGD and Adam optimizers #177

hweom · 2022-10-23T18:26:26Z

What does this PR accomplish?

Adds trainer subsystem with SGD and Adam optimizers.

🦚 Feature
🧭 Architecture

📜 Checklist

Test coverage is excellent
All unit tests pass
Documentation is thorough, extensive and explicit

juice/src/train/optimizer/sgd_momentum.rs

juice/src/train/optimizer/adam.rs

juice/src/train/mod.rs

drahnr

A few nits, looks good! We should eventually talk about how to abstract over multiple data types.

Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Fix coaster UI tests (rustc error messages changed in 1.62 (#172) * Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170) * Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic * Check output matrix dims in GEMM; fix corresponding Linear layer logic * Update coaster-blas/src/frameworks/cuda/helper.rs * Fix merge mistake in commit 6952a49 (#173) * doc: clarify remote test (#175) * bump rust-bindgen to 0.60.1, bump cargo lock file (#174) * build(deps): bump capnp from 0.14.9 to 0.14.11 (#179) Bumps [capnp](https://github.com/capnproto/capnproto-rust) from 0.14.9 to 0.14.11. - [Release notes](https://github.com/capnproto/capnproto-rust/releases) - [Commits](capnproto/capnproto-rust@capnp-v0.14.9...capnp-v0.14.11) --- updated-dependencies: - dependency-name: capnp dependency-type: direct:production ... * build(deps): bump tokio from 1.21.0 to 1.23.1 (#183) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.0 to 1.23.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.21.0...tokio-1.23.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production ... * build(deps): bump bumpalo from 3.11.0 to 3.12.0 (#187) Bumps [bumpalo](https://github.com/fitzgen/bumpalo) from 3.11.0 to 3.12.0. - [Release notes](https://github.com/fitzgen/bumpalo/releases) - [Changelog](https://github.com/fitzgen/bumpalo/blob/main/CHANGELOG.md) - [Commits](fitzgen/bumpalo@3.11.0...3.12.0) --- updated-dependencies: - dependency-name: bumpalo dependency-type: indirect ... * build(deps): bump tokio from 1.23.1 to 1.24.2 (#191) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.23.1 to 1.24.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/commits) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production ... * Now also saves bias layers (#193) * build(deps): bump openssl from 0.10.41 to 0.10.48 Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.41 to 0.10.48. - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](sfackler/rust-openssl@openssl-v0.10.41...openssl-v0.10.48) updated-dependencies: - dependency-name: openssl dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Do not pass batch_size to cudnnGetRNNParamsSize(). * Add a feature for deterministic (pseudo)randomizing. * New network architecture pieces: Layer, Descriptor, Context, Network (#165) * New network architecture pieces: Layer, Descriptor, Context, Network * Update juice/src/net/descriptor.rs * Implement Sequential layer for the new architecture (#168) * Implement Sequential layer * Fix coaster UI tests (rustc error messages changed in 1.62 (#172) * Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170) * Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic * Check output matrix dims in GEMM; fix corresponding Linear layer logic * Update coaster-blas/src/frameworks/cuda/helper.rs * More ergonomic net creation and fallible Sequential constructor * Fix merge mistake in commit 6952a49 * Add a few more layers to the new architecture (#176) * Add trainer subsystem with SGD and Adam optimizers (#177) * Coaster convolution API cleanup (#178) * Move Convolution workspace into context * Implement Convolution, Dropout and Pooling layers (#180) * Move Convolution workspace into context * Formatting fixes * Fixed unit tests * Partial implementation of the Convolution layer * Implement the remaining parts for Convolution layer * Implement dropout and pooling layers * Fix CUDA tensor descriptor size error and adjust layer testing infra * Extended debug output for layers with custom Debug impl * Add softmax layers and convert MNIST example (#184) * Move Convolution workspace into context * Formatting fixes * Fixed unit tests * Partial implementation of the Convolution layer * Implement the remaining parts for Convolution layer * Implement dropout and pooling layers * Fix CUDA tensor descriptor size error and adjust layer testing infra * Extended debug output for layers with custom Debug impl * Changed mnist example to the new architecture * Plumbed the momentum arg in the mnist example * Implemented softmax and logsoftmax layers * Remove unnecessary NLL parameter and fix mnist example * Fix native backend softmax and logsoftmax grad computation * Changed slicing syntax in native backend softmax functions * Convert juice benchtests to Criterion (#192) * Convert Juice benchmarks to Criterion * Add newline at the end of Cargo.toml * Made Layer operations return a Result (#186) * Made Layer operations return a Result * Change LayerError to contain Boxes * Update benchmarks for new layer API * Simplify new_rnn_config() Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <bernhard@ahoi.io> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: opfromthestart <opfromthestart@gmail.com>

Mikhail Balakhno added 2 commits October 23, 2022 11:24

Add trainer subsystem with SGD and Adam optimizers

c4c9851

Trivial cleanups

784cb40

drahnr reviewed Oct 24, 2022

View reviewed changes

juice/src/train/optimizer/sgd_momentum.rs Show resolved Hide resolved

drahnr reviewed Oct 24, 2022

View reviewed changes

juice/src/train/optimizer/adam.rs Show resolved Hide resolved

drahnr reviewed Oct 24, 2022

View reviewed changes

juice/src/train/mod.rs Show resolved Hide resolved

drahnr approved these changes Oct 24, 2022

View reviewed changes

drahnr added 2 commits October 24, 2022 17:34

Update juice/src/train/optimizer/adam.rs

7f9a90f

Update juice/src/train/optimizer/sgd_momentum.rs

bbfdd61

drahnr merged commit 0654e6f into fff-rs:arch-refactor Oct 25, 2022

hweom deleted the arch-refactor branch November 5, 2022 18:29

hweom added a commit to hweom/juice that referenced this pull request Feb 25, 2024

Add trainer subsystem with SGD and Adam optimizers (fff-rs#177)

6e8b558

Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

hweom added a commit to hweom/juice that referenced this pull request Mar 10, 2024

Add trainer subsystem with SGD and Adam optimizers (fff-rs#177)

caf87f2

Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com> Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trainer subsystem with SGD and Adam optimizers #177

Add trainer subsystem with SGD and Adam optimizers #177

hweom commented Oct 23, 2022

drahnr left a comment

Add trainer subsystem with SGD and Adam optimizers #177

Add trainer subsystem with SGD and Adam optimizers #177

Conversation

hweom commented Oct 23, 2022

What does this PR accomplish?

📜 Checklist

drahnr left a comment

Choose a reason for hiding this comment