Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add softmax layers and convert MNIST example #184

Merged
merged 17 commits into from
Jan 9, 2023

Conversation

hweom
Copy link
Contributor

@hweom hweom commented Jan 7, 2023

What does this PR accomplish?

  • 馃┕ Bug Fix
  • 馃 Feature
  • 馃Л Architecture

Changes proposed by this PR:

  • Implement Softmax and LogSoftmax layers in the new architecture.
  • Convert MNIST example to use the new architecture.
  • Fix a bug in native backend where Softmax and LogSoftmax were computed without taking the batch structure into account.

Notes to reviewer:

MNIST example results

Old arch

$ time target/debug/example-mnist-classification mnist linear
Last sample: Prediction: 8, Target: 8 | Accuracy 915/1000 = 91.50%
target/debug/example-mnist-classification mnist linear  3.23s user 0.59s system 97% cpu 3.921 total

$ time target/debug/example-mnist-classification mnist conv
Last sample: Prediction: 8, Target: 8 | Accuracy 946/1000 = 94.60%
target/debug/example-mnist-classification mnist conv  6.48s user 0.93s system 65% cpu 11.307 total

New arch

$ time target/debug/example-mnist-classification mnist linear
Last sample: Prediction: 8, Target: 8 | Accuracy 893/1000 = 89.30%
target/debug/example-mnist-classification mnist linear  2.99s user 0.54s system 97% cpu 3.597 total

$ time target/debug/example-mnist-classification mnist conv
Last sample: Prediction: 8, Target: 8 | Accuracy 943/1000 = 94.30%
target/debug/example-mnist-classification mnist conv  5.80s user 1.30s system 68% cpu 10.398 total

(slight accuracy drop for linear net is likely due to initial weight randomization)

馃摐 Checklist

  • Test coverage is excellent
  • All unit tests pass
  • The juice-examples run just fine
  • Documentation is thorough, extensive and explicit

Copy link
Member

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, generally good to go. Variable names are a bit short, especially in iterator closures.

coaster-nn/src/frameworks/native/helper.rs Outdated Show resolved Hide resolved
coaster-nn/src/frameworks/native/helper.rs Outdated Show resolved Hide resolved
coaster-nn/src/frameworks/native/helper.rs Outdated Show resolved Hide resolved
coaster-nn/src/frameworks/native/helper.rs Outdated Show resolved Hide resolved
juice/src/net/loss/negative_log_likelihood.rs Show resolved Hide resolved
@hweom
Copy link
Contributor Author

hweom commented Jan 8, 2023

Variable names are a bit short, especially in iterator closures.

Are there specific instances you think are worth improving?

I guess the native backend softmax* and log_softmax* functions? I tried to be minimally invasive there, but maybe it's worth a more thorough cleanup?

@drahnr
Copy link
Member

drahnr commented Jan 9, 2023

Variable names are a bit short, especially in iterator closures.

Are there specific instances you think are worth improving?

I guess the native backend softmax* and log_softmax* functions? I tried to be minimally invasive there, but maybe it's worth a more thorough cleanup?

It doesn't make sense to do renames here, after a second look I think they're sufficiently contained (usually only spanning up to a handful of lines) that we can leave them as they are to keep the changeset small.

@drahnr drahnr merged commit 1a6a820 into fff-rs:arch-refactor Jan 9, 2023
hweom added a commit to hweom/juice that referenced this pull request Feb 25, 2024
* Move Convolution workspace into context

* Formatting fixes

* Fixed unit tests

* Partial implementation of the Convolution layer

* Implement the remaining parts for Convolution layer

* Implement dropout and pooling layers

* Fix CUDA tensor descriptor size error and adjust layer testing infra

* Extended debug output for layers with custom Debug impl

* Changed mnist example to the new architecture

* Plumbed the momentum arg in the mnist example

* Implemented softmax and logsoftmax layers

* Remove unnecessary NLL parameter and fix mnist example

* Fix native backend softmax and logsoftmax grad computation

* Changed slicing syntax in native backend softmax functions

Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com>
hweom added a commit to hweom/juice that referenced this pull request Mar 10, 2024
* Move Convolution workspace into context

* Formatting fixes

* Fixed unit tests

* Partial implementation of the Convolution layer

* Implement the remaining parts for Convolution layer

* Implement dropout and pooling layers

* Fix CUDA tensor descriptor size error and adjust layer testing infra

* Extended debug output for layers with custom Debug impl

* Changed mnist example to the new architecture

* Plumbed the momentum arg in the mnist example

* Implemented softmax and logsoftmax layers

* Remove unnecessary NLL parameter and fix mnist example

* Fix native backend softmax and logsoftmax grad computation

* Changed slicing syntax in native backend softmax functions

Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com>
drahnr added a commit that referenced this pull request Mar 15, 2024
* Fix coaster UI tests (rustc error messages changed in 1.62 (#172)

* Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170)

* Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic

* Check output matrix dims in GEMM; fix corresponding Linear layer logic

* Update coaster-blas/src/frameworks/cuda/helper.rs

* Fix merge mistake in commit 6952a49 (#173)

* doc: clarify remote test (#175)

* bump rust-bindgen to 0.60.1, bump cargo lock file (#174)

* build(deps): bump capnp from 0.14.9 to 0.14.11 (#179)

Bumps [capnp](https://github.com/capnproto/capnproto-rust) from 0.14.9 to 0.14.11.
- [Release notes](https://github.com/capnproto/capnproto-rust/releases)
- [Commits](capnproto/capnproto-rust@capnp-v0.14.9...capnp-v0.14.11)

---
updated-dependencies:
- dependency-name: capnp
  dependency-type: direct:production
...

* build(deps): bump tokio from 1.21.0 to 1.23.1 (#183)

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.0 to 1.23.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.21.0...tokio-1.23.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
...

* build(deps): bump bumpalo from 3.11.0 to 3.12.0 (#187)

Bumps [bumpalo](https://github.com/fitzgen/bumpalo) from 3.11.0 to 3.12.0.
- [Release notes](https://github.com/fitzgen/bumpalo/releases)
- [Changelog](https://github.com/fitzgen/bumpalo/blob/main/CHANGELOG.md)
- [Commits](fitzgen/bumpalo@3.11.0...3.12.0)

---
updated-dependencies:
- dependency-name: bumpalo
  dependency-type: indirect
...

* build(deps): bump tokio from 1.23.1 to 1.24.2 (#191)

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.23.1 to 1.24.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
...

* Now also saves bias layers (#193)

* build(deps): bump openssl from 0.10.41 to 0.10.48

Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.41 to 0.10.48.
- [Release notes](https://github.com/sfackler/rust-openssl/releases)
- [Commits](sfackler/rust-openssl@openssl-v0.10.41...openssl-v0.10.48)

updated-dependencies:
- dependency-name: openssl
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Do not pass batch_size to cudnnGetRNNParamsSize().

* Add a feature for deterministic (pseudo)randomizing.

* New network architecture pieces: Layer, Descriptor, Context, Network (#165)

* New network architecture pieces: Layer, Descriptor, Context, Network

* Update juice/src/net/descriptor.rs

* Implement Sequential layer for the new architecture (#168)

* Implement Sequential layer

* Fix coaster UI tests (rustc error messages changed in 1.62 (#172)

* Fix Linear layer bias gradient computation; add size checks to CUDA functions (#170)

* Assert the correct tensor sizes in copy() and gemm(); fix related Linear logic

* Check output matrix dims in GEMM; fix corresponding Linear layer logic

* Update coaster-blas/src/frameworks/cuda/helper.rs

* More ergonomic net creation and fallible Sequential constructor

* Fix merge mistake in commit 6952a49

* Add a few more layers to the new architecture (#176)

* Add trainer subsystem with SGD and Adam optimizers (#177)

* Coaster convolution API cleanup (#178)

* Move Convolution workspace into context

* Implement Convolution, Dropout and Pooling layers (#180)

* Move Convolution workspace into context

* Formatting fixes

* Fixed unit tests

* Partial implementation of the Convolution layer

* Implement the remaining parts for Convolution layer

* Implement dropout and pooling layers

* Fix CUDA tensor descriptor size error and adjust layer testing infra

* Extended debug output for layers with custom Debug impl

* Add softmax layers and convert MNIST example (#184)

* Move Convolution workspace into context

* Formatting fixes

* Fixed unit tests

* Partial implementation of the Convolution layer

* Implement the remaining parts for Convolution layer

* Implement dropout and pooling layers

* Fix CUDA tensor descriptor size error and adjust layer testing infra

* Extended debug output for layers with custom Debug impl

* Changed mnist example to the new architecture

* Plumbed the momentum arg in the mnist example

* Implemented softmax and logsoftmax layers

* Remove unnecessary NLL parameter and fix mnist example

* Fix native backend softmax and logsoftmax grad computation

* Changed slicing syntax in native backend softmax functions

* Convert juice benchtests to Criterion (#192)

* Convert Juice benchmarks to Criterion

* Add newline at the end of Cargo.toml

* Made Layer operations return a Result (#186)

* Made Layer operations return a Result

* Change LayerError to contain Boxes

* Update benchmarks for new layer API

* Simplify new_rnn_config()

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Mikhail Balakhno <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: opfromthestart <opfromthestart@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants