Merge `v9` into `thinc.ai` #931

danieldk · 2024-04-18T10:07:29Z

Description

Warning: do not squash

Types of change

Docs

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

…explosion#743)

Merge `master` into `v9`

* `NumpyOps`: Remove unused/vestigial free functions, reuse functions in `Ops` * Remove superfluous `typedef`

* disable mypy run for Python 3.10 * dot

* `CBlas`: Add `sscalv` * `NumpyOps`: Replace usage of `.linalg` with `numpy` and `BLAS` calls * Remove vestigial/mostly unused `backends.linalg` module * Use BLAS notation for `sscal`, add `dscal`

* `NumpyOps`: Move `blis` detection to `compat` module, replace `blis.cy.gemm` calls with `CBlas` calls * `NumpOps`: Call `self.cblas()` instead of directly instantiating `CBlas` * `CBlas`: Add `dgemm` * `NumpyOps`: Use `CBlas.?gemm` in `gemm`

Sync v9 with the latest from master

@kadarakos

* return logloss instead of squared differrence * check whether to comput binary or categorical loss value * function to apply label smoothing to 2d array * force exclusive classes * formatting * mypy debug * bugfix * compare cross entropy to torch * fix type and error message * updating cross-entropy tests * all categorical crossentropy tests updated * sequence crossentropy test * rearrange if statements * sequence ce negprefix test start * all tests for (sequence) cross entropy * use CategoricalCrossentropy as loss * don't run conversion and validation twice in __call__ * add type for truths in convert_truths (thnx @ richardpaulhudson) * fix one-hot check and no unexpected error branch * cupy support for torch comparison * import floats2d * hopefully right type to pass old torch cross-entropy * nonstrict sum to 1 * typo * remove redundant work for sequential cross entropy * type typo * fix imports * remove misleading comments * assertion for clarity * add back mistakenly removed imports * throw error rather than assert * legacy versions and tests for crossentropy + sequential * type fix * Update thinc/legacy/loss.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * legacy cross-entropy import through registry * no legacy test module * type fix * hacking types for mypy * return type * Update thinc/legacy/loss.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update thinc/legacy/__init__.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * initial functional sparse ce los * separate functionality for SparseCE and CategoricalCrossentropy * fix missing value type * correcting label smoothing param contraint * test new label smooth validation error * less than 0 input validation * string concat * small update to error msg * fix max smoothing coefficient * double check error message * Categorical and Sparse factories and tests * Update thinc/util.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * update test with less strict match * Fix types, pair-hacked with @kadarakos * (Sparse)CategoricalEntropy: support Ragged guesses Since we can encoder sequences as Ragged, this could replace (Sparse)SequenceCategoricalEntropy. * follow updated api * Update thinc/util.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * indent fix * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * remove unnecessary list copy * add type to truths * fix missing assignment * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * rever suggestion * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/legacy/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/tests/test_loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/util.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Update thinc/loss.py Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * move check1d out of loss and more general signature * mypy fix * SparseCE rename Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

…221205

Sync `master` into `v9`

…ion#809) * Bring back support for missing labels to legacy cross entropy * Use `missing_value` to detect missing values * Typing fixes

…er-20221220

…221220 Merge master into v9

@shadeMe

…n#804) * Give schedulers access to the key, step, and last eval score Before this change schedules were generators that generate a value for each training step. This, however has the limitation that scheduler cannot use other information that is available in the optimizer such as the parameter key. This information is useful for e.g. discriminative learning rates, where certain parameters are on a different schedule than others. To accommodate passing additional information, this change converts schedules to callables. These callables are passed the training step, the parameter key, and the last evaluation score (when available). Traditional scalar and generated schedules are converted to callables by the optimizer for compatibility. * Fix use of the `t` parameter where used in the schedules Also add tests, so that doesn't break again. * Fixes from @shadeMe * Call _schedule_args once * Make Optimizer.step private * Fix two missed step uses in tests * Float fix Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Fix schedule call * Move `ScheduleCallable` to `thinc.types` * Move from callables to a `Schedule` class The new learning rate functionality used `Callable`s. However, the issue with callables it that they cannot be pickled. This is problematic, because schedules can end up in spaCy pipelines (e.g. through the optimizer associated with the `Language` object). This change solves this issue by refactoring the schedules into regular objects. This now works similar to Thinc `Model`s -- there is a new `Scheduler` class which can be constructed with composition. I tested the changes with spaCy and pickling as well as usin existing configurations works. * Remove stray `runtime_checkable` import * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add `Schedule.to_generator` This method turns a `Schedule` into a generator by feeding the `Schedule` steps with a given starting step and increment. * Doc fix Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * docs: add default values for Schedule.to_generator * fix anchor Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

…ster-20230112

…20230112 Merge `master` into `v9`

Set version to v9.0.0.dev2

* Add plateau.v1 schedule This schedule yields values from the wrapped schedule, exponentially scaled by the number of times optimization has plateaued. * Fix anchor * Remove stagnant wording in favor of plateaus * Type annotation: last_score is Optional Also set a default value, to that the schedule does not fail when the last_score argument is not provided. * Update docs to clarify that passing last_score is not mandatory * Document plateau arguments

* fix valid label smoothing parameter * remove print * fix typo * ensure number of classes larger than one

…-20230322 Sync v9 with master

…aster-1

…ter-1 Update v9 from master

* Revert "Cross entropy fix (explosion#647)" This reverts commit c8ac07f. * Cherry pick MPS Torch bug to get CI unstuck

…rge-master-20240109

PR explosion#897 fixed the dtypes in strings2arrays, however also broke strings2arrays for batches with sequences if inequal lengths.

The way we used local thread storage before did not typecheck, since we assigned to `Thread`. Thread local storage can be a global variable, the state of this object will be different per thread.

* remove slow marker from basic tagger test * fix strings2array * isort

…aster-20240109 Merge master into v9

* Fix `cupy.cublas` import Reported in explosion#920. * Update mypy to work with recent Torch versions * CI: Do not run MyPy on Python 3.6/3.7.

…-master-20240212

…er-20240212 Merge `master` into `v9`

This change adds `AppleOps` to Thinc, to ensure that the AMX unit is always used on Apple Silicon Macs. Before this change, a user would get much worse performance if they forgot to install `thinc-apple-ops`. The `apple_ops` and `_accelerate` modules are built conditionally. When detecting the best CPU implementation, we rely on a `try...except` import to determine whether Apple ops are available. Even though x86_64 Macs do not have an AMX unit, Accelerate is competitive with BLIS, so it does not hurt to enable Apple ops on all Macs.

* Document AppleOps and MPSOps * Reformat Ops table - Sort alphabetically. - Note that `AppleOps` is new in 9.0. * Missing comma

netlify · 2024-04-18T10:07:32Z

👷 Deploy request for thinc-ai pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`f348090`

shadeMe and others added 30 commits August 17, 2022 17:37

Remove thinc.extra.search module and related tests (moved to spaCy) (…

64967eb

…explosion#743)

Merge branch 'master' into chore/merge-master-into-v9

b0c9be8

Merge pull request explosion#764 from shadeMe/chore/merge-master-into-v9

a80d275

Merge `master` into `v9`

NumpyOps cleanup (explosion#760)

43ef766

* `NumpyOps`: Remove unused/vestigial free functions, reuse functions in `Ops` * Remove superfluous `typedef`

disable mypy run for Python 3.10 (explosion#768) (explosion#769)

17c823e

* disable mypy run for Python 3.10 * dot

Remove vestigial/mostly unused backends.linalg module (explosion#742)

0366934

* `CBlas`: Add `sscalv` * `NumpyOps`: Replace usage of `.linalg` with `numpy` and `BLAS` calls * Remove vestigial/mostly unused `backends.linalg` module * Use BLAS notation for `sscal`, add `dscal`

Merge branch 'master' into update/v9

372ecf5

Merge branch 'master' into update/v9

20d97d9

Merge branch 'master' into update/v9

166f39b

Merge branch 'master' into update/v9

b979a94

Merge pull request explosion#784 from svlandeg/update/v9

9e3acb8

Sync v9 with the latest from master

Merge remote-tracking branch 'upstream/master' into sync/v9-master-20…

9039d67

…221205

Merge pull request explosion#810 from danieldk/sync/v9-master-20221205

c5ad06d

Sync `master` into `v9`

Bring back support for missing labels to legacy cross entropy (explos…

cdc9717

…ion#809) * Bring back support for missing labels to legacy cross entropy * Use `missing_value` to detect missing values * Typing fixes

Set version to v9.0.0.dev0 (explosion#816)

9743709

Fix spurious v prefix in the version number (explosion#818)

07f8f88

Merge remote-tracking branch 'upstream/master' into chores/merge-mast…

4645083

…er-20221220

Merge pull request explosion#826 from danieldk/chores/merge-master-20…

6e91494

…221220 Merge master into v9

Set version to v9.0.0.dev1 (explosion#829)

f6f6c81

Merge remote-tracking branch 'upstream/master' into chore/v9-merge-ma…

9704ad9

…ster-20230112

Merge pull request explosion#840 from danieldk/chore/v9-merge-master-…

ece7eec

…20230112 Merge `master` into `v9`

Set version to v9.0.0.dev2

bbe8f53

Merge pull request explosion#841 from danieldk/chore/bump-v9.0.0.dev2

e58ca10

Set version to v9.0.0.dev2

Smooth one hot fix (explosion#830)

fc24e8a

* fix valid label smoothing parameter * remove print * fix typo * ensure number of classes larger than one

Merge remote-tracking branch 'upstream/master' into v9

82c9151

danieldk and others added 23 commits March 22, 2023 14:30

Merge pull request explosion#867 from danieldk/chore/update-v9-master…

ab9c439

…-20230322 Sync v9 with master

Set version to v9.0.0.dev3 (explosion#868)

bf0e276

Merge remote-tracking branch 'upstream/master' into chore/update-v9-m…

46ceec5

…aster-1

Merge pull request explosion#874 from adrianeboyd/chore/update-v9-mas…

886231f

…ter-1 Update v9 from master

Temporarily revert new loss implementations (explosion#916)

816ea33

* Revert "Cross entropy fix (explosion#647)" This reverts commit c8ac07f. * Cherry pick MPS Torch bug to get CI unstuck

isort

95f894f

Merge remote-tracking branch 'upstream/master' into maintenance/v9-me…

e570a1a

…rge-master-20240109

strings2arrays: make work again for sequences of inequal length

d34f536

PR explosion#897 fixed the dtypes in strings2arrays, however also broke strings2arrays for batches with sequences if inequal lengths.

Fix local thread storage usage and make it typecheck

5c46b82

The way we used local thread storage before did not typecheck, since we assigned to `Thread`. Thread local storage can be a global variable, the state of this object will be different per thread.

Fixup imports that lead to type checking issues

09e9555

Fix strings2array (explosion#918)

6c314d2

* remove slow marker from basic tagger test * fix strings2array * isort

Merge pull request explosion#917 from danieldk/maintenance/v9-merge-m…

bf2e00b

…aster-20240109 Merge master into v9

Set version to v9.0.0.dev4 (explosion#919)

40d4148

Fix cupy.cublas import (explosion#921)

307a4f8

* Fix `cupy.cublas` import Reported in explosion#920. * Update mypy to work with recent Torch versions * CI: Do not run MyPy on Python 3.6/3.7.

Set version to v8.2.3 (explosion#922)

3aae298

Merge remote-tracking branch 'upstream/master' into maintenance/merge…

c1c72f2

…-master-20240212

Merge pull request explosion#923 from danieldk/maintenance/merge-mast…

dec431c

…er-20240212 Merge `master` into `v9`

Set version to 9.0.0.dev5 (explosion#925)

ec68d7d

Set version to 9.0.0.dev6 (explosion#928)

2a0b9c1

Document AppleOps and MPSOps (explosion#929)

ccae258

* Document AppleOps and MPSOps * Reformat Ops table - Sort alphabetically. - Note that `AppleOps` is new in 9.0. * Missing comma

Set version to 9.0.0 (explosion#930)

5be631e

Merge branch 'v9' into thinc.ai

f348090

danieldk added docs Documentation 🔜 v9.0 Related to upcoming v9.0 labels Apr 18, 2024

danieldk merged commit 98a3118 into explosion:thinc.ai Apr 18, 2024
10 checks passed

danieldk deleted the maintenance/merge-v9-thincai branch April 18, 2024 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge `v9` into `thinc.ai` #931

Merge `v9` into `thinc.ai` #931

danieldk commented Apr 18, 2024

netlify bot commented Apr 18, 2024

Merge v9 into thinc.ai #931

Merge v9 into thinc.ai #931

Conversation

danieldk commented Apr 18, 2024

Description

Types of change

Checklist

netlify bot commented Apr 18, 2024

👷 Deploy request for thinc-ai pending review.

Merge `v9` into `thinc.ai` #931

Merge `v9` into `thinc.ai` #931