Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and clarify enable/disable behavior of spacy.load() #11459

Merged
merged 13 commits into from
Sep 27, 2022

Conversation

rmitsch
Copy link
Contributor

@rmitsch rmitsch commented Sep 8, 2022

Description

As brought up in #11443, the current behavior of spacy.load() w.r.t. enable, disable and the enabled/disabled options in the config is opaque and inconvenient to work around in case of a conflict. This PR

  • extends the error message to be informative,
  • lets enable/disable values passed to spacy.load() take precedence over config values, i.e. config values are overwritten if function arguments are passed, and
  • adds a warning message in case this (discarding config values because function arguments were passed) happens, since this can have unforeseen consequences for other pipeline components.

Types of change

Enhancement (maybe fix?).

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

… config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments.
@rmitsch rmitsch added enhancement Feature requests and improvements feat / config Feature: Training config labels Sep 8, 2022
@rmitsch rmitsch self-assigned this Sep 8, 2022
@svlandeg svlandeg added the feat / ux Feature: User experience, error messages etc. label Sep 8, 2022
spacy/language.py Show resolved Hide resolved
spacy/language.py Outdated Show resolved Hide resolved
spacy/language.py Outdated Show resolved Hide resolved
spacy/errors.py Outdated Show resolved Hide resolved
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the best solution to me, and at least provides some helpful warnings to guide users to the correct usage of this functionality.

spacy/language.py Outdated Show resolved Hide resolved
spacy/tests/pipeline/test_pipe_methods.py Outdated Show resolved Hide resolved
@rmitsch
Copy link
Contributor Author

rmitsch commented Sep 14, 2022

Hm, it's not great, especially...

I agree, but is it really more correct than checking if it's an empty SimpleFrozenList, as all empty instances should be identical? I don't mind doing it this way, I'm just wondering. The only alternative I can think of is changing the default param to None, which is messy.

@svlandeg
Copy link
Member

svlandeg commented Sep 14, 2022

Hm, it's not great, especially...

I agree, but is it really more correct than checking if it's an empty SimpleFrozenList, as all empty instances should be identical? I don't mind doing it this way, I'm just wondering. The only alternative I can think of is changing the default param to None, which is messy.

Are all empty instances identical? I mean, what if a user specifically calls the function with an empty SimpleFrozenList - couldn't we tell the difference?

Apart from that argument, there's also a worry of maintainability - if we ever do change the default to None (which I don't advocate for right now), then we might not notice that 200 lines further down, an implicit dependency has broken. But this will be more obvious & easier to keep in-sync when we store this default in a var.

@rmitsch
Copy link
Contributor Author

rmitsch commented Sep 15, 2022

Apart from that argument, there's also a worry of maintainability...

Yeah, I agree.

Are all empty instances identical? I mean, what if a user specifically calls the function with an empty SimpleFrozenList - couldn't we tell the difference?

Not via equality, but via identity. The latest commit includes a suggestion for how we could do this, based on your idea of a private global variable and id(). Since this default value DEFAULT_PIPES_STATUS is used in several modules, it's currently not private though, which I'm not fond of. Let me know what you think (name is also up for debate).

spacy/util.py Outdated Show resolved Hide resolved
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Still trying to think of a decent name though. Maybe _DEFAULT_EMPTY_PIPES? We'd never have one default which contained actual pipe names, because then we couldn't reuse this var across disable, enable and exclude.

@svlandeg svlandeg merged commit aea1671 into explosion:master Sep 27, 2022
polm added a commit to polm/spaCy that referenced this pull request Oct 28, 2022
Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in explosion#11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in explosion#11706.

This change just has all the load functions use the singleton value
instead.
adrianeboyd added a commit that referenced this pull request Nov 3, 2022
* Fix default parameters for load functions

Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in #11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in #11706.

This change just has all the load functions use the singleton value
instead.

* Add test that there are no warnings on module-based load

This will succeed due to changes in this branch, but local tests with
the latest release failed as intended.

* Try reverting commit and see if CI changes

There is an error in CI that is probably unrelated.

Revert "Fix default parameters for load functions"

This reverts commit dc46b35.

* Revert "Try reverting commit and see if CI changes"

This reverts commit 2514ed0.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
adrianeboyd added a commit to adrianeboyd/spaCy that referenced this pull request Nov 7, 2022
…osion#11713)

* Fix default parameters for load functions

Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in explosion#11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in explosion#11706.

This change just has all the load functions use the singleton value
instead.

* Add test that there are no warnings on module-based load

This will succeed due to changes in this branch, but local tests with
the latest release failed as intended.

* Try reverting commit and see if CI changes

There is an error in CI that is probably unrelated.

Revert "Fix default parameters for load functions"

This reverts commit dc46b35.

* Revert "Try reverting commit and see if CI changes"

This reverts commit 2514ed0.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
adrianeboyd added a commit to adrianeboyd/spaCy that referenced this pull request Nov 15, 2022
…osion#11713)

* Fix default parameters for load functions

Some load functions used SimpleFrozenList() directly instead of the
_DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but
the changes in explosion#11459 check for equality using identity, not value, so a
warning is incorrectly raised sometimes, as in explosion#11706.

This change just has all the load functions use the singleton value
instead.

* Add test that there are no warnings on module-based load

This will succeed due to changes in this branch, but local tests with
the latest release failed as intended.

* Try reverting commit and see if CI changes

There is an error in CI that is probably unrelated.

Revert "Fix default parameters for load functions"

This reverts commit dc46b35.

* Revert "Try reverting commit and see if CI changes"

This reverts commit 2514ed0.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements feat / config Feature: Training config feat / ux Feature: User experience, error messages etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants