Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

account for NER labels with a hyphen in the name #10960

Merged
merged 8 commits into from
Jun 17, 2022

Conversation

svlandeg
Copy link
Member

@svlandeg svlandeg commented Jun 13, 2022

Description

This came up when running spacy debug data on data from AnEM, which contains an entity label "Multi-tissue_structure". The ner pipe would parse this as "Multi" and the debug data command would do the same internally, because both were using

label.split("-")[1]

instead of

label.split("-", 1)[1]

when stripping of the BILU prefix (e.g. "B-")

Types of change

bug fix

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@svlandeg svlandeg added bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer labels Jun 13, 2022
spacy/training/iob_utils.py Outdated Show resolved Hide resolved
spacy/training/iob_utils.py Outdated Show resolved Hide resolved
spacy/tokens/doc.pyx Outdated Show resolved Hide resolved
spacy/tokens/doc.pyx Outdated Show resolved Hide resolved
@svlandeg svlandeg merged commit eaeca5e into explosion:master Jun 17, 2022
@svlandeg svlandeg deleted the fix/ner-labels branch June 17, 2022 19:02
danieldk added a commit that referenced this pull request Jun 27, 2022
* Add "Aim-spaCy" to spaCy Universe (#10943)

* Add Aim-spaCy to spaCy universe

* Update Aim thumbnail

* Fix author links

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Auto-format code with black (#10945)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* precomputable_biaffine: avoid concatenation (#10911)

The `forward` of `precomputable_biaffine` performs matrix multiplication
and then `vstack`s the result with padding. This creates a temporary
array used for the output of matrix concatenation.

This change avoids the temporary by pre-allocating an array that is
large enough for the output of matrix multiplication plus padding and
fills the array in-place.

This gave me a small speedup (a bit over 100 WPS) on de_core_news_lg on
M1 Max (after changing thinc-apple-ops to support in-place gemm as BLIS
does).

* Add failing test: `test_matcher_extension_in_set_predicate` (#10948)

* vectors: remove use of float as row number (#10955)

The float -1 was returned rather than the integer -1 as the row for
unknown keys. This doesn't introduce a realy bug, since such floats
cast (without issues) to int in the conversion to NumPy arrays. Still,
it's nice to to do the correct thing :).

* Update for CBlas changes in Thinc 8.1.0.dev2 (#10970)

* Workaround for Typer optional default values with Python calls (#10788)

* Workaround for Typer optional default values with Python calls: added test and workaround.

* @rmitsch Workaround for Typer optional default values with Python calls: reverting some black formatting changes.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* @rmitsch Workaround for Typer optional default values with Python calls: removing return type hint.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Workaround for Typer optional default values with Python calls: fixed imports, added GitHub issue marker.

* Workaround for Typer optional default values with Python calls: removed forcing of default values for optional arguments in init_config_cli(). Added default values for init_config(). Synchronized default values for init_config_cli() and init_config().

* Workaround for Typer optional default values with Python calls: removed unused import.

* Workaround for Typer optional default values with Python calls: fixed usage of optimize in init_config_cli().

* Workaround for Typer optional default values with Pythhon calls: remove output_file from InitDefaultValues.

* Workaround for Typer optional default values with Python calls: rename class for default init values.

* Workaround for Typer optional default values with Python calls: remove newline.

* remove introduced newlines

* Remove test_init_config_from_python_without_optional_args().

* remove leftover import

* reformat import

* remove duplicate

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Made _initialize_X() methods private. (#10978)

* Auto-format code with black (#10977)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* account for NER labels with a hyphen in the name (#10960)

* account for NER labels with a hyphen in the name

* cleanup

* fix docstring

* add return type to helper method

* shorter method and few more occurrences

* user helper method across repo

* fix circular import

* partial revert to avoid circular import

* `enable` argument for spacy.load() (#10784)

* Enable flag on spacy.load: foundation for include, enable arguments.

* Enable flag on spacy.load: fixed tests.

* Enable flag on spacy.load: switched from pretrained model to empty model with added pipes for tests.

* Enable flag on spacy.load: switched to more consistent error on misspecification of component activity. Test refactoring. Added  to default config.

* Enable flag on spacy.load: added support for fields not in pipeline.

* Enable flag on spacy.load: removed serialization fields from supported fields.

* Enable flag on spacy.load: removed 'enable' from config again.

* Enable flag on spacy.load: relaxed checks in _resolve_component_activation_status() to allow non-standard pipes.

* Enable flag on spacy.load: fixed relaxed checks for _resolve_component_activation_status() to allow non-standard pipes. Extended tests.

* Enable flag on spacy.load: comments w.r.t. resolution workarounds.

* Enable flag on spacy.load: remove include fields. Update website docs.

* Enable flag on spacy.load: updates w.r.t. changes in master.

* Implement Doc.from_json(): update docstrings.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): remove newline.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Implement Doc.from_json(): change error message for E1038.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Enable flag on spacy.load: wrapped docstring for _resolve_component_status() at 80 chars.

* Enable flag on spacy.load: changed exmples for enable flag.

* Remove newline.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix docstring for Language._resolve_component_status().

* Rename E1038 to E1042.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* add counts to verbose list of NER labels (#10957)

* Update linguistic-features.md (#10993)

Change link for downloading fasttext word vectors

* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras (#10904)

* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras

Also test with thinc-apple-ops that is at least 0.1.0.dev0.

* Check thinc-apple-ops on macOS with Python 3.10

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Use `pip install --pre` for installing thinc-apple-ops in CI

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Gor Arakelyan <gor19973010@gmail.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Victoria <80417010+victorialslocum@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / ner Feature: Named Entity Recognizer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants