-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Dutch noun chunks to skip overlapping spans #11275
Merged
svlandeg
merged 3 commits into
explosion:master
from
adrianeboyd:bugfix/nl-noun-chunks-10846
Aug 10, 2022
Merged
Fix Dutch noun chunks to skip overlapping spans #11275
svlandeg
merged 3 commits into
explosion:master
from
adrianeboyd:bugfix/nl-noun-chunks-10846
Aug 10, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
adrianeboyd
added
bug
Bugs and behaviour differing from documentation
lang / nl
Dutch language data and models
labels
Aug 8, 2022
3 tasks
svlandeg
reviewed
Aug 9, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks good!
svlandeg
reviewed
Aug 9, 2022
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
diyclassics
pushed a commit
to diyclassics/spaCy
that referenced
this pull request
Aug 29, 2022
* Add test for overlapping noun chunks * Skip overlapping noun chunks * Update spacy/tests/lang/nl/test_noun_chunks.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
svlandeg
added a commit
that referenced
this pull request
Aug 30, 2022
* Add lang folder for la (Latin) * Add Latin lang classes * Add minimal tokenizer exceptions * Add minimal stopwords * Add minimal lex_attrs * Update stopwords, tokenizer exceptions * Add la tests; register la_tokenizer in conftest.py * Update spacy/lang/la/lex_attrs.py Remove duplicate form in Latin lex_attrs Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update natto-py version spec (#11222) * Update natto-py version spec * Update setup.cfg Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add scorer to textcat API docs config settings (#11263) * Update docs for pipeline initialize() methods (#11221) * Update documentation for dependency parser * Update documentation for trainable_lemmatizer * Update documentation for entity_linker * Update documentation for ner * Update documentation for morphologizer * Update documentation for senter * Update documentation for spancat * Update documentation for tagger * Update documentation for textcat * Update documentation for tok2vec * Run prettier on edited files * Apply similar changes in transformer docs * Remove need to say annotated example explicitly I removed the need to say "Must contain at least one annotated Example" because it's often a given that Examples will contain some gold-standard annotation. * Run prettier on transformer docs * chore: add 'concepCy' to spacy universe (#11255) * chore: add 'concepCy' to spacy universe * docs: add 'slogan' to concepCy * Support full prerelease versions in the compat table (#11228) * Support full prerelease versions in the compat table * Fix types * adding spans to doc_annotation in Example.to_dict (#11261) * adding spans to doc_annotation in Example.to_dict * to_dict compatible with from_dict: tuples instead of spans * use strings for label and kb_id * Simplify test * Update data formats docs Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix regex invalid escape sequences (#11276) * Add W605 to the errors raised by flake8 in the CI (#11283) * Clean up automated label-based issue handling (#11284) * Clean up automated label-based issue handline 1. upgrade tiangolo/issue-manager to latest 2. move needs-more-info to tiangolo 3. change needs-more-info close time to 7 days 4. delete old needs-more-info config * Use old, longer message * Fix label name * Fix Dutch noun chunks to skip overlapping spans (#11275) * Add test for overlapping noun chunks * Skip overlapping noun chunks * Update spacy/tests/lang/nl/test_noun_chunks.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Docs: displaCy documentation - data types, `parse_{deps,ents,spans}`, spans example (#10950) * add in spans example and parse references * rm autoformatter * rm extra ents copy * TypedDict draft * type fixes * restore non-documentation files * docs update * fix spans example * fix hyperlinks * add parse example * example fix + argument fix * fix api arg in docs * fix bad variable replacement * fix spacing in style Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * fix spacing on table * fix spacing on table * rm temp files Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * include span_ruler for default warning filter (#11333) * Add uk pipelines to website (#11332) * Check for . in factory names (#11336) * Make fixes for PR #11349 * Fix roman numeral coverage in #11349 Co-authored-by: Patrick J. Burns <patricks@diyclassics.org> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com> Co-authored-by: Jules Belveze <32683010+JulesBelveze@users.noreply.github.com> Co-authored-by: stefawolf <wlf.ste@gmail.com> Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com> Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fix Dutch noun chunks to skip overlapping spans.
Types of change
Bug fix.
Checklist