Skip to content

v3.2.1: doc_cleaner component, new Matcher attributes, bug fixes and more

Compare
Choose a tag to compare
@adrianeboyd adrianeboyd released this 07 Dec 16:30
· 932 commits to master since this release
800737b

✨ New features and improvements

  • NEW: doc_cleaner component for removing doc.tensor,doc._._trf_data or other Doc attributes at the end of the pipeline to reduce size of output docs.
  • NEW: ENT_ID and ENT_KB_ID to Matcher pattern attributes.
  • Support kb_id for entities in displaCy from Doc input.
  • Add Span.sents property for spans spanning over more than one sentence.
  • Add EntityRuler.remove to remove patterns by id.
  • Make the Tagger neg_prefix configurable.
  • Use Language.pipe in Language.evaluate for more efficient processing.
  • Test suite updates: move regression tests into core test modules with pytest markers for issue numbers, extend tests for languages with alpha support.

🔴 Bug fixes

  • Fix issue #9638: Make JsonlCorpus path optional again.
  • Fix issue #9654: Fix spancat for empty docs and zero suggestions.
  • Fix issue #9658: Improve error message for incorrect .jsonl paths in EntityRuler.
  • Fix issue #9674: Fix language-specific factory handling in package CLI.
  • Fix issue #9694: Convert labels to strings for README in package CLI.
  • Fix issue #9697: Exclude strings from source vector checks.
  • Fix issue #9701: Allow Scorer.score_spans to handle predicted docs with missing annotation.
  • Fix issue #9722: Initialize parser from reference parse rather than aligned example.
  • Fix issue #9764: Set annotations more efficiently in tagger and morphologizer.

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @danieldk, @DuyguA, @honnibal, @ines, @ljvmiranda921, @narayanacharya6, @nrodnova, @Pantalaymon, @polm, @richardpaulhudson, @svlandeg, @thiippal, @Vishnunkumar