Skip to content

Add checks (and converters?) for documents with multiple sentences in debug-data #4409

@adrianeboyd

Description

@adrianeboyd

Feature description

The parser section of spacy debug-data should show a warning when there are no/few documents with multiple sentences in the training data.

Potentially add a simple converter to spacy convert to group sentences, similar to -n with the IOB converters. A bit of variety in document lengths is probably a good idea here, too, rather than just -n N, but I don't know if it makes that much difference in the model performance.

Metadata

Metadata

Assignees

Labels

enhancementFeature requests and improvementsfeat / cliFeature: Command-line interface

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions