Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scorer option to return per-component scores #12540

Merged
merged 2 commits into from May 12, 2023

Conversation

adrianeboyd
Copy link
Contributor

Description

Add per_component option to Language.evaluate, Scorer.score, and evaluate CLI to return scores keyed by tokenizer (hard-coded) or by component name. For the evaluate CLI per-component scores can only be saved to JSON.

Types of change

Enhancement.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Add `per_component` option to `Language.evaluate` and `Scorer.score` to
return scores keyed by `tokenizer` (hard-coded) or by component name.

Add option to `evaluate` CLI to score by component. Per-component scores
can only be saved to JSON.
@adrianeboyd adrianeboyd added enhancement Feature requests and improvements feat / cli Feature: Command-line interface feat / scorer Feature: Scorer labels Apr 18, 2023
@adrianeboyd
Copy link
Contributor Author

There's no technical reason this has to wait for v3.6, but it seemed nicer for the docs.

Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the current PR correctly, I had envisioned this differently.

Imagine you have two NER's in the pipeline that are supposed to somehow work on top of each other - so NER1 would predict some entities and NER2 would predict some additional entities for yet unassigned tokens.

What I thought we'd do, is either:
a) evaluate the performance of NER1 and NER2 individually on entities, i.e. when run in isolation
b) evaluate the performance in sequence of the pipeline - i.e. after NER1 runs you have a certain performance for entities, and then after that NER2 runs as well, and the performance has changed in some sense.

I thought that approach b) would make most sense as it will give an idea on how much NER2 contributes to additional performance in comparison to only running NER1.

Instead I think the current implementation would basically just show the same NER performance for both components, assuming they're both using the same scorer, right?

spacy/cli/evaluate.py Outdated Show resolved Hide resolved
spacy/cli/evaluate.py Outdated Show resolved Hide resolved
@adrianeboyd
Copy link
Contributor Author

I see this as two features that build on each other:

  1. per-component scoring, e.g. for two separate textcat components, where a final, non-incremental evaluation is fine
  2. incremental scoring, e.g. scoring each component at the point right after it's run in the pipeline, which would have to enable per-component scoring in order to report the results sensibly

The second one is a bit more complicated (since you can't just call Scorer.score for a full pipeline in Language.evaluate any more) and you need the per-component score option, so I implemented the per-component part first as a separate PR.

@svlandeg
Copy link
Member

Gotcha!

@svlandeg svlandeg merged commit 3637148 into explosion:master May 12, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements feat / cli Feature: Command-line interface feat / scorer Feature: Scorer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants