Add scorer option to return per-component scores #12540

adrianeboyd · 2023-04-18T08:42:18Z

Description

Add per_component option to Language.evaluate, Scorer.score, and evaluate CLI to return scores keyed by tokenizer (hard-coded) or by component name. For the evaluate CLI per-component scores can only be saved to JSON.

Types of change

Enhancement.

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

Add `per_component` option to `Language.evaluate` and `Scorer.score` to return scores keyed by `tokenizer` (hard-coded) or by component name. Add option to `evaluate` CLI to score by component. Per-component scores can only be saved to JSON.

adrianeboyd · 2023-04-18T08:43:08Z

There's no technical reason this has to wait for v3.6, but it seemed nicer for the docs.

svlandeg

If I understand the current PR correctly, I had envisioned this differently.

Imagine you have two NER's in the pipeline that are supposed to somehow work on top of each other - so NER1 would predict some entities and NER2 would predict some additional entities for yet unassigned tokens.

What I thought we'd do, is either:
a) evaluate the performance of NER1 and NER2 individually on entities, i.e. when run in isolation
b) evaluate the performance in sequence of the pipeline - i.e. after NER1 runs you have a certain performance for entities, and then after that NER2 runs as well, and the performance has changed in some sense.

I thought that approach b) would make most sense as it will give an idea on how much NER2 contributes to additional performance in comparison to only running NER1.

Instead I think the current implementation would basically just show the same NER performance for both components, assuming they're both using the same scorer, right?

spacy/cli/evaluate.py

adrianeboyd · 2023-04-24T15:29:54Z

I see this as two features that build on each other:

per-component scoring, e.g. for two separate textcat components, where a final, non-incremental evaluation is fine
incremental scoring, e.g. scoring each component at the point right after it's run in the pipeline, which would have to enable per-component scoring in order to report the results sensibly

The second one is a bit more complicated (since you can't just call Scorer.score for a full pipeline in Language.evaluate any more) and you need the per-component score option, so I implemented the per-component part first as a separate PR.

svlandeg · 2023-04-24T15:40:18Z

Gotcha!

adrianeboyd added enhancement Feature requests and improvements feat / cli Feature: Command-line interface feat / scorer Feature: Scorer labels Apr 18, 2023

svlandeg reviewed Apr 24, 2023

View reviewed changes

spacy/cli/evaluate.py Outdated Show resolved Hide resolved

spacy/cli/evaluate.py Outdated Show resolved Hide resolved

Update help text and messages

855e3ea

svlandeg merged commit 3637148 into explosion:master May 12, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scorer option to return per-component scores #12540

Add scorer option to return per-component scores #12540

adrianeboyd commented Apr 18, 2023

adrianeboyd commented Apr 18, 2023

svlandeg left a comment

adrianeboyd commented Apr 24, 2023

svlandeg commented Apr 24, 2023

Add scorer option to return per-component scores #12540

Add scorer option to return per-component scores #12540

Conversation

adrianeboyd commented Apr 18, 2023

Description

Types of change

Checklist

adrianeboyd commented Apr 18, 2023

svlandeg left a comment

Choose a reason for hiding this comment

adrianeboyd commented Apr 24, 2023

svlandeg commented Apr 24, 2023