f1-score evaluation #9

p16i · 2018-10-25T11:36:48Z

Hi,

While working on #8, it seems to me that the evaluation of f-score is based on flatten true and pred labels. For example, given 2 samples whose lengths are 7 and 20. The current code flatten the labels to shape (27,) and compute the score. However, I think it could overestimate the value.

To illustrate, I've made a notebook using random data. You can see in there that the avg f-score is slightly lower than the f-score from the flatten data.

Looking forward to your thought on this.

The text was updated successfully, but these errors were encountered:

p16i changed the title ~~Issues in evaluation~~ Issue in f1-score evaluation Oct 25, 2018

p16i changed the title ~~Issue in f1-score evaluation~~ f1-score evaluation Oct 25, 2018

p16i mentioned this issue May 7, 2019

Precision of each word segmentation engine. PyThaiNLP/pythainlp#62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

f1-score evaluation #9

f1-score evaluation #9

p16i commented Oct 25, 2018

f1-score evaluation #9

f1-score evaluation #9

Comments

p16i commented Oct 25, 2018