Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate with superpops: how is the average calculated? #31

Closed
richelbilderbeek opened this issue Apr 5, 2022 · 2 comments
Closed

evaluate with superpops: how is the average calculated? #31

richelbilderbeek opened this issue Apr 5, 2022 · 2 comments

Comments

@richelbilderbeek
Copy link
Contributor

richelbilderbeek commented Apr 5, 2022

Dear GenoCAE maintainers, hi @cnettel and @kausmees,

As you are back, I have found the following (here discussed from my point of view). Here I submit something I found unexpected. If you also did not expect this, I'd happily create a minimally reproducible example.

When using evaluate with a superpops file, in one of my cases I got the following:

Population num samples f1_score_3 f1_score_5
C 333 0.0000 0.0000
B 334 0.2431 0.0000
A 333 0.4400 0.4996
avg (micro) 1000 0.3100 0.3330

The unexpectedness is in the last line, that suggests to calculate the average, but appears to do different things per column (and I understand for the first column (num_samples) to use a sum there :-) ).

I would expect the averages to be:

Population num samples f1_score_3 f1_score_5
C 333 0.0000 0.0000
B 334 0.2431 0.0000
A 333 0.4400 0.4996
avg (micro) 333 0.2277 0.1665

I checked: these 'averages' are also neither the harmonic nor geometric mean.

What are those values?

If you think these are weird as well, I will happily create a reproducible example. Else, I am happy to learn what these values are :-)

@kausmees
Copy link
Owner

kausmees commented Apr 8, 2022

Hello

The average reported there is the micro-averaged F1 score, calculated by this function: here.

It is calculated globally over the classes based on the total true positives, false negatives and false positives, and can't be derived from the numbers in this table alone. This is why we chose to print it there, whereas the macro-average and weighted average can be calculated from the per-class F1 scores in this table.

I'm not sure of the utility of this measurement for this particular application, in our paper we reported the weighted average F1 score over the classes.

I see how this is confusing though, I will write an explanation in the README to document the behaviour.

Thanks for bringing it up,
K

@kausmees kausmees closed this as completed Apr 8, 2022
@richelbilderbeek
Copy link
Contributor Author

Thanks for clearing that up 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants