Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason for random shuffling? #232

Closed
aneesh-joshi opened this issue Jul 8, 2018 · 3 comments
Closed

Reason for random shuffling? #232

aneesh-joshi opened this issue Jul 8, 2018 · 3 comments
Assignees
Labels

Comments

@aneesh-joshi
Copy link
Contributor

https://github.com/faneshion/MatchZoo/blob/50422a5bc973d27b8c507122aa3f41e633366893/matchzoo/metrics/evaluations.py#L19

@faneshion @bwanglzu
I am trying to understand the implementation of these metrics. Cannot understand why there is a random shuffle.

Also, could you point me to a good resource for an implementation of such metrics? I have gone through most found in google.

@thiziri
Copy link

thiziri commented Jul 8, 2018

You can look on trec_eval

@faneshion
Copy link
Member

Shuffle is important for ranking based metrics. For example, if you score all the candidate document with the same value (e.g., 0.0), and your data is organized by label. Then you may get a high MAP/NDCG performance. So, shuffle can reduce this illusion.

@aneesh-joshi
Copy link
Contributor Author

Thank you @thiziri
The trec_eval binary was very useful. I am using it for my purposes.

@faneshion
That sounds like a good reason. Is it incorporated into TREC/ is it standard practice to shuffle before evaluating? I need to establish solid benchmarks and hence need a commony accepted evaluation metric.

Thanks a lot for the prompt replies!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants