Reason for random shuffling? #232

aneesh-joshi · 2018-07-08T18:11:58Z

https://github.com/faneshion/MatchZoo/blob/50422a5bc973d27b8c507122aa3f41e633366893/matchzoo/metrics/evaluations.py#L19

@faneshion @bwanglzu
I am trying to understand the implementation of these metrics. Cannot understand why there is a random shuffle.

Also, could you point me to a good resource for an implementation of such metrics? I have gone through most found in google.

thiziri · 2018-07-08T19:41:44Z

You can look on trec_eval

faneshion · 2018-07-10T20:00:12Z

Shuffle is important for ranking based metrics. For example, if you score all the candidate document with the same value (e.g., 0.0), and your data is organized by label. Then you may get a high MAP/NDCG performance. So, shuffle can reduce this illusion.

aneesh-joshi · 2018-07-13T10:02:46Z

Thank you @thiziri
The trec_eval binary was very useful. I am using it for my purposes.

@faneshion
That sounds like a good reason. Is it incorporated into TREC/ is it standard practice to shuffle before evaluating? I need to establish solid benchmarks and hence need a commony accepted evaluation metric.

Thanks a lot for the prompt replies!

bwanglzu assigned faneshion Jul 8, 2018

bwanglzu added the question label Jul 8, 2018

bwanglzu closed this as completed Jul 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reason for random shuffling? #232

Reason for random shuffling? #232

aneesh-joshi commented Jul 8, 2018

thiziri commented Jul 8, 2018

faneshion commented Jul 10, 2018

aneesh-joshi commented Jul 13, 2018

Reason for random shuffling? #232

Reason for random shuffling? #232

Comments

aneesh-joshi commented Jul 8, 2018

thiziri commented Jul 8, 2018

faneshion commented Jul 10, 2018

aneesh-joshi commented Jul 13, 2018