You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 11, 2023. It is now read-only.
Hi,
I am against publicly releasing the annotations at this point. By having them "hidden" behind the leaderboard evaluation we are in less danger of overfitting on the dataset (or someone "cheating" by looking in the test set). The test set is quite small and sooner or later solutions will start overfitting it.
Having said that, (a) I think that we should eventually release them (e.g. after a year or so) and/or (b) share them with individual when they have a good reason (e.g. an alternate use case) and they verbally agree not to share the testset further and not to use the testset for the CodeSearchNet challenge.
Hi @mallamanis! I understand your reasons to keep the annotations away from curious eyes, especially when the competition just got started. But, anyway I encourage you folks to release them in the near future in order to foster evaluation of NLP techniques applied to search engines.
AFAIK, it's quite difficult to find freely available datasets and annotations to fully evaluate information retrieval systems. TREC collection is one of them but your data collection would definitely add a lot of value for a different domain.
Hi,
Fantastic initiative, thanks a lot :-)
Are you planning to publish the relevance judgements, ie, the 4k expert relevance annotations?
The text was updated successfully, but these errors were encountered: