###On the effectiveness of search engines in finding medical self-diagnosis information
Data related to the paper "Diagnose This If You Can: On the effectiveness of search engines in finding medical self-diagnosis information", by G. Zuccon, B. Koopman, J. Palotti, ECIR 2015.
In this work, we examined poorly formulated self-diagnosis queries and we found that major search engines were providing irrelevant information that could lead to incorrect self-diagnosis, self-treatment and ultimately possible harm.
-
ecir2015_circumlocation_health_search.pdf: pre-print of the paper published at ECIR 2015.
-
ecir2015_poster.pdf: the poster of the paper that was presented at the ECIR 2015 conference.
-
queries.txt: the file containing the queries that were used in the evaluation described in the paper. The file is tab separated and contains the query id, the true condition, and the actual query. This format can be directly used with Relevation.
-
bingresults.txt: a TREC style result file containing the results returned by Bing (using the official Bing API) in answers to the queries used in the paper.
-
googleresults.txt: a TREC style result file containing the results returned by Google (using the deprecated Ajax API) in answers to the queries used in the paper.
-
qrels.ecir2015.txt: relevance assessments in TREC format for the queries and result files used in this work.
-
id2url-mapping: mappings from document identifiers (in the format 1.txt, 2.txt etc.) to URLs of the web pages retrieved by the two search engines. For more detials, see the notes below.
-
Result files are in TREC format. This means that TRECeval can be directly used with these result files. For more information about the TREC result file format, see these notes about TRECeval.
-
Webpages were not crawled; rather, we recorded the link to the webpages the search engines provided and then fetched (but not stored) the pages to show to the relevance assessors using Relevation.
-
TRECeval results and qrels format does not allow the use of URLs as document identifier. This is why we had to create a mapping that allows to assign a document id (e.g. 1.txt) to an URL. These mappings are contained in the folder id2url-mapping. Note that each mapping corresponds to one file; this is because in this way Relevation could be used to directly load these mappings and automatically fetch the corresponding webpage (with a minor modification to Relevation's source code).
See the dedicated webpage for this work, which provides further details, FAQs, media coverage and follow up.