Data related to the paper "Diagnose This If You Can On the effectiveness of search engines in finding medical self-diagnosis information", by G. Zuccon, B. Koopman, J. Palotti, ECIR 2015.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
id2url-mapping
LICENSE
README.md
bingresults.txt
ecir2015_circumlocation_health_search.pdf
ecir2015_poster.pdf
googleresults.txt
qrels.ecir2015.txt
queries.txt

README.md

Dignose This If You Can

###On the effectiveness of search engines in finding medical self-diagnosis information

Data related to the paper "Diagnose This If You Can: On the effectiveness of search engines in finding medical self-diagnosis information", by G. Zuccon, B. Koopman, J. Palotti, ECIR 2015.

What does this research show?

In this work, we examined poorly formulated self-diagnosis queries and we found that major search engines were providing irrelevant information that could lead to incorrect self-diagnosis, self-treatment and ultimately possible harm.

What is in this repository?

  • ecir2015_circumlocation_health_search.pdf: pre-print of the paper published at ECIR 2015.

  • ecir2015_poster.pdf: the poster of the paper that was presented at the ECIR 2015 conference.

  • queries.txt: the file containing the queries that were used in the evaluation described in the paper. The file is tab separated and contains the query id, the true condition, and the actual query. This format can be directly used with Relevation.

  • bingresults.txt: a TREC style result file containing the results returned by Bing (using the official Bing API) in answers to the queries used in the paper.

  • googleresults.txt: a TREC style result file containing the results returned by Google (using the deprecated Ajax API) in answers to the queries used in the paper.

  • qrels.ecir2015.txt: relevance assessments in TREC format for the queries and result files used in this work.

  • id2url-mapping: mappings from document identifiers (in the format 1.txt, 2.txt etc.) to URLs of the web pages retrieved by the two search engines. For more detials, see the notes below.

Notes

  • Result files are in TREC format. This means that TRECeval can be directly used with these result files. For more information about the TREC result file format, see these notes about TRECeval.

  • Webpages were not crawled; rather, we recorded the link to the webpages the search engines provided and then fetched (but not stored) the pages to show to the relevance assessors using Relevation.

  • TRECeval results and qrels format does not allow the use of URLs as document identifier. This is why we had to create a mapping that allows to assign a document id (e.g. 1.txt) to an URL. These mappings are contained in the folder id2url-mapping. Note that each mapping corresponds to one file; this is because in this way Relevation could be used to directly load these mappings and automatically fetch the corresponding webpage (with a minor modification to Relevation's source code).

Other information about this work

See the dedicated webpage for this work, which provides further details, FAQs, media coverage and follow up.