Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

Create crowdsourcing project in PyBossa for deduplicating trials #398

Closed
vitorbaptista opened this issue Sep 21, 2016 · 2 comments
Closed

Comments

@vitorbaptista
Copy link
Contributor

vitorbaptista commented Sep 21, 2016

This issue will be done to fix #75 and #76

The task should show the user links to two trials, asking her if they are the same.

@vitorbaptista vitorbaptista added this to the Launch milestone Sep 21, 2016
@vitorbaptista vitorbaptista self-assigned this Sep 21, 2016
vitorbaptista added a commit to opentrials/processors that referenced this issue Sep 29, 2016
@vitorbaptista vitorbaptista changed the title Create crowdsourcing task for deduplicating trials Create crowdsourcing project in PyBossa for deduplicating trials Sep 30, 2016
vitorbaptista added a commit to opentrials/processors that referenced this issue Sep 30, 2016
We're comparing trials two by two. To understand how we create the tasks,
consider a database with trials A, B, C and D. The tasks created will be (A, B),
(B, C) and (C, D). This won't test all possible cases, because they are in the
millions with our current database. This is just an initial pass. With this
logic, we'll create NUMBER_OF_TRIALS - 1 tasks.

There's a challenge here on how to upload this to CrowdCrafting, as it only
allows 300 requests per 15 minutes. In that speed, it'll take more than 10 days
to add the ~330k tasks. Not to say that 330k tasks is already a lot.

We'll need to filter out more trials to make it feasible, specially considering
that deciding if two trials are the same isn't a trivial task.

opentrials/opentrials#398
@pwalsh pwalsh removed this from the Launch milestone Dec 6, 2016
@pwalsh
Copy link
Member

pwalsh commented Feb 22, 2017

@vitorbaptista WONTFIX or doing?

@vitorbaptista
Copy link
Contributor Author

@pwalsh This is already done in opentrials/processors#64, but wasn't merged because there're too many tasks for crowdsourcing. We would need a way to filter the tasks, which we don't have yet. I'll close this and the related PR, but we should revisit it when we have a way to filter for potentially wrongly deduplicated trials.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants