Skip to content
This repository has been archived by the owner on Jan 29, 2022. It is now read-only.

Investigate how to add attribution notices of a trial data #316

Closed
vitorbaptista opened this issue Aug 17, 2016 · 4 comments
Closed

Investigate how to add attribution notices of a trial data #316

vitorbaptista opened this issue Aug 17, 2016 · 4 comments
Assignees

Comments

@vitorbaptista
Copy link
Contributor

vitorbaptista commented Aug 17, 2016

At a minimum, we should simply be able to say something like:

This trial contains data contributed by John Doe and Mary Smith.

But ideally we should be able to point out what each person contributed, as in the bottom-right of the mockup:

original mockup

Investigate on what's necessary to implement both versions.

@georgiana-b
Copy link
Contributor

@vitorbaptista I noticed in the mock-up we credit both sources and people. Are we also interested to credit sources (registries) for the info taken from there?

Also, looking at the mock-up it seems that the credits given to people are for data contributions. Is that right? Is that all?

@vitorbaptista
Copy link
Contributor Author

@georgiana-b Usually people would contribute data, but it could also be a link (for example, someone could contribute a two-way table linking trials with publications)

We could potentially credit the registries as well, depending on how complicated it would be.

@georgiana-b
Copy link
Contributor

georgiana-b commented Jan 16, 2017

For the user-provided data contributions I have some ideas depending on the mapping provided in #628.

  1. If we only turn data contributions into documents, publications, records and anything with a source_url then we can just query in the Explorer for data_contributions that have as url or data_url the source_url of the documents that are sent in the API response. If we find any, we get the details about the users who submitted them and display them:
    {document_name/publication_name} provided by {user_name} in [this contribution]({source_url})
    
    This sounds good for a first step as it doesn't require many changes. For the databases we just need a constraint on data_contributions to make sure we always have at least one of the URLs.
2. If we turn data contributions into other entities as well e.g. `interventions`, `trials_publications` (as mentioned in the comment above) then the smoothest way I see is to add a `data_contributions` table in API `database` too that will contain most of the same fields and their values from it's `explorer` counterpart (marked below with X).
Explorer data contrib API data contrib
id UUID PRIMARY KEY NOT NULL, X
user_id UUID,
trial_id UUID,
data_url VARCHAR(255), X
comments TEXT, X
created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL, X
updated_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL, X
approved BOOLEAN,
curation_comments TEXT, X
url VARCHAR(255), X
data_category_id INTEGER,

A data_contribution can be linked to multiple entities. I noticed that currently data_contributions have a unique constraint on url so I assume we want to avoid having multiple contributions for the same entity. This means our API data_contributions table will have a has_many relation with all entities we allow contributions for so we just need to add a nullable data_contribution_id foreign key to all "contribuable" tables.

This way we can send in the trial API response the data_contribution_id for all entities that have it. To make it clear what piece of information was contributed, we should also add entities to our API response that are not currently exposed. For example, for a trial which was linked to a publication via a data contribution, we should add the trials_publications info to our API response:

"trials_publications": [
    {
        "publication_id": "{publication_id}",
        "data_contribution_id": "{data_contribution_id}"
    }
]

Then in the Explorer we can collect all entities from the API response that have a data_contribution_id, fetch their data_contribution and user from the explorer db, and display them according to their type.

If we create entities from data contributions we have to consider how to integrate that with our source based setup. There are 3 possible cases:

  1. An entity is created from a data contribution link to an outside source (e.g. somebody submits a link to a PubMed publication). The created publication will have a data_contribution_id and a source_id=pubmed.
  2. An entity is created from a data contribution file uploaded by the user (e.g. somebody uploads a document). The created document will not have a source_id but it will have a data_contribution_id.
  3. An entity is created from a registry and it will not have a data_contribution_id but it will have a source_id.

So, we should have a constraint on "contribuable" tables to always have at least one of source_id or data_contribution_id not null.

@georgiana-b
Copy link
Contributor

georgiana-b commented Jan 17, 2017

Crediting registries for trial data is not very complicated because in our current setup we take all the values for the trial from its primary record. Since opentrials/processors#95 and opentrials/api#131 we added is_primary to records and this information is sent in the API response. So, in the Explorer, we just have to select the source_id and/or data_contribution_id of the record with is_primary=true and credit that source for the trial data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants