Allow for attribution of all data (per-field sourcing) #26

Open
evdb opened this Issue Apr 2, 2012 · 19 comments

Projects

None yet

9 participants

@evdb
Contributor
evdb commented Apr 2, 2012

Ensure that this suggestion is implemented:

Make sure you include sources i.e. make it fundamental in the user interface that people put the URL and/or a text description of where they got data from.

This is vital for provenance and for later data updating. And most importantly, for the user interface to add credibility to sites made using the data.

@evdb evdb added the Difficulty 13 label May 22, 2014
@mhl
Member
mhl commented May 22, 2014

Notes from the backlog triage session at the all team meeting: "see #279 - probably sources of information should be attached to changes, so related to the versioning work"

@mhl mhl added the versioning label May 22, 2014
@clkao
clkao commented Sep 22, 2014

+1

@kaerumy
kaerumy commented Oct 30, 2014

This is currently a blocker for Sinar work on representative database. We have several data sources to help build a complete picture of our representatives, and these sources must be attributed to maintain integrity (non technical) of data that is being used by everybody else.

@paullenz
Member

As this is a blocked for you I have tagged it as contender for review to be added to the next sprint (starts in a week)

@kaerumy
kaerumy commented Nov 5, 2014

Thank you for making this a contender.

@paullenz
Member
paullenz commented Nov 8, 2014

Speaking to Sinar - for their needs just an additional field that would enable them to list attribution source would be sufficient for now - more complex versioning is not essential

This is needed ASAP (faster than the multiple language option)

@mhl mhl added 3 - Now and removed 1 - Contender labels Nov 10, 2014
@chrismytton chrismytton added 1 - Contender and removed 3 - Now labels Nov 12, 2014
@chrismytton
Member

Pull request for simplified version of this using popolo source fields is here - #680.

@chrismytton
Member

@kaerumy I've just pushed a basic implementation of source fields live. This adds an extra Sources tab to the person/organization view/edit pages. This follows the Popolo metadata format of having each source be a url with a short description.

screen shot 2014-11-13 at 12 55 41

I'm going to leave this ticket open as we plan to do more comprehensive attribution based on versioning in the future, but hopefully this is a useful initial version! 👍

@kaerumy
kaerumy commented Nov 25, 2014

This is good for some of our initial import of representatives, but we will eventually need source per field in near future. https://sinar-malaysia.popit.mysociety.org/organizations/5474018888eca8ff1ed43367#sources is not very useful when we want to track for example the source of posts held/change for each person/post.

Consumer applications like our Accountability tracker will need to easily pull specific source for changes in posts held, or start/end of position. Currently it will have to pull in all Sources and leave it to the user to figure out which one was used to verity the information.

@chrismytton chrismytton changed the title from Allow for attribution of all data to Allow for attribution of all data (per-field sourcing) Dec 8, 2014
@mhl mhl added 2 - Current Sprint and removed 1 - Contender labels Jan 8, 2015
@zarino
Member
zarino commented Jan 19, 2015

Hi @kaerumy – If you had a way to record sources for each individual change, would you expect it to be optional, or required?

Also, if users were asked for a source only once, when they finally press the "Save changes" button, rather than individually per input field – would that be ok for your use case? If it wouldn't, could you explain why?

@zarino
Member
zarino commented Jan 19, 2015

Just to add some context, YourNextMP (which is a user interface built on top of the PopIt API) has already made its own decisions about both of these points.

YourNextMP requires you to provide a source whenever you make any changes.

But it doesn't ask per individual field. Instead, on the candidate editing page (which is closest to the default PopIt person/organisation page) there's a single "Source" field next to the "Save" button at the bottom of the page:

screen shot 2015-01-19 at 16 44 53

This means YourNextMP can't track individual sources for each individual change, but it can record a source for each value changed at the same time (what I'd call an "editing session", or in version control might be called a "check-in" or a "commit").

Looking at the diffs on YourNextMP it appears most people only change one thing when they edit a candidate—maybe adding an email address or a twitter profile, or adding a membership—so recording one source per editing session, rather than per field, doesn't make much of a difference. But it does make it a lot simpler for users doing the editing.

The question is, @kaerumy, does Sinar Project need more granularity than that?

@kaerumy
kaerumy commented Jan 20, 2015

Hi @zarino, in short we're building a single centralized database of thousands of people & organizations from various verified sources, that will be used in by different applications as well as a research tool for investigative journalists for years to come.

  • each profile is built up from dozens of sources, often specific to one field. It's really important to know exactly where you got this from, especially if it could implicate someone to a corruption issue or even disqualify them for an election nomination. Technically you could have standard format for references to fields, and sources in a change log for the Sources field, but you would need to parse it for a few more of the following use cases
  • Other users & applications could be using just a few fields. such as posts held. They are not interested in pulling in and digging through all sources, just displaying the source for these specific fields so others can quickly verify. eg. this person held post of CEO of ABC in 1992 (source: X url) One donor wanted to make sure that on website that contains multiple sources, that those sources for the field (eg. picture) are clearly displayed, to make sure that it's not randomly pulled off the Internet.
  • Editors/reviewers need to easily find/review source for specific field is for specific field. A person may hold post for a public company (source A), while holding a political post at same period (source B), before this he held a lower political post (source C), and at this specific time period, he was received gov award and title which adds another prefix/suffix to name (from source D). For each of these fields, a person could be holding up to 200 posts (from some research on beneficial ownership). That's a lot of sources, mixed in with changes for Twitter, FB etc. if in single unified change log.
@zarino
Member
zarino commented Jan 20, 2015

Thanks for the feedback @kaerumy. You didn't quite answer my questions, but can I assume that your answers would be..?

  • Yes, Sinar Project would expect source attribution to be a required step, performed by everyone who wants to submit new data.
  • And no, citing a single source at the moment of saving is not good enough – Sinar would require individual source inputs for each changed field.

Is that accurate?

@kaerumy
kaerumy commented Jan 20, 2015

Yes correct on both.

@zarino
Member
zarino commented Jan 20, 2015

For mySociety peeps – here's a run-down of my work on this so far:

https://docs.google.com/a/mysociety.org/presentation/d/1msmugkIGh3i1v25gYoq4DNULwyhcd7mGHtPy05u0FwQ/edit

@chrismytton chrismytton self-assigned this Jan 26, 2015
@chrismytton chrismytton removed their assignment Feb 11, 2015
@chrismytton chrismytton added the Design label Feb 11, 2015
@akuckartz

This is about provenance and could use PROV-O (http://www.w3.org/TR/prov-o/).

@chrismytton
Member

Just a quick update on where we're at with this: We haven't forgotten about this ticket! We've done some work towards implementing it but it became clear along the way that the implementation is too coupled to the UI.

For the time being we've put this ticket on hold while we focus on decoupling the API and the UI, which is being tracked in #837.

Once we've made some progress with the API and UI decoupling then I want to come back to this ticket and hopefully the implementation will be much cleaner/clearer.

@kaerumy
kaerumy commented Jul 25, 2016

We've implemented this feature in our replacement version of Popit here https://github.com/Sinar/popit_ng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment