This repository has been archived by the owner. It is now read-only.

Allow for attribution of all data (per-field sourcing) #26

Open
evdb opened this Issue Apr 2, 2012 · 19 comments

Comments

Projects
None yet
9 participants
@evdb
Contributor

evdb commented Apr 2, 2012

Ensure that this suggestion is implemented:

Make sure you include sources i.e. make it fundamental in the user interface that people put the URL and/or a text description of where they got data from.

This is vital for provenance and for later data updating. And most importantly, for the user interface to add credibility to sites made using the data.

@evdb evdb added the Difficulty 13 label May 22, 2014

@mhl

This comment has been minimized.

Show comment
Hide comment
@mhl

mhl May 22, 2014

Member

Notes from the backlog triage session at the all team meeting: "see #279 - probably sources of information should be attached to changes, so related to the versioning work"

Member

mhl commented May 22, 2014

Notes from the backlog triage session at the all team meeting: "see #279 - probably sources of information should be attached to changes, so related to the versioning work"

@mhl mhl added the versioning label May 22, 2014

@clkao

This comment has been minimized.

Show comment
Hide comment
@clkao

clkao commented Sep 22, 2014

+1

@kaerumy

This comment has been minimized.

Show comment
Hide comment
@kaerumy

kaerumy Oct 30, 2014

This is currently a blocker for Sinar work on representative database. We have several data sources to help build a complete picture of our representatives, and these sources must be attributed to maintain integrity (non technical) of data that is being used by everybody else.

kaerumy commented Oct 30, 2014

This is currently a blocker for Sinar work on representative database. We have several data sources to help build a complete picture of our representatives, and these sources must be attributed to maintain integrity (non technical) of data that is being used by everybody else.

@paullenz

This comment has been minimized.

Show comment
Hide comment
@paullenz

paullenz Oct 31, 2014

As this is a blocked for you I have tagged it as contender for review to be added to the next sprint (starts in a week)

paullenz commented Oct 31, 2014

As this is a blocked for you I have tagged it as contender for review to be added to the next sprint (starts in a week)

@kaerumy

This comment has been minimized.

Show comment
Hide comment
@kaerumy

kaerumy Nov 5, 2014

Thank you for making this a contender.

kaerumy commented Nov 5, 2014

Thank you for making this a contender.

@paullenz

This comment has been minimized.

Show comment
Hide comment
@paullenz

paullenz Nov 8, 2014

Speaking to Sinar - for their needs just an additional field that would enable them to list attribution source would be sufficient for now - more complex versioning is not essential

This is needed ASAP (faster than the multiple language option)

paullenz commented Nov 8, 2014

Speaking to Sinar - for their needs just an additional field that would enable them to list attribution source would be sufficient for now - more complex versioning is not essential

This is needed ASAP (faster than the multiple language option)

@mhl mhl added 3 - Now and removed 1 - Contender labels Nov 10, 2014

@chrismytton chrismytton added 1 - Contender and removed 3 - Now labels Nov 12, 2014

@chrismytton

This comment has been minimized.

Show comment
Hide comment
@chrismytton

chrismytton Nov 12, 2014

Member

Pull request for simplified version of this using popolo source fields is here - #680.

Member

chrismytton commented Nov 12, 2014

Pull request for simplified version of this using popolo source fields is here - #680.

@chrismytton

This comment has been minimized.

Show comment
Hide comment
@chrismytton

chrismytton Nov 13, 2014

Member

@kaerumy I've just pushed a basic implementation of source fields live. This adds an extra Sources tab to the person/organization view/edit pages. This follows the Popolo metadata format of having each source be a url with a short description.

screen shot 2014-11-13 at 12 55 41

I'm going to leave this ticket open as we plan to do more comprehensive attribution based on versioning in the future, but hopefully this is a useful initial version! 👍

Member

chrismytton commented Nov 13, 2014

@kaerumy I've just pushed a basic implementation of source fields live. This adds an extra Sources tab to the person/organization view/edit pages. This follows the Popolo metadata format of having each source be a url with a short description.

screen shot 2014-11-13 at 12 55 41

I'm going to leave this ticket open as we plan to do more comprehensive attribution based on versioning in the future, but hopefully this is a useful initial version! 👍

@kaerumy

This comment has been minimized.

Show comment
Hide comment
@kaerumy

kaerumy Nov 25, 2014

This is good for some of our initial import of representatives, but we will eventually need source per field in near future. https://sinar-malaysia.popit.mysociety.org/organizations/5474018888eca8ff1ed43367#sources is not very useful when we want to track for example the source of posts held/change for each person/post.

Consumer applications like our Accountability tracker will need to easily pull specific source for changes in posts held, or start/end of position. Currently it will have to pull in all Sources and leave it to the user to figure out which one was used to verity the information.

kaerumy commented Nov 25, 2014

This is good for some of our initial import of representatives, but we will eventually need source per field in near future. https://sinar-malaysia.popit.mysociety.org/organizations/5474018888eca8ff1ed43367#sources is not very useful when we want to track for example the source of posts held/change for each person/post.

Consumer applications like our Accountability tracker will need to easily pull specific source for changes in posts held, or start/end of position. Currently it will have to pull in all Sources and leave it to the user to figure out which one was used to verity the information.

@chrismytton chrismytton changed the title from Allow for attribution of all data to Allow for attribution of all data (per-field sourcing) Dec 8, 2014

@mhl mhl added 2 - Current Sprint and removed 1 - Contender labels Jan 8, 2015

@zarino

This comment has been minimized.

Show comment
Hide comment
@zarino

zarino Jan 19, 2015

Member

Hi @kaerumy – If you had a way to record sources for each individual change, would you expect it to be optional, or required?

Also, if users were asked for a source only once, when they finally press the "Save changes" button, rather than individually per input field – would that be ok for your use case? If it wouldn't, could you explain why?

Member

zarino commented Jan 19, 2015

Hi @kaerumy – If you had a way to record sources for each individual change, would you expect it to be optional, or required?

Also, if users were asked for a source only once, when they finally press the "Save changes" button, rather than individually per input field – would that be ok for your use case? If it wouldn't, could you explain why?

@zarino

This comment has been minimized.

Show comment
Hide comment
@zarino

zarino Jan 19, 2015

Member

Just to add some context, YourNextMP (which is a user interface built on top of the PopIt API) has already made its own decisions about both of these points.

YourNextMP requires you to provide a source whenever you make any changes.

But it doesn't ask per individual field. Instead, on the candidate editing page (which is closest to the default PopIt person/organisation page) there's a single "Source" field next to the "Save" button at the bottom of the page:

screen shot 2015-01-19 at 16 44 53

This means YourNextMP can't track individual sources for each individual change, but it can record a source for each value changed at the same time (what I'd call an "editing session", or in version control might be called a "check-in" or a "commit").

Looking at the diffs on YourNextMP it appears most people only change one thing when they edit a candidate—maybe adding an email address or a twitter profile, or adding a membership—so recording one source per editing session, rather than per field, doesn't make much of a difference. But it does make it a lot simpler for users doing the editing.

The question is, @kaerumy, does Sinar Project need more granularity than that?

Member

zarino commented Jan 19, 2015

Just to add some context, YourNextMP (which is a user interface built on top of the PopIt API) has already made its own decisions about both of these points.

YourNextMP requires you to provide a source whenever you make any changes.

But it doesn't ask per individual field. Instead, on the candidate editing page (which is closest to the default PopIt person/organisation page) there's a single "Source" field next to the "Save" button at the bottom of the page:

screen shot 2015-01-19 at 16 44 53

This means YourNextMP can't track individual sources for each individual change, but it can record a source for each value changed at the same time (what I'd call an "editing session", or in version control might be called a "check-in" or a "commit").

Looking at the diffs on YourNextMP it appears most people only change one thing when they edit a candidate—maybe adding an email address or a twitter profile, or adding a membership—so recording one source per editing session, rather than per field, doesn't make much of a difference. But it does make it a lot simpler for users doing the editing.

The question is, @kaerumy, does Sinar Project need more granularity than that?

@kaerumy

This comment has been minimized.

Show comment
Hide comment
@kaerumy

kaerumy Jan 20, 2015

Hi @zarino, in short we're building a single centralized database of thousands of people & organizations from various verified sources, that will be used in by different applications as well as a research tool for investigative journalists for years to come.

  • each profile is built up from dozens of sources, often specific to one field. It's really important to know exactly where you got this from, especially if it could implicate someone to a corruption issue or even disqualify them for an election nomination. Technically you could have standard format for references to fields, and sources in a change log for the Sources field, but you would need to parse it for a few more of the following use cases
  • Other users & applications could be using just a few fields. such as posts held. They are not interested in pulling in and digging through all sources, just displaying the source for these specific fields so others can quickly verify. eg. this person held post of CEO of ABC in 1992 (source: X url) One donor wanted to make sure that on website that contains multiple sources, that those sources for the field (eg. picture) are clearly displayed, to make sure that it's not randomly pulled off the Internet.
  • Editors/reviewers need to easily find/review source for specific field is for specific field. A person may hold post for a public company (source A), while holding a political post at same period (source B), before this he held a lower political post (source C), and at this specific time period, he was received gov award and title which adds another prefix/suffix to name (from source D). For each of these fields, a person could be holding up to 200 posts (from some research on beneficial ownership). That's a lot of sources, mixed in with changes for Twitter, FB etc. if in single unified change log.

kaerumy commented Jan 20, 2015

Hi @zarino, in short we're building a single centralized database of thousands of people & organizations from various verified sources, that will be used in by different applications as well as a research tool for investigative journalists for years to come.

  • each profile is built up from dozens of sources, often specific to one field. It's really important to know exactly where you got this from, especially if it could implicate someone to a corruption issue or even disqualify them for an election nomination. Technically you could have standard format for references to fields, and sources in a change log for the Sources field, but you would need to parse it for a few more of the following use cases
  • Other users & applications could be using just a few fields. such as posts held. They are not interested in pulling in and digging through all sources, just displaying the source for these specific fields so others can quickly verify. eg. this person held post of CEO of ABC in 1992 (source: X url) One donor wanted to make sure that on website that contains multiple sources, that those sources for the field (eg. picture) are clearly displayed, to make sure that it's not randomly pulled off the Internet.
  • Editors/reviewers need to easily find/review source for specific field is for specific field. A person may hold post for a public company (source A), while holding a political post at same period (source B), before this he held a lower political post (source C), and at this specific time period, he was received gov award and title which adds another prefix/suffix to name (from source D). For each of these fields, a person could be holding up to 200 posts (from some research on beneficial ownership). That's a lot of sources, mixed in with changes for Twitter, FB etc. if in single unified change log.
@zarino

This comment has been minimized.

Show comment
Hide comment
@zarino

zarino Jan 20, 2015

Member

Thanks for the feedback @kaerumy. You didn't quite answer my questions, but can I assume that your answers would be..?

  • Yes, Sinar Project would expect source attribution to be a required step, performed by everyone who wants to submit new data.
  • And no, citing a single source at the moment of saving is not good enough – Sinar would require individual source inputs for each changed field.

Is that accurate?

Member

zarino commented Jan 20, 2015

Thanks for the feedback @kaerumy. You didn't quite answer my questions, but can I assume that your answers would be..?

  • Yes, Sinar Project would expect source attribution to be a required step, performed by everyone who wants to submit new data.
  • And no, citing a single source at the moment of saving is not good enough – Sinar would require individual source inputs for each changed field.

Is that accurate?

@kaerumy

This comment has been minimized.

Show comment
Hide comment
@kaerumy

kaerumy Jan 20, 2015

Yes correct on both.

kaerumy commented Jan 20, 2015

Yes correct on both.

@zarino

This comment has been minimized.

Show comment
Hide comment
@zarino
Member

zarino commented Jan 20, 2015

For mySociety peeps – here's a run-down of my work on this so far:

https://docs.google.com/a/mysociety.org/presentation/d/1msmugkIGh3i1v25gYoq4DNULwyhcd7mGHtPy05u0FwQ/edit

@chrismytton chrismytton self-assigned this Jan 26, 2015

@chrismytton chrismytton referenced a pull request that will close this issue Jan 26, 2015

Open

[WIP] Per field sourcing #727

@chrismytton chrismytton removed their assignment Feb 11, 2015

@chrismytton chrismytton added the Design label Feb 11, 2015

@akuckartz

This comment has been minimized.

Show comment
Hide comment
@akuckartz

akuckartz Apr 15, 2015

This is about provenance and could use PROV-O (http://www.w3.org/TR/prov-o/).

akuckartz commented Apr 15, 2015

This is about provenance and could use PROV-O (http://www.w3.org/TR/prov-o/).

@akuckartz

This comment has been minimized.

Show comment
Hide comment
@chrismytton

This comment has been minimized.

Show comment
Hide comment
@chrismytton

chrismytton May 27, 2015

Member

Just a quick update on where we're at with this: We haven't forgotten about this ticket! We've done some work towards implementing it but it became clear along the way that the implementation is too coupled to the UI.

For the time being we've put this ticket on hold while we focus on decoupling the API and the UI, which is being tracked in #837.

Once we've made some progress with the API and UI decoupling then I want to come back to this ticket and hopefully the implementation will be much cleaner/clearer.

Member

chrismytton commented May 27, 2015

Just a quick update on where we're at with this: We haven't forgotten about this ticket! We've done some work towards implementing it but it became clear along the way that the implementation is too coupled to the UI.

For the time being we've put this ticket on hold while we focus on decoupling the API and the UI, which is being tracked in #837.

Once we've made some progress with the API and UI decoupling then I want to come back to this ticket and hopefully the implementation will be much cleaner/clearer.

@kaerumy

This comment has been minimized.

Show comment
Hide comment
@kaerumy

kaerumy Jul 25, 2016

We've implemented this feature in our replacement version of Popit here https://github.com/Sinar/popit_ng

kaerumy commented Jul 25, 2016

We've implemented this feature in our replacement version of Popit here https://github.com/Sinar/popit_ng

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.