New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Claim and ClaimReview 2018 tracking issue #1969

Open
danbri opened this Issue Jun 23, 2018 · 7 comments

Comments

Projects
None yet
6 participants
@danbri
Copy link
Contributor

danbri commented Jun 23, 2018

This is a meta issue to collect materials for improvements around ClaimReview and Claim. It is the successor to our original tracking issue (#1061) which came out of discussions between Bill Adair (@BillAdairDuke) and some of us from Google. We are moving into a new phase of this work as it matures. This issue is the hub for ensuring community, platform and expert issues are tracked and addressed. If an idea is not recorded here, it is unlikely to result in changes to Schema.org definitions or examples.

I will dump a few things here in the description (e.g. as outcomes from GlobalFact5 conference conversations), and then factor them out into distinct issues later. (June 23 2018 note from @danbri)

Tracked sub-Issues

  • media-related #1686 claims and content/appearances that are non-textual; make it work for these cases, and show how it works. Including offsets within video/audio; Chequeado (@danbri to follow up with Pablo M. Fernández) have examples of this.
  • URL of claims #1828 - this led to our addition of Claim and the notion of an "appearance", and to the need to better document sameAs (below); let's keep issue open until documentation is clearer.

sameAs clarification

It emerges that we have had some communication confusions around the use of sameAs properties.

The documentation we published at Google has contributed to this, in that it did not make sufficiently clear that sameAs is an entirely general purpose disambiguation mechanism that can be used on any entity in any schema.org description. Furthermore, fact checkers and journalists should not be expected to be familiar with all areas of Schema.org and we will need to work hard to communicate our underlying datamodel and approach to machine readable description, as this work is getting more widely adopted.

Richer examples may help. I collected some Wikidata URLs that could be used to show how { "@type":"Person", "name":"John Smith"} could be made less ambiguous.

Some of these are living, some lived recently, some are historical. (At this point there does not seem to be critical mass for modeling historical claims, perhaps that may emerge later).

Examples

We should add more examples showing the diversity and creativity of the fact checking community, and make clear that our goal around ClaimReview (and Claim) is not to impose a structure, but to capture in a machine format the common data patterns implicit in existing and evolving practice.

  • we should have a clear example of a fact check without a quantitative rating
  • we should consider an idiom (with new supporting property if needed) to avoid the clumsy current formulation of using "alternateName" on a Rating without numbers.
  • We should show examples (and confirm markup patterns) in which the content in which the claim appears is non-textual (video, audio).
    • BBC's More or Less podcast could be great source of examples here.
    • The ABC RMIT fact checking work also have video fact checks.
    • We need examples where the claim appears in non-textual content, and these should include an (accessibility-minded) treatment of transcripts/subtitles/closed captions, and temporal offsets. #1686 has some discussion.
  • Examples of Claim needed, and showing +/- strengths weaknesses vs the classic ClaimReview pattern.

Claims

We recently (v3.4 release) added a Claim type into Pending area. This has great potential but we need to be clear on several things before removing the "pending" warning flag:

  • .Fact Checking orgs can continue to use ClaimReview as-is, and consuming platforms (e.g. Bing, Google, Facebook etc.) should take care to support "classic" ClaimReview even if they come to also understand Claim.
  • EXAMPLES (see above).
  • What potential relationships exist between claims? e.g. (purely for example)
  • paraphrase or explication (first Claim would be the verbatim appearance; second would be an attempt to rephrase it in terms that are more appropriate for checking, e.g. unarticulated context (time, place, other indexicals (he/she/it/we/they/here etc.).
  • potentially there are multiple paraphases or explications of a claim, if the original was unclear
  • translation (of the original or of a paraphrase; and we should be clear which). Particularly important in states with multiple languages in widespread use, but also in states with immigrant populations, i.e. important almost everywhere. This is also important when political speech is reported, translated and contested internationally; e.g. English-language discussion of Iranian political statements about Israel.
    See "“most of the success we got harnessing the crowd was actually in the translations. […] People were very happy to translate factchecks into other languages.” — Alexios Mantzarlis" in @MevanBabakar's crowdsourcing document.
    • Can we get some real examples here. Maybe Belgium, US (Spanish/English), Israel, Iran, ...
  • Engage with repositories of claims, e.g. Wikidata, DebateGraph, to learn from their deployments of Claim-descriptions, or interoperate at the data level.
  • Scientific dialogue and argumentation modeling. There is a huge literature here. We should document a few examples and avoid premature standardization or complexification, but also encourage data-consuming platforms to support diverse extensions to the core claim and claim-review patterns.

(edited to add...)

Syndication, Sharing, Usage and feed-related concerns.

Note that we have already addressed the desire to indicate expiration dates ("shelf life") in #1687 which was a response to a request from Full Fact (@MevanBabakar / @FullFact).

We may also want to document a pattern for fact check publishers to point to "community norms", conventions that sit above their formal copyright/trademark etc license terms. Some open data publishers are using this to indicate how they request other parties use their data.

Community Norms backgrounder

(This needs moving out to a related or sub- issue.)

  • ODC https://opendatacommons.org/norms/ https://opendatacommons.org/norms/odc-by-sa/ and disclaimer, https://opendatacommons.org/disclaimers/ see also https://opendatacommons.org/category/community-norms/
    (btw this is a different initiative to DataCommons.org). Nearby, discussion of PDDL and Norms in https://timreview.ca/article/122
  • Dataverse, https://dataverse.org/best-practices/dataverse-community-norms
    CC0 followed by norm statement, "Dataverse asks that all users who download datasets from Dataverse’s services follow the following Community Norms.*",
    and things like "Crediting any research used with data citations", "Maintaining anonymity of human subjects", "Third Party API Applications".
  • Paper, "Open science and community norms Data retention and publication moratoria policies in genomics projects"
    http://journals.sagepub.com/doi/abs/10.1177/0968533212458431
  • While modern genomics research often adheres to community norms emphasizing open data sharing, many genomics institutes and projects have recently nuanced such norms with a corpus of data release policies. In particular, publication moratoria and data retention policies have been enacted to ‘reward’ data producers and ensure data quality control. Given the novelty of these policies, this article seeks to identify and analyse the main features of data retention and publication moratoria policies of major genomics institutes and projects around the world. We find that as more collaborative genomics projects are created, and further genomic research discoveries are announced, the need for more sophisticated yet practical and effective policies will increase. Reward systems should be implemented that recognize contributions from data producers and acknowledge the need to remain dedicated to the goals of open data sharing. To this end, in addition to the current choices of employing data retention or publication moratoria policies, alternative models that would be easier to implement or less demanding on open science should also be considered.
  • Bibliographic, e.g. OCLC whose catalog aggregates from multiple libraries.
  • https://www.oclc.org/research/themes/systemwide-library/ohiolink/communitynorms.html "Because we've chosen the Open Data Commons Attribution license (an open license) for the database, we ask that you also comply with the Community Norms when using the bibliographic data in the database. The Community Norms are the result of a long, collaborative process with OCLC's member libraries, including many of the world's leading lending institutions, that considered how to keep OCLC's services sustainable and of high quality. OCLC provides important infrastructure and needed services to the global library community, and as a result we ask that if you are using the bibliographic data in the database other than for study or research that you review the Community Norms below and follow them when making such use of the bibliographic data."
  • Virginia LibraData, https://www.library.virginia.edu/libra/datasets/libra-data-community-norms/ (same as dataverse above). See also https://dataverse.org/best-practices/dataverse-community-norms

Next steps...

This is an incomplete list, but it captures some of the hallway conversations from GlobalFact5; to be continued.

There is also the Tech and Check "standards" subcommittee (hello @MevanBabakar again), who are collecting issues related to fact checking, not strictly limited to claim/claimreview, e.g. profiles of NewsArticle to use in automation tool chains, see https://github.com/TechCheckStandards/core/issues

@danbri

This comment has been minimized.

Copy link
Contributor

danbri commented Jun 23, 2018

Educational/Occupational Credential schemas

What are the requirements from fact checking? There are nearby communities looking at this, but with more of a view to JobPosting and related schemas than fact checking.

We ought to evaluate the current proposals from the Edu/Occupational credentials community, to see if it meets needs around Fact Check markup (and News Articles generally, /cc @TheTrustProject). A good starting point is https://www.w3.org/community/eocred-schema/2018/05/23/progress-so-far-and-the-beginning-of-the-end/ (/cc @philbarker). We should collect some usecases, e.g. "articles/claimreviews made by people who work for Medical organizations" versus a specific medical credential, e.g. expertise relevant to vaccines.

Here is a draft to give a feel for the current level of detail. The modeling is a little indirect to avoid Schema.org having to standardize everything across all territories and fields and to make use of existing code lists and initiatives.

{
  "@context": "http://schema.org",
  "@type": "EducationalOccupationalCredential",
  "name" : "HNC Facilities Management",
  "description" : "Higher National qualifications provide practical
  skills and theoretical knowledge that meet the needs of employers. The HNC
  in Facilities Management (SCQF level 7) develops knowledge and skills of
  the modern Facilities Management industry including both ‘hard’ and ‘soft’
  services, and is aimed at those in supervisory and management roles or
  aspiring managers within the wider realm of Facilities Services.",
  "educationalLevel" : {
    "@type": "DefinedTerm",
    "name": "SCQF Level 7",
    "inDefinedTermSet": "https://www.sqa.org.uk/sqa/71377.html"
  },
  "credentialCategory" : {
    "@type": "DefinedTerm",
    "name": "Higher National Certificate",
    "termCode": "HNC"
  },
  "competencyRequired" : {
    "@type": "DefinedTerm",
    "termCode": "ASTFM401",
    "name": "Understand facilities management and its place in the organisation",
    "url": "https://www.ukstandards.org.uk/PublishedNos/ASTFM401.pdf",
    "inDefinedTermSet": "https://www.ukstandards.org.uk/"
  }
}

("Notable") Claims in Wikidata

  • Follow up discussions with @Wikidata about reviewing their representation of certain classes of claim / hoax / misinformation. See also @MevanBabakar 's "Crowdsourced Factchecking" essay. I should be clear that I am not proposing that professional fact-checking be outsourced to the Wiki universe, just that the worlds are already overlapping and we would do well to consider how the data patterns relate (c.f. sameAs above). /cc @lydiapintscher
    • Hallway conversations at GlobalFact5 suggest that long-lived, widely discussed claims (e.g. spanning an election/referendum cycle or 6+ months, perhaps) may well end up getting Wikidata IDs, and some thought on how that relates to our model would be worthwhile. So not for all evaluable claims or every piece of political speech, but something mapping hub concepts from political speech. Whether Wikidata could serve to help organize claim occurrences into clusters or in other ways is very much an open question.
@thadguidry

This comment has been minimized.

Copy link

thadguidry commented Jun 23, 2018

@danbri Regarding your comments about "scientific dialog and argumentation modeling" and "articles/claimreviews made by people who work for Medical organizations" ...

That happens sometimes within our evidenceOrigin. And evidenceOrigin needs to accept a Dataset as well, so I created #1971

MedicalGuidelineRecommendation itself is a "Claim" of sorts and definitely a result of fact checking across Datasets, ScholarlyArticles and consensus opinions by Medical authorities (sometimes in government service). So #1971 needs to be addressed. But we have no way to represent that MedicalGuidelineRecommendation is a "Claim" or some itemReviewed previously by authorities. Need to tie them together somehow.

@cong-yu

This comment has been minimized.

Copy link

cong-yu commented Jun 23, 2018

@danbri having this overall tracking issue is nice, thanks! I also want us to finish the conversation on #1828 as soon as possible since the discussion there has the right scope for fixing the most important issue for the association data.

Also, I think alternateName does its job quite well and changing it will cause major implementation issues.

@danbri

This comment has been minimized.

Copy link
Contributor

danbri commented Jun 24, 2018

Also TODO: talk to Annotations people (ping @judell @azaroth42) to see if there are idioms expressed against https://www.w3.org/annotation/ that map into Claim(Review), e.g. Hypothes.is - see https://web.hypothes.is/blog/annotation-is-now-a-web-standard/

Are there are public annotation datasets out there with suitable fact checks in?

@judell

This comment has been minimized.

Copy link

judell commented Jun 25, 2018

@danbri I've looked at a lot of ClaimReview data in the wild, and summarized my conclusions here: https://misinfocon.com/how-web-annotation-can-help-readers-spot-fact-checked-claims-ccbf9246dd68. The upshot: target URLs of claim reviews are rarely being captured in a way that would enable search engines, and other actors, to associate target URLs with the claim reviews that target them.

Why isn't the example given here, https://search.google.com/structured-data/testing-tool, being followed consistently? I suspect there are two reasons. First, there might need to be more/better guidance about why/how to cite the target URL in the way https://example.flatworlders.com/we-know-that-the-world-is-flat is cited in the example. Second, as I argue in the Misinfocon post, adding ClaimReview data has become one more step in an already burdensome workflow, so streamlining that workflow -- if possible -- seems desirable.

It would be ideal to validate these observations directly with the producers of fact-checking-oriented ClaimReview metadata, of whom there are not many.

@azaroth42

This comment has been minimized.

Copy link

azaroth42 commented Jun 25, 2018

I would say that Claim is outside of the scope of the (concluded) Web Annotation work. It's a domain specific notion that classifies some web resource or segment thereof. The description of the segmentation is in the Annotation work, via Specific Resources and Selectors. This allows the identification of the segment, and schema could then provide the classification.

ClaimReviews seem very similar to Web Annotations with a particular motivation. Something like: schema:claimReviewing skos:broader oa:assessing (and see definition of assessing )

Unless ... the Review is the structured data that is provided to assess the Claim, at which point it's the body of the annotation that has the claim as its target.

Hope that helps?

@BillAdairDuke

This comment has been minimized.

Copy link

BillAdairDuke commented Jun 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment