Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClaimReview needs field for URL of claims / integration of new Claim type #1828

Closed
BillAdairDuke opened this issue Jan 22, 2018 · 67 comments
Closed

Comments

@BillAdairDuke
Copy link

@BillAdairDuke BillAdairDuke commented Jan 22, 2018

After discussing the use of ClaimReview with fact-checkers, there is confusion among the fact-checkers about how to populate the field for SameAs field. Some fact-checkers have been using for the URL of speeches or articles where the claim appeared to provide evidence and details of the claim; others have been using it for the home page/knowledge base article on the person or entity that made the statement.

Both have value for search engines and other uses. It’s valuable to be able to see a knowledge base article on the person or entity being checked; it’s also valuable to be able to see where a false claim was published, particularly the article or speech where the claim originated.

The solution: We propose that:
sameAs will continue to be used for the entity or speaker that made the the claim (we called it the “who to blame” field). This field would typically be the home page of the person making the claim, or their Wikipedia page.

We will add a field(s), under ClaimReview, called claimUrlOriginal of type URL to designate the URL of the origin of a claim such as a speech or article; fact-checkers could include additional URLs (claimUrl) of type URL if the claim has been repeated in other publications.

For example, here is a fact-check by PolitiFact of a claim by New Century Times. The SameAs field is the website’s URL. The claimURLOriginal field would be the article that contained the claim that was fact-checked. That claim also appeared on Vote.us.org, which is listed as a claimURL.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "claimUrlOriginal": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/",
  "claimUrl": ["https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"],
  "author":
  {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com"
  },
  "itemReviewed":
  {
    "@type": "CreativeWork",
    "author":
    {
      "@type": "Person",
      "name": "New Century Times",
      "jobTitle": "website",
      "sameAs": "http://newcenturytimes.com/"
    },
    "datePublished": "2017-12-30"
  },
  "reviewRating":
  {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

Edited 2018-01-23 by @danbri to highlight key terms, omit HTML script tag, improve formatting.

@danbri
Copy link
Contributor

@danbri danbri commented Jan 23, 2018

Thanks @BillAdairDuke, good to get this discussion moving. I agree there's an issue here and it is important to find a pragmatic solution.

One quick clarification about sameAs, ... it has potentially several roles in that it can be used on each typed entity, whether we're talking about the (Claim)Review, the Person, or a CreativeWork like NewsArticle. Conceivably there could be several sameAs properties in a single description.

Regarding "claimUrlOriginal", does this always indicate the URL of the itemReviewed? If so we could potentially make it an "url" property of that entity.

Some of the same issues might also be addressed by introducing a new type that more explicitly represents the actual claim under discussion. So if the claim is that "The moon is made of green cheese", right now Schema.org only represents that fairly indirectly; we talk about a review of that claim and then kind of casually mention the claim in passing via the claimReviewed property. In other words our current model is very much oriented around a bottom up way of dealing with claims as they occur, rather than attempting to normalize them into abstract claims that are unrelated to their occurance in real life. My impression is that by introducing claimUrl we would be getting into very similar territory. Are there factchecking-oriented sites that try identify claims quite separate from their specific occurrences?

@BillAdairDuke
Copy link
Author

@BillAdairDuke BillAdairDuke commented Jan 23, 2018

Thanks for the feedback, Dan. Answers below, with a request for clarification.

Regarding "claimUrlOriginal", does this always indicate the URL of the itemReviewed? If so we could potentially make it an "url" property of that entity.

Yes, “claimURLOriginal” would always indicate the URL of the statement that is reviewed. In some cases that would be the URL of a text or speech. It could also be a news article that contains the claim in a quote of the person.

Some of the same issues might also be addressed by introducing a new type that more explicitly represents the actual claim under discussion. So if the claim is that "The moon is made of green cheese", right now Schema.org only represents that fairly indirectly; we talk about a review of that claim and then kind of casually mention the claim in passing via the claimReviewed property. In other words our current model is very much oriented around a bottom up way of dealing with claims as they occur, rather than attempting to normalize them into abstract claims that are unrelated to their occurance in real life. My impression is that by introducing claimUrl we would be getting into very similar territory. Are there factchecking-oriented sites that try identify claims quite separate from their specific occurrences?

I’m not clear what you mean in that the schema represents the claim “fairly indirectly.” In practice, it seems to me that it’s clear that “itemReviewed” is what is being fact-checked. I don't think there's been any confusion about that by users.

I’m also not sure what you mean by “rather than attempting to normalize them into abstract claims that are unrelated to their occurance in real life.”

Are there factchecking-oriented sites that try identify claims quite separate from their specific occurrences?

Also not sure what you mean here. If you mean, "Do fact-checkers distinguish repeated claims?" - Practices vary. Some publishers, such as the Washington Post, will put multiple similar statements by different people at the top of an article, although not in any structured way. Others, such as PolitiFact and Snopes, will just mention in the text that the claim or a similar one has been repeated by others. This is particularly clear with talking points that get repeated by many politicians and publications.

@jkosslyn
Copy link

@jkosslyn jkosslyn commented Jan 23, 2018

Dan -- basically, we're trying to make a distinction between the entity responsible for a claim and places where the claim has appeared. Currently we have the former but not the latter.

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 23, 2018

Hi All,

Here is my take on this.

  • A claim can be a tweet, a person's quote in a rally, an article, etc.
  • Each claim can be repeated in many places and may also have an origin, where the claim originally appeared.
  • The fact checker examines an instance (or spreader or cheerleader) of the claim, which is modeled inside itemReviewed. Note that the instance being examined is often not the original instance of the claim, but rather the most notable person who's making the claim.

Assuming this is our common understanding, we have two options to model as many instances of the claim as possible:

  1. Use itemReviewed to model all the instances, i.e., one itemReviewed for each spreader of the claim. This is a much bigger burden on the reporter since itemReviewed is very heavy: we need author, publication date, etc., it seems to be a non-starter.

  2. Use a simpler structure, i.e., the proposed claimUrl and claimUrlOriginal, to capture the spreaders of the claim. Assuming there is just one origin, claimUrlOriginal is singleton and claimUrl is vector. The semantics here are very different from the url attribute directly under ClaimReview since these are not URLs of the fact check.

(We do note that claimUrl is not used to capture merely reporting pieces. For example, if XYZ said 'earth is flat' and publisher A writes an article saying 'XYZ says earth is flat,' A's article shouldn't be modeled as claimUrl.)

At a high level, claimReviewed, claimUrl, claimUrlOriginal are all attributes of ClaimReview, it seems to be pretty well organized as a unit.

Dan, what do you think? We want to get the definition into the pending schema as soon as possible, so the journalists can start using it.

@danbri
Copy link
Contributor

@danbri danbri commented Jan 24, 2018

Thanks, that all helps me understand where you're coming from.

By indirectly, what I mean is that we have made the review of the claim the focus, and there is no type in schema.org that actually represents the claim itself. We have types for the factcheck reviews, the people, documents, etc but we haven't yet introduced an item (that can have relationships to and from other items) representing the Claim. Maybe we don't need to.

This is a much bigger burden on the reporter since itemReviewed is very heavy: we need author, publication date, etc., it seems to be a non-starter.

I'd put it slightly differently. None of these fields are mandatory in schema.org, sites can say as much or as little as they like. But by handling the itemReviewed as the occurrence of the claim, not the idea itself, yes it does potentially come with more baggage. Which was why I'm (tentatively) floating the possibility that we might want to have a type like "Claim" that could just represent the idea being checked, not the article or tv show or whatever where it showed up. Let's not talk about this for ages but I did want a little discussion on this point before we move to get the properties into Pending. How about we aim at having new terms in Pending by the end of next week, unless we cook up a better idea here in the meantime?

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 24, 2018

Sounds good, Dan, let's wait till end of next week in case someone comes up with a good idea.

Agreed, ClaimReview is review focused. I am a bit hesitant though to create a Claim type since it might be very confusing to people with itemReviewed also representing (an instance of) the claim. I think the rationale for keeping claimReviewed, claimUrl, claimUrlOriginal under ClaimReview directly is that they are all identified/extracted by the fact checker, not the claim maker, so it belongs more to the ClaimReview.

@danbri
Copy link
Contributor

@danbri danbri commented Jan 24, 2018

That makes sense, @cong-yu

@rvguha @vholland - any thoughts?

@vholland
Copy link
Contributor

@vholland vholland commented Jan 24, 2018

We are adding more and more properties about the claim, including the URLs. It would be better to create a proper class for Claims and put the properties on the class rather than shoehorning into ClaimReview.

@rvguha
Copy link
Contributor

@rvguha rvguha commented Jan 24, 2018

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 24, 2018

The itemReviewed within ClaimReview is in fact the claim, or more accurately, an instance of the claim. If we introduce a new Claim type, are we going to cause more confusion among the journalists in terms of the relationship between the two? Or are we talking about replacing itemReviewed with claimReviewed? (I am worried about backwards compatibility with the existing markups here ...)

@danbri
Copy link
Contributor

@danbri danbri commented Jan 25, 2018

@cong-yu my recommendation would be for reviews to stick with itemReviewed, regardless of what exactly the review is of. We already are getting reasonable questions about how to handle things like TV clips / videos, where there are two "candidates" for the item reviewed: the specific clip versus the larger material (e.g. 20 seconds vs 1 hour). Naturally it is good to have both documented, and for there to be a well known way of associating them with each other (and directly or indirectly, with the review). The TV discussion (instigated by the Archive.org folk, who I believe are also in touch with @BillAdairDuke) is in #1686

For the TV clip case, here is the draft approach I've sketched: https://gist.github.com/danbri/96d6265756577e5f21ad4141f053b76d ... Here is a cut down subset to show the structure:

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2017/TODO",
    "claimReviewed": "In the middle of the Cold War, the United States played a role in the overthrow of a democratically-elected Iranian government.",
    "reviewBody": "This claim is true. The UK also played a role.",
    "itemReviewed": {
        "@type": "Clip",
        "startTime": "350",
        "endTime": "370",
        "duration": "20",
        "isPartOf": {
            "@type": "VideoObject",
            "url": "https://www.youtube.com/watch?v=B_889oBKkNU",
            "name": "Clip from President Obama Speech to Muslim World in Cairo",
            "transcript": "http://www.nytimes.com/2009/06/04/us/politics/04obama.text.html"
        }
    }
  }

In this case, we're saying "there's a ClaimReview, and its itemReviewed is a Clip, which isPartOf a VideoObject". If we explore the push from Vicki and Guha towards adding a Claim type (and which I agree will be very useful eventually), then we have some similar choices to make. Two analogous structures would be as follows. One has a review-claim-work structure, the other is review-work-claim.

Variant 1: "the review's itemReviewed is a Claim, which has a claimedIn of the materials carrying the claim occurence." (review-claim-work)

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2018/TODO",
    "claimReviewed": "The moon is composed largely of green cheese (maybe with some patches of Camembert).",
    "reviewBody": "This claim is false, people have been there to check.",
    "itemReviewed": {
        "@type": "Claim",
        "The moon is made of green cheese.",
        "url": "https://en.wikipedia.org/wiki/The_Moon_is_made_of_green_cheese",
        "claimedIn": {
            "@type": "VideoObject",
            "url": "https://www.youtube.com/watch?v=B_889oBKkNU",
            "name": "LUNA LACTOSE NEWS: 10 Green cheese facts the MSM don't want you to know"            
        }
    }
  }

This is however a bit awkward, since the more canonicalized/abstracted claim is sandwiched between the two items (the review and the VideoObject or Article etc) which describe its occurence. This is fine for a single fact check but gets messy if you merge descriptions of multiple occurences since there is then no association from review to the specific article being reviewed.

Variant 2: "the review's itemReviewed is a CreativeWork that carries an occurence of a Claim, indicated via embodiesClaim." (review-work-claim)

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2018/TODO",
    "claimReviewed": "The moon is composed largely of green cheese, with some patches of Camembert.",
    "reviewBody": "This claim is false, people have been there to check.",
    "itemReviewed": {
            "@type": "VideoObject",
            "url": "https://www.youtube.com/watch?v=B_889oBKkNU",
            "name": "LUNA LACTOSE NEWS: 10 Green cheese facts the MSM don't want you to know"            
            "embodiesClaim":  {
              "@type": "Claim",
              "url": "https://en.wikipedia.org/wiki/The_Moon_is_made_of_green_cheese",
              "The moon is made of green cheese."
            }

    }
  }

If we continue down this route we will need to consider whether the Claim here is the exact claim represents in some NewsArticle/VideoObject etc or represents the more general idea that is likely surfacing in many places. For now I've assumed it is somewhat abstracted, but that urls might be available for very well known claims - either at Wikipedia, or aggregations from the fact checking world, initiatives like DebateGraph etc.

However If each occurrence of the Claim type really just represents a specific occurrence of a claim in a particular work, then variant one above could also be viable.

I haven't tried to mix together the TV Clip situation with the "model Claim explicitly" approach yet, but wanted to mention it here as there are similar structural issues, namely that we have to figure out which things get modeled with itemReviewed and which with other properties.

We ought also to consider whether one ClaimReview can have multiple claims/items being reviewed, or it is better just to repeat the entire structure for every (occurrence of a) claim.

@jkosslyn
Copy link

@jkosslyn jkosslyn commented Jan 26, 2018

Thanks Dan. I just chatted with Guha and now better understand the desire for a "Claim" object that would be viable on its own, separate from the enclosing ClaimReview. However, to Cong's point, it will impose substantial hardship on overstretched fact check techies if we break the existing schema. Here's a thought, building on your proposed variant 1: what if Claim extended CreativeWork, kept the existing "author" semantics that we're using live, and added a field called something like "appearance" that could be an organization, person, VideoObject, etc. Thus, fact checkers who don't want to use the new "appearance" field could keep using their existing markups with no breakage, but going forward we have a richer concept of a Claim for those who want to express additional characteristics or use them outside the context of a ClaimReview.

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 26, 2018

@danbri If introducing claimUrl and ClaimUrlOriginal is absolutely a no-go, then I think reusing itemReviewed is a good compromise. Below is an example markup that corresponds to @BillAdairDuke's example. It fits well with the additional types you are creating. It is, however, not a Claim type, and it is not clear to me how to introduce an attribute that denotes the meaning of "original" since it does not make sense for general review. It is also not necessarily backward compatible since the current implementation assumes there is just one itemReviewed and that's where the claimant is extracted.

@jkosslyn Is it possible for you to create an example markup for me to understand exactly what you mean? I don't really like the above approach and would like to see what alternative proposal is there.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author":
  {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com"
  },
  "itemReviewed": [
    {
      "@type": "CreativeWork",
      "author":
      {
        "@type": "Person",
        "name": "New Century Times",
        "jobTitle": "website",
        "sameAs": "http://newcenturytimes.com/"
      },
      "datePublished": "2017-12-30"
    },
    {
      "@type": "CreativeWork",
      "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/"
    },
    {
      "@type": "CreativeWork",
      "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
    }
  ],
  "reviewRating":
  {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}
@jkosslyn
Copy link

@jkosslyn jkosslyn commented Jan 27, 2018

@cong-yu sure thing, here's what I mean. Very similar to your proposal, but with only a single itemReviewed of type "Claim," and containing several instances of "appearance" in addition to the "author" field.


{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
    },
  "itemReviewed": {
      "@type": "Claim",
      "text": An intelligence agency decided to fight back and sent Trump a message by escorting six staffers out of the White House for failing background investigations."
      "author": {
        "@type": "Person",
        "name": "New Century Times",
        "jobTitle": "website",
        "sameAs": "http://newcenturytimes.com/" },
      "datePublished": "2017-12-30" 
      },
      "appearance": {
      "@type": "CreativeWork",
      "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/"
      "position": "first"
    },
      "appearance": {
      "@type": "CreativeWork",
      "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
    },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

The underlying logic is that a review can only really be of a single claim (otherwise how can the review come to a single rating?). If that single claim appears in multiple places, we could use the "position" field or somesuch within each appearance to indicate the first appearance vs follow-on appearances.

Follow-on thought: Perhaps we don't even need a new "appearance" field at all, and the existing "workExample" field might suffice, depending on how that field is currently used in practice?

@chaals
Copy link
Contributor

@chaals chaals commented Jan 27, 2018

Disconnected thoughts:

  • Having a way to describe a claim (actually, that is just a statement, and in that sense a claim review is also a claim) seems useful.
  • As far as I know factchecking typically works at the level of statements. If I make a speech in which I say three things, one true, one false, and one untestable, saying "the speech is not true" is less useful than identifying which things in it are untrue.
  • Many claims have been around for a long time, and get useful identifiers. This seems handy in making use of fact-checking information.
@cong-yu
Copy link

@cong-yu cong-yu commented Jan 27, 2018

@jkosslyn thanks, I revised your code so it parsed correctly and uses workExample. I think we are getting somewhere with the great discussion. See my comment after the code.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
  },
  "itemReviewed": {
    "@type": ["Claim", "CreativeWork"],
    "text": "An intelligence agency decided to fight back and sent Trump a message by escorting six staffers out of the White House for failing background investigations.",
    "author": {
      "@type": "Person",
      "name": "New Century Times",
      "jobTitle": "website",
      "sameAs": "http://newcenturytimes.com/"
    },
    "datePublished": "2017-12-30",
    "workExample": [
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/",
        "position": "first"
      },
      {
        "@type": "CreativeWork",
        "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
      }
    ]
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

Given all the proposals, let's take a look at the following scenario.

Say for example, Publisher A originally publishes an article AA spreading a rumor. Politician X notices that article and spreads the same rumor in one of his speeches. Publisher B reports the speech in an article BB, which also spreads the rumor. Fact checker C comes in and fact checks X and writes the fact checking article.

In the less semantically rich first proposal (claimUrl, claimUrlOriginal), the itemReviewed will be about X making the speech, which links to the official speech site, while claimUrl and claimUrilOriginal link to BB and AA, respectively. This works because the itemReviewed, claimUrl, claimUrlOriginal are equivalent, all are just instances of the claim.

In the more semantically rich current proposal with workExample, the one stored inside itemReviewed gets the elevated status because it is the root element containing all the workExamples. Thus, in this case, should the speech be in the itemReviewed or the article AA? If the former, the semantics is a bit weird since politician B is not the original claim maker. If the latter, what about cases where the original claim maker can not be identified?

So this may be at the root of the discussion: if we want to capture Claim as @chaals and many on this thread are aiming for, what fields really identify a Claim? So far, we have pretty good fields to identify instances of a Claim, but not really a Claim in the sense that can unify all the instances. Is it possible that we don't actually have that information for most cases?

@jkosslyn
Copy link

@jkosslyn jkosslyn commented Jan 28, 2018

@cong-yu that proposal LGTM, @danbri what do you think?

@cong-yu in terms of the scenario you pose, the claim would be the rumor as asserted by Publisher A in article AA, and as amplified by Politician X and Publisher B. If the amplifications cite the original claim by Publisher A, then they would be workExamples; if the amplifications don't cite AA but rather assert the claim as fact, then they could be considered new examples of the claim with new authorship by Politician X and Publisher B.

In terms of the general principle, I think you're right that some Claims will have more fields and some will have less; some will be unified and some may be disjoint; etc. This seems to be a fundamental, unavoidable fact of the messiness of the real world. If there are enough instances of Claims with overlapping fields, perhaps they could be programmatically merged, at least in some cases.

@danbri
Copy link
Contributor

@danbri danbri commented Jan 28, 2018

This direction is looking good, but let me check something. Good idea to reuse workExample, but is it intended that it is attached to the ClaimReview not the Claim (which was where "appearance" was shown in Justin's original).
This is the most recent example from Cong:
https://search.google.com/structured-data/testing-tool/u/0/?url=https%3A%2F%2Fgist.githubusercontent.com%2Fdanbri%2F07c8581de424942c446c4fa70a76f1ee%2Fraw%2F3aa786d592783d3020ac2a53b917933cc819d50c%2Fgistfile1.txt#url=https%3A%2F%2Fgist.githubusercontent.com%2Fdanbri%2F07c8581de424942c446c4fa70a76f1ee%2Fraw%2F3aa786d592783d3020ac2a53b917933cc819d50c%2Fgistfile1.txt

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 28, 2018

@danbri Good call, that was a typo, fixed.

@jkosslyn a bit confused here, are you suggesting the following: If the instances of the claim are independent (i.e., no citation connection), we should model them as separate itemReviewed; if they form a citation chain, we model them as workExamples where the one everyone cites becomes the itemReviewed? That seems too complicated semantically to maintain and also not backward compatible. Perhaps we can assume within a single ClaimReview, the instances of the claim are always connected and the reporter can simply use their judgement to pick one that gets the itemReviewed's author/datePublished treatment?

@danbri
Copy link
Contributor

@danbri danbri commented Jan 28, 2018

@cong-yu @jkosslyn having a single itemReviewed seems a little simpler conceptually to me, and consistent with the way other kinds of reviews are handled. But it can be awkward (as it was with the TV video vs smaller clip example) since you have to somewhat arbitrarily pick one entity as the focal point, even if the others can also be associated via relationships.

I would like to leave open the possibility (beyond the fact check review usecase) of having Claims relate to Claims, eg. well known claims having well known URLs. I'm not sure if there is a need for Claim-to-Claim metadata within typical fact check claim reviews though. We can probably get away with using sameAs URLs for the well known links, to keep the review markup simple.

Here's an example I've had in mind for a while, https://debategraph.org/Stream.aspx?nid=33108&vt=rgraph&dc=focus - debategraph's representation of the (claim that) "the 1953 coup deposed the democratically elected Mosaddeq government". Actually that is a pretty mild claim, the stronger claim is the more specific one that CIA and MI6 were the initiators of the coup. By having an explicit representation of these (increasingly detailed) claims, decoupled from any specific fact check review, we get an interesting foundation for linking this kind of data together.

@jkosslyn
Copy link

@jkosslyn jkosslyn commented Jan 29, 2018

@danbri agreed, having a single itemReviewed is conceptually clearer to me as well. I noticed that Clip inherits from CreativeWork; could we allow the itemReviewed to be any subclass of CreativeWork? As long as it has a url field, adding more optional fields wouldn't hurt the core usecase.

@cong-yu If the instances of the claim are independent (i.e., no citation connection), I think we should model them as separate ClaimReviews. I.e. two independent claims with two independent sources and contexts means two independent reviews. After all, in fact checking we've seen so many cases where the context tips the scales of the review.

Of course there may always be corner cases where citations are not explicit but are implied; in those cases editorial judgement would make the call.

@danbri
Copy link
Contributor

@danbri danbri commented Jan 29, 2018

@jkosslyn - for a ClaimReview it may make sense to encourage only CreativeWork. There are other kinds of reviews too (cars, hotels, employers, ...), but they at least can share the common structure of having a single focal itemReviewed.

In passing (and wary of distractions), we should also note for the record that there may be some commonalities between Claim(Review) markup and Question/Answer sites. That does not mean we need to converge the two forms of markup, but it is worth bearing in mind that e.g. matters of historical or scientific fact are often debated on sites that are structured in terms of questions and candidate answers. For example: https://history.stackexchange.com/questions/26495/how-secret-was-the-us-and-british-involvement-in-the-1953-iranian-coup-d%C3%A9tat?rq=1 (see data extracted by Google).

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 29, 2018

OK, I think we are getting to some consensus. The remaining question is that, given we are not introducing new field, do we really need a new Claim type?

I do think there is value in introducing Claim type on its own, independent of the ClaimReview, with all the benefits of linking everything together, but it does sound like beyond the scope of this current extension to ClaimReview?

@danbri
Copy link
Contributor

@danbri danbri commented Jan 29, 2018

@cong-yu isn't the ability to say what we need without adding more custom properties a feature not a bug?

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 29, 2018

@danbri do you mean we can introduce the Claim type without needing to add a new field?

@danbri
Copy link
Contributor

@danbri danbri commented Jan 29, 2018

We could.

@rvguha
Copy link
Contributor

@rvguha rvguha commented Jan 29, 2018

@thadguidry
Copy link
Contributor

@thadguidry thadguidry commented Jan 29, 2018

Claim: Guha is intelligent. All humans have intelligence. Sometimes :)

+1 for new Claim type.

@cong-yu
Copy link

@cong-yu cong-yu commented Jan 29, 2018

@rvguha @danbri Sounds good, in terms of backward compatibility implementation, can we have the itemReviewed be of both Claim and CreativeWork type?

@rvguha
Copy link
Contributor

@rvguha rvguha commented Jan 29, 2018

@subbuvincent
Copy link

@subbuvincent subbuvincent commented Jan 31, 2018

@cong-yu Thanks. And this is sheepish but it looks like when I posted my comment, my browser was showing a cached version so I had missed several comments going back 4 days! Never mind :). It's great to see more progress since then.

@vinnygreen
Copy link

@vinnygreen vinnygreen commented Apr 17, 2018

Hello all! I'm head of operations and development for Snopes.com. I integrated the claim review mark-up into Snopes' WordPress Theme. I spent a lot of time thinking about this markup because it relates to the other projects I am working on like a misinformation triage engine and a commercial API.

The ClaimReview process is built into our fact-check template with a simple UI for our staff. My staff never sees the schema; they fill in fields I labeled in a way they understand.

I've reviewed the thread and wanted to chime in with Snopes' position on the markup:

  • SameAs field and itemReviewed are confusing, but the simple solution is just allowing multiple itemReviewed for each distinct claim which is not currently permitted.
  • If a page has many distinct claims, it requires many claimReviews. If a claimReview has multiple sources, it should require multiple itemRevieweds.
  • With multiple itemReviews available, we can have multiple sources of information and use the sameAs field as intended.
  • We could instruct end users to assume the earliest date of the itemReviewed is the earliest source of that claim that the publisher has identified. If they wanted to know what the individual fact checker considered to be the earliest claim URL, they would find the oldest datePublished in one of the many itemRevieweds and then search for the sameAs URL that links to an article.
  • Alternatively, we could add a new URL field called claimUrl in the itemReviewed section for the exact URL where the claim lives which could have its own datePublished as well.
  • My concern is that new or ambiguous fields will be ignored or misinterpreted by end-users creating more misinformation. Not to mention the burden on the journalist curating the data that can be inferred via a query by the end-user. The more data entry work for the journalist the greater the chance for error.
  • Backward compatibility is incredibly important.

I know I jumped in very late, but I hope this is helpful. I can be reached directly at vinny@snopes.com

Snopes Ideal Schema

  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2016-06-22",
  "url": "http://snopes.com/fact-check/a-snopes-fact-check/",

  "itemReviewed":
  {
    "@type": "CreativeWork",
    "author":
    {
      "@type": "Organization",
      "name": "Bad Actor 1",
      "sameAs": "https://twitter.com/badactor1",
      "sameAs": "https://wikipedia.com/badactor1",
    },
    {
      "claimUrl": "https://badactor1blog.com/article-with-claim/",
      "datePublished": "2016-06-20"
   }
  },

  "itemReviewed":
  {
    "@type": "CreativeWork",
    "author":
    {
      "@type": "Organization",
      "name": "Bad Actor 2",
      "sameAs": "https://twitter.com/badactor 2",
      "sameAs": "https://wikipedia.com/badactor 2"
    },
    {
      "claimUrl": "https://badactor2blog.com/article-with-claim/",
      "datePublished": "2016-06-20"
   },
    {
      "claimUrl": "https://facebook.com/badactor2blog/212131342334536342",
      "datePublished": "2016-06-21"
   }
  },

  "claimReviewed": "Bad Actors say what?",
  "author":
  {
    "@type": "Organization",
    "name": "Snopes.com"
  },
  "reviewRating":
  {
    "@type": "Rating",
    "ratingValue": "1",
    "bestRating": "5",
    "worstRating": "1",
    "alternateName" : "False"
  }
}```
@danbri
Copy link
Contributor

@danbri danbri commented Jul 11, 2018

I've been looking into the consequences of this for neighbouring pieces of Schema.org, i.e. non fact-checking reviews. I have opened an independent issue on that, since Schema.org needs to have a clear picture of when/whether itemReviewed can be repeated. I believe we can make this work and just need to do the housekeeping to keep related definitions in sync.

I'd like to say

  • "itemReviewed" in general case: most typically points to a single thing
  • but can point to several things, if a review is reviewing several things
  • conventions and common patterns may vary between the different subtypes of Review (UserReview, CriticReview, ClaimReview etc.).
  • Fact checking reviews (reviews of Claims, i.e. ClaimReview) can use repeated itemReviewed to describe various appearances of several claims that it makes sense to evaluate within a common review (i.e. fact-check). We need not be strict here about whether they are multiple appearances of exactly the same claim, or variations on a theme.
  • Other kinds of reviews sometimes have multiple items reviewed in an integrated writeup, this is probably more typical of critic reviews than user reviews, and what we do here for fact checking shouldn't impact those other types, beyond the general clarification that "itemReviewed" is sometimes repeated.
  • We can note that it is possible to use the various properties of the items reviewed (their publication dates, or perhaps mainEntityOfPage markup) to point to the earliest or primary item, if there are several. Note that we are not imposing a single approach to that matter here. Conventions may emerge, and they may be subtype-specific.
@vinnygreen
Copy link

@vinnygreen vinnygreen commented Jul 11, 2018

@danbri Some notes for you:

From my perspective, the item(s) being reviewed should purport the claim being reviewed with little left to the imagination. Allowing for variation could make understanding the data and using in applications more difficult.

If this data is being utilized in an application, it's easier to convey that the item's being reviewed strictly match the claim. Otherwise, we would need the publisher to express the reason or theme of the reviewed items. This might stretch the definition of "claim" and cause more issues.

When considering how multiple itemRevieweds can be deployed across other subtypes, we need to consider if it multiple distinct Reviews is more appropriate. More clearly, is the need for many itemRevieweds solved by having many Reviews on the page?

We build collections of ClaimReviews in our archives. We convey the theme on the page through the title and description and included many ClaimReviews. Google uses them in the knowledge panel currently.

See: https://www.snopes.com/fact-check/category/photos/

@cong-yu
Copy link

@cong-yu cong-yu commented Jul 16, 2018

FYI The following is the example code taking Vinny and Dan's comments into consideration. It leverages the repeatability of itemReviewed, which Dan is hoping to get approved soon. No new type or attribute is introduced (Claim type is already being proposed in a separate issue).

Regarding the granularity of the repeatability:

  1. Same claim, same claimant, different instances => use different workExample within same itemReviewed for repetition

  2. Same claim, different claimant => use itemReviewed for repetitiion

  3. Different claim => use different ClaimReview

Vinny has a good point on being careful about the variants of the claim (whether two claims are really different or simply slight variants of the same), it is an editorial call and the idea is that each publisher should think carefully about that and use either 3 or 1/2 as appropriate.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/.../white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
  },
  "itemReviewed": [
  {
    "@type": ["Claim", "CreativeWork"],
    "author": {
      "@type": "Organization",
      "name": "New Century Times",
      "sameAs": "http://newcenturytimes.com/"
    },
    "appearance": [
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/...-white-house-high-ranking-officials-thrown-out/",
        "datePublished": "2017-12-30"
      },
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/...-white-house-high-ranking-officials-thrown-out-2/",
        "datePublished": "2017-12-31"
      }
    ]
  },
  {
    "@type": ["Claim", "CreativeWork"],
    "author": {
      "@type": "Organization",
      "name": "Vote US",
      "sameAs": "http://www.vote.us.org/"
    },
    "appearance": [
      {
        "@type": "CreativeWork",
        "url": "https://www.vote.us.org/...-raided-the-white-house-6-people-thrown-out/",
        "datePublished": "2017-02-16"
      }
    ]
  }],
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}
@danbri
Copy link
Contributor

@danbri danbri commented Jul 16, 2018

Thanks @cong-yu - we already added Claim (with the usual 'pending' status for new things) in this last release.

Wouldn't we rather use the "appearance" (and even "firstAppearance") properties to link from the Claim to the CreativeWork? (rather than workExample)

There's also no strong need to multiple-type ("@type": ["Claim", "CreativeWork"]); Claim is a defined in the schemas as a CreativeWork so the extra info doesn't add much except page weight.

@BillAdairDuke
Copy link
Author

@BillAdairDuke BillAdairDuke commented Jul 16, 2018

@danbri
Copy link
Contributor

@danbri danbri commented Jul 16, 2018

@cong-yu
Copy link

@cong-yu cong-yu commented Jul 16, 2018

@danbri yep, appearance should work as well, I will update my comment

@BillAdairDuke Yeah, I would leave it to the fact checkers to determine how similar two claims are to each other that a single ClaimReview is warranted ... @danbri this is something hard to concretely define since it is somewhat subjective, and I would not try to clarify this too much ...

@cong-yu
Copy link

@cong-yu cong-yu commented Jul 16, 2018

@danbri FYI using just Claim still gives error in SDTT so I will keep both for now.

@danbri
Copy link
Contributor

@danbri danbri commented Jul 16, 2018

@cong-yu - then let's update the Google tool!

And yes, I agree re hard to define. Maybe something soft like:

"several variations on the same claim (e.g. different phrasings but in the same social/political context) can be handled within a single ClaimReview via repetition of itemReviewed, but the idea of a ClaimReview is that more or less a single idea is being examined."

@danbri
Copy link
Contributor

@danbri danbri commented Jul 26, 2018

@BillAdairDuke @vinnygreen can you help us with some guiding examples from Snopes, Politifact or elsewhere that exercise some of the variations we've discussed above?

For example, @cong-yu 's list,

Same claim, same claimant, different instances => use different workExample within same itemReviewed for repetition
Same claim, different claimant => use itemReviewed for repetitiion
Different claim => use different ClaimReview```

If you have examples showing these kinds of variation on real fact check sites that would be really helpful in fixing up our canonical guidelines in a way that is technically sound and aligned with fact-checking practices. In particular also any examples with multiple languages (of the claim being translated or the whole factcheck) would help.

Dan
@BillAdairDuke
Copy link
Author

@BillAdairDuke BillAdairDuke commented Jul 26, 2018

@danbri danbri changed the title ClaimReview needs field for URL of claims ClaimReview needs field for URL of claims / integration of new Claim type Jul 27, 2018
@danbri
Copy link
Contributor

@danbri danbri commented Mar 15, 2019

Here's some text towards documenting how all these constructs can fit together...

Key Concepts

ClaimReview-based factcheck markup defines a structure that corresponds to the kind of information included in many fact-checking pages.

@rvguha
Copy link
Contributor

@rvguha rvguha commented Mar 15, 2019

@rvguha
Copy link
Contributor

@rvguha rvguha commented Mar 15, 2019

@danbri
Copy link
Contributor

@danbri danbri commented Mar 15, 2019

@rvguha - Simon and I wrote this together. /cc @cong-yu

We were originally thinking of it for Google's own documentation but it felt like a better fit for Schema.org. @tmarshbing - can you take a look from Bing's perspective too?

@sens3
Copy link

@sens3 sens3 commented Mar 15, 2019

The key concepts you outline are semantically correct and make sense to me.
My only concern is that all this flexibility makes things confusing.

Can we run this by one or two fact checkers to see whether this makes sense to them?

@danbri
Copy link
Contributor

@danbri danbri commented Mar 15, 2019

Thanks @sens3, that's why we're here. I've pinged a few folks by email and Twitter who might not be monitoring Github closely, too...

Some of the flexibility we can nudge people away from for now (e.g. having claimReviewed be anything other than a string is not needed until there's a strong demand for multi-lingualism and clarity about translations). But a lot of it comes with the territory, as soon as the question of multiple appearances of the "same" claim - made by different parties, reported in different places - is considered.

I would love to tighten things down to a few core patterns and have examples of each, so long as we're consistent with the direction that that fact checking community is headed.

@vinnygreen
Copy link

@vinnygreen vinnygreen commented Mar 15, 2019

@danbri Can you build out the an empty example schema with all the bells and whistles? I can hunt down some real examples that mirror the possible complexity and fill in the schema.

@danbri
Copy link
Contributor

@danbri danbri commented Mar 15, 2019

Hi @vinnygreen - thanks. Here's a fairly simple example @sens3 and I been talking through. Essentially the evolution is from having a CreativeWork (e.g. some news article) be the value of "itemReviewed", to having a Claim there. Then we can point from that Claim to its appearances in the press etc., using "appearance". This helps avoid ambiguity when we want to distinguish authors of articles and claims.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2016-06-22",
  "url": "http://example.com/news/science/worldisflat.html",
  "claimReviewed": "The world is flat",
  "itemReviewed": {
    "@type": "Claim",
    "author": {
      "@type": "Organization",
      "name": "Square World Society",
      "sameAs": "https://example.flatworlders.com/we-know-that-the-world-is-flat"
    },
    "datePublished": "2016-06-20",
    "appearance": {
      "@type": "NewsArticle",
      "url": "http://skeptical.example.net/news/a122121",
      "name": "Square Earth - Flat earthers for the Internet age",
      "datePublished": "2016-06-22",
      "author": {
        "@type": "Person",
        "name": "T. Tellar"
      }
    }
  },
  "author": {
    "@type": "Organization",
    "name": "Example.com science watch"
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "1",
    "bestRating": "5",
    "worstRating": "1",
    "alternateName": "False"
  }
}
</script>

@cguess
Copy link

@cguess cguess commented Mar 22, 2019

I’ll jump in and agree that ambiguity will lead to paralysis for anyone not used to relational databases and systems. Having any data nested more than one level, two at the extreme, is, from my perspective, untenable. From the journalism perspective every field should, ideally, have one and only one right way to fill out. It’s easier for me to explain how to shoe horn something than it is to remember options.

@Fleker
Copy link

@Fleker Fleker commented Jan 16, 2020

It seems, from the specification, that a Claim is designed to serve as a child of a ClaimReview. While this makes sense for fact-checking, I wonder if this could be applied in reverse. A news site may be interested in verifying their trustworthiness through the IEEE P7011 standard. As such, they may want to markup claims they make with links to a fact-checking service, either 3P (Snopes) or 1P (if the news service has their own fact checking).

So there would be an interest in modifying the order and adding a new review or some other link to a fact-checker. What do you think?

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Claim",
  "datePublished": "2016-06-22",
  "url": "https://trustworthy.news/science/world-is-flat",
  "abstract": "The world is flat",
  **"review"**: {
    "@type": "ClaimReview",
    "datePublished": "2016-06-28",
    "url": "https://facts.news/science/world-is-flat",
    "author": {
      "@type": "Organization",
      "name": "News Facts Fact Checker"
    },
    "reviewRating": {
      "@type": "Rating",
      "ratingValue": "1",
      "bestRating": "5",
      "worstRating": "1",
      "alternateName": "False"
    }
  }
}
</script>

This may alternatively be scattered throughout a news piece with a handful of reviews as an extension of a link that can be link to a fact checker. Then the fact checker can provide a ClaimReview element that returns to the current news provider to provide verification or denial of that fact.

As we all know,
<a href="/previous-story-on-news-service" itemscope itemtype="http://schema.org/Claim">
    <meta itemprop="review" itemscope itemtype="http://schema.org/ClaimReview>
       <meta itemprop="url" content="https://facts.news/science/world-is-flat" />
    </meta>
    the world is flat
</a>.
@github-actions
Copy link

@github-actions github-actions bot commented Jul 18, 2020

This issue is being tagged as Stale due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet