New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClaimReview needs field for URL of claims / integration of new Claim type #1828

Open
BillAdairDuke opened this Issue Jan 22, 2018 · 56 comments

Comments

Projects
None yet
10 participants
@BillAdairDuke

BillAdairDuke commented Jan 22, 2018

After discussing the use of ClaimReview with fact-checkers, there is confusion among the fact-checkers about how to populate the field for SameAs field. Some fact-checkers have been using for the URL of speeches or articles where the claim appeared to provide evidence and details of the claim; others have been using it for the home page/knowledge base article on the person or entity that made the statement.

Both have value for search engines and other uses. It’s valuable to be able to see a knowledge base article on the person or entity being checked; it’s also valuable to be able to see where a false claim was published, particularly the article or speech where the claim originated.

The solution: We propose that:
sameAs will continue to be used for the entity or speaker that made the the claim (we called it the “who to blame” field). This field would typically be the home page of the person making the claim, or their Wikipedia page.

We will add a field(s), under ClaimReview, called claimUrlOriginal of type URL to designate the URL of the origin of a claim such as a speech or article; fact-checkers could include additional URLs (claimUrl) of type URL if the claim has been repeated in other publications.

For example, here is a fact-check by PolitiFact of a claim by New Century Times. The SameAs field is the website’s URL. The claimURLOriginal field would be the article that contained the claim that was fact-checked. That claim also appeared on Vote.us.org, which is listed as a claimURL.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "claimUrlOriginal": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/",
  "claimUrl": ["https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"],
  "author":
  {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com"
  },
  "itemReviewed":
  {
    "@type": "CreativeWork",
    "author":
    {
      "@type": "Person",
      "name": "New Century Times",
      "jobTitle": "website",
      "sameAs": "http://newcenturytimes.com/"
    },
    "datePublished": "2017-12-30"
  },
  "reviewRating":
  {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

Edited 2018-01-23 by @danbri to highlight key terms, omit HTML script tag, improve formatting.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 23, 2018

Thanks @BillAdairDuke, good to get this discussion moving. I agree there's an issue here and it is important to find a pragmatic solution.

One quick clarification about sameAs, ... it has potentially several roles in that it can be used on each typed entity, whether we're talking about the (Claim)Review, the Person, or a CreativeWork like NewsArticle. Conceivably there could be several sameAs properties in a single description.

Regarding "claimUrlOriginal", does this always indicate the URL of the itemReviewed? If so we could potentially make it an "url" property of that entity.

Some of the same issues might also be addressed by introducing a new type that more explicitly represents the actual claim under discussion. So if the claim is that "The moon is made of green cheese", right now Schema.org only represents that fairly indirectly; we talk about a review of that claim and then kind of casually mention the claim in passing via the claimReviewed property. In other words our current model is very much oriented around a bottom up way of dealing with claims as they occur, rather than attempting to normalize them into abstract claims that are unrelated to their occurance in real life. My impression is that by introducing claimUrl we would be getting into very similar territory. Are there factchecking-oriented sites that try identify claims quite separate from their specific occurrences?

@BillAdairDuke

This comment has been minimized.

BillAdairDuke commented Jan 23, 2018

Thanks for the feedback, Dan. Answers below, with a request for clarification.

Regarding "claimUrlOriginal", does this always indicate the URL of the itemReviewed? If so we could potentially make it an "url" property of that entity.

Yes, “claimURLOriginal” would always indicate the URL of the statement that is reviewed. In some cases that would be the URL of a text or speech. It could also be a news article that contains the claim in a quote of the person.

Some of the same issues might also be addressed by introducing a new type that more explicitly represents the actual claim under discussion. So if the claim is that "The moon is made of green cheese", right now Schema.org only represents that fairly indirectly; we talk about a review of that claim and then kind of casually mention the claim in passing via the claimReviewed property. In other words our current model is very much oriented around a bottom up way of dealing with claims as they occur, rather than attempting to normalize them into abstract claims that are unrelated to their occurance in real life. My impression is that by introducing claimUrl we would be getting into very similar territory. Are there factchecking-oriented sites that try identify claims quite separate from their specific occurrences?

I’m not clear what you mean in that the schema represents the claim “fairly indirectly.” In practice, it seems to me that it’s clear that “itemReviewed” is what is being fact-checked. I don't think there's been any confusion about that by users.

I’m also not sure what you mean by “rather than attempting to normalize them into abstract claims that are unrelated to their occurance in real life.”

Are there factchecking-oriented sites that try identify claims quite separate from their specific occurrences?

Also not sure what you mean here. If you mean, "Do fact-checkers distinguish repeated claims?" - Practices vary. Some publishers, such as the Washington Post, will put multiple similar statements by different people at the top of an article, although not in any structured way. Others, such as PolitiFact and Snopes, will just mention in the text that the claim or a similar one has been repeated by others. This is particularly clear with talking points that get repeated by many politicians and publications.

@jkosslyn

This comment has been minimized.

jkosslyn commented Jan 23, 2018

Dan -- basically, we're trying to make a distinction between the entity responsible for a claim and places where the claim has appeared. Currently we have the former but not the latter.

@cong-yu

This comment has been minimized.

cong-yu commented Jan 23, 2018

Hi All,

Here is my take on this.

  • A claim can be a tweet, a person's quote in a rally, an article, etc.
  • Each claim can be repeated in many places and may also have an origin, where the claim originally appeared.
  • The fact checker examines an instance (or spreader or cheerleader) of the claim, which is modeled inside itemReviewed. Note that the instance being examined is often not the original instance of the claim, but rather the most notable person who's making the claim.

Assuming this is our common understanding, we have two options to model as many instances of the claim as possible:

  1. Use itemReviewed to model all the instances, i.e., one itemReviewed for each spreader of the claim. This is a much bigger burden on the reporter since itemReviewed is very heavy: we need author, publication date, etc., it seems to be a non-starter.

  2. Use a simpler structure, i.e., the proposed claimUrl and claimUrlOriginal, to capture the spreaders of the claim. Assuming there is just one origin, claimUrlOriginal is singleton and claimUrl is vector. The semantics here are very different from the url attribute directly under ClaimReview since these are not URLs of the fact check.

(We do note that claimUrl is not used to capture merely reporting pieces. For example, if XYZ said 'earth is flat' and publisher A writes an article saying 'XYZ says earth is flat,' A's article shouldn't be modeled as claimUrl.)

At a high level, claimReviewed, claimUrl, claimUrlOriginal are all attributes of ClaimReview, it seems to be pretty well organized as a unit.

Dan, what do you think? We want to get the definition into the pending schema as soon as possible, so the journalists can start using it.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 24, 2018

Thanks, that all helps me understand where you're coming from.

By indirectly, what I mean is that we have made the review of the claim the focus, and there is no type in schema.org that actually represents the claim itself. We have types for the factcheck reviews, the people, documents, etc but we haven't yet introduced an item (that can have relationships to and from other items) representing the Claim. Maybe we don't need to.

This is a much bigger burden on the reporter since itemReviewed is very heavy: we need author, publication date, etc., it seems to be a non-starter.

I'd put it slightly differently. None of these fields are mandatory in schema.org, sites can say as much or as little as they like. But by handling the itemReviewed as the occurrence of the claim, not the idea itself, yes it does potentially come with more baggage. Which was why I'm (tentatively) floating the possibility that we might want to have a type like "Claim" that could just represent the idea being checked, not the article or tv show or whatever where it showed up. Let's not talk about this for ages but I did want a little discussion on this point before we move to get the properties into Pending. How about we aim at having new terms in Pending by the end of next week, unless we cook up a better idea here in the meantime?

@cong-yu

This comment has been minimized.

cong-yu commented Jan 24, 2018

Sounds good, Dan, let's wait till end of next week in case someone comes up with a good idea.

Agreed, ClaimReview is review focused. I am a bit hesitant though to create a Claim type since it might be very confusing to people with itemReviewed also representing (an instance of) the claim. I think the rationale for keeping claimReviewed, claimUrl, claimUrlOriginal under ClaimReview directly is that they are all identified/extracted by the fact checker, not the claim maker, so it belongs more to the ClaimReview.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 24, 2018

That makes sense, @cong-yu

@rvguha @vholland - any thoughts?

@vholland

This comment has been minimized.

Contributor

vholland commented Jan 24, 2018

We are adding more and more properties about the claim, including the URLs. It would be better to create a proper class for Claims and put the properties on the class rather than shoehorning into ClaimReview.

@rvguha

This comment has been minimized.

Contributor

rvguha commented Jan 24, 2018

@cong-yu

This comment has been minimized.

cong-yu commented Jan 24, 2018

The itemReviewed within ClaimReview is in fact the claim, or more accurately, an instance of the claim. If we introduce a new Claim type, are we going to cause more confusion among the journalists in terms of the relationship between the two? Or are we talking about replacing itemReviewed with claimReviewed? (I am worried about backwards compatibility with the existing markups here ...)

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 25, 2018

@cong-yu my recommendation would be for reviews to stick with itemReviewed, regardless of what exactly the review is of. We already are getting reasonable questions about how to handle things like TV clips / videos, where there are two "candidates" for the item reviewed: the specific clip versus the larger material (e.g. 20 seconds vs 1 hour). Naturally it is good to have both documented, and for there to be a well known way of associating them with each other (and directly or indirectly, with the review). The TV discussion (instigated by the Archive.org folk, who I believe are also in touch with @BillAdairDuke) is in #1686

For the TV clip case, here is the draft approach I've sketched: https://gist.github.com/danbri/96d6265756577e5f21ad4141f053b76d ... Here is a cut down subset to show the structure:

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2017/TODO",
    "claimReviewed": "In the middle of the Cold War, the United States played a role in the overthrow of a democratically-elected Iranian government.",
    "reviewBody": "This claim is true. The UK also played a role.",
    "itemReviewed": {
        "@type": "Clip",
        "startTime": "350",
        "endTime": "370",
        "duration": "20",
        "isPartOf": {
            "@type": "VideoObject",
            "url": "https://www.youtube.com/watch?v=B_889oBKkNU",
            "name": "Clip from President Obama Speech to Muslim World in Cairo",
            "transcript": "http://www.nytimes.com/2009/06/04/us/politics/04obama.text.html"
        }
    }
  }

In this case, we're saying "there's a ClaimReview, and its itemReviewed is a Clip, which isPartOf a VideoObject". If we explore the push from Vicki and Guha towards adding a Claim type (and which I agree will be very useful eventually), then we have some similar choices to make. Two analogous structures would be as follows. One has a review-claim-work structure, the other is review-work-claim.

Variant 1: "the review's itemReviewed is a Claim, which has a claimedIn of the materials carrying the claim occurence." (review-claim-work)

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2018/TODO",
    "claimReviewed": "The moon is composed largely of green cheese (maybe with some patches of Camembert).",
    "reviewBody": "This claim is false, people have been there to check.",
    "itemReviewed": {
        "@type": "Claim",
        "The moon is made of green cheese.",
        "url": "https://en.wikipedia.org/wiki/The_Moon_is_made_of_green_cheese",
        "claimedIn": {
            "@type": "VideoObject",
            "url": "https://www.youtube.com/watch?v=B_889oBKkNU",
            "name": "LUNA LACTOSE NEWS: 10 Green cheese facts the MSM don't want you to know"            
        }
    }
  }

This is however a bit awkward, since the more canonicalized/abstracted claim is sandwiched between the two items (the review and the VideoObject or Article etc) which describe its occurence. This is fine for a single fact check but gets messy if you merge descriptions of multiple occurences since there is then no association from review to the specific article being reviewed.

Variant 2: "the review's itemReviewed is a CreativeWork that carries an occurence of a Claim, indicated via embodiesClaim." (review-work-claim)

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2018/TODO",
    "claimReviewed": "The moon is composed largely of green cheese, with some patches of Camembert.",
    "reviewBody": "This claim is false, people have been there to check.",
    "itemReviewed": {
            "@type": "VideoObject",
            "url": "https://www.youtube.com/watch?v=B_889oBKkNU",
            "name": "LUNA LACTOSE NEWS: 10 Green cheese facts the MSM don't want you to know"            
            "embodiesClaim":  {
              "@type": "Claim",
              "url": "https://en.wikipedia.org/wiki/The_Moon_is_made_of_green_cheese",
              "The moon is made of green cheese."
            }

    }
  }

If we continue down this route we will need to consider whether the Claim here is the exact claim represents in some NewsArticle/VideoObject etc or represents the more general idea that is likely surfacing in many places. For now I've assumed it is somewhat abstracted, but that urls might be available for very well known claims - either at Wikipedia, or aggregations from the fact checking world, initiatives like DebateGraph etc.

However If each occurrence of the Claim type really just represents a specific occurrence of a claim in a particular work, then variant one above could also be viable.

I haven't tried to mix together the TV Clip situation with the "model Claim explicitly" approach yet, but wanted to mention it here as there are similar structural issues, namely that we have to figure out which things get modeled with itemReviewed and which with other properties.

We ought also to consider whether one ClaimReview can have multiple claims/items being reviewed, or it is better just to repeat the entire structure for every (occurrence of a) claim.

@jkosslyn

This comment has been minimized.

jkosslyn commented Jan 26, 2018

Thanks Dan. I just chatted with Guha and now better understand the desire for a "Claim" object that would be viable on its own, separate from the enclosing ClaimReview. However, to Cong's point, it will impose substantial hardship on overstretched fact check techies if we break the existing schema. Here's a thought, building on your proposed variant 1: what if Claim extended CreativeWork, kept the existing "author" semantics that we're using live, and added a field called something like "appearance" that could be an organization, person, VideoObject, etc. Thus, fact checkers who don't want to use the new "appearance" field could keep using their existing markups with no breakage, but going forward we have a richer concept of a Claim for those who want to express additional characteristics or use them outside the context of a ClaimReview.

@cong-yu

This comment has been minimized.

cong-yu commented Jan 26, 2018

@danbri If introducing claimUrl and ClaimUrlOriginal is absolutely a no-go, then I think reusing itemReviewed is a good compromise. Below is an example markup that corresponds to @BillAdairDuke's example. It fits well with the additional types you are creating. It is, however, not a Claim type, and it is not clear to me how to introduce an attribute that denotes the meaning of "original" since it does not make sense for general review. It is also not necessarily backward compatible since the current implementation assumes there is just one itemReviewed and that's where the claimant is extracted.

@jkosslyn Is it possible for you to create an example markup for me to understand exactly what you mean? I don't really like the above approach and would like to see what alternative proposal is there.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author":
  {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com"
  },
  "itemReviewed": [
    {
      "@type": "CreativeWork",
      "author":
      {
        "@type": "Person",
        "name": "New Century Times",
        "jobTitle": "website",
        "sameAs": "http://newcenturytimes.com/"
      },
      "datePublished": "2017-12-30"
    },
    {
      "@type": "CreativeWork",
      "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/"
    },
    {
      "@type": "CreativeWork",
      "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
    }
  ],
  "reviewRating":
  {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}
@jkosslyn

This comment has been minimized.

jkosslyn commented Jan 27, 2018

@cong-yu sure thing, here's what I mean. Very similar to your proposal, but with only a single itemReviewed of type "Claim," and containing several instances of "appearance" in addition to the "author" field.


{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
    },
  "itemReviewed": {
      "@type": "Claim",
      "text": An intelligence agency decided to fight back and sent Trump a message by escorting six staffers out of the White House for failing background investigations."
      "author": {
        "@type": "Person",
        "name": "New Century Times",
        "jobTitle": "website",
        "sameAs": "http://newcenturytimes.com/" },
      "datePublished": "2017-12-30" 
      },
      "appearance": {
      "@type": "CreativeWork",
      "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/"
      "position": "first"
    },
      "appearance": {
      "@type": "CreativeWork",
      "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
    },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

The underlying logic is that a review can only really be of a single claim (otherwise how can the review come to a single rating?). If that single claim appears in multiple places, we could use the "position" field or somesuch within each appearance to indicate the first appearance vs follow-on appearances.

Follow-on thought: Perhaps we don't even need a new "appearance" field at all, and the existing "workExample" field might suffice, depending on how that field is currently used in practice?

@chaals

This comment has been minimized.

Contributor

chaals commented Jan 27, 2018

Disconnected thoughts:

  • Having a way to describe a claim (actually, that is just a statement, and in that sense a claim review is also a claim) seems useful.
  • As far as I know factchecking typically works at the level of statements. If I make a speech in which I say three things, one true, one false, and one untestable, saying "the speech is not true" is less useful than identifying which things in it are untrue.
  • Many claims have been around for a long time, and get useful identifiers. This seems handy in making use of fact-checking information.
@cong-yu

This comment has been minimized.

cong-yu commented Jan 27, 2018

@jkosslyn thanks, I revised your code so it parsed correctly and uses workExample. I think we are getting somewhere with the great discussion. See my comment after the code.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
  },
  "itemReviewed": {
    "@type": ["Claim", "CreativeWork"],
    "text": "An intelligence agency decided to fight back and sent Trump a message by escorting six staffers out of the White House for failing background investigations.",
    "author": {
      "@type": "Person",
      "name": "New Century Times",
      "jobTitle": "website",
      "sameAs": "http://newcenturytimes.com/"
    },
    "datePublished": "2017-12-30",
    "workExample": [
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/",
        "position": "first"
      },
      {
        "@type": "CreativeWork",
        "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
      }
    ]
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

Given all the proposals, let's take a look at the following scenario.

Say for example, Publisher A originally publishes an article AA spreading a rumor. Politician X notices that article and spreads the same rumor in one of his speeches. Publisher B reports the speech in an article BB, which also spreads the rumor. Fact checker C comes in and fact checks X and writes the fact checking article.

In the less semantically rich first proposal (claimUrl, claimUrlOriginal), the itemReviewed will be about X making the speech, which links to the official speech site, while claimUrl and claimUrilOriginal link to BB and AA, respectively. This works because the itemReviewed, claimUrl, claimUrlOriginal are equivalent, all are just instances of the claim.

In the more semantically rich current proposal with workExample, the one stored inside itemReviewed gets the elevated status because it is the root element containing all the workExamples. Thus, in this case, should the speech be in the itemReviewed or the article AA? If the former, the semantics is a bit weird since politician B is not the original claim maker. If the latter, what about cases where the original claim maker can not be identified?

So this may be at the root of the discussion: if we want to capture Claim as @chaals and many on this thread are aiming for, what fields really identify a Claim? So far, we have pretty good fields to identify instances of a Claim, but not really a Claim in the sense that can unify all the instances. Is it possible that we don't actually have that information for most cases?

@jkosslyn

This comment has been minimized.

jkosslyn commented Jan 28, 2018

@cong-yu that proposal LGTM, @danbri what do you think?

@cong-yu in terms of the scenario you pose, the claim would be the rumor as asserted by Publisher A in article AA, and as amplified by Politician X and Publisher B. If the amplifications cite the original claim by Publisher A, then they would be workExamples; if the amplifications don't cite AA but rather assert the claim as fact, then they could be considered new examples of the claim with new authorship by Politician X and Publisher B.

In terms of the general principle, I think you're right that some Claims will have more fields and some will have less; some will be unified and some may be disjoint; etc. This seems to be a fundamental, unavoidable fact of the messiness of the real world. If there are enough instances of Claims with overlapping fields, perhaps they could be programmatically merged, at least in some cases.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 28, 2018

This direction is looking good, but let me check something. Good idea to reuse workExample, but is it intended that it is attached to the ClaimReview not the Claim (which was where "appearance" was shown in Justin's original).
This is the most recent example from Cong:
https://search.google.com/structured-data/testing-tool/u/0/?url=https%3A%2F%2Fgist.githubusercontent.com%2Fdanbri%2F07c8581de424942c446c4fa70a76f1ee%2Fraw%2F3aa786d592783d3020ac2a53b917933cc819d50c%2Fgistfile1.txt#url=https%3A%2F%2Fgist.githubusercontent.com%2Fdanbri%2F07c8581de424942c446c4fa70a76f1ee%2Fraw%2F3aa786d592783d3020ac2a53b917933cc819d50c%2Fgistfile1.txt

@cong-yu

This comment has been minimized.

cong-yu commented Jan 28, 2018

@danbri Good call, that was a typo, fixed.

@jkosslyn a bit confused here, are you suggesting the following: If the instances of the claim are independent (i.e., no citation connection), we should model them as separate itemReviewed; if they form a citation chain, we model them as workExamples where the one everyone cites becomes the itemReviewed? That seems too complicated semantically to maintain and also not backward compatible. Perhaps we can assume within a single ClaimReview, the instances of the claim are always connected and the reporter can simply use their judgement to pick one that gets the itemReviewed's author/datePublished treatment?

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 28, 2018

@cong-yu @jkosslyn having a single itemReviewed seems a little simpler conceptually to me, and consistent with the way other kinds of reviews are handled. But it can be awkward (as it was with the TV video vs smaller clip example) since you have to somewhat arbitrarily pick one entity as the focal point, even if the others can also be associated via relationships.

I would like to leave open the possibility (beyond the fact check review usecase) of having Claims relate to Claims, eg. well known claims having well known URLs. I'm not sure if there is a need for Claim-to-Claim metadata within typical fact check claim reviews though. We can probably get away with using sameAs URLs for the well known links, to keep the review markup simple.

Here's an example I've had in mind for a while, https://debategraph.org/Stream.aspx?nid=33108&vt=rgraph&dc=focus - debategraph's representation of the (claim that) "the 1953 coup deposed the democratically elected Mosaddeq government". Actually that is a pretty mild claim, the stronger claim is the more specific one that CIA and MI6 were the initiators of the coup. By having an explicit representation of these (increasingly detailed) claims, decoupled from any specific fact check review, we get an interesting foundation for linking this kind of data together.

@jkosslyn

This comment has been minimized.

jkosslyn commented Jan 29, 2018

@danbri agreed, having a single itemReviewed is conceptually clearer to me as well. I noticed that Clip inherits from CreativeWork; could we allow the itemReviewed to be any subclass of CreativeWork? As long as it has a url field, adding more optional fields wouldn't hurt the core usecase.

@cong-yu If the instances of the claim are independent (i.e., no citation connection), I think we should model them as separate ClaimReviews. I.e. two independent claims with two independent sources and contexts means two independent reviews. After all, in fact checking we've seen so many cases where the context tips the scales of the review.

Of course there may always be corner cases where citations are not explicit but are implied; in those cases editorial judgement would make the call.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 29, 2018

@jkosslyn - for a ClaimReview it may make sense to encourage only CreativeWork. There are other kinds of reviews too (cars, hotels, employers, ...), but they at least can share the common structure of having a single focal itemReviewed.

In passing (and wary of distractions), we should also note for the record that there may be some commonalities between Claim(Review) markup and Question/Answer sites. That does not mean we need to converge the two forms of markup, but it is worth bearing in mind that e.g. matters of historical or scientific fact are often debated on sites that are structured in terms of questions and candidate answers. For example: https://history.stackexchange.com/questions/26495/how-secret-was-the-us-and-british-involvement-in-the-1953-iranian-coup-d%C3%A9tat?rq=1 (see data extracted by Google).

@cong-yu

This comment has been minimized.

cong-yu commented Jan 29, 2018

OK, I think we are getting to some consensus. The remaining question is that, given we are not introducing new field, do we really need a new Claim type?

I do think there is value in introducing Claim type on its own, independent of the ClaimReview, with all the benefits of linking everything together, but it does sound like beyond the scope of this current extension to ClaimReview?

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 29, 2018

@cong-yu isn't the ability to say what we need without adding more custom properties a feature not a bug?

@cong-yu

This comment has been minimized.

cong-yu commented Jan 29, 2018

@danbri do you mean we can introduce the Claim type without needing to add a new field?

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 29, 2018

We could.

@rvguha

This comment has been minimized.

Contributor

rvguha commented Jan 29, 2018

@thadguidry

This comment has been minimized.

thadguidry commented Jan 29, 2018

Claim: Guha is intelligent. All humans have intelligence. Sometimes :)

+1 for new Claim type.

@cong-yu

This comment has been minimized.

cong-yu commented Jan 29, 2018

@rvguha @danbri Sounds good, in terms of backward compatibility implementation, can we have the itemReviewed be of both Claim and CreativeWork type?

@rvguha

This comment has been minimized.

Contributor

rvguha commented Jan 29, 2018

@jkosslyn

This comment has been minimized.

jkosslyn commented Jan 29, 2018

Of course, @BillAdairDuke , and thanks for weighing in. @danbri and @cong-yu will fact check me if I'm mistaken, but here's my understanding of the practical implications for journalists:

  • schema that is already deployed will continue to work
  • new schema can label its itemReviewed as "@type":"Claim" instead of "@type":"CreativeWork"
  • if a journalist wants to indicate the blameworthy party responsible for a claim, the journalist will continue to use the existing author field, including the optional SameAs field to specify the URL of the responsible party
  • if a journalist wants to indicate one or more places where a claim appeared, the journalist will use one or more "workExample" fields inside that Claim. The workExamples are CreativeWorks (@danbri or subclasses, right, like Clips?)
  • if one of the places where the claim appeared was the very first public instance of the claim, the journalist can add a field to that workExample that says "position":"first"

What'd I miss?

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 29, 2018

Absolutely agree with Bill that usability (including of our documentation/examples, not just the abstract structure is critical. Justin's summary overall makes sense, the only place I feel wobbly is the "position": "first" construction. How important is this? Could we just encourage use of dates (on cCaims, ClaimReviews, and CreativeWorks)?

*yes, subtypes of CreativeWork are also expected; sometimes a CreativeWork may be further contextualized by also describing a larger work it is part of too, eg video clip, book chapter, blog post.

@cong-yu

This comment has been minimized.

cong-yu commented Jan 29, 2018

@BillAdairDuke Bill, I think here is the consensus code among the thread.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/punditfact/statements/2018/jan/05/new-century-times/white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
  },
  "itemReviewed": {
    "@type": ["Claim", "CreativeWork"],
    "text": "An intelligence agency decided to fight back and sent Trump a message by escorting six staffers out of the White House for failing background investigations.",
    "author": {
      "@type": "Person",
      "name": "New Century Times",
      "jobTitle": "website",
      "sameAs": "http://newcenturytimes.com/"
    },
    "datePublished": "2017-12-30",
    "workExample": [
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/2017/12/30/breaking-fbi-just-raided-the-white-house-high-ranking-officials-thrown-out/",
        "position": "first"
      },
      {
        "@type": "CreativeWork",
        "url": "https://www.vote.us.org/memo/thread/3419/fbi-just-raided-the-white-house-6-people-thrown-out/"
      }
    ]
  },
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

A few more notes:

  1. @jkosslyn For backward implementation compatibility, let's keep both Claim and CreativeWork as a type of the itemReviewed. In the future, this may not be necessary, but I want it to be on the safe side.

  2. @danbri datePublished does not provide semantics of "original," since even if this example is earliest among the examples, it does not necessary mean it is the original claim. ["position": "first"] solves that.

  3. @BillAdairDuke workExample is an existing field and thus preferred. If the journalists strongly prefers some other terms, then we can introduce a new field for the Claim type. At the same time, the widget can use any term and seamlessly map into this field.

  4. While the structure has a hierarchy, itemReviewed to workExample, the semantics is not really hierarchical, journalists should feel free to put whatever they consider as the main claim-maker into the root structure.

  5. The Claim type will allow for future extension, including clips, books, etc. and eventually provide the basis for a Claim database.

@BillAdairDuke Do you want Chris to take a look at the code? I am also happy to chat for more clarification if needed.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 29, 2018

"the semantics is not really hierarchical" - unfortunately that is the exact problem with position: first, ... since the properties of the work being described aren't contextualized to the Claim or ClaimReview, but have to be objective properties of that entity directly.

One way to do that would be to use a more specific property than workExample. We might for example define "appearance" and also "firstAppearance" as relations linking the Claim to the CreativeWork. That would address my concern about position:first and provide a way to name the link in a journalist-friendly way while still having a documented pattern based on workExample. I think! Typed from an airport on a phone, I may have missed some nuance.

@thadguidry

This comment has been minimized.

thadguidry commented Jan 29, 2018

@cong-yu Why is "this example is earliest among the examples" not imply the notion of "original" to you ? Why does a Claim not signify that it is the "original" Claim in a list of workExamples if that Claim has the earliest datePublished ? I guess your saying that you want a quick and easy way for machines and humans to not have to calculate WHICH item that has the earliest datePublished and just get right to the original item with a position:first without much fuss ?

@cong-yu

This comment has been minimized.

cong-yu commented Jan 29, 2018

@danbri hmm ... isn't it the intension of the position field to put the item in context? (From the description: the position of an item in a series or sequence of items.) But I am open to introduce a new field or two (originalAppearance and appearance) under Claim. Shall we just go ahead and do it?

@thadguidry The journalist may not know who made the original claim, and thus it is not among the examples.

@danbri

This comment has been minimized.

Contributor

danbri commented Jan 29, 2018

@cong-yu @jkosslyn @BillAdairDuke et al - I have made a pass at implementing this. It needs more attention including an example or two and improvements to the text for the existing ClaimReview and itemReviewed definitions. But there are rough cut definitions for Claim, appearance and firstAppearance. Wording can be refined. You can see the raw definitions in the links to the configuration files here, or staged at http://pending.webschemas.org/Claim

@thadguidry

This comment has been minimized.

thadguidry commented Jan 30, 2018

@danbri Let's try to avoid using the type name again in the definition (why do you and @RichardWallis always seem to do that? :-) ) Suggest changing this "a specific, factually-oriented claim ..." to this "a specific, factually-oriented statement ..." or another synonym of choice.

@cong-yu

This comment has been minimized.

cong-yu commented Jan 30, 2018

Thanks @danbri !

A couple comments on the pending Claim type:

  1. Shall we clarify a bit on the text property for Claim and the claimReviewed property for ClaimReview? If a Claim is attached to a ClaimReview, then the text field, if missing, is assumed to the same as the the claimReviewed property of the ClaimReview it is attached to

  2. For firstAppearance: shall we consider change "first known" to "original", or better, change firstAppearance to originalAppearance? This may be a question for @BillAdairDuke as in do we care if this field contains the first known instance of the claim even though the reporter knows for sure this is not the origin?

@subbuvincent

This comment has been minimized.

subbuvincent commented Jan 30, 2018

Hi folks, I just caught up with this thread. All three central points are valid.

  • the need from the use case @BillAdairDuke brought up for claimOriginal and claimUrl
  • and the opportunity for a class of claims, a-la canonicals for claims @vholland @rvguha
  • and the reality of maintaining backwards compatibiity that @cong-yu rightly brings up

I tried something here using @danbri's exploration and still constraining it with the original use case @BillAdairDuke wants to address. Perhaps we could create @type Claim to allow forward adoption by letting claimReviewed take both text values and type-Claim values. If we allow a property called claimStage in type Claim, then we can distinguish between original and repeated in a series.

The rest can follow.

{
    "@context": "http://schema.org",
    "@type": "ClaimReview",
    "datePublished": "2017-07-06",
    "url": "http://danbri.org/2018/TODO",
    "claimReviewed": [“The moon is composed largely of green cheese, with some patches of Camembert.",
     		      {
		      “@type": "Claim",
              	      “claimStage”: “Original”,
                      "url": "https://en.wikipedia.org/wiki/The_Moon_is_made_of_green_cheese",
              	      “text”: ”The moon is made of green cheese."
            	      },
     		      {
		      “@type": "Claim",
              	      “claimStage”: “Repeated”,
                      "url": "https://the-moon-is-made-of-green-cheese.com”,
              	      “text”: ”The moon is made of green cheese."
            	      }
 		      ],
    "reviewBody": "This claim is false, people have been there to check.",
    "itemReviewed": {
	   	     "@type": "CreativeWork",
    		     "author":
    				{
      				"@type": “Organization”,   // just changed Person to Org - my pref.
      				"name": "New Century Times",
      				"sameAs": "http://newcenturytimes.com/"
    				},
    		    "datePublished": "2017-12-30"
  		    },
   "reviewRating":
    {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}

Does this help?

@cong-yu

This comment has been minimized.

cong-yu commented Jan 31, 2018

@subbuvincent Thanks Subbu, I do believe the mixture of textual value and nested value will break current implementations. If it is either/or, it might be tricky for the existing parsers.

@subbuvincent

This comment has been minimized.

subbuvincent commented Jan 31, 2018

@cong-yu Thanks. And this is sheepish but it looks like when I posted my comment, my browser was showing a cached version so I had missed several comments going back 4 days! Never mind :). It's great to see more progress since then.

@vinnygreen

This comment has been minimized.

vinnygreen commented Apr 17, 2018

Hello all! I'm head of operations and development for Snopes.com. I integrated the claim review mark-up into Snopes' WordPress Theme. I spent a lot of time thinking about this markup because it relates to the other projects I am working on like a misinformation triage engine and a commercial API.

The ClaimReview process is built into our fact-check template with a simple UI for our staff. My staff never sees the schema; they fill in fields I labeled in a way they understand.

I've reviewed the thread and wanted to chime in with Snopes' position on the markup:

  • SameAs field and itemReviewed are confusing, but the simple solution is just allowing multiple itemReviewed for each distinct claim which is not currently permitted.
  • If a page has many distinct claims, it requires many claimReviews. If a claimReview has multiple sources, it should require multiple itemRevieweds.
  • With multiple itemReviews available, we can have multiple sources of information and use the sameAs field as intended.
  • We could instruct end users to assume the earliest date of the itemReviewed is the earliest source of that claim that the publisher has identified. If they wanted to know what the individual fact checker considered to be the earliest claim URL, they would find the oldest datePublished in one of the many itemRevieweds and then search for the sameAs URL that links to an article.
  • Alternatively, we could add a new URL field called claimUrl in the itemReviewed section for the exact URL where the claim lives which could have its own datePublished as well.
  • My concern is that new or ambiguous fields will be ignored or misinterpreted by end-users creating more misinformation. Not to mention the burden on the journalist curating the data that can be inferred via a query by the end-user. The more data entry work for the journalist the greater the chance for error.
  • Backward compatibility is incredibly important.

I know I jumped in very late, but I hope this is helpful. I can be reached directly at vinny@snopes.com

Snopes Ideal Schema

  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2016-06-22",
  "url": "http://snopes.com/fact-check/a-snopes-fact-check/",

  "itemReviewed":
  {
    "@type": "CreativeWork",
    "author":
    {
      "@type": "Organization",
      "name": "Bad Actor 1",
      "sameAs": "https://twitter.com/badactor1",
      "sameAs": "https://wikipedia.com/badactor1",
    },
    {
      "claimUrl": "https://badactor1blog.com/article-with-claim/",
      "datePublished": "2016-06-20"
   }
  },

  "itemReviewed":
  {
    "@type": "CreativeWork",
    "author":
    {
      "@type": "Organization",
      "name": "Bad Actor 2",
      "sameAs": "https://twitter.com/badactor 2",
      "sameAs": "https://wikipedia.com/badactor 2"
    },
    {
      "claimUrl": "https://badactor2blog.com/article-with-claim/",
      "datePublished": "2016-06-20"
   },
    {
      "claimUrl": "https://facebook.com/badactor2blog/212131342334536342",
      "datePublished": "2016-06-21"
   }
  },

  "claimReviewed": "Bad Actors say what?",
  "author":
  {
    "@type": "Organization",
    "name": "Snopes.com"
  },
  "reviewRating":
  {
    "@type": "Rating",
    "ratingValue": "1",
    "bestRating": "5",
    "worstRating": "1",
    "alternateName" : "False"
  }
}```
@danbri

This comment has been minimized.

Contributor

danbri commented Jul 11, 2018

I've been looking into the consequences of this for neighbouring pieces of Schema.org, i.e. non fact-checking reviews. I have opened an independent issue on that, since Schema.org needs to have a clear picture of when/whether itemReviewed can be repeated. I believe we can make this work and just need to do the housekeeping to keep related definitions in sync.

I'd like to say

  • "itemReviewed" in general case: most typically points to a single thing
  • but can point to several things, if a review is reviewing several things
  • conventions and common patterns may vary between the different subtypes of Review (UserReview, CriticReview, ClaimReview etc.).
  • Fact checking reviews (reviews of Claims, i.e. ClaimReview) can use repeated itemReviewed to describe various appearances of several claims that it makes sense to evaluate within a common review (i.e. fact-check). We need not be strict here about whether they are multiple appearances of exactly the same claim, or variations on a theme.
  • Other kinds of reviews sometimes have multiple items reviewed in an integrated writeup, this is probably more typical of critic reviews than user reviews, and what we do here for fact checking shouldn't impact those other types, beyond the general clarification that "itemReviewed" is sometimes repeated.
  • We can note that it is possible to use the various properties of the items reviewed (their publication dates, or perhaps mainEntityOfPage markup) to point to the earliest or primary item, if there are several. Note that we are not imposing a single approach to that matter here. Conventions may emerge, and they may be subtype-specific.
@vinnygreen

This comment has been minimized.

vinnygreen commented Jul 11, 2018

@danbri Some notes for you:

From my perspective, the item(s) being reviewed should purport the claim being reviewed with little left to the imagination. Allowing for variation could make understanding the data and using in applications more difficult.

If this data is being utilized in an application, it's easier to convey that the item's being reviewed strictly match the claim. Otherwise, we would need the publisher to express the reason or theme of the reviewed items. This might stretch the definition of "claim" and cause more issues.

When considering how multiple itemRevieweds can be deployed across other subtypes, we need to consider if it multiple distinct Reviews is more appropriate. More clearly, is the need for many itemRevieweds solved by having many Reviews on the page?

We build collections of ClaimReviews in our archives. We convey the theme on the page through the title and description and included many ClaimReviews. Google uses them in the knowledge panel currently.

See: https://www.snopes.com/fact-check/category/photos/

@cong-yu

This comment has been minimized.

cong-yu commented Jul 16, 2018

FYI The following is the example code taking Vinny and Dan's comments into consideration. It leverages the repeatability of itemReviewed, which Dan is hoping to get approved soon. No new type or attribute is introduced (Claim type is already being proposed in a separate issue).

Regarding the granularity of the repeatability:

  1. Same claim, same claimant, different instances => use different workExample within same itemReviewed for repetition

  2. Same claim, different claimant => use itemReviewed for repetitiion

  3. Different claim => use different ClaimReview

Vinny has a good point on being careful about the variants of the claim (whether two claims are really different or simply slight variants of the same), it is an editorial call and the idea is that each publisher should think carefully about that and use either 3 or 1/2 as appropriate.

{
  "@context": "http://schema.org",
  "@type": "ClaimReview",
  "datePublished": "2018-01-05",
  "url": "http://www.politifact.com/.../white-house-staffers-removed-job-subject-false/",
  "claimReviewed": "Breaking: FBI just raided the White House, 6 people thrown out.",
  "author": {
    "@type": "Organization",
    "name": "PolitiFact",
    "url": "http://www.politifact.com" 
  },
  "itemReviewed": [
  {
    "@type": ["Claim", "CreativeWork"],
    "author": {
      "@type": "Organization",
      "name": "New Century Times",
      "sameAs": "http://newcenturytimes.com/"
    },
    "appearance": [
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/...-white-house-high-ranking-officials-thrown-out/",
        "datePublished": "2017-12-30"
      },
      {
        "@type": "CreativeWork",
        "url": "http://info.nct.news/...-white-house-high-ranking-officials-thrown-out-2/",
        "datePublished": "2017-12-31"
      }
    ]
  },
  {
    "@type": ["Claim", "CreativeWork"],
    "author": {
      "@type": "Organization",
      "name": "Vote US",
      "sameAs": "http://www.vote.us.org/"
    },
    "appearance": [
      {
        "@type": "CreativeWork",
        "url": "https://www.vote.us.org/...-raided-the-white-house-6-people-thrown-out/",
        "datePublished": "2017-02-16"
      }
    ]
  }],
  "reviewRating": {
    "@type": "Rating",
    "ratingValue": "2",
    "bestRating": "7",
    "worstRating": "1",
    "alternateName": "False"
  }
}
@danbri

This comment has been minimized.

Contributor

danbri commented Jul 16, 2018

Thanks @cong-yu - we already added Claim (with the usual 'pending' status for new things) in this last release.

Wouldn't we rather use the "appearance" (and even "firstAppearance") properties to link from the Claim to the CreativeWork? (rather than workExample)

There's also no strong need to multiple-type ("@type": ["Claim", "CreativeWork"]); Claim is a defined in the schemas as a CreativeWork so the extra info doesn't add much except page weight.

@BillAdairDuke

This comment has been minimized.

BillAdairDuke commented Jul 16, 2018

@danbri

This comment has been minimized.

Contributor

danbri commented Jul 16, 2018

@cong-yu

This comment has been minimized.

cong-yu commented Jul 16, 2018

@danbri yep, appearance should work as well, I will update my comment

@BillAdairDuke Yeah, I would leave it to the fact checkers to determine how similar two claims are to each other that a single ClaimReview is warranted ... @danbri this is something hard to concretely define since it is somewhat subjective, and I would not try to clarify this too much ...

@cong-yu

This comment has been minimized.

cong-yu commented Jul 16, 2018

@danbri FYI using just Claim still gives error in SDTT so I will keep both for now.

@danbri

This comment has been minimized.

Contributor

danbri commented Jul 16, 2018

@cong-yu - then let's update the Google tool!

And yes, I agree re hard to define. Maybe something soft like:

"several variations on the same claim (e.g. different phrasings but in the same social/political context) can be handled within a single ClaimReview via repetition of itemReviewed, but the idea of a ClaimReview is that more or less a single idea is being examined."

@danbri

This comment has been minimized.

Contributor

danbri commented Jul 26, 2018

@BillAdairDuke @vinnygreen can you help us with some guiding examples from Snopes, Politifact or elsewhere that exercise some of the variations we've discussed above?

For example, @cong-yu 's list,

Same claim, same claimant, different instances => use different workExample within same itemReviewed for repetition
Same claim, different claimant => use itemReviewed for repetitiion
Different claim => use different ClaimReview```

If you have examples showing these kinds of variation on real fact check sites that would be really helpful in fixing up our canonical guidelines in a way that is technically sound and aligned with fact-checking practices. In particular also any examples with multiple languages (of the claim being translated or the whole factcheck) would help.

Dan
@BillAdairDuke

This comment has been minimized.

BillAdairDuke commented Jul 26, 2018

@danbri danbri changed the title from ClaimReview needs field for URL of claims to ClaimReview needs field for URL of claims / integration of new Claim type Jul 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment