Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update /MediaReview codelist with new content from the fact-checker community #2844

Open
danbri opened this issue Feb 18, 2021 · 8 comments
Open
Assignees
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes.

Comments

@danbri
Copy link
Contributor

danbri commented Feb 18, 2021

This is part of #2450 but just for the codelist piece.

From Duke (Nov 2020):

MediaReview Fields and Labels (as of November 2020)
Media Type: VIDEO

RATINGS
Original: No evidence the footage has been misleadingly altered or manipulated, though it may contain false or misleading claims.
Missing Context: Presenting unaltered video in an inaccurate manner that misrepresents the footage. For example, using incorrect dates or locations, altering the transcript or sharing brief clips from a longer video to mislead viewers. (A video rated “original” can also be missing context.)
Edited: The video has been edited or rearranged. This category applies to time edits, including editing multiple videos together to alter the story being told or editing out large portions from a video.
Transformed: Part or all of the video has been manipulated to transform the footage itself. This category includes using tools like the Adobe Suite to change the speed of the video, add or remove visual elements or dub audio. Deepfakes are also a subset of transformation.
Staged: A video that has been created using actors or similarly contrived.
Satire/Parody: A video that was created as political or humorous commentary and is presented in that context. (Reshares of satire/parody content that do not include relevant context are more likely to fall under the “missing context” rating.)

OTHER FIELDS
Video URL: Link to the page containing the video, such as an article or social media post
Original Media URL: Link to the original, non-manipulated version of the video (if available)
Original Media Context: A short sentence explaining the original context if media is used out of context
Timestamp of video edit (in HH:MM:SS format)
Ending timestamp of video edit, if applicable (in HH:MM:SS format)
[Fact-checker] Article URL

Media Type: IMAGE

RATINGS
Original: No evidence the image has been misleadingly altered or manipulated, though it may still contain false or misleading claims.
Missing Context: Presenting unaltered images in an inaccurate manner to misrepresent the image and mislead the viewer. For example, a common tactic is using an unaltered image but saying it came from a different time or place. (An image rated “original” can also be missing context.)
Cropped: Presenting a part of an image from a larger whole to mislead the viewer.
Transformed: Adding or deleting visual elements to give the image a different meaning with the intention to mislead.
Staged: An image that was created using actors or similarly contrived, such as a screenshot of a fake tweet.
Satire/Parody: An image that was created as political or humorous commentary and is presented in that context. (Reshares of satire/parody content that do not include relevant context are more likely to fall under the “missing context” rating.)

OTHER FIELDS
Image URL: Link to the page containing the image, such as an article or social media post
Original Media URL: Link to the original, non-manipulated version of the image (if available)
Original Media Context: A short sentence explaining the original context if media is used out of context
[Fact-checker] Article URL

Media Type: IMAGE WITH OVERLAID/EMBEDDED TEXT

RATINGS
Original: No evidence the image has been misleadingly altered or manipulated, though it may still contain false or misleading claims.
Missing Context: An unaltered image presented in an inaccurate manner to misrepresent the image and mislead the viewer. For example, a common tactic is using an unaltered image but saying it came from a different time or place. (An “original” image with inaccurate text would generally fall in this category.)
Cropped: Presenting a part of an image from a larger whole to mislead the viewer.
Transformed: Adding or deleting visual elements to give the image a different meaning with the intention to mislead.
Staged: An image that was created using actors or similarly contrived, such as a screenshot of a fake tweet.
Satire/Parody: An image that was created as political or humorous commentary and is presented in that context. (Reshares of satire/parody content that do not include relevant context are more likely to fall under the “missing context” rating.)

OTHER FIELDS

Image With Overlaid/Embedded Text URL: Link to the page containing the image with overlaid/embedded text, such as an article or social media post
Original Media URL: Link to the original, non-manipulated version of the image with overlaid/embedded text (if available)
Original Media Context: A short sentence explaining the original context if media is used out of context
[Fact-checker] Article URL

Media Type: AUDIO

RATINGS
Original: No evidence the audio has been misleadingly altered or manipulated, though it may contain false or misleading claims.
Missing Context: Unaltered audio presented in an inaccurate manner that misrepresents it. For example, using incorrect dates or locations, or sharing brief clips from a longer recording to mislead viewers. (Audio rated “original” can also be missing context.)
Edited: The audio has been edited or rearranged. This category applies to time edits, including editing multiple audio clips together to alter the story being told or editing out large portions from the recording.
Transformed: Part or all of the audio has been manipulated to alter the words or sounds, or the audio has been synthetically generated, such as to create a sound-alike voice.
Staged: Audio that has been created using actors or similarly contrived.
Satire/Parody: Audio that was created as political or humorous commentary and is presented in that context. (Reshares of satire/parody content that do not include relevant context are more likely to fall under the “missing context” rating.)

OTHER FIELDS
Audio URL: Link to the page containing the audio, such as an article or social media post
Original Media URL: Link to the original, non-manipulated version of the audio (if available)
Original Media Context: A short sentence explaining the original context if media is used out of context
Timestamp of audio edit (in HH:MM:SS format)
Ending timestamp of audio edit, if applicable (in HH:MM:SS format)
[Fact-checker] Article URL

@danbri danbri self-assigned this Feb 18, 2021
danbri added a commit that referenced this issue Feb 18, 2021
@github-actions
Copy link

This issue is being tagged as Stale due to inactivity.

@github-actions github-actions bot added the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Apr 20, 2021
@danbri danbri removed the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Apr 28, 2021
@danbri
Copy link
Contributor Author

danbri commented Apr 28, 2021

Let me update where this is up to.

For how we got here see https://www.niemanlab.org/2020/01/is-this-video-missing-context-transformed-or-edited-this-effort-wants-to-standardize-how-we-categorize-visual-misinformation/ and https://firstdraftnews.org/latest/wapo-guide-to-manipulated-video/

  • the 12.0 release (cf. Version 12 Planning (goal: 2021-03-08) #2813) last month included the codelist MediaManipulationRatingEnumeration which reflects the efforts of folk in the factchecking and related communities, with the definitions based on definitions pulled together by the Duke Reporters Lab team (e.g. see https://reporterslab.org/what-is-mediareview/).
    • the list is DecontextualizedContent, EditedOrCroppedContent, OriginalMediaContent, SatireOrParodyContent, StagedContent, TransformedContent.
    • Each of these has definition sub-sections that explain what it means in an image vs video vs audio vs image-with-text setting.
    • this provides a foundation to build out from. In particular, there are plenty of nuances around media object versioning, representing surrounding context, etc. We'll talk through that here.
  • the following example is as far as we've got in developing things further, I'll post the example and then talk through the design issues.
{
  "@context": "https://schema.org",
  "@type": "MediaReview",
  "datePublished": "2021-04-01",
  "url": "http://example.org/example/TODO",
  "author": {
    "@type": "Organization",
    "name": "Politifact"
  },
  "mediaAuthenticityCategory": "DecontextualizedContent",
  "originalMediaContextDescription": "The image is a screenshot of planetarium software meant to depict a real scene.",
  "itemReviewed": {
    "@type": "MediaReviewItem",
    "creator": {
      "@type": "Organization",
      "name": "Astrophile Daily",
      "url": "http://astrophiledaily.com/",
      "sameAs": "https://twitter.com/phy_astrodotcom"
    },
    "interpretedAsClaim": {
      "@type": "Claim",
      "description": "Photo shows 'Earth, Venus and Jupiter as seen from Mars.' "
    },
    "appearance": [
      {
        "@type": "ImageObjectSnapshot",
        "description": "the image shows a red, rocky surface and three glowing orbs in the sky that resemble Orion’s Belt",
        "sha256sum": [
          "395ce30d3bddbf179acf94c86eb77d6190c5a4c8dd462127bd72f116161e73a7",
          "b29ef40884181309a22bc95a821ec9361edd0d42c1dfb92ec0e777a704523caa"
        ]
      },
      {
        "@type": "ImageObjectSnapshot",
        "accessedOnUrl": "https://www.faceb[...etc]7000459996/?type=3&theater",
        "contentUrl": "https://www.faceb[...etc]7000459996/?type=3&theater",
        "archivedAt": "https://archive.is/9Yb6E"
      }
    ]
  }
}

Proposed Next Steps

These are some specific suggested additions/changes for schema.org, often with notes on the rough underlying requirements - comment welcome especially from implementors). The driver is basically "imagine using Schema.org JSON-LD to describe a media file with the codelist - what else is most likely going to be useful and feasible to say, so that the data can be used to enhance ClaimReview-like usecases and other anti-misinformation scenarios?".

  • Something like an /originalMediaContextDescription property, a subproperty of /description, to capture (in the case of decontextualized content) the way in which additional understanding might improve the interpretation of the image.
  • New subtypes of ImageObject, VideoObject, AudioObject, called ImageObjectSnapshot (etc)., indicating that a single set of bytes that never change.
  • Expectation that two byte-identical files are considered to be the exact-same ImageObjectSnapshot (same for video, audio). Note the consequences - that filesystem metadata is not considered as part of the content, and that certain simple images (e.g. zero-byte file, a totally white PNG file, etc.) could potentially arise independently with no causal connection to each other.
  • Expectation that we will add hashing-related properties, e.g. /sha256hash
  • Define some relationship from the MediaReview or the associated claim, to one or more appearances of the media item. Drafted using "appearance" here for now.
  • Introduce a MediaReviewItem type, which serves to bundle together all the targets of a MediaReview so that there is a single obvious target for the /itemReviewed property which all Reviews expect.
  • Explore an /interpretedAsClaim property that relates that bundle to a specific claim. This could be exact words from the audio/video/image, or a journalistic interpretation, cleaned up e.g. to remove hate speech or implicit context.
  • For each snapshot, we'll want to be as clear as possible about where it was found, can be found (e.g. /archivedAt), and look into best practices for assigning an identifier. These could be arbitrarily assigned by the creator of the markup (or tooling) or use whatever current best practices are for URIs based on hashes (e.g. see https://tools.ietf.org/html/rfc6920 https://github.com/hash-uri/hash-uri ....). This would need careful documentation.
  • We anticipate that these media object snapshot entities would potentially be described also with perceptual hashes (see Wikipedia), but note that there are potential usability and ethical considerations to consider, e.g. in case of complex algorithms that machine learning from training data).
  • there's an interesting case Joel Luther raised: there is a slight mismatch between the types prioritized by the mediareview collaboration (image, image with text, video, audio) vs schema.org, which only has ImageObject, VideoObject, AudioObject.
    • we considered a new dedicated type but it felt weird, and also verbose because we'd then also need StaticImageWithTextObject if we added ImageWithTextObject.
    • we considered re-using the /caption property to imply this was an "image with text", but since /caption is the closest thing we have now to HTML's "alt" structure it seems inappropriate, since every ImageObject ideally is properly described this way for accessibility by blind and partially sighted users.
    • conclusion was to add a dedicated /embeddedTextCaption property, defined as a subproperty of /caption. This would be used for cases where the caption text was embedded in the content somewhere (primarily but not necessarily static images - not sure about GIFs, video. Can't think of a way to make it make sense for audio, but it may still make most sense as a property of MediaObject to cover future subtypes too.

There are clearly rough edges here and decisions to be made, but I feel the above sets a path to flesh out the missing details around MediaReview beyond the core codelist we published in the last release. I will work up some schema definitions in the direction outlined, for discussion and implementation feedback.

@joelwluther
Copy link

For how we got here see https://www.niemanlab.org/2020/01/is-this-video-missing-context-transformed-or-edited-this-effort-wants-to-standardize-how-we-categorize-visual-misinformation/ and https://firstdraftnews.org/latest/wapo-guide-to-manipulated-video/

Thanks @danbri. A few more reference documents detailing how we got to this place:

  • In the fall of 2019, the Reporters’ Lab and partners began working on adapting the Washington Post’s taxonomy into a proposed Schema for fact-checks of manipulated images and videos. The first draft of this effort is available with comments here.
  • Following an open feedback period, the Reporters’ Lab incorporated suggestions into a second draft of the taxonomy. This draft was emailed to all signatories of the International Fact-Checking Network's Code of Principles on October 17, 2019, and was made available for public comment.
  • We incorporated suggestions from that document into a draft Schema.org proposal and began to test MediaReview for a selection of fact-checks of images and videos. Our internal testing helped refine the draft of the Schema proposal, and shared an updated version with IFCN signatories on November 26. We also re-shared this draft, seeking comment, in the IFCN Slack on December 4.
  • On January 30, 2020, the Duke Reporters’ Lab, the International Fact-Checking Network, and Google hosted a Fact-Checkers Community Meeting at the offices of the Washington Post. 46 people, representing 21 fact-checking outlets and 15 countries, were in attendance. We presented slides about MediaReview, asked fact-checkers to test the creation process on their own, and again asked for feedback from those in attendance.
  • The Reporters' Lab began a testing process with prominent fact-checkers in the United States (FactCheck.org, PolitiFact, and the Washington Post) in April 2020. We have publicly shared their test MediaReview entries, now totaling 300, throughout the testing process.
  • On June 1, 2020, we wrote and circulated a document summarizing the remaining development issues with MediaReview, including new issues we had discovered through our first phase of testing. We also proposed new Media Types for “image macro” and “audio,” and new associated ratings, and circulated those in a document as well. We published links to both of these documents on the Reporters’ Lab site (We want your feedback on the MediaReview tagging system) and published a short explainer detailing the basics of MediaReview (What is MediaReview?)
  • We again presented on MediaReview at Global Fact 7 in June 2020, detailing our efforts thus far and again asking for feedback on our new proposed media types and ratings and our Feedback and Discussion document. The YouTube video of that session has been viewed over 500 times, by fact-checkers around the globe, and dozens participated in the live chat.
  • We hosted another session on MediaReview for IFCN signatories on April 1, 2021, again seeking feedback and updating fact-checkers on our plans to further test the Schema proposal.

Joel Luther, Duke Reporters' Lab

@danbri
Copy link
Contributor Author

danbri commented Apr 29, 2021 via email

@danbri
Copy link
Contributor Author

danbri commented May 19, 2021

A couple of updates for collaboration and transparency

I spoke last week with Leigh Dodds (@ldodds) who is working with Full Fact, and had been giving the ClaimReview and related schemas some careful attention. He mentioned a few points that touch on MediaReview design issues, so I'm recording them here. Leigh may share something more carefully written elsewhere. This is partial.

  • itemReviewed: he felt it would be natural for this to be a repeating property. For example in the case of an examination of multiple distinct but related claims, e.g. in a debate amongst politicians.
    • we noted that itemReviewed on ClaimReview is inherited from its longstanding use on Review. Leigh pointed out that many reviews are about several things, which would be a more substantive thing to change (even if reasonable). See also itemReviewed inverseOf review #161 on "review" being almost an inverse and not setting corresponding constraints.
    • Leigh mentioned that sometimes 'name', 'description' and 'value' seem to compete for being the main way to summarize the text of a Claim. Also that "firstAppearance" is hard to determine. We talked around the perceived vagueness of what "appearance" points to. Whether or not the thing pointed to is from the same author (if any) as the original claim, etc.
  1. I spoke today with Erica Ryan, Joel Luther, Mark Stencel from Duke, on the MediaReview work. They had been working through examples of video content using the designs outlined above.
  • we are lacking a field to capture a link to original media. This came up in the video usecase but is general. Link is most commonly to a landing page rather than to a specific media object (which in the video case is usually hard for users to find, which also makes the sha-hashing properties less relevant for video). So, this would be:
    • add a new property "originalMediaLink" on MediaReview, definition something like "Link to the page containing an original version of the content", particularly for edited/manipulated media. We discussed whether it could sometimes be a factchecking site, and I wouldn't want to rule this out. Example was https://www.snopes.com/fact-check/mookie-wilson-dinosaurs/ (although in this case the URL https://www.villagevoice.com/2018/09/15/they-crawled-out-of-the-swamps-to-save-the-mets/ is available from the same publisher as the 1986 original). This also makes case for the property being repeateable. Definition should avoid endorsing the original - just because it was edited/manipulated we can't assume much else - it could be hateful, mischievous, satirical or just plain wrong.
    • We should work through an out of context example too.

This business of whether itemReviewed is repeatable relates to MediaReview. With MediaReview we are trying to be clearer that the review is of a MediaReviewItem (which is a containing structure for various things e.g. versions of an image). If itemReviewed is not repeatable on ClaimReview, then this pushes content elsewhere: e.g. into having lots of ClaimReviews in the same page (in which case is there value in having FactCheckArticle as an Article type to capture that practice?). Mark mentioned that WaPo sometimes examines several claims in one go, but focusses on the lead/earliest. One way to deal with this within the current structure would be via multiple claims / appearances, but we don't seem to have a settled pattern for it yet.

Also discussed ephemeral content, clubhouse-style, Fleets, etc., and agreed that for now the focus is naturally on items that get shared, rather than places where troubling things are said. Consequently, if MediaReview is relevant it's likely because someone has e.g. screen-captured content which would otherwise have been ephemeral. In which case originalMediaContextDescription would be a reasonable place to document that, since linking to the original doesn't make sense.

(excuse scrappy notes, but I wanted to get something out before memory blurs further!)

@github-actions
Copy link

This issue is being nudged due to inactivity.

@github-actions github-actions bot added the no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). label Jul 19, 2021
@danbri
Copy link
Contributor Author

danbri commented Aug 16, 2022

Nearby: #3162

@cguess
Copy link

cguess commented Jan 10, 2023

@danbri has there been any movements on this? As we at Duke (tagging @joelwluther) are beginning to implement tooling based on MediaReview some of these concerns are becoming more necessary to address.

Specifically accessedOnUrl is in your example here however, it has not been added to the official Schema.org page.

Please let us know about how best to proceed and what steps the community can assist with in making this happen.

@danbri danbri added the Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes. label Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). Queued for Editorial Work Editor needs to turn issues/PRs into final code and release notes.
Projects
None yet
Development

No branches or pull requests

3 participants