Skip to content

Latest commit

 

History

History
149 lines (112 loc) · 11.5 KB

README.md

File metadata and controls

149 lines (112 loc) · 11.5 KB

Wiki Media Cred

Metaphactory dataviz of news-media subclasses in Wikidata

[drafting…]
Project Goal: Turn Wikimedia into a news-site credibility tool.

These are resources and a data diary for the WikiCred/Iffy.news project, adding news-site credibility indicators, found in external databases, into Wikidata/Wikipedia. The external data I have was mostly U.S. and English. Those with news-media data for other countries and languages may find this repo helpful.

The following workflow came from trial and many errors in my attempts to:

  • Find news-media items in Wikidata.
  • Create new items for news-media not in Wikidata.
  • Match news-media items in Wikidata with their domain names (to relate Wikidata items with their entries in external databases).
  • Add data from external media databases into Wikidata (especially crediility indicaters like press-association membership and street address).

Data dumps

Useful datasets created by this project include (more coming):

Tools used

I gathered Wikidata items with the Wikidata Query Service searches (example: news media in the United States), added data with Quick Statements (example: add place of publication) and wikibase-cli, and merged Wikidata with external datasets mostly in Google Sheets, helped by the Wikipedia and Wikidata Tools sheets add-on.

After starting over several times, I remembered my betters taught me to make each step replicable and reversable — so I could back out of any import mess I made. To do this, I usually added a column with a sortable flag, indicating the source of imported data — to track where things like circulation estimates and domain names came from. As they (should) say in the tech world: Move slow and fix things.

How Wikidata thinks

Wikidata stores stuctured data used in Wikipedia and other Wikimedia projects. It's a collection of entries for Items, "all the things in human knowledge, including topics, concepts, and objects." Each Item has its own page, URL, and unique QID (Q + a number).

Property values

The Denver Post (Q2668654) is an item. It has a label (its name), QID , a short description ("daily newspaper in Denver, Colorado"), and aliases (alternative names: "Denver Post | denverpost.com"). Those are followed by a list of Statements about the item. Statements have a Property (P + a number) and a Value (in that property's specified data type):

property value (data type)
instance of (P31) daily newspaper (Q1110794) (Item)
inception (P571) 1892 (Point in time)
official website (P856) https://www.denverpost.com/ (URL)

News media often have a separate list of statements under the heading Identifiers. Those properties have a data type called External identifier, for example, Facebook ID (P2013) and ISSN (P236), the International Standard Serial Number.

Class conciousness

An item isn't always one thing. It can be a concept: a Class of things in a heirarchy, with one item being a subclass of (P279) of another. Each of these newsy items is a subclass of the one to its left:

media ➡️ mass media ➡️ news media ➡️ written news media ➡️ newspaper ➡️ daily newspaper

Coordinate categories

For this project, it was convenient to have all news-media outlets in Wikidata be an instance of, or instance of a subclass of, news media. Most already were. But some news outlets weren't showing up because they were instances of classes that weren't in the news media heirarchy (e.g., investigative journalism (Q1127717), news program, news magazine. I fixed those by added statement making them a news media subclass. (Check this network chart in Wikidata Graph Builder).

mindmap
	root((news media))
		id(news agency)
			news photo agency
			newswire
		id(newsletter)
			municipal newsletter
			night letter 1>
			school newsletter
			stock exchange newsletter
			Wikimedia newsletter
		id(investigative journalism)	
		id(medical press)	
		id(news program)	
			current affairs shows
			flagship newscast
			television news magazine
			United States cable news
		id(news broadcasting)	
			children's broadcasted news
			current affairs
			election broadcast
			reporting television program
		id(news magazine)
		id(news media in the United States)
		id(news website)
			fake news website
			news aggregation website 2>
			online newspaper
			sports news website
			television news website
			video game news website 1>
		id(talk radio)
			conservative talk radio
			Internet talk radio
			Progressive talk radio
		id(press center)	
		id(written news media)
			Cooperative press
			Famille de presse
			newspaper 202>
		id(women's press)

Subclasses of news media, 2 levels down

[@Todo: Briefly explain diff btwn instance and subclass] A few news-outlets were instance of items that should new-media subclasses but weren't (e.g., news program and news magazine. I brought them into the fold (i.e., made them a news media subclass, or subclass of a news media subclass.)

The classification wrangling went something like this:

  1. Get all news outlets under one general category: news media.
  2. Get subclasses into logical categories (one or two levels down).
  3. Change specific new-outlets improperly assigned subclass to 'instance of`.
  4. Label unlabled subclasses (one or two levels down), consulting the item's Wikipedia article or official website for the best name.

Match domains

[@Todo: Describe ways to find, confirm website URL, and add both to new-media items.]

Crowd-wisdom classes

[@Todo: Briefly explain: Find out which properties WD folk use most often for news-media items. Then go with the wiki-crowd wisdom in deciding which property/class to use.]

Put publications in their place

[@Todo: Briefly explain: The city was most often a place of publication, but sometimes was headquarters location (P159), location (P276), and/or located in the administrative territorial entity (P131). Done: Add place of publication to all news media. Todo: Add street address (P6375) (use format in prop's example: street, city, state, zip)]

Preparing for the End Times

[@Todo: Briefly explain: The date a publicaton ceased was 90% in dissolved, abolished or demolished (P576) statements, with the rest as end time (P582). Done: Copy all dates in end time into dissolved… (with precision: day, month, or year).]

Members only

[@Todo: Briefly explain: Membership in a press asscoiation was almost always member of (P463) but a few times affiliation (P1416).]