Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema.org should have mappings to Wikidata terms where possible #280

Open
danbri opened this issue Jan 23, 2015 · 189 comments
Open

Schema.org should have mappings to Wikidata terms where possible #280

danbri opened this issue Jan 23, 2015 · 189 comments
Assignees
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). schema.org vocab General top level tag for issues on the vocabulary standards + organizations Relationships, liaison, mappings between our work and standards elsewhere status:work expected We are likely to, or would like to, or probably should try, ... to do something in this area.

Comments

@danbri
Copy link
Contributor

danbri commented Jan 23, 2015

From Lydia Pintscher in https://twitter.com/nightrose/status/558549091844886528

@danbri any issue to track progress on http://schema.org  mapping to Wikidata? 
Maybe even get people to help out?

Update 2016-01-26 - since the original post there have been some improvements at both Wikidata and Schema.org:

  • Wikidata: mappings (exact, super/sub) from properties and (perhaps to a lesser extent in that the notion isn't so built-in) types to schema.org can be expressed within Wikidata.
  • Wikidata now has a SPARQL endpoint at https://query.wikidata.org which is the most natural way of retrieving data; other explorations such as JSON dumps below are less important now.
  • Schema.org has updated its extension mechanism and is encouraging both hosted and external extensions.
  • D3-compatible RDFS JSON-LD is published from schema.org and can be used for visualization; this would also be a good model for getting an overview of Wikidata. See http://bl.ocks.org/danbri/1c121ea8bd2189cf411c for example visualization.
  • Various notes towards using Wikidata as an extension language for Schema.org are explored towards the end of this issue, as are SPARQL queries for extracting Wikidata's structure and property metadata for use in mappings.

 Nearby

@danbri danbri added schema.org vocab General top level tag for issues on the vocabulary status:work expected We are likely to, or would like to, or probably should try, ... to do something in this area. standards + organizations Relationships, liaison, mappings between our work and standards elsewhere labels Jan 23, 2015
@danbri danbri self-assigned this Jan 23, 2015
@danbri danbri added this to the 2015 Q1 milestone Jan 23, 2015
@danbri
Copy link
Contributor Author

danbri commented Jan 23, 2015

Notes from IRC,

@lydiapintscher
Copy link

Here is how mapping can be done on the Wikidata side for example: https://www.wikidata.org/wiki/Property:P31

The JSON dumps are the best dumps.

@innovimax
Copy link

+1

@elf-pavlik
Copy link
Contributor

happy to help here a little! I had chance to meet few people from Wikidata crew during 31C3 and remember that serving turtle also needs some fixing... but it already uses schema.org quite a lot!

$ curl http://www.wikidata.org/entity/Q80 -iL -H "Accept: text/turtle"

@danbri
Copy link
Contributor Author

danbri commented Jan 25, 2015

I went looking for the code that generates this. For those without turtle, an excerpt from running

curl http://www.wikidata.org/entity/Q42 -iL -H "Accept: text/turtle"

(full response is at https://gist.github.com/danbri/66616096d42e595376f6 )

[update]Hmm actually you can get it all in the browser without using content negotiation, just via suffixes:

( edit! I have moved a big chunk of text to https://gist.github.com/danbri/181ff7763f479c397e10 - apologies to those who got accidental notifications due to the '@' symbol.)

This is great but also unfortunately "the easy part" in that these are fixed built-in properties that each Wikidata entry will always carry.

Looking around for relevant source code,

It would be interesting to see how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata, at @lydiapintscher mentioned re https://www.wikidata.org/wiki/Property:P31

@ppKrauss
Copy link

I agree, "Schema.org should have mappings to Wikidata terms where possible". How to vote? or how to colaborate and/or check work in progress? There are a link about work in this issue?

@elf-pavlik
Copy link
Contributor

@danbri please remember to fence code snippets with three backticks which can also include clue for syntax highlighting

```ttl
  code goes here @bg @dr @mr
  @prefix data: http://www.wikidata.org/wiki/Special:EntityData/ .
  @prefix schema: http://schema.org/ .
  no mentions using @foo

```

also see code tab in Examples of github markdown https://guides.github.com/features/mastering-markdown/#examples

@elf-pavlik
Copy link
Contributor

@ppKrauss I think people would appreciate more machine readable mappings using owl:equivalentProperty etc.
e.g.

<link property="owl:equivalentProperty" href="http://purl.org/dc/terms/description"/>

IMO we could consider everything from subset of OWL used by RDFa Vocabulary Entailment
http://www.w3.org/TR/rdfa-syntax/#s_vocab_expansion

@ppKrauss
Copy link

@elf-pavlik thanks (!), so the issue now is only to add something as
<link property="owl:equivalentProperty" href="http://WikiDataURL"/>
in each rdf:Property and each rdfs:Class ... is it?

New suggestion: we may colaborate with an online interface or (initially) by a spreadsheet (ex. Excel) at github, with the columns wikidataID and Property or wikidataID and Class.

@lydiapintscher
Copy link

Why not add it directly in Wikidata?

@ppKrauss
Copy link

@lydiapintscher , perhaps I am not understanding your point, sorry... The objetive in this issue is to map the Schema.org's definitions into the Wikidata.org's concept-definitions, not the inverse.

@lydiapintscher
Copy link

Both should happen, no? ;-)

@ppKrauss
Copy link

@lydiapintscher , I think it is a matter of scope. You can imagine Wikidata as an (external and closed) didictionary, like Webster, not like an open project like Wiipedia.

@lydiapintscher
Copy link

Wikidata is just as open as Wikipedia.

@nemobis
Copy link

nemobis commented Feb 22, 2015

Peter, 22/02/2015 18:39:

wikipedia.org concept definitions

Does such a thing exist?

@elf-pavlik
Copy link
Contributor

@lydiapintscher once schema.org URIs have mappings to wikidata URIs added, do you see a way to add them to wikidata in programmable way? IMO it doesn't make sense to do it manualy via web UI... maybe wikidata team could just import them from schema.rdfa?

BTW I'll stay most of march ~Berlin and could meet IRL with you and anyone else from wikidata interested in this issue... Whenever in Berlin I go anyways to #OKLab / CodeForBerlin on every monday evening at Wikimedia HQ 😄 (we can discuss details over pm - just see my gh profile)

@ppKrauss
Copy link

I am trying (with bad English) to consolidate this issue in a draft of the proposal, can you help?

A next step will be to create a Readme.md for everybody edit this text, perhaps with the #352 mechanism, and (phase1) implement "by hand" some examples in schema.rdfa.


Foundations collected from comments posted in this discussion:

  1. @danbri and Lydia Pintscher summary, "schema.org mapping to Wikidata".
  2. Techinal suggestion to "schema.org property marked as equiv to another: schema:description ", @danbri.
  3. @danbri and @elf-pavlik looking for some automation ... or "how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata".
  4. ...
  5. @elf-pavlik suggestion to add the tag <link property="owl:equivalentProperty" href="http://WikiDataURL"/>, into each rdfs:Class and each rdf:Property resource definitions.
    The equivalentProperty is the same as showed in the Property:P31 example) of @lydiapintscher.
  6. Proposal of @ppKrauss to start at Schema.org and with human work, with no automation (for test and start).
  7. Suggestion of @lydiapintscher for think also about Wikidata mapping to Schema.org...

PROPOSAL OF THE ISSUE #280

Proposal for enhance schema.rdfa definition descriptors (rdfs:comment) and semantics, mapping each vocabulary item to a Wikidata item.

A sibling project at Wikidata will be the Wikidata.org-to-Schema.org mapping.

PART 1 - SchemaOrg mapping to Wikidata

Actions: add <link property="{$OWL}" href="{$WikiDataURL}"/> with the correct $WikiDataURL.

  • At each rdfs:Class add the <link> tag with $OWL="owl:equivalentClass" or, when not possible, use$OWL="rdfs:subClassOf".
  • At each rdf:Property add the <link> tag with $OWL="owl:equivalentProperty" or, when not possible, use$OWL="rdfs:subPropertyOf".

Actions on testing phase: do some with no automation. Example: start with classes Person and Organization, and its properties.

Examples


PART 2 - Wikidata mapping to SchemaOrg

... under construction... see similar mappings at schema.rdfs.org/mappings.html... Wikidata also have a lot of iniciatives maping Wikidata to external vocabularies (ex. there are a map from Wikidata to BNCF Thesaurus)...

@ppKrauss
Copy link

@lydiapintscher , Sorry again... I not saw that there are also a proposal of "sibling project at Wikidata" (!)... Can you please check if my "draft of this proposal" text is now on the rails? I am trying to "translate" and consolidate all comments in one document... To start all with the same scope, objective, etc.

@ppKrauss
Copy link

@danbri , @elf-pavlik , and others, I not understand if there are a "formal procedure for create proposals" here...

Can you please check if my "draft of this proposal" text is now on the rails? I need your help to "translate" and consolidate it.


About automation, I still do not understand well, you want to automate?
My opinion. I think we can start with non-automated procedures, that will be util to check automated ones, which happen to be introduced later... Or to check the "size" of the non-automated task (~1000 items!). I think that a reliable mapping needs human control.

@elf-pavlik
Copy link
Contributor

@ppKrauss thanks for trying to summarize this thread into a proposal!

http://schema.org/Organization is owl:equivalentProperty to Q43229

please don't confuse owl:equivalentClass with owl:equivalentProperty

if you look at schema.rdf we need accordingly

  • typeof="rdfs:Class" needs owl:equivalentClass or rdfs:subClassOf
  • typeof="rdf:Property" needs owl:equivalentProperty or rdfs:subPropertyOf

for the automation, once we map one way schema.org -> wikidata (however we manage to do it) then we can automate importing most of that mapping into wikidata so no one needs to click and copy&paste...

Last but not least, schema.org just starts using github recently and also seems to go through various other processes, I would encourage you to stay patient and give people time to reply 😄

@danbri
Copy link
Contributor Author

danbri commented Feb 23, 2015

Thanks all. Indeed I'm on a trip and can't currently give this the
attention it deserves, but I would try to nudge the focus towards actual
mappings and away from the specific implementation details at schema.org.
We will be making some changes in the site tooling to support mechanisms
for extension that may be relevant here.

How about we just jump into the details and start a spreadsheet with a
table of schema.org types and properties? Eg on google docs...?

On Mon, 23 Feb 2015 09:06 ☮ elf Pavlik ☮ notifications@github.com wrote:

@ppKrauss https://github.com/ppKrauss thanks for trying to summarize
this thread into a proposal!

http://schema.org/Organization is owl:equivalentProperty to Q43229

please don't confuse owl:equivalentClass with owl:equivalentProperty

if you look at schema.rdf
https://github.com/schemaorg/schemaorg/blob/sdo-gozer/data/schema.rdfa
we need accordingly

  • typeof="rdfs:Class" needs owl:equivalentClass or
    rdfs:subClassOf
  • typeof="rdf:Property" needs owl:equivalentProperty or
    rdfs:subPropertyOf

for the automation, once we map one way schema.org -> wikidata (however
we manage to do it) then we can automate importing most of that mapping
into wikidata so no one needs to click and copy&paste...

Last but not least, schema.org just starts using github recently and also
seems to go through various other processes, I would encourage you to stay
patient and give people time to reply [image: 😄]


Reply to this email directly or view it on GitHub
#280 (comment).

@ppKrauss
Copy link

@elf-pavlik thanks (!), I edited with your correction (and now coping also to my issue280 "ahead of work" :-)


@danbri Ok I send to to this googleDoc and updated my #352 with the tool that generates the spreadsheet.


@elf-pavlik and @danbri , no urgence (!). As a novice here, I am experimenting/testing the collaboration possibilities, and studing schemaOrg as a project ... Now I have a better "schema.org big picture", I see a good work(!), by moderators and vibrant community. My only help/clue about "better Github use" is at #352, and perhaps still a little messy.

Returning to talk about the spreadsheet, there are ~1500 items (!)... A good starting point is the classes Person and Organization, the "vCard semantic" is the more used in the Web,

http://webdatacommons.org/structureddata/index.html#toc2

so, I am starting to work with them (Person and Organization)... It is ok, good starting point?

@danbri
Copy link
Contributor Author

danbri commented Feb 24, 2015

Thanks. Yes starting with the more most general / common types makes sense.

Where I got stuck: I could not figure out a good programmatic way to access
Wikidata's schema information in all its richness.

Maybe there is a way to take the JSON dumps, load them into some
fast-access NoSQL-ish database, so that things can be
searched/matched/retrieved easily?

nearby: https://gist.github.com/chrpr/23926c4650ce4363c51b dumps DBpedia's vocab (not Wikidata, but worth a look for comparison)

@jimkont
Copy link

jimkont commented Feb 24, 2015

Wikidata provides RDF dumps here: http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150126/

It is easy to get the classes from the wikidata-taxonomy dump but needs to be joined with the wikidata-terms dump to get the labels. For properties you can use the wikidata-properties dump

If you want something more fine-grained you can try the WKDT toolkit
https://github.com/Wikidata/Wikidata-Toolkit

Or create a DBpedia extractor, we have experimental support for wikidata in this branch:
https://github.com/alismayilov/extraction-framework/tree/wikidataAllCommits

RDF dumps can be directly loaded in a SPARQL endpoint or easily manipulated in CLI/code and load in any store.

@jaygray0919
Copy link

Thank you Thad @thadguidry One could make a case for @ppKrauss Peter's advice - the sprint concept. An alternative might be a more formal divide-and-conquer approach. Thad has developed a process and is using tools. Perhaps Thad could document that process and how to use tools (OpenRefine is well documented so a light overview may be all that is needed). If we divided the work into buckets (e.g. A-C, D-G, H- ...) and applied Thad's process/tools we might get a consistent, repeatable result from a group with the same goal - map the two vocabularies. I'm not sure how to organize the buckets, but I bet Thad has an idea. I will sign up for a bucket with a reasonable 'volume'. I could dedicate 10-20 hours to this project if others made a comparable commitment.

@thadguidry
Copy link
Contributor

@jaygray0919 That's the idea. I'm waiting on my team in OpenRefine to get a good tool out for the community to handle this long term and much more easily. And yes we would be providing documentation and tutorials for this process. Here's the full docs for Reconciling https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation but we're going to make a simpler, separate tutorial just for mapping Schema.org -> Wikidata.

@jaygray0919
Copy link

@thadguidry good. we'll invest 10-20 hours in an organized 'stone soup' project (https://en.wikipedia.org/wiki/Stone_Soup). send up a flare when we should begin work on our 'bucket.'

@pigsonthewing
Copy link

My proposal for a Wikidata property for IPTC subject codes has stalled, over the question of whether to have a property for each type of IPTC code, or just one over-arching property for all of them:

https://www.wikidata.org/wiki/Wikidata:Property_proposal/IPTC_subject_code

More views/ arguments there would help.

@thadguidry
Copy link
Contributor

@pigsonthewing Added my Support comment to the Wikidata Property proposal.

@thadguidry
Copy link
Contributor

UPDATE: IPTC Newscode property is now live on Wikidata. Gotta get mapping now ! ;-) Thanks so much to Andy @pigsonthewing for making that happen !

@rtroncy
Copy link

rtroncy commented Jul 17, 2018

@thadguidry I'm interested in getting an exhaustive list of mappings between the IPTC media topics codes and Wikidata. Running a simple query using the new P5429 property yields 74 results that actually mix up different code taxonomy from IPTC (which is fine).

Are you aware of any Phabricator task that aims to collect mappings between Wikidata and IPTC (media topics) codes? Are you working yourself on this?

@thadguidry
Copy link
Contributor

@rtroncy what do you mean by mixing ? Can you give me 1 example here so I can see ?

@rtroncy
Copy link

rtroncy commented Jul 17, 2018

@thadguidry Click here, some Wikidata entities are mapped with "subject codes" (now deprecated), some with "audio codec", some with "media topics", some with "product genre", etc. ... all those are different codes list maintained (and sometimes deprecated) by IPTC and the mapping is done with the sole P5429 property. Fair enough, we can then filter if we want just the mappings to a specific code list, but this is what I meant with "mixing".

@thadguidry
Copy link
Contributor

thadguidry commented Jul 17, 2018

@rtroncy Ah, well, that's not mixing, that is called N:1 mapping. That is on purpose. "architecture" as an separate IPTC subject and media code, is still the concept of "architecture" in Wikidata. Having IPTC codes mapped to the correct Wikidata entity is a "good thing", even still when those IPTC codes have deprecated, then some old dataset can still be made useful against Wikidata because we took the time to map them for posterity sake.

@rtroncy
Copy link

rtroncy commented Jul 17, 2018

Note that I'm not criticizing! I'm also all for doing mappings even for deprecated terms for the same exact reasons you provide. My comment was just that a single property is used (P5429) for doing N:1 mapping between Wikidata and different code lists which have just in common to be published by IPTC. One could have imagined minting different properties for mappings about subject codes, media topics, etc. But perhaps this would have been an overkill.

This brings me to my original question: who is working on this at the moment? If none, I may propose a mapping between Wikidata and the full media topics thesaurus.

@thadguidry
Copy link
Contributor

thadguidry commented Jul 17, 2018

@pigsonthewing and I decided that using many Wikidata properties was not needed, to have multiple properties, so we instead chose to go with just 1, and use it wisely. This reasoning is discussed in the initial Wikidata property proposal for the "IPTC Newscode" here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/IPTC_subject_code

I'm working on it through OpenRefine reconciling currently. There's 1186 concepts that need to be mapped in total, and its not a completely automatic thing...human judgement has to be applied.
For instance, there's 20+ concepts of a "series" in Wikidata... and things like that, so its a bit slow process and I put about 4 hours a week into it, between my day job and volunteerism on open source projects.
capture

And crazy definitions used by IPTC sometimes like "food" with http://cv.iptc.org/newscodes/mediatopic/20000248 "Selling goods for human consumption to the human end user" which makes you immediately think they mean "food industry" but looking at the broader concept they have mapped, "Consumer Goods - Items produced for and sold to individuals" , then its clear they really did mean "food", but it has a food industry definition! Don't you just love vocabulary "opinions" and trying to map someone's state of mind at the time ? lololol

@rtroncy
Copy link

rtroncy commented Jul 18, 2018

Indeed! We need to rely on human judgments and the short/vague definition of some terms does not make the life easy. Do you need help with the task? Is your openrefine project local or can we collaboratively work on it? By when do you expect to complete the mappings if you're work on it solely (and thanks a lot for doing this as an aside of a real job :-)

@nitmws
Copy link

nitmws commented Jul 18, 2018

May I ask where these Wikidata / IPTC NewsCodes mappings come from? The IPTC Media Topics are the still maintained taxonomy of IPTC and it has a top level sports concept with a mapping to Wikidata: http://cv.iptc.org/newscodes/mediatopic/15000000 - but the Wikidata query refered to by @rtroncy ("Click here") shows a spct/sport for Q349 (Sport). From IPTC's point of view it would be great to use these about 100 Media Topic mappings as they have been done with a lot of human judgement by our team of taxonomists. (Note: the IPTC Subject Codes still exist but are not maintained anymore, the Media Topics are the successor)

@thadguidry
Copy link
Contributor

thadguidry commented Jul 18, 2018

@rtroncy I am able to do about 10% a day and just started this week (was waiting on OpenRefine bug to get fixed from my team), so should be done in about another week.

@nitmws The approx. 75 mappings that you see currently in Wikidata come from me. Once I am done hopefully next week, then I'll bulk upload into Wikidata from OpenRefine. And yes I am aware of the mappings done by the IPTC team themselves in IPTC Newscodes and am looking at them also when they show up. The task however is to get more Linked Data into Wikidata from lots of vocabularies. After I'm done, your IPTC team is welcome to query them and then notify on Wikidata talk pages for discussion of any quality issues...but let's wait until I'm done next week.

@thadguidry
Copy link
Contributor

thadguidry commented Jul 21, 2018

UPDATE: 70% done with IPTC mapping.

@bquinn
Copy link

bquinn commented Jul 24, 2018

Hi @thadguidry and all, I'm the new MD of IPTC, taking over from Michael who retired a few weeks ago (but will no doubt still stay involved in some projects).

It's great to see you working on this, and we'll be happy to take a look at your mappings when they're done (or before that if you want to share what you've done so far).

Also if you could flag up with us when you see a label or description that looks a bit strange, that would be great - we're in the process of reviewing the Media Topics vocabulary and would welcome any pointers to entries that look wrong.

Thanks again for your work!

@thadguidry
Copy link
Contributor

@bquinn Sure thing Brendan. I've been keeping a few notes, suggestions on simple tweaks, and deeper discussion issues. For starters to get you started on the simple stuff, there's a few topic names that ideally should have "and" replaced with "or" like this these : mediatopic/20000044 mediatopic/20000043 mediatopic/20000147

@thadguidry
Copy link
Contributor

thadguidry commented Aug 2, 2018

UPDATE: 90% done with IPTC mapping.

@thadguidry
Copy link
Contributor

thadguidry commented Aug 13, 2018

UPDATE: 100% done with IPTC mapping. Edits being uploaded now from my OpenRefine instance, should be done in another 15 minutes. ;-)

753 IPTC Newscodes matched to Wikipedia topics.

@bquinn
Copy link

bquinn commented Aug 13, 2018

Great, congratulations! I look forward to going through your mappings with the NewsCodes WG, we'll let you know if we spot any issues.

@ettorerizza
Copy link

ettorerizza commented Sep 22, 2018

Dear all. I would be interested to take a look at the possibilities of mapping between the properties of schema.org and those of Wikidata. Can I use this file to do some testing, or is there a more complete and up-to-date list?

@Dataliberate
Copy link
Contributor

Dataliberate commented Sep 22, 2018 via email

@Tpt
Copy link
Contributor

Tpt commented Jan 30, 2019

I made a small tool to validate mapping to external vocabularies stored on Wikidata. Here is the report for schema.org: https://tools.wmflabs.org/tptools/wd_mapping_validator.php?prefix=http%3A%2F%2Fschema.org%2F (beware, the tool does the validation on page load, so it's a bit slow).
A short explanation about what the tool does is at the end of the page.

@thadguidry
Copy link
Contributor

@Tpt Thanks so much for continuing to work on the mapping validator tool. It was a pleasure working with you offlist to get it into a better state!

@github-actions
Copy link

This issue is being tagged as Stale due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity Discuss has gone quiet. Auto-tagging to encourage people to re-engage with the issue (or close it!). schema.org vocab General top level tag for issues on the vocabulary standards + organizations Relationships, liaison, mappings between our work and standards elsewhere status:work expected We are likely to, or would like to, or probably should try, ... to do something in this area.
Projects
None yet
Development

No branches or pull requests