-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema.org should have mappings to Wikidata terms where possible #280
Comments
Notes from IRC,
|
Here is how mapping can be done on the Wikidata side for example: https://www.wikidata.org/wiki/Property:P31 The JSON dumps are the best dumps. |
+1 |
happy to help here a little! I had chance to meet few people from Wikidata crew during 31C3 and remember that serving turtle also needs some fixing... but it already uses schema.org quite a lot! $ curl http://www.wikidata.org/entity/Q80 -iL -H "Accept: text/turtle" |
I went looking for the code that generates this. For those without turtle, an excerpt from running curl http://www.wikidata.org/entity/Q42 -iL -H "Accept: text/turtle" (full response is at https://gist.github.com/danbri/66616096d42e595376f6 ) [update]Hmm actually you can get it all in the browser without using content negotiation, just via suffixes:
( edit! I have moved a big chunk of text to https://gist.github.com/danbri/181ff7763f479c397e10 - apologies to those who got accidental notifications due to the '@' symbol.) This is great but also unfortunately "the easy part" in that these are fixed built-in properties that each Wikidata entry will always carry. Looking around for relevant source code,
It would be interesting to see how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata, at @lydiapintscher mentioned re https://www.wikidata.org/wiki/Property:P31 |
I agree, "Schema.org should have mappings to Wikidata terms where possible". How to vote? or how to colaborate and/or check work in progress? There are a link about work in this issue? |
@danbri please remember to fence code snippets with three backticks which can also include clue for syntax highlighting
also see code tab in Examples of github markdown https://guides.github.com/features/mastering-markdown/#examples |
@ppKrauss I think people would appreciate more machine readable mappings using owl:equivalentProperty etc. Line 5706 in d370e33
IMO we could consider everything from subset of OWL used by RDFa Vocabulary Entailment |
@elf-pavlik thanks (!), so the issue now is only to add something as New suggestion: we may colaborate with an online interface or (initially) by a spreadsheet (ex. Excel) at github, with the columns wikidataID and Property or wikidataID and Class. |
Why not add it directly in Wikidata? |
@lydiapintscher , perhaps I am not understanding your point, sorry... The objetive in this issue is to map the Schema.org's definitions into the Wikidata.org's concept-definitions, not the inverse. |
Both should happen, no? ;-) |
@lydiapintscher , I think it is a matter of scope. You can imagine Wikidata as an (external and closed) didictionary, like Webster, not like an open project like Wiipedia. |
Wikidata is just as open as Wikipedia. |
Peter, 22/02/2015 18:39:
Does such a thing exist? |
@lydiapintscher once schema.org URIs have mappings to wikidata URIs added, do you see a way to add them to wikidata in programmable way? IMO it doesn't make sense to do it manualy via web UI... maybe wikidata team could just import them from schema.rdfa? BTW I'll stay most of march ~Berlin and could meet IRL with you and anyone else from wikidata interested in this issue... Whenever in Berlin I go anyways to #OKLab / CodeForBerlin on every monday evening at Wikimedia HQ 😄 (we can discuss details over pm - just see my gh profile) |
I am trying (with bad English) to consolidate this issue in a draft of the proposal, can you help? A next step will be to create a Foundations collected from comments posted in this discussion:
PROPOSAL OF THE ISSUE #280Proposal for enhance schema.rdfa definition descriptors ( A sibling project at Wikidata will be the Wikidata.org-to-Schema.org mapping. PART 1 - SchemaOrg mapping to Wikidata Actions: add
Actions on testing phase: do some with no automation. Example: start with classes Person and Organization, and its properties. Examples
PART 2 - Wikidata mapping to SchemaOrg ... under construction... see similar mappings at schema.rdfs.org/mappings.html... Wikidata also have a lot of iniciatives maping Wikidata to external vocabularies (ex. there are a map from Wikidata to BNCF Thesaurus)... |
@lydiapintscher , Sorry again... I not saw that there are also a proposal of "sibling project at Wikidata" (!)... Can you please check if my "draft of this proposal" text is now on the rails? I am trying to "translate" and consolidate all comments in one document... To start all with the same scope, objective, etc. |
@danbri , @elf-pavlik , and others, I not understand if there are a "formal procedure for create proposals" here... Can you please check if my "draft of this proposal" text is now on the rails? I need your help to "translate" and consolidate it. About automation, I still do not understand well, you want to automate? |
@ppKrauss thanks for trying to summarize this thread into a proposal!
please don't confuse owl:equivalentClass with owl:equivalentProperty if you look at schema.rdf we need accordingly
for the automation, once we map one way schema.org -> wikidata (however we manage to do it) then we can automate importing most of that mapping into wikidata so no one needs to click and copy&paste... Last but not least, schema.org just starts using github recently and also seems to go through various other processes, I would encourage you to stay patient and give people time to reply 😄 |
Thanks all. Indeed I'm on a trip and can't currently give this the How about we just jump into the details and start a spreadsheet with a On Mon, 23 Feb 2015 09:06 ☮ elf Pavlik ☮ notifications@github.com wrote:
|
@elf-pavlik thanks (!), I edited with your correction (and now coping also to my issue280 "ahead of work" :-) @danbri Ok I send to to this googleDoc and updated my #352 with the tool that generates the spreadsheet. @elf-pavlik and @danbri , no urgence (!). As a novice here, I am experimenting/testing the collaboration possibilities, and studing schemaOrg as a project ... Now I have a better "schema.org big picture", I see a good work(!), by moderators and vibrant community. My only help/clue about "better Github use" is at #352, and perhaps still a little messy. Returning to talk about the spreadsheet, there are ~1500 items (!)... A good starting point is the classes Person and Organization, the "vCard semantic" is the more used in the Web, http://webdatacommons.org/structureddata/index.html#toc2 so, I am starting to work with them (Person and Organization)... It is ok, good starting point? |
Thanks. Yes starting with the more most general / common types makes sense. Where I got stuck: I could not figure out a good programmatic way to access Maybe there is a way to take the JSON dumps, load them into some nearby: https://gist.github.com/chrpr/23926c4650ce4363c51b dumps DBpedia's vocab (not Wikidata, but worth a look for comparison) |
Wikidata provides RDF dumps here: http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150126/ It is easy to get the classes from the wikidata-taxonomy dump but needs to be joined with the wikidata-terms dump to get the labels. For properties you can use the wikidata-properties dump If you want something more fine-grained you can try the WKDT toolkit Or create a DBpedia extractor, we have experimental support for wikidata in this branch: RDF dumps can be directly loaded in a SPARQL endpoint or easily manipulated in CLI/code and load in any store. |
Thank you Thad @thadguidry One could make a case for @ppKrauss Peter's advice - the sprint concept. An alternative might be a more formal divide-and-conquer approach. Thad has developed a process and is using tools. Perhaps Thad could document that process and how to use tools (OpenRefine is well documented so a light overview may be all that is needed). If we divided the work into buckets (e.g. A-C, D-G, H- ...) and applied Thad's process/tools we might get a consistent, repeatable result from a group with the same goal - map the two vocabularies. I'm not sure how to organize the buckets, but I bet Thad has an idea. I will sign up for a bucket with a reasonable 'volume'. I could dedicate 10-20 hours to this project if others made a comparable commitment. |
@jaygray0919 That's the idea. I'm waiting on my team in OpenRefine to get a good tool out for the community to handle this long term and much more easily. And yes we would be providing documentation and tutorials for this process. Here's the full docs for Reconciling https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation but we're going to make a simpler, separate tutorial just for mapping Schema.org -> Wikidata. |
@thadguidry good. we'll invest 10-20 hours in an organized 'stone soup' project (https://en.wikipedia.org/wiki/Stone_Soup). send up a flare when we should begin work on our 'bucket.' |
My proposal for a Wikidata property for IPTC subject codes has stalled, over the question of whether to have a property for each type of IPTC code, or just one over-arching property for all of them: https://www.wikidata.org/wiki/Wikidata:Property_proposal/IPTC_subject_code More views/ arguments there would help. |
@pigsonthewing Added my Support comment to the Wikidata Property proposal. |
UPDATE: IPTC Newscode property is now live on Wikidata. Gotta get mapping now ! ;-) Thanks so much to Andy @pigsonthewing for making that happen ! |
@thadguidry I'm interested in getting an exhaustive list of mappings between the IPTC media topics codes and Wikidata. Running a simple query using the new P5429 property yields 74 results that actually mix up different code taxonomy from IPTC (which is fine). Are you aware of any Phabricator task that aims to collect mappings between Wikidata and IPTC (media topics) codes? Are you working yourself on this? |
@rtroncy what do you mean by mixing ? Can you give me 1 example here so I can see ? |
@thadguidry Click here, some Wikidata entities are mapped with "subject codes" (now deprecated), some with "audio codec", some with "media topics", some with "product genre", etc. ... all those are different codes list maintained (and sometimes deprecated) by IPTC and the mapping is done with the sole P5429 property. Fair enough, we can then filter if we want just the mappings to a specific code list, but this is what I meant with "mixing". |
@rtroncy Ah, well, that's not mixing, that is called N:1 mapping. That is on purpose. "architecture" as an separate IPTC subject and media code, is still the concept of "architecture" in Wikidata. Having IPTC codes mapped to the correct Wikidata entity is a "good thing", even still when those IPTC codes have deprecated, then some old dataset can still be made useful against Wikidata because we took the time to map them for posterity sake. |
Note that I'm not criticizing! I'm also all for doing mappings even for deprecated terms for the same exact reasons you provide. My comment was just that a single property is used (P5429) for doing N:1 mapping between Wikidata and different code lists which have just in common to be published by IPTC. One could have imagined minting different properties for mappings about subject codes, media topics, etc. But perhaps this would have been an overkill. This brings me to my original question: who is working on this at the moment? If none, I may propose a mapping between Wikidata and the full media topics thesaurus. |
@pigsonthewing and I decided that using many Wikidata properties was not needed, to have multiple properties, so we instead chose to go with just 1, and use it wisely. This reasoning is discussed in the initial Wikidata property proposal for the "IPTC Newscode" here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/IPTC_subject_code I'm working on it through OpenRefine reconciling currently. There's 1186 concepts that need to be mapped in total, and its not a completely automatic thing...human judgement has to be applied. And crazy definitions used by IPTC sometimes like "food" with http://cv.iptc.org/newscodes/mediatopic/20000248 "Selling goods for human consumption to the human end user" which makes you immediately think they mean "food industry" but looking at the broader concept they have mapped, "Consumer Goods - Items produced for and sold to individuals" , then its clear they really did mean "food", but it has a food industry definition! Don't you just love vocabulary "opinions" and trying to map someone's state of mind at the time ? lololol |
Indeed! We need to rely on human judgments and the short/vague definition of some terms does not make the life easy. Do you need help with the task? Is your openrefine project local or can we collaboratively work on it? By when do you expect to complete the mappings if you're work on it solely (and thanks a lot for doing this as an aside of a real job :-) |
May I ask where these Wikidata / IPTC NewsCodes mappings come from? The IPTC Media Topics are the still maintained taxonomy of IPTC and it has a top level sports concept with a mapping to Wikidata: http://cv.iptc.org/newscodes/mediatopic/15000000 - but the Wikidata query refered to by @rtroncy ("Click here") shows a spct/sport for Q349 (Sport). From IPTC's point of view it would be great to use these about 100 Media Topic mappings as they have been done with a lot of human judgement by our team of taxonomists. (Note: the IPTC Subject Codes still exist but are not maintained anymore, the Media Topics are the successor) |
@rtroncy I am able to do about 10% a day and just started this week (was waiting on OpenRefine bug to get fixed from my team), so should be done in about another week. @nitmws The approx. 75 mappings that you see currently in Wikidata come from me. Once I am done hopefully next week, then I'll bulk upload into Wikidata from OpenRefine. And yes I am aware of the mappings done by the IPTC team themselves in IPTC Newscodes and am looking at them also when they show up. The task however is to get more Linked Data into Wikidata from lots of vocabularies. After I'm done, your IPTC team is welcome to query them and then notify on Wikidata talk pages for discussion of any quality issues...but let's wait until I'm done next week. |
UPDATE: 70% done with IPTC mapping. |
Hi @thadguidry and all, I'm the new MD of IPTC, taking over from Michael who retired a few weeks ago (but will no doubt still stay involved in some projects). It's great to see you working on this, and we'll be happy to take a look at your mappings when they're done (or before that if you want to share what you've done so far). Also if you could flag up with us when you see a label or description that looks a bit strange, that would be great - we're in the process of reviewing the Media Topics vocabulary and would welcome any pointers to entries that look wrong. Thanks again for your work! |
@bquinn Sure thing Brendan. I've been keeping a few notes, suggestions on simple tweaks, and deeper discussion issues. For starters to get you started on the simple stuff, there's a few topic names that ideally should have "and" replaced with "or" like this these : mediatopic/20000044 mediatopic/20000043 mediatopic/20000147 |
UPDATE: 90% done with IPTC mapping. |
UPDATE: 100% done with IPTC mapping. Edits being uploaded now from my OpenRefine instance, should be done in another 15 minutes. ;-) |
Great, congratulations! I look forward to going through your mappings with the NewsCodes WG, we'll let you know if we spot any issues. |
Dear all. I would be interested to take a look at the possibilities of mapping between the properties of schema.org and those of Wikidata. Can I use this file to do some testing, or is there a more complete and up-to-date list? |
That is the current release version of the file that contains definition of
properties within the vocabulary . For the always up to date use:
https://schema.org/version/latest/all-layers-properties.csv
For types within the vocabulary you will also need:
https://schema.org/version/latest/all-layers-types.csv
Checkout other format files, that contain all values in one file, here
<https://schema.org/docs/developers.html#defs>.
~Richard
Richard Wallis
Founder, Data Liberate
http://dataliberate.com
Linkedin: http://www.linkedin.com/in/richardwallis
Twitter: @rjw
…On Sat, 22 Sep 2018 at 16:43, Ettore Rizza ***@***.***> wrote:
Dear all. I would be interested to take a look at the possibility of
mapping between the properties of schema.org and those of Wikidata. Can I
use this file
<https://github.com/schemaorg/schemaorg/blob/master/data/releases/3.4/all-layers-properties.csv>
to do some testing, or is there a more complete and up-to-date list?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHVdIOmBwFbqV6tlKYKFyE5NoxsCoUIhks5udlq5gaJpZM4DWPdl>
.
|
I made a small tool to validate mapping to external vocabularies stored on Wikidata. Here is the report for schema.org: https://tools.wmflabs.org/tptools/wd_mapping_validator.php?prefix=http%3A%2F%2Fschema.org%2F (beware, the tool does the validation on page load, so it's a bit slow). |
@Tpt Thanks so much for continuing to work on the mapping validator tool. It was a pleasure working with you offlist to get it into a better state! |
This issue is being tagged as Stale due to inactivity. |
From Lydia Pintscher in https://twitter.com/nightrose/status/558549091844886528
Update 2016-01-26 - since the original post there have been some improvements at both Wikidata and Schema.org:
Nearby
The text was updated successfully, but these errors were encountered: