New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display Wikidata label in Wikipedia field #4382

Open
1ec5 opened this Issue Oct 2, 2017 · 29 comments

Comments

Projects
None yet
7 participants
@1ec5
Collaborator

1ec5 commented Oct 2, 2017

The Wikipedia field should also display the label (and perhaps the description) of the item referred to by the wikidata tag, using the Wikidata API. This would help mappers verify the correctness of a Wikidata tag at a glance.

/cc @nyurik @pigsonthewing

@pigsonthewing

This comment has been minimized.

Show comment
Hide comment
@pigsonthewing

pigsonthewing Oct 2, 2017

Good suggestion, but the display should be alongside the Wikidata field (like in JOSM), not alongside the Wikipedia field.

pigsonthewing commented Oct 2, 2017

Good suggestion, but the display should be alongside the Wikidata field (like in JOSM), not alongside the Wikipedia field.

@1ec5

This comment has been minimized.

Show comment
Hide comment
@1ec5

1ec5 Oct 2, 2017

Collaborator

There isn’t currently a dedicated Wikidata field, just the possibility of showing a wikidata row in the “All tags” section. Perhaps we could generalize the Wikipedia field to display something sensible for any combination of wikipedia and wikidata.

Collaborator

1ec5 commented Oct 2, 2017

There isn’t currently a dedicated Wikidata field, just the possibility of showing a wikidata row in the “All tags” section. Perhaps we could generalize the Wikipedia field to display something sensible for any combination of wikipedia and wikidata.

@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 2, 2017

Member

I've thought about this a bit and I think it would be great to improve the design of our wikipedia field to make it more useful for inspecting the values that are there and possibly fetching data useful to OSM.

Current field:

  • wikidata value is not visible, it sits down in the "All Tags" section only.
  • User interacts with wikipedia

screenshot 2017-10-02 11 45 59

Possible improvements:

  • wikidata is visible, so user knows whether it is set or not
  • Button opens a pane that fetches linked data, names, images, etc
  • We should have a way in the preset definition to say which linked data is relevant
  • Support possible import to OSM of useful values (population, brand, logo, name, etc)

screenshot 2017-10-02 11 45 59 copy

Member

bhousel commented Oct 2, 2017

I've thought about this a bit and I think it would be great to improve the design of our wikipedia field to make it more useful for inspecting the values that are there and possibly fetching data useful to OSM.

Current field:

  • wikidata value is not visible, it sits down in the "All Tags" section only.
  • User interacts with wikipedia

screenshot 2017-10-02 11 45 59

Possible improvements:

  • wikidata is visible, so user knows whether it is set or not
  • Button opens a pane that fetches linked data, names, images, etc
  • We should have a way in the preset definition to say which linked data is relevant
  • Support possible import to OSM of useful values (population, brand, logo, name, etc)

screenshot 2017-10-02 11 45 59 copy

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 3, 2017

@bhousel interesting thoughts, thanks!

wikidata is visible, so user knows whether it is set or not

👍

Button opens a pane that fetches linked data, names, images, etc

👎 - users should always see some easy-to-check information from Wikidata. Showing the raw number is actually useless - almost no one knows them by heart. Raw number would more likely intimidate novice users and consume valuable screen space, rather than add value. I would recommend the label and P31's label in the user's language, perhaps in smaller gray font, plus on mouseover show description.

We should have a way in the preset definition to say which linked data is relevant

Please elaborate on this, sounds interesting.

Support possible import to OSM of useful values (population, brand, logo, name, etc)

👎 do we really want to have a cloned data just by copying it to every object? It will be an unmaintainable mess, especially due to the lack of proper bot culture in OSM.

nyurik commented Oct 3, 2017

@bhousel interesting thoughts, thanks!

wikidata is visible, so user knows whether it is set or not

👍

Button opens a pane that fetches linked data, names, images, etc

👎 - users should always see some easy-to-check information from Wikidata. Showing the raw number is actually useless - almost no one knows them by heart. Raw number would more likely intimidate novice users and consume valuable screen space, rather than add value. I would recommend the label and P31's label in the user's language, perhaps in smaller gray font, plus on mouseover show description.

We should have a way in the preset definition to say which linked data is relevant

Please elaborate on this, sounds interesting.

Support possible import to OSM of useful values (population, brand, logo, name, etc)

👎 do we really want to have a cloned data just by copying it to every object? It will be an unmaintainable mess, especially due to the lack of proper bot culture in OSM.

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 3, 2017

P.S. I think we should also consider other wikipedia/wikidata tags here, such as subject, brand, artist, operator, ... - it would be great to have a consistent interface for them as well. BTW, some communities prefer not to add xxx:wikipedia at all, e.g. use brand:wikidata only.

nyurik commented Oct 3, 2017

P.S. I think we should also consider other wikipedia/wikidata tags here, such as subject, brand, artist, operator, ... - it would be great to have a consistent interface for them as well. BTW, some communities prefer not to add xxx:wikipedia at all, e.g. use brand:wikidata only.

@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 3, 2017

Member

users should always see some easy-to-check information from Wikidata. Showing the raw number is actually useless - almost no one knows them by heart. Raw number would more likely intimidate novice users and consume valuable screen space, rather than add value.

I kind of think just showing the code on a single line requires less screen space than displaying linked data, but more importantly avoids doing an api call. If the user wants more info, they can click the button to expand the pulldown and see what wikidata is available.

do we really want to have a cloned data just by copying it to every object? It will be an unmaintainable mess,

This is currently what OSM does, so we shouldn't change it. We add tags for things like population and artist, which could be looked up elsewhere. This is fine because people want to consume the OSM data without doing lookups to another database.

We should have a way in the preset definition to say which linked data is relevant

Please elaborate on this, sounds interesting.

It's just another way of saying "If the feature is place=city fetch and display P1082 (Population) which maps to osm tag population=*, and if the feature is a tourism=artwork fetch and display P170 (Creator) which maps to osm tag artist_name=*.

Member

bhousel commented Oct 3, 2017

users should always see some easy-to-check information from Wikidata. Showing the raw number is actually useless - almost no one knows them by heart. Raw number would more likely intimidate novice users and consume valuable screen space, rather than add value.

I kind of think just showing the code on a single line requires less screen space than displaying linked data, but more importantly avoids doing an api call. If the user wants more info, they can click the button to expand the pulldown and see what wikidata is available.

do we really want to have a cloned data just by copying it to every object? It will be an unmaintainable mess,

This is currently what OSM does, so we shouldn't change it. We add tags for things like population and artist, which could be looked up elsewhere. This is fine because people want to consume the OSM data without doing lookups to another database.

We should have a way in the preset definition to say which linked data is relevant

Please elaborate on this, sounds interesting.

It's just another way of saying "If the feature is place=city fetch and display P1082 (Population) which maps to osm tag population=*, and if the feature is a tourism=artwork fetch and display P170 (Creator) which maps to osm tag artist_name=*.

@1ec5

This comment has been minimized.

Show comment
Hide comment
@1ec5

1ec5 Oct 3, 2017

Collaborator

I think we should also consider other wikipedia/wikidata tags here, such as subject, brand, artist, operator, ...

This is covered by #4262. brand:wikidata is probably blocked by supporting brand in the first place: #3371 #2300.

This is currently what OSM does, so we shouldn't change it. We add tags for things like population and artist, which could be looked up elsewhere. This is fine because people want to consume the OSM data without doing lookups to another database.

Wikidata statements may come from sources that OpenStreetMap may not be particularly comfortable with, such as copyrighted websites and even Google Maps (think restaurant details and the like). Also, we would have to find a way to transform each statement’s references into *:source tags.

wikidata has been positioned as an alternative to adding non-physically verifiable data to OSM. Doing lookups in another database is sort of the point. For example, recall the debate over tagging features with translations and transliterations: if we offer a way to copy over details from Wikidata, would we encourage mappers to maintain non-local translations and transliterations in OSM?

My goal in starting this discussion was to improve wikidata’s human readability, which some on the talk mailing list have brought up as an advantage of wikipedia over wikidata. It would be great to address the papercuts in #3929 at the same time, but I’d be wary of expanding the scope to facilitate new workflows.

I kind of think just showing the code on a single line requires less screen space than displaying linked data, but more importantly avoids doing an api call.

Is the concern about driving too much traffic to Wikidata’s API? The API is designed to be used for “pretty-printing” QIDs on the fly, and we can limit the number of calls by caching responses.

Collaborator

1ec5 commented Oct 3, 2017

I think we should also consider other wikipedia/wikidata tags here, such as subject, brand, artist, operator, ...

This is covered by #4262. brand:wikidata is probably blocked by supporting brand in the first place: #3371 #2300.

This is currently what OSM does, so we shouldn't change it. We add tags for things like population and artist, which could be looked up elsewhere. This is fine because people want to consume the OSM data without doing lookups to another database.

Wikidata statements may come from sources that OpenStreetMap may not be particularly comfortable with, such as copyrighted websites and even Google Maps (think restaurant details and the like). Also, we would have to find a way to transform each statement’s references into *:source tags.

wikidata has been positioned as an alternative to adding non-physically verifiable data to OSM. Doing lookups in another database is sort of the point. For example, recall the debate over tagging features with translations and transliterations: if we offer a way to copy over details from Wikidata, would we encourage mappers to maintain non-local translations and transliterations in OSM?

My goal in starting this discussion was to improve wikidata’s human readability, which some on the talk mailing list have brought up as an advantage of wikipedia over wikidata. It would be great to address the papercuts in #3929 at the same time, but I’d be wary of expanding the scope to facilitate new workflows.

I kind of think just showing the code on a single line requires less screen space than displaying linked data, but more importantly avoids doing an api call.

Is the concern about driving too much traffic to Wikidata’s API? The API is designed to be used for “pretty-printing” QIDs on the fly, and we can limit the number of calls by caching responses.

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 3, 2017

I kind of think just showing the code on a single line requires less screen space than displaying linked data, but more importantly avoids doing an api call.

Is the concern about driving too much traffic to Wikidata’s API? The API is designed to be used for “pretty-printing” QIDs on the fly, and we can limit the number of calls by caching responses.

I also think this is a very bad reasoning. A user has already indicated that they want to examine a single object, and should see all the relevant information. And we are talking about just a single object, not millions of them. Requiring an additional click in a different part of the screen is a very bad usability, and will result in users mostly ignoring it. Wikidata API can totally take it - not an issue there for sure. On the presentation side - take a look at how Q numbers are shown at Wikidata itself - its not shown in the large bold letters - instead, it shows the label as the most prominent text, plus description and aliases underneath, and the Q number is grayed and smaller. This is a much better usability. We just need to figure out how to fit it into our UI. Also, don't worry about the caching - just make a GET call - the browser will cache it just fine.

We should have a way in the preset definition to say which linked data is relevant
It's just another way of saying "If the feature is place=city fetch and display P1082 (Population) which maps to osm tag population=, and if the feature is a tourism=artwork fetch and display P170 (Creator) which maps to osm tag artist_name=.

Love the idea! It would be awesome to have a community-controlled (something that doesn't require a full iD re-deployment) place to define what matches to what. While not related to this ticket, here are some thoughts:

  • Sometimes you would need much more than just a simple property lookup. For example, checking if wikidata points to a disambig page involves a recursive check, which is better to do via sparql endpoint (single query). This way errors can be shown right away.
  • For place=city, if there is a population tag, show the value of P1082 next to it, e.g. gray, in parenthesis. If the number differs substantially, show a yellow triangle with an exclamation point next to it.

nyurik commented Oct 3, 2017

I kind of think just showing the code on a single line requires less screen space than displaying linked data, but more importantly avoids doing an api call.

Is the concern about driving too much traffic to Wikidata’s API? The API is designed to be used for “pretty-printing” QIDs on the fly, and we can limit the number of calls by caching responses.

I also think this is a very bad reasoning. A user has already indicated that they want to examine a single object, and should see all the relevant information. And we are talking about just a single object, not millions of them. Requiring an additional click in a different part of the screen is a very bad usability, and will result in users mostly ignoring it. Wikidata API can totally take it - not an issue there for sure. On the presentation side - take a look at how Q numbers are shown at Wikidata itself - its not shown in the large bold letters - instead, it shows the label as the most prominent text, plus description and aliases underneath, and the Q number is grayed and smaller. This is a much better usability. We just need to figure out how to fit it into our UI. Also, don't worry about the caching - just make a GET call - the browser will cache it just fine.

We should have a way in the preset definition to say which linked data is relevant
It's just another way of saying "If the feature is place=city fetch and display P1082 (Population) which maps to osm tag population=, and if the feature is a tourism=artwork fetch and display P170 (Creator) which maps to osm tag artist_name=.

Love the idea! It would be awesome to have a community-controlled (something that doesn't require a full iD re-deployment) place to define what matches to what. While not related to this ticket, here are some thoughts:

  • Sometimes you would need much more than just a simple property lookup. For example, checking if wikidata points to a disambig page involves a recursive check, which is better to do via sparql endpoint (single query). This way errors can be shown right away.
  • For place=city, if there is a population tag, show the value of P1082 next to it, e.g. gray, in parenthesis. If the number differs substantially, show a yellow triangle with an exclamation point next to it.
@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 4, 2017

Member

Is the concern about driving too much traffic to Wikidata’s API? The API is designed to be used for “pretty-printing” QIDs on the fly, and we can limit the number of calls by caching responses.

I also think this is a very bad reasoning. A user has already indicated that they want to examine a single object, and should see all the relevant information. And we are talking about just a single object, not millions of them.

The concern is about introducing an additional async dependency to the code that builds out the sidebar. We already have issues with the address and phone fields which call out to nominatim (see #4198). So I'd prefer to not even call an external service unless the user really cares to see what the wikidata identifier points to (lets be honest, most iD users don't care about this).

Member

bhousel commented Oct 4, 2017

Is the concern about driving too much traffic to Wikidata’s API? The API is designed to be used for “pretty-printing” QIDs on the fly, and we can limit the number of calls by caching responses.

I also think this is a very bad reasoning. A user has already indicated that they want to examine a single object, and should see all the relevant information. And we are talking about just a single object, not millions of them.

The concern is about introducing an additional async dependency to the code that builds out the sidebar. We already have issues with the address and phone fields which call out to nominatim (see #4198). So I'd prefer to not even call an external service unless the user really cares to see what the wikidata identifier points to (lets be honest, most iD users don't care about this).

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 4, 2017

The concern is about introducing an additional async dependency to the code that builds out the sidebar. We already have issues with the address and phone fields which call out to nominatim (see #4198).

@bhousel This is a valid engineering concern, but we shouldn't shift our work as programmers to manual tasks for users. For example, it is far easier to ask user a confirmation of their action, and shift responsibility to them, than to implement a proper "undo". Yet the proper interface is the one with "undo", not the one that asks "please confirm" for every action. It is hard to get things perfect, but it only affects several engineers, as oppose to make every user click something just to check it. In short, we shouldn't be lazy (something I often find doing myself :) )

So I'd prefer to not even call an external service unless the user really cares to see what the wikidata identifier points to (lets be honest, most iD users don't care about this).

I don't think this is a correct logic. If user opened an object, a good UI would show all the relevant information, with all the possible warnings, allowing user to inspect it all, and possibly correct it if they see something out of place, even with a casual glance. Requiring user action is a sure way to make most people ignore it.

P.S. BTW, this is the kind of reasoning behind the desired separation between UI (driven by users' goals) and engineering (driven by programmers). We shouldn't wear both hats at the same time. A good book about it on UI design - "Inmates are running the asylum" - speaks exactly of such cases, when UI is the result of the internals of the system, not because it is the most fitting for the users.

nyurik commented Oct 4, 2017

The concern is about introducing an additional async dependency to the code that builds out the sidebar. We already have issues with the address and phone fields which call out to nominatim (see #4198).

@bhousel This is a valid engineering concern, but we shouldn't shift our work as programmers to manual tasks for users. For example, it is far easier to ask user a confirmation of their action, and shift responsibility to them, than to implement a proper "undo". Yet the proper interface is the one with "undo", not the one that asks "please confirm" for every action. It is hard to get things perfect, but it only affects several engineers, as oppose to make every user click something just to check it. In short, we shouldn't be lazy (something I often find doing myself :) )

So I'd prefer to not even call an external service unless the user really cares to see what the wikidata identifier points to (lets be honest, most iD users don't care about this).

I don't think this is a correct logic. If user opened an object, a good UI would show all the relevant information, with all the possible warnings, allowing user to inspect it all, and possibly correct it if they see something out of place, even with a casual glance. Requiring user action is a sure way to make most people ignore it.

P.S. BTW, this is the kind of reasoning behind the desired separation between UI (driven by users' goals) and engineering (driven by programmers). We shouldn't wear both hats at the same time. A good book about it on UI design - "Inmates are running the asylum" - speaks exactly of such cases, when UI is the result of the internals of the system, not because it is the most fitting for the users.

@woodpeck

This comment has been minimized.

Show comment
Hide comment
@woodpeck

woodpeck Oct 5, 2017

Wikidata is a remote/foreign system for OSM. There never has been a discussion or even resolution in the community to make Wikidata "the" external lookup database for all kinds of stuff. Personally I have come to view many actions by Wikidata proponents in OSM as outright hostile, riding roughshod over existing consensus in the project, and I am very critical of giving Wikidata some kind of special blessing that would allow this particular external database to be closely integrated with our editing workflow while other external databases - that might, depending on the subject, even offer more, better, or better-licensed data - are not supported in the same way. This is a matter that must be discussed without pressure and where the community needs to be given a chance to decide what they want, rather than being steamrolled into making OSM the auxiliary geodata store of Wikidata.

woodpeck commented Oct 5, 2017

Wikidata is a remote/foreign system for OSM. There never has been a discussion or even resolution in the community to make Wikidata "the" external lookup database for all kinds of stuff. Personally I have come to view many actions by Wikidata proponents in OSM as outright hostile, riding roughshod over existing consensus in the project, and I am very critical of giving Wikidata some kind of special blessing that would allow this particular external database to be closely integrated with our editing workflow while other external databases - that might, depending on the subject, even offer more, better, or better-licensed data - are not supported in the same way. This is a matter that must be discussed without pressure and where the community needs to be given a chance to decide what they want, rather than being steamrolled into making OSM the auxiliary geodata store of Wikidata.

@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 5, 2017

Member

be closely integrated with our editing workflow while other external databases - that might, depending on the subject, even offer more, better, or better-licensed data - are not supported in the same way.

Cool, if you know of any, please open issues so we can integrate them. Nobody wants to make wikidata the only integration if there are better sources. Development in iD is mostly driven by what people open issues for, and even more so what people open pull requests for.

This is a matter that must be discussed without pressure and where the community needs to be given a chance to decide what they want, rather than being steamrolled into making OSM the auxiliary geodata store of Wikidata.

I don't think it's fair to say that there is any "pressure" to add this feature, or that we're being "steamrolled" in any way. Do you really think this?

Member

bhousel commented Oct 5, 2017

be closely integrated with our editing workflow while other external databases - that might, depending on the subject, even offer more, better, or better-licensed data - are not supported in the same way.

Cool, if you know of any, please open issues so we can integrate them. Nobody wants to make wikidata the only integration if there are better sources. Development in iD is mostly driven by what people open issues for, and even more so what people open pull requests for.

This is a matter that must be discussed without pressure and where the community needs to be given a chance to decide what they want, rather than being steamrolled into making OSM the auxiliary geodata store of Wikidata.

I don't think it's fair to say that there is any "pressure" to add this feature, or that we're being "steamrolled" in any way. Do you really think this?

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 5, 2017

@woodpeck, while it has been mentioned, this discussion is not focusing on "the external lookup database for all kinds of stuff". Its main focus is to show relevant information for the Wikipedia/Wikidata tags only - such as label, description, and instance-of. In other words - addressing the community concern that Q12345 is unreadable and unverifiable. For other, non-wiki related tags, we may want to have additional support to correlate it wikidata, but this should be out of scope for this task. That support would probably be in the form of community-contributed validators.

nyurik commented Oct 5, 2017

@woodpeck, while it has been mentioned, this discussion is not focusing on "the external lookup database for all kinds of stuff". Its main focus is to show relevant information for the Wikipedia/Wikidata tags only - such as label, description, and instance-of. In other words - addressing the community concern that Q12345 is unreadable and unverifiable. For other, non-wiki related tags, we may want to have additional support to correlate it wikidata, but this should be out of scope for this task. That support would probably be in the form of community-contributed validators.

@1ec5

This comment has been minimized.

Show comment
Hide comment
@1ec5

1ec5 Oct 5, 2017

Collaborator

Speaking only for myself as the originator of this issue, my idea was merely to constructively address a usability issue that has been brought up repeatedly on the talk list regarding the management of Wikidata QIDs in OSM, not to wade into the broader issue of Wikidata’s role in OSM.

Collaborator

1ec5 commented Oct 5, 2017

Speaking only for myself as the originator of this issue, my idea was merely to constructively address a usability issue that has been brought up repeatedly on the talk list regarding the management of Wikidata QIDs in OSM, not to wade into the broader issue of Wikidata’s role in OSM.

@pigsonthewing

This comment has been minimized.

Show comment
Hide comment
@pigsonthewing

pigsonthewing Oct 5, 2017

I have come to view many actions by Wikidata proponents in OSM as outright hostile

Well, that certainly explains a lot.

I think, though, you may need to look closer to home for the source of any hostility.

Perhaps it's time that OSM adopted an "assume good faith" policy, like Wikipedia's:

https://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith

pigsonthewing commented Oct 5, 2017

I have come to view many actions by Wikidata proponents in OSM as outright hostile

Well, that certainly explains a lot.

I think, though, you may need to look closer to home for the source of any hostility.

Perhaps it's time that OSM adopted an "assume good faith" policy, like Wikipedia's:

https://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith

@woodpeck

This comment has been minimized.

Show comment
Hide comment
@woodpeck

woodpeck Oct 5, 2017

@bhousel, re. "do you really think it is", frankly in the last half year or so I feel that there's intense pressure on OSM to accept Wikidata links. I used to think it's a nice addition to existing Wikipedia links that we typically had for place nodes or important tourist attractions ("important" being the operative word - as you might know, Nominatim uses the existence of Wikipedia links to rank search results). But all of a sudden we have people demanding that we add Wikidata links to near everything, and especially also start encoding object properties in terms of Wikidata links (adding brand:wikidata or wikidata:brand, operator:wikidata or wikidata:operator, and so on). I'm still cross with @nyurik for the sheer amount of auto-added Wikidata links he's responsible for and I believe many of them deserve to be kicked out again on the grounds of being of questionable quality. I fear that unless we take a step back and think about what all this means for OSM, we'll indeed be steamrolled into doing something that might better have been done differently, or not at all. First just the place nodes, then everything else, then encoding properties in wikidata references (we've already seen the first operator:wikidata WITHOUT a matching human-readable operator tag), now you're being asked to conveniently hide the wikidata tags and make API queries to display "nice" text, something that further threatens to reduce the self-sufficiency of OSM. Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down. On the flip side, people can be tempted to modify OSM data based on what they find linked in Wikidata, an activity that is questionable at best and in the worst case violates copyright. I hope that our editor maintainers to at least ponder the wider implications of his issue before happily accepting any "more wikidata!!!" pull requests that come their way.

woodpeck commented Oct 5, 2017

@bhousel, re. "do you really think it is", frankly in the last half year or so I feel that there's intense pressure on OSM to accept Wikidata links. I used to think it's a nice addition to existing Wikipedia links that we typically had for place nodes or important tourist attractions ("important" being the operative word - as you might know, Nominatim uses the existence of Wikipedia links to rank search results). But all of a sudden we have people demanding that we add Wikidata links to near everything, and especially also start encoding object properties in terms of Wikidata links (adding brand:wikidata or wikidata:brand, operator:wikidata or wikidata:operator, and so on). I'm still cross with @nyurik for the sheer amount of auto-added Wikidata links he's responsible for and I believe many of them deserve to be kicked out again on the grounds of being of questionable quality. I fear that unless we take a step back and think about what all this means for OSM, we'll indeed be steamrolled into doing something that might better have been done differently, or not at all. First just the place nodes, then everything else, then encoding properties in wikidata references (we've already seen the first operator:wikidata WITHOUT a matching human-readable operator tag), now you're being asked to conveniently hide the wikidata tags and make API queries to display "nice" text, something that further threatens to reduce the self-sufficiency of OSM. Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down. On the flip side, people can be tempted to modify OSM data based on what they find linked in Wikidata, an activity that is questionable at best and in the worst case violates copyright. I hope that our editor maintainers to at least ponder the wider implications of his issue before happily accepting any "more wikidata!!!" pull requests that come their way.

@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 5, 2017

Member

Thanks for your input @woodpeck.. I agree with most of your concerns, replies below:

I used to think it's a nice addition to existing Wikipedia links that we typically had for place nodes or important tourist attractions ("important" being the operative word - as you might know, Nominatim uses the existence of Wikipedia links to rank search results).

Great, I also think the links are a nice addition, and as Wikipedia does have a notability threshold, it makes sense for Nominatim to do this.

But all of a sudden we have people demanding that we add Wikidata links to near everything, and especially also start encoding object properties in terms of Wikidata links (adding brand:wikidata or wikidata:brand, operator:wikidata or wikidata:operator, and so on).

Yes, but I can see why people want this. It would be inappropriate to create a Relation containing all the Starbuckses, and so people want to tag them with something that makes them easy to find. This is a reasonable thing to want to do. There has been confusion about brand vs operator for years, and free text fields would mean it might be spelled "Starbucks" / "Starbuck's" / "Starbuckʼs" or more, so assigning them all the value of brand:wikidata=Q37158 is a neat way of solving the problem without bothering anybody else that consumes the data.

I also think anybody who adds data this way should still fill in brand=Starbucks and even brand:source=wikidata. (If it's an editor feature, we can enforce this).

I'm still cross with @nyurik for the sheer amount of auto-added Wikidata links he's responsible for and I believe many of them deserve to be kicked out again on the grounds of being of questionable quality.

OK, I'm not very familiar with the work that was done, but it sounded like he just wrote a script to add wikidata tags that were missing. For people who don't use wikidata, it's not clear to me why they care about what was done. The mailing list thread got really toxic, and I'm kind of embarrassed by community response calling to shut down someone's work, just over one tag that's obviously useful.

I fear that unless we take a step back and think about what all this means for OSM, we'll indeed be steamrolled into doing something that might better have been done differently, or not at all.

Ok that's totally fair, no code has been written yet and this is the place to discuss concerns.

First just the place nodes, then everything else, then encoding properties in wikidata references (we've already seen the first operator:wikidata WITHOUT a matching human-readable operator tag), now you're being asked to conveniently hide the wikidata tags and make API queries to display "nice" text, something that further threatens to reduce the self-sufficiency of OSM. Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

Great - I don't want this either! (As I said above, I really don't want to replace the data tag with the human-readable display value). Anyone using wikidata as a source for something in OSM, I think they should have to set the the data tag, the human readable tag, and the source tag. I think the editor should enforce that these things go together.

On the flip side, people can be tempted to modify OSM data based on what they find linked in Wikidata, an activity that is questionable at best and in the worst case violates copyright.

Wikidata is CC0, which is allowable. This is one of the more clear sources of information to pull from.

I hope that our editor maintainers to at least ponder the wider implications of his issue before happily accepting any "more wikidata!!!" pull requests that come their way.

Again, I think all of your concerns are reasonable, and I don't want to brush them aside. Thanks for voicing them.. 👍 I don't want wikidata to replace osm, but rather to fit in places where it makes sense..

Member

bhousel commented Oct 5, 2017

Thanks for your input @woodpeck.. I agree with most of your concerns, replies below:

I used to think it's a nice addition to existing Wikipedia links that we typically had for place nodes or important tourist attractions ("important" being the operative word - as you might know, Nominatim uses the existence of Wikipedia links to rank search results).

Great, I also think the links are a nice addition, and as Wikipedia does have a notability threshold, it makes sense for Nominatim to do this.

But all of a sudden we have people demanding that we add Wikidata links to near everything, and especially also start encoding object properties in terms of Wikidata links (adding brand:wikidata or wikidata:brand, operator:wikidata or wikidata:operator, and so on).

Yes, but I can see why people want this. It would be inappropriate to create a Relation containing all the Starbuckses, and so people want to tag them with something that makes them easy to find. This is a reasonable thing to want to do. There has been confusion about brand vs operator for years, and free text fields would mean it might be spelled "Starbucks" / "Starbuck's" / "Starbuckʼs" or more, so assigning them all the value of brand:wikidata=Q37158 is a neat way of solving the problem without bothering anybody else that consumes the data.

I also think anybody who adds data this way should still fill in brand=Starbucks and even brand:source=wikidata. (If it's an editor feature, we can enforce this).

I'm still cross with @nyurik for the sheer amount of auto-added Wikidata links he's responsible for and I believe many of them deserve to be kicked out again on the grounds of being of questionable quality.

OK, I'm not very familiar with the work that was done, but it sounded like he just wrote a script to add wikidata tags that were missing. For people who don't use wikidata, it's not clear to me why they care about what was done. The mailing list thread got really toxic, and I'm kind of embarrassed by community response calling to shut down someone's work, just over one tag that's obviously useful.

I fear that unless we take a step back and think about what all this means for OSM, we'll indeed be steamrolled into doing something that might better have been done differently, or not at all.

Ok that's totally fair, no code has been written yet and this is the place to discuss concerns.

First just the place nodes, then everything else, then encoding properties in wikidata references (we've already seen the first operator:wikidata WITHOUT a matching human-readable operator tag), now you're being asked to conveniently hide the wikidata tags and make API queries to display "nice" text, something that further threatens to reduce the self-sufficiency of OSM. Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

Great - I don't want this either! (As I said above, I really don't want to replace the data tag with the human-readable display value). Anyone using wikidata as a source for something in OSM, I think they should have to set the the data tag, the human readable tag, and the source tag. I think the editor should enforce that these things go together.

On the flip side, people can be tempted to modify OSM data based on what they find linked in Wikidata, an activity that is questionable at best and in the worst case violates copyright.

Wikidata is CC0, which is allowable. This is one of the more clear sources of information to pull from.

I hope that our editor maintainers to at least ponder the wider implications of his issue before happily accepting any "more wikidata!!!" pull requests that come their way.

Again, I think all of your concerns are reasonable, and I don't want to brush them aside. Thanks for voicing them.. 👍 I don't want wikidata to replace osm, but rather to fit in places where it makes sense..

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 5, 2017

@woodpeck you do raise important points about self-reliance and other aspects, but I do not think they belong in this ticket. My auto-edits is also a separate discussion with diverging points of view, and in no way relate to this. The Wikipedia and Wikidata links have been added by nearly 20,000 individuals 1 2, and are very actively used by a large subset of the community and data consumers. You are welcome to start a world-wide discussion to prohibit or restrict its use, and try to achieve a consensus, but again, not relevant here. If they are banned and removed from OSM db, removing it from iD will be a matter of minutes.

On the other hand, just like in any other Open Source project, the feature set of each tool is driven by its users. You are welcome not to use it, but others find value in it. So just like the Wikipedia field in the iD editor uses external API call to autocomplete, Wikidata field can be augmented with additional information. If the Wikidata API fails, the information will still be there - in the form of the ID. If Wikipedia goes down, the Wikipedia title will remain there, just without the autocomplete. The data consumers can (and do) download full dumps of the databases, so even if all WMF data-centers melt down at once, and the fifth largest web site collapses, I don't think it will be that big of a problem. Especially because there are other services like my OSM+Wikidata, which has a full Wikidata copy, and can supply the same information. So there are clearly multiple options. Lets not spread FUD, just because you may not like what others are doing - no one is removing the tags that you care about. Having a better UI for Wikidata labels is good for anyone who wants to maintain good quality Wikidata links.

BTW, both Wikidata & Wikipedia uses the same servers, and you can imagine the amount of resources that goes into maintaining them, compared to the OSM servers.

nyurik commented Oct 5, 2017

@woodpeck you do raise important points about self-reliance and other aspects, but I do not think they belong in this ticket. My auto-edits is also a separate discussion with diverging points of view, and in no way relate to this. The Wikipedia and Wikidata links have been added by nearly 20,000 individuals 1 2, and are very actively used by a large subset of the community and data consumers. You are welcome to start a world-wide discussion to prohibit or restrict its use, and try to achieve a consensus, but again, not relevant here. If they are banned and removed from OSM db, removing it from iD will be a matter of minutes.

On the other hand, just like in any other Open Source project, the feature set of each tool is driven by its users. You are welcome not to use it, but others find value in it. So just like the Wikipedia field in the iD editor uses external API call to autocomplete, Wikidata field can be augmented with additional information. If the Wikidata API fails, the information will still be there - in the form of the ID. If Wikipedia goes down, the Wikipedia title will remain there, just without the autocomplete. The data consumers can (and do) download full dumps of the databases, so even if all WMF data-centers melt down at once, and the fifth largest web site collapses, I don't think it will be that big of a problem. Especially because there are other services like my OSM+Wikidata, which has a full Wikidata copy, and can supply the same information. So there are clearly multiple options. Lets not spread FUD, just because you may not like what others are doing - no one is removing the tags that you care about. Having a better UI for Wikidata labels is good for anyone who wants to maintain good quality Wikidata links.

BTW, both Wikidata & Wikipedia uses the same servers, and you can imagine the amount of resources that goes into maintaining them, compared to the OSM servers.

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 5, 2017

On the flip side, people can be tempted to modify OSM data based on what they find linked in Wikidata, an activity that is questionable at best and in the worst case violates copyright.

Wikidata is CC0, which is allowable. This is one of the more clear sources of information to pull from.

@bhousel, it seems with Wikidata there is a somewhat interesting conundrum. WMF considers individual data points to be non-copyrightable, per US laws, and has rules stating it is OK to copy a single GPS location from Google to Wikipedia. Wikidata often takes data from Wikipedia. So as the result, Wikidata has an aggregate that partially came from GMaps. OSMF on the other hand is highly paranoid about data sources, and wants to be whiter than white (I seriously doubt it is possible with a large community, but at least that's the stated goal). So as the result, OSMF considers Wikidata geo points as not "safe". All of them. On the other hand, I have not heard anyone objecting to other Wikidata values on the legal grounds. Just FYI, even though this is totally off the topic :)

nyurik commented Oct 5, 2017

On the flip side, people can be tempted to modify OSM data based on what they find linked in Wikidata, an activity that is questionable at best and in the worst case violates copyright.

Wikidata is CC0, which is allowable. This is one of the more clear sources of information to pull from.

@bhousel, it seems with Wikidata there is a somewhat interesting conundrum. WMF considers individual data points to be non-copyrightable, per US laws, and has rules stating it is OK to copy a single GPS location from Google to Wikipedia. Wikidata often takes data from Wikipedia. So as the result, Wikidata has an aggregate that partially came from GMaps. OSMF on the other hand is highly paranoid about data sources, and wants to be whiter than white (I seriously doubt it is possible with a large community, but at least that's the stated goal). So as the result, OSMF considers Wikidata geo points as not "safe". All of them. On the other hand, I have not heard anyone objecting to other Wikidata values on the legal grounds. Just FYI, even though this is totally off the topic :)

@pigsonthewing

This comment has been minimized.

Show comment
Hide comment
@pigsonthewing

pigsonthewing Oct 5, 2017

Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

There is, of course, no factual basis whatsoever for such a hyperbolic statement.

pigsonthewing commented Oct 5, 2017

Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

There is, of course, no factual basis whatsoever for such a hyperbolic statement.

@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 5, 2017

Member

Yes, @pigsonthewing, @woodpeck, & @nyurik, please remember to be nice and keep discussion factual and constructive.. This is an issue tracker for professionals working to improve OSM, not a mailing list.

Yes, @woodpeck I promise you will still be able to edit OSM if the Wikidata API is down. I'm not that bad a programmer, right?

Thank you! 🙇

Member

bhousel commented Oct 5, 2017

Yes, @pigsonthewing, @woodpeck, & @nyurik, please remember to be nice and keep discussion factual and constructive.. This is an issue tracker for professionals working to improve OSM, not a mailing list.

Yes, @woodpeck I promise you will still be able to edit OSM if the Wikidata API is down. I'm not that bad a programmer, right?

Thank you! 🙇

@matkoniecz

This comment has been minimized.

Show comment
Hide comment
@matkoniecz

matkoniecz Oct 6, 2017

Support possible import to OSM of useful values (population, brand, logo, name, etc)

I tried doing this and in my experience there are serious problems

  • data quality on Wikidata is really low, much lower than OSM, enwiki, plwiki, Wikimedia Commons or even Google Maps
  • license of Wikidata is unclear - I am pretty sure it is CC0 but under US law. As result people copy into Wikidata databases copyrighted under EU database law and claimed to not be copyrighted under USA law
  • people editing Wikidata are unconcerned about copyright (unlike Wikimedia Commons or OSM) - see for example https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=573156574#Wikidata:Copyright_rules_.28AKA_-_is_Wikidata_CC0_in_Europe.3F.29 where it turns out that basic information about license is not documented and nobody is able to answer simplest questions about how Wikidata is licensed

Wikidata may be useful - for example see my own https://www.openstreetmap.org/user/Mateusz%20Konieczny/diary/42385 but mass imports from wikidata are a serious copyrigt and quality problem.

matkoniecz commented Oct 6, 2017

Support possible import to OSM of useful values (population, brand, logo, name, etc)

I tried doing this and in my experience there are serious problems

  • data quality on Wikidata is really low, much lower than OSM, enwiki, plwiki, Wikimedia Commons or even Google Maps
  • license of Wikidata is unclear - I am pretty sure it is CC0 but under US law. As result people copy into Wikidata databases copyrighted under EU database law and claimed to not be copyrighted under USA law
  • people editing Wikidata are unconcerned about copyright (unlike Wikimedia Commons or OSM) - see for example https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=573156574#Wikidata:Copyright_rules_.28AKA_-_is_Wikidata_CC0_in_Europe.3F.29 where it turns out that basic information about license is not documented and nobody is able to answer simplest questions about how Wikidata is licensed

Wikidata may be useful - for example see my own https://www.openstreetmap.org/user/Mateusz%20Konieczny/diary/42385 but mass imports from wikidata are a serious copyrigt and quality problem.

@matkoniecz

This comment has been minimized.

Show comment
Hide comment
@matkoniecz

matkoniecz Oct 6, 2017

Wikidata is CC0, which is allowable.

It is CC0 under US law (I guess). It allows importing data from copyrighted sources, including Wikipedias, and databases copyrighted under EU law, including for example currently active proposal to import data from OSM - see https://www.wikidata.org/wiki/Wikidata:Bot_requests#OpenStreetMap_objects .

matkoniecz commented Oct 6, 2017

Wikidata is CC0, which is allowable.

It is CC0 under US law (I guess). It allows importing data from copyrighted sources, including Wikipedias, and databases copyrighted under EU law, including for example currently active proposal to import data from OSM - see https://www.wikidata.org/wiki/Wikidata:Bot_requests#OpenStreetMap_objects .

@matkoniecz

This comment has been minimized.

Show comment
Hide comment
@matkoniecz

matkoniecz Oct 6, 2017

Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

Making bizarre claims weakens parts of your argument that make sense.

matkoniecz commented Oct 6, 2017

Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

Making bizarre claims weakens parts of your argument that make sense.

@bhousel

This comment has been minimized.

Show comment
Hide comment
@bhousel

bhousel Oct 6, 2017

Member

Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

Making bizarre claims weakens parts of your argument that make sense.

Honestly this isn't that bizarre a claim.. There are lots of things in iD that rely on other services and do not work correctly when those services are down - see #4198

For example, when nominatim goes down, the address field in iD doesn't work. When taginfo goes down, many of our combo fields won't have have suggested values. Sometimes imagery providers go down or the OSM API goes down, making editing impossible. I'm sure the wikidata API is more stable than all of those other things I just mentioned, but still we should make sure not to overrely on it.

Member

bhousel commented Oct 6, 2017

Before too long you won't be able to edit OSM meaningfully if the Wikidata API goes down.

Making bizarre claims weakens parts of your argument that make sense.

Honestly this isn't that bizarre a claim.. There are lots of things in iD that rely on other services and do not work correctly when those services are down - see #4198

For example, when nominatim goes down, the address field in iD doesn't work. When taginfo goes down, many of our combo fields won't have have suggested values. Sometimes imagery providers go down or the OSM API goes down, making editing impossible. I'm sure the wikidata API is more stable than all of those other things I just mentioned, but still we should make sure not to overrely on it.

@pigsonthewing

This comment has been minimized.

Show comment
Hide comment
@pigsonthewing

pigsonthewing Oct 6, 2017

people editing Wikidata are unconcerned about copyright

Making bizarre claims weakens parts of your argument that make sense.

pigsonthewing commented Oct 6, 2017

people editing Wikidata are unconcerned about copyright

Making bizarre claims weakens parts of your argument that make sense.

@matkoniecz

This comment has been minimized.

Show comment
Hide comment
@matkoniecz

matkoniecz Oct 6, 2017

@pigsonthewing maybe I missed something - is there somewhere on wikidata page describing copyright status of collected data? Especially - is it CC0 under EU database copyright rules?

matkoniecz commented Oct 6, 2017

@pigsonthewing maybe I missed something - is there somewhere on wikidata page describing copyright status of collected data? Especially - is it CC0 under EU database copyright rules?

@spindr

This comment has been minimized.

Show comment
Hide comment
@spindr

spindr Oct 6, 2017

Maybe this can be more generally expressed as providing the ability to "check linked data" -- that checks all URLs and URNs, etc. including webpages, wikipedia, wikidata. But I'd make it an extra step if it'll take a while to perform.

spindr commented Oct 6, 2017

Maybe this can be more generally expressed as providing the ability to "check linked data" -- that checks all URLs and URNs, etc. including webpages, wikipedia, wikidata. But I'd make it an extra step if it'll take a while to perform.

@nyurik

This comment has been minimized.

Show comment
Hide comment
@nyurik

nyurik Oct 6, 2017

@spindr the API call is very fast (its one of the most common operation API is used for). The UI doesn't need to wait to show the data - e.g. we can show the Q123 in the Wikipedia field with a spinner, and once the API returns, morph/update it to include the returned results. Please think of this firstly from the usability perspective - what would be most convenient to the user.

nyurik commented Oct 6, 2017

@spindr the API call is very fast (its one of the most common operation API is used for). The UI doesn't need to wait to show the data - e.g. we can show the Q123 in the Wikipedia field with a spinner, and once the API returns, morph/update it to include the returned results. Please think of this firstly from the usability perspective - what would be most convenient to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment