New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify the type of sources and targets #16
Comments
No objection about identifying the type of ressources addressed by adding #...TYPE URIs. Typically the canoncial identifiers used are not URIs. There undoubtedly exist "things" like ISBNs outside the semantic web and identifiers by this standard do not contain the string "urn:isbn:", although this is an officially registered namespace for ISBNs. And there are things like IMDb IDs where you have persistent URLs but any kind of URI you'll make up will be inofficial at the moment. Thus #SOURCETYPE may be something identifying "books" (or some restriction like "ebooks", or "books about animals"), #PREFIX might be "urn:isbn:" (for what it's worth. As you clarified in issue #15 it is a string, not an URI) but there still lacks a statement about the kind of identifiers we use for the mapping, namely an ISBN's. This might be stated as (I'm not sure about giving a reduced namespace prefix or a specification document is better suited in absence of a "standards vocabulary")
(and this time "urn:isbn" is an URI). |
There is no need to state the "kind of an identifier" because all identifiers MUST be URIs. This is not a bug but intended on purpose. It's not a problem because Beacon is about links, not about identifiers. In lack of official URI namespaces just use an inofficial namespace. Nobody is interested in plain identifiers anyway but in what these identifiers identify and what is linked by these identifiers. |
I strongly disagree. If all would be about URIs then VoID Linksets were everything you need and especially Beacon files would just be an attempt to backport meaningful RDF into just another silly serialization as text files. Above I had hoped to outline clearly enough that at least some "classical identifiers" are not born as URIs. And still do not have official unique URIs. And even for those who have almost no existing software uses these URIs but almost always reduces them to just plain old "numbers". And these (non-URI) identifiers are more than an enumeration but form a system governed by assignment rules, syntax specifications and so on. Therefore there is a huge gap between specifying a #PREFIX /string/ which turns every individual identifier into a (private or official) URI and additionally stating (by a #SCHEME /URI/) that the unprefixed numbers used in the data section of a Beacon File are not arbitrary but rather taken from an established identifier system according to its semantics. Of course, the spec must be careful when talking about "identifiers" in the URI sense and and "classical identifiers from systems" not yet completely transformed into the semantic web framework. |
Please send me a concrete pull request to modify the actual specification. I don't get the use case of "established identifier systems" without URI prefix. At least for mapping to HTML you get an URL as target, so conforms to URI syntax. If you don't have an URI prefix for the source, you need to somehow communicate which kind of identifier you are using, anyway. Let's say there is an establed identifier system called "gnarz". How shall a user of your link dump know that you use this system? The burden of somehow configurating that "this Beacon uses gnarz-ids" is same to configurating that all URIs starting with |
I'll give some examples instead, culminating in directions about handling the famous gnarz-IDs. I. There is an official HTTP-URI for the identifiers used: Unfortunately there is no official URI or persistent URL for "GND" (either seen as a dataset or as an "effort" consisting of objectives and rules) and I have many choices to provide a web-operational URI "about" GND. I'd probably opt for This case is typical for "modern" URIs which align web-friendly according to registered domain names at the price that there is no IANA-delegation chain for their identifiers. VIAF-Identifiers fall in this category. II. Official non-HTTP-URI Further examples are BNF-identifiers (a subspace of info:ark) and LCNAF (info:lccn), info:oclcnum and so on with the additional obstacle that these info-URIs are quite outdated and one certainly would prefer III. The World according to the Gnarz community In this situation we still have to utilize #PREFIX in the extended meaning of providing an URL pattern (one probably would not call the queries above UR_I_s since there are so many possible searches in the database to yield the same result) although it might me cleaner to use the made-up-Prefix as #PREFIX and give the hint how to actually reach the (source) data on the web only in a description field or so: Then (in the case where we drop working URLs on the record level) we are in the comfortable situation of being able to say everything with the #PREFIX and indeed do not need an additional #SCHEME statement. |
There are at least two of them:
The "#SCHEME" meta field is meaningless to me unless you provide an exact definition, where exactely to put it the specification and what to change there instead.
It's not the purpose of an URI to resolve to anything, but to identify something. If one wants to know about the nature of an URI, he or she just has to traverse the URI hierarchy. In this case you clearly end up at RFC 3187. This is how URI is defined. Again, it does not help to suggest alternative forms of ISBN as URI that nobody uses anyway, Just stick to the most used form for interoperability. I think ISBN is no good example to illustrate your point.
But there at least exists an URI form, so why not using it instead of providing meaningless sequences of characters and hoping that applications will guess its meaning? There is
If there is no URI form of gnarz identifiers, one has to create it. For instance:
This is not less usable than for instance
or
In short: If there is an URI schema of identifiers, then use ist. If there are multiple URI schmas, use the most popular form. If there is no URI schema, define one and propagate it so others can benefit from your data. Without known URI schema, identifiers are not usable without human intervention, anyway. |
ad I. ad II. ad III. Maybe we cannot do better than VoID: Section 4.2 http://www.w3.org/TR/void/#pattern generalizes turtle prefixes (void:uriSpace) to Regex patterns (void:uriRegexPattern) /in case/ the URIs have something in common. The datasets as a whole are identified by URIs preferably "provided by the original data provider". If there is none, one should "mint" one in one's own namespace /and/ include a link to /the/ homepage of the dataset (the VoID primer argues that this at least could help with "discovery"). In our setting #PREFIX takes care of the former, but at the moment there is no meta element taking care of the latter (i.e. identifying the "source" dataset - and accidentally we also somehow lost the identifying URI for the target dataset, leaving us with the #NAME string). But I still think the Beacon situation is not identical to the VoID situation: Albeit you can construct a "source dataset" the primary concern is to assign (or create) target URIs from source /identifiers/. #PREFIX strings are a means to transform this into something which can be expressed by RDF but it often introduces an element of volatility not present when sticking to the plain old identifiers. Furthermore the implicit identification of the #PREFIX string with a namespace URI which is to be taken as "canonical" URI representing the source dataset which in turn claims to explain the identifiers used is neither clean nor does it work (remember the days where http://d-nb.info/gnd/ could not tell us wether we were talking about GKD or SWD?) Admittedly Beacon has other serializations than RDF/XML but we define #PREFIX to be a string (or RDF literal) and therefore we must not ever use its content as an URI (and maybe we should rethink wether PREFIX should be allowed to be an URI template). Now Question 1: Can we always identify the two datasets involved by "vendor supplied" URIs or at least their respective homepages? Question 2: If yes, is this valuable enough to justify meta fields for this or these (I understand #HOMEPAGE links to an external description of the /linkset/ which should include - targetted at human readers? - some basic information on the datasets involved) Question 3: In cases of a very artificially constructed "source dataset" (think of gnarz as the numbers in a phone directory from 1920 not yet digitized) can/should we somehow relax the "source dataset" URI to something more strongly relating to the identifiers used? Question 4: Should we stress the more formal aspects of the identifiers as textual data e.g. by optionally providing an .xsd description of their form? The URL of this description could serve as the URI of question 3? Actually, I was quite fond of the old notion |
Please provide specific changes and separate different issues instead of a general discussion. Issue #16 is solved by introduction of #SOURCETYPE and #TARGETTYPE. I opened issue #20 for how to deal with non-URI identifiers and issue #21 for what about the source dataset (question 2), you may create additional issues for the other questions. |
The proposed |
This does not fully solve this request, but the introduction of |
There may be a need to identify the "kind of" resources identified by source URIs and/or target URIs (there is no "kind of" identifier as all identifiers are URIs). This information may just be put in the #DESCRIPTION meta field. The concept of a "kind of" thing is rather fuzzy anyway. A formal solution was to introduce something like #SOURCETYPE and/or #TARGETTYPE, for instance to state that all entities linked to/from are people (foaf:Person): For instance
is mapped to the RDF graph:
The text was updated successfully, but these errors were encountered: