All elements need id(?), strategy for generating ids #14

cboettig · 2013-08-12T19:25:38Z

Need to generate ids for nodes such as <otus> and <trees>, etc. Should we use uuid for this?

The text was updated successfully, but these errors were encountered:

rvosa · 2013-08-13T07:36:16Z

Do we expect people to alter the DOM tree a lot (i.e. are their risks of clashes if we use a simpler scheme)? Otherwise maybe tag name + a counter is more concise?

cboettig · 2013-08-15T02:59:47Z

@rvosa Yeah, I'm not sure -- still trying to wrap my head around this one. Most users will probably only use the top-level API for writing an ape::phylo tree or list of trees (ape:multiPhylo) to NeXML, in which case we can number them as we go. But the S4-Class-based interface we have so far also allows users to just coerce ape::phylo trees into the S4 RNeXML::tree class, which can then be inserted into NeXML later. Perhaps we don't want users doing that, but this means they could modularly build up the DOM and we then have to watch out for collisions.

Or in more concrete terms, I have this setAs("phylo", "tree" ...) subroutine for mapping phylo objects to the S4 object that mimics the schema. Since the phylo object doesn't have an ID, I either have to generate one at this time, or otherwise add the id when adding the tree to an existing or new nexml/trees object. Does that make sense?

In other news, the validator complains that UUIDs aren't valid id attributes:

... is not a valid value of the atomic type 'xs:ID'

hlapp · 2013-08-15T03:50:02Z

On Aug 14, 2013, at 10:59 PM, Carl Boettiger wrote:

In other news, the validator complains that UUIDs aren't valid id attributes:

That sounds like a validator bug.

rvosa · 2013-08-15T09:44:29Z

What do the UUIDs look like? The schema specifies that the type of @id is
xs:ID, which is a non-colonized name (NCName), so instance documents must
conform to the production rules of NCNames (probably most importantly:
start with a letter or an underscore). If they don't, I don't see how the
validator is at fault here.

On Thu, Aug 15, 2013 at 5:50 AM, Hilmar Lapp notifications@github.comwrote:

On Aug 14, 2013, at 10:59 PM, Carl Boettiger wrote:

In other news, the validator complains that UUIDs aren't valid id
attributes:

That sounds like a validator bug.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-22683684
.

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

hlapp · 2013-08-15T14:16:11Z

On Aug 15, 2013, at 5:44 AM, Rutger Vos wrote:

What do the UUIDs look like? [...] instance documents must conform to the production rules of NCNames (probably most importantly: start with a letter or an underscore). If they don't, I don't see how the
validator is at fault here.

UUIDs can start with a digit.

@cboettig: I suggest that if you choose UUIDs, you put them in the form of a urn:uuid: scheme. See http://www.ietf.org/rfc/rfc4122.txt

rvosa · 2013-08-16T10:27:51Z

IDs need to be non-colonized names, i.e. strings without colons. If I
understand your suggestion correctly, the UUIDs would contain colons, which
would be a no-no.

On Thu, Aug 15, 2013 at 4:16 PM, Hilmar Lapp notifications@github.comwrote:

On Aug 15, 2013, at 5:44 AM, Rutger Vos wrote:

What do the UUIDs look like? [...] instance documents must conform to
the production rules of NCNames (probably most importantly: start with a
letter or an underscore). If they don't, I don't see how the
validator is at fault here.

UUIDs can start with a digit.

@cboettig: I suggest that if you choose UUIDs, you put them in the form of
a urn:uuid: scheme. See http://www.ietf.org/rfc/rfc4122.txt

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-22705449
.

Dr. Rutger A. Vos
Bioinformaticist
Naturalis Biodiversity Center
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the
Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com

hlapp · 2013-08-16T13:17:13Z

On Aug 16, 2013, at 6:27 AM, Rutger Vos wrote:

IDs need to be non-colonized names, i.e. strings without colons. If I
understand your suggestion correctly, the UUIDs would contain colons, which
would be a no-no.

So HTTP URIs can't be IDs?

rvosa · 2013-08-19T11:26:47Z

Not normally. However, IDs can become part of HTTP URIs when transforming documents to RDF as they are then made globally unique by prefixing them with either the location of the document or the value of xml:base of the nearest ancestor node that contains this attribute. (Note that I didn't just make this up or anything.)

I see where you're going with this line of questioning. If we want HTTP URIs as IDs (good id(ea)), use xml:base.

cboettig · 2013-08-28T23:55:44Z

I was just using the uuid package, which generates uuids that look like:

> UUIDgenerate()
[1] "f7af80aa-dfb2-4134-aa82-db1c0e9e7980"

No colons, so I'm not sure why the validator (accessed with the R wrapper to xmllib2) is unhappy.

Regardless, not sure uuids were a good idea for this purpose anyhow. The current workflow doesn't give the user the same flexibility over the DOM directly, so we probably don't have to worry about a user creating two S4 "tree" objects and then sticking them in the same nexml with duplicated IDs.

Instead, there is a method for phylo->nexml that creates the ids for otus as t1, t2..., nodes as n1, n2..., edges as e1, e2... etc (done). A separate method for multiPhylo -> nexml will allow the user to add multiple trees while avoiding id conflicts (not written yet). With a sensible top-level API I think we should be fine using these simple ids(?)

cboettig · 2013-09-06T20:53:21Z

Okay, I think we're happy with our only locally unique ids for the moment. (Though still unsure what was wrong with the uuid above according to the validator...). Anyway, closing this issue.

cboettig · 2013-11-27T23:17:46Z

It appears that strings starting with a number were not valid ids (and uuids often start with numbers).

To address this, all functions that assign ids use the internal method nexml_id(), which can create local numbers using a given character prefix; e.g. edges use "nexml_id("e") to get ids like e1, e2, etc, using an internal counter. The counters start at 1 and increase each time the id of a given prefix is used in that R session, unless reset with reset_id_counter(). This local counter scheme is used by default.

The command options(uuid=TRUE) will make RNeXML use uuids for all id attributes instead. To avoid the validation error, these are prepended with uuid-. This option can be issued per session or put in the user's .Rprofile as persistent configuration. options(uuid=FALSE) sets the behavior back to the local identifiers.

test_global_ids.R provides a unit test that we generate valid nexml when using the global (uuid) id scheme.

rvosa closed this as completed Aug 13, 2013

rvosa reopened this Aug 13, 2013

cboettig closed this as completed Sep 6, 2013

cboettig mentioned this issue Dec 18, 2017

Ontotrace Example is not valid NeXML? #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All elements need id(?), strategy for generating ids #14

All elements need id(?), strategy for generating ids #14

cboettig commented Aug 12, 2013

rvosa commented Aug 13, 2013

cboettig commented Aug 15, 2013

hlapp commented Aug 15, 2013

rvosa commented Aug 15, 2013

hlapp commented Aug 15, 2013

rvosa commented Aug 16, 2013

hlapp commented Aug 16, 2013

rvosa commented Aug 19, 2013

cboettig commented Aug 28, 2013

cboettig commented Sep 6, 2013

cboettig commented Nov 27, 2013

All elements need id(?), strategy for generating ids #14

All elements need id(?), strategy for generating ids #14

Comments

cboettig commented Aug 12, 2013

rvosa commented Aug 13, 2013

cboettig commented Aug 15, 2013

hlapp commented Aug 15, 2013

rvosa commented Aug 15, 2013

hlapp commented Aug 15, 2013

rvosa commented Aug 16, 2013

hlapp commented Aug 16, 2013

rvosa commented Aug 19, 2013

cboettig commented Aug 28, 2013

cboettig commented Sep 6, 2013

cboettig commented Nov 27, 2013