Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the semantics of slashes, which is shared by client and server #35

Closed
RubenVerborgh opened this issue Aug 12, 2019 · 35 comments
Closed

Comments

@RubenVerborgh
Copy link
Contributor

No description provided.

@csarven
Copy link
Member

csarven commented Sep 16, 2019

"semantics of slashes"? Do you mean that 1) certain count of slashes 2) in particular locations of a URI should 3) have a special meaning and 4) handled in a particular way?

If so, I see that as an implementation detail and not something that should be prescribed in a spec.

@pmcb55
Copy link

pmcb55 commented Sep 16, 2019

@csarven No - I think this is about clients being able to rely on the semantic that a slash in an IRI explicitly means 'LDP containment', i.e. that an IRI of 'foo/bar/baz' means both 'foo' and 'foo/bar' are LDP containers, and that the container 'foo/bar' ldp:contains 'foo/bar/baz', and that the container 'foo' `ldp:contains' 'foo/bar'.

@csarven
Copy link
Member

csarven commented Sep 16, 2019

@pmcb55 I believe what I've raised covers that case.

What's the use case for a client application observing foo/bar/baz and needing to infer that foo or foo/bar are something in particular eg. LDP Containers? Note the subtle difference when encountering foo/bar/ ldp:contains foo/bar/baz where naturally foo/bar/ is a container ie. mostly likely described with rdf:type ldp:Container or at the very least can be inferred given the LDP vocab (as per ldp:contains rdfs:domain ldp:Container).

Clients should solely rely on the values provided in the HTTP headers or the message body for a given URI.

@RubenVerborgh
Copy link
Contributor Author

I agree with you @csarven.

However, @timbl wants to give slashes in URIs a shared meaning among servers and clients, and hence specify that.

@csarven
Copy link
Member

csarven commented Sep 16, 2019

There are also potential issues for baking this sort of knowledge about URIs, if for example, LDP is no longer the primary or strictly the only platform. It effectively means that clients can work with non-LDP servers without major changes to their internal operations. So, while the Solid Ecosystem is for the time being specified around LDP, it doesn't need to go "all in" especially when we are not really forced to. It gives the ecosystem a chance to evolve in a simple manner.

@pmcb55
Copy link

pmcb55 commented Sep 16, 2019

Yep - I agree with you @csarven (and @RubenVerborgh) too! For the past two All-Hands I've wanted to discuss this very specific issue, but both times we didn't find the time to actively discuss it, unfortunately.
But @csarven - to your point about being locked into LDP, I probably overstated it above when I said slashes in IRI's explicitly mean ldp:contains. I think @timbl may just be saying the spec should state a slash in an IRI means 'contains' (although being more vague like that might hurt implementation interoperability!).
I believe the motivation behind it may have been to allow a Pod's IRI space map automatically to a file-system's folder and filename space.

@csarven
Copy link
Member

csarven commented Sep 16, 2019

Which is why such requirement needs to be rooted in use cases. What's the bottom line problem that's being addressed?

Even if something like this is given as a MAY or a Note, then there is still the risk of implementations dissecting URIs instead of looking at the headers or body.

It also raises the bar on making sure that the URI pattern is synchronised with the underlying semantics - another requirement.

Consequently, the spec will be handing out a URI Template reflecting LDP's semantics.

Hence, the reasons I've provided above (and I'm sure there are more) doesn't make me comfortable about this requirement.

Do the perceived benefits outweigh the issues?

@acoburn
Copy link
Member

acoburn commented Sep 16, 2019

I also see this primarily as an implementation detail.

There is, however, one use case where a definition of the semantics of slashes could be especially useful, namely in the intersection of LDP and WebACL. LDP containment is strictly parent-to-child; for a given child resource, there is no specification-defined way (using HTTP headers or RDF statements) to find the parent resource. In WebACL, on the other hand, the inheritance algorithm requires knowledge of the child-to-parent relationship.

For an implementation, this means that it is necessary to have some mechanism to traverse from parent to child as well as from child to parent. Effectively, this means that an implementation needs some internal mechanism for exposing this information. This is easy if the LDP and WebACL components interact below the HTTP layer (this is the approach I have implemented); otherwise, if the two components are entirely separate and interact above the HTTP layer, there would need to be either (a) an assumption about the semantics of slashes or (b) some implementation-specific header (or other signifier) that indicates where the parent resource can be found.

@justinwb
Copy link
Member

I agree with you @csarven.

However, @timbl wants to give slashes in URIs a shared meaning among servers and clients, and hence specify that.

For reference - see comment from @timbl here

@RubenVerborgh
Copy link
Contributor Author

RubenVerborgh commented Sep 16, 2019

I think we will need to ask @timbl to write a proposal, or for someone to write up @timbl's proposal if they have sufficient details.

He's likely also in the best position to answer @csarven's point regarding use cases, given that we have not identified other supporters so far. However, @acoburn's point already comes close; we should wonder whether we can realize WebACL without requiring these semantics.

@csarven
Copy link
Member

csarven commented Sep 24, 2019

As the ACL system already requires knowledge of child-parent relationship, then the pointer isn't particularly implementation-specific above and beyond what can work with a "file-based-solid-server". So the notion of .. (parent directory) like in Unix should suffice: ex:containedIn or acl:parent or whatever.

@kjetilk
Copy link
Member

kjetilk commented Sep 25, 2019

I'm not adding myself to the list of supporters just yet, but I do note that this could be used to resolve one of my itches with LDP-RSes, that POST doesn't mean "add some triples to the resource", which I think is a very important intuition for developers... :-) The key problem within LDP is that the client can't tell the difference between an LDP-C and an LDP-RS. With this, it can, just look for a slash.

@csarven
Copy link
Member

csarven commented Sep 27, 2019

I'm not interpreting the bit on "client can't tell the difference" the same way as you. Can you elaborate? Would a Link rel=type with a specific target URI in the response to Request-URI help to differentiate LDP-C and LDP-RS? See https://www.w3.org/TR/ldp/#ldpr-gen-linktypehdr

@csarven csarven added this to the December 19th milestone Oct 1, 2019
@Mitzi-Laszlo Mitzi-Laszlo added this to To Do in Specification Oct 2, 2019
@dmitrizagidulin
Copy link
Member

(Bringing this up on this issue because it relates to the semantics of slashes.)

We need to specify the expected behavior of whether requests to Solid ldp containers always require a slash, or not.

For example, say I have a top-level container test-container. What should happen when I do an http GET to /test-container (no slash at the end)? The options are:

  1. Return a 404 Not Found (since "no slash" explicitly means "not a container", and there doesn't exist a non-container resource named test-container there)
  2. Do a 301 redirect to /test-container/ (this is what NSS currently does).
  3. Treat the request as if it was a GET to /test-container/ (with slash), without a 301 redirect (and so, return the appropriate container listing triples etc).

@csarven
Copy link
Member

csarven commented Oct 6, 2019

We need to specify the expected behavior of whether requests to Solid ldp containers always require a slash, or not.

Why?

It is up to the social entities having rights on the URI as per URI Ownership: https://www.w3.org/TR/webarch/#def-uri-ownership , and a resource they've decided to associate with a URI.

I have a top-level container test-container.

What's the URI of the "top-level container"?

What should happen when I do an http GET to /test-container (no slash at the end)?

The server responding to URI http://example.org/test-container:

GET /test-container

HTTP/1.1 200 OK
Link: <http://www.w3.org/ns/ldp#Container>; rel="type",
      <http://www.w3.org/ns/ldp#Resource>; rel="type"

Alternatively, 3xx.

If the server is not managing the URI http://example.org/test-container:

404.

Aside: Possibly 403 re #14

If a client must know what the URI is before engaging, it should make a HEAD or maybe even an OPTIONS request to pick up the information as mentioned in #35 (comment) .

@csarven
Copy link
Member

csarven commented Oct 6, 2019

Okie dokie. Let's dive deeper:

From https://www.w3.org/TR/webarch/#uri-opacity :

It is tempting to guess the nature of a resource by inspection of a URI that identifies it. However, the Web is designed so that agents communicate resource information state through representations, not identifiers. In general, one cannot determine the type of a resource representation by inspecting a URI for that resource.

Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.

Hence, this issue, the request to interpret the slashes in a particular way in context of LDP, is also requesting to ignore AWWW's good practice and at least come up with acceptable reasons. It boils down to whether the reasons are in fact acceptable when contrasted with the implications.

It can be argued from a Web user's perspective such that there is a value in Guessing information from a URI as per The use of Metadata in URIs: https://www.w3.org/2001/tag/doc/metaDataInURI-31#guessing . However, this potential or perceived value is not at the centre of this issue.

It is of course anything goes in the end (sort of.. maybe) as per Reliability of URI metadata https://www.w3.org/2001/tag/doc/metaDataInURI-31#erroneous if we were to stretch the possibility to its extreme in that slashes are spec'd:

Constraint: Web software MUST NOT depend on the correctness of metadata inferred from a URI, except when the encoding of such metadata is documented by applicable standards and specifications.

Not sure if slashes actually fall into the "metadata" category there.

In any case, it is also mentioned that even if the information is reliably encoded in the URI, representation metadata takes precedence: https://www.w3.org/2001/tag/doc/mime-respect#dav-scenario. The example is about Content-Type but we can see how that would hold true for LDP's interaction model via Link header. So, given information in the URI vs representation metadata, the latter wins.

re:

"Add the semantics of slashes, which is shared by client and server"

essentially warrants URI string processing tied to LDP that entails:

  • The need to publish URI Template(s).
  • The semantics encoded in the URI string becomes most authoritative and overrides the information in the representation metadata and data.
  • The URI string must remain consistent with information in the representation metadata and data.
  • The URI string obsolesces or at least makes the requirement to include LDP interaction models redundant.
  • Client knowing whether they are communicating with a Solid server (and in which cases) in order to have a specific interpretation of the URI pattern.
  • Favours imperative over declarative approach to "making sense" of the underlying semantics of resources.
  • ?

Putting it together, I think there may be a loophole or at least one can be engineered ;) For that, I want to contrast the above concerns with the key points mentioned in Authoritative Metadata: https://www.w3.org/2001/tag/doc/mime-respect because it posits that the HTTP header metadata should be considered authoritative (over the representation data):

This might seem like going on a tangent but if the URI pattern imposes behaviour, it is essentially intended to be interpreted to be more authoritative than the representation metadata. So, instead of deriving the semantics directly off the URI, the "file-based-solid-server" spec (as opposed to the "The Solid Ecosystem" spec!) can prescribe an HTTP header for server's intended semantics on the URI eg. URI-Template: <..> (one more patterns) or maybe even something weird and over-engineered like Link: <..>; rel="http://www.w3.org/ns/footprints#uriTemplate", .. , to help clients with follow-ups. Lastly, in order to not conflict with the existing LDP requirements, this way of using the header MUST be derived from the information that would normally be in the representation metadata as per LDP's interaction model, and not the other way around in order to avoid inconsistencies (as mentioned earlier).

As for the child-to-parent relationship, that's still a separate case. It needs a relation along the lines of #35 (comment) .

Don't break stuff. Final answer.

@acoburn
Copy link
Member

acoburn commented Oct 6, 2019

I would note that, while NSS has a behavior such at a request for /container redirects a client to /container/, is it not also legitimate to do the opposite? I.e. a request for /container/ redirects to /container? Requiring that containers end in the / character seems overly specific and it conflicts with the advice that @csarven mentions above.

Link headers in a response will always indicate whether a resource is a container or not, and those are much stronger semantics than relying on the particular form of a URL.

Also, what if a client creates a LDP-RS at /foo and then modifies the resource such that it becomes a container? (NSS can't do this, but other systems can). Would that mean that the resource's URL must change? That seems problematic.

@RubenVerborgh
Copy link
Contributor Author

Many arguments above are, I think, preaching to the choir. In the sense that there currently seems to be no one on thread who wants to give slashes semantics. Perhaps it would be good to count explicit support for it in addition to @timbl's, who I'll invite to lay out his arguments.

@dmitrizagidulin
Copy link
Member

Ok sweet :) What I'm hearing on the thread is - "it's up to the individual Solid server implementation" to decide the trailing slash behavior (with respect to containers).
I can work with that.

@csarven
Copy link
Member

csarven commented Oct 6, 2019

@RubenVerborgh Which arguments do you think are not preaching to the choir? Were those then worthwhile in the end?

When we come across an issue like "Add the semantics of slashes, which is shared by client and server", how should one ought to engage? Does everyone agree on that? I've responded as best as I could as comments emerged without taking anything for granted. I have even provided a possibility in which the feature request could be implemented or further explored. Or maybe even completely throw out because there was a flaw somewhere. That's all worthwhile in my opinion. That's all meanwhile I don't personally think the feature is a good idea for TSE. I was trying to be neutral.

IMO, the point of these issues is not merely about getting a basic vote on issues or slapping a solution right off the bat, but to go through the "obvious" stuff so that we are in fact working with the same material, where support in favour or against can be based off that. At the very least, the reasoning process is documented, can be referred back, as well as to potentially identify where we have gone wrong. There is a tonne of material out there that we are trying to knit together so that the outcome is coherent as a whole. We have to buckle up and work through - the choir is not the only audience here. Not to forget that there is a lot of material that we can all learn from each other. I'm certainly not an expert in all areas and so I do in fact look forward to bits of information and references that emerge out of these discussions (in the whole repo and elsewhere).

It wasn't until #35 (comment) and in fact in #35 (comment) which made the background obvious and digestible, .. workable! But the discussion continued. That's a good thing. There is absolutely no reason to shut that off or pause, and evidently necessary. So, perhaps starting with that right up front "This issue is a stub for feature x. It was mentioned at y. z will present their case" would save up a lot of time and avoid preaching to the choir?

@RubenVerborgh
Copy link
Contributor Author

RubenVerborgh commented Oct 6, 2019

Apologies; didn’t mean to cut off (and did not ask for that). It just seemed that we were all trying to convince each other that slashes should not have semantics, which—if we indeed already were to agree—might not be meaningful. Hence my proposal to see if we actually have any arguments in favor, to add new input to the discussion. (For the record, I also do not want slashes to have semantics. I created this issue because I was tasked to. Note that I have tagged and invited the original proposer on multiple occasions.)

So let me phrase this as a question: is there anyone here who thinks that slashes should carry semantics, and if so, what arguments are there for it?

@pmcb55
Copy link

pmcb55 commented Oct 6, 2019

I don't understand @csarven's comment above (sorry!), but to @RubenVerborgh's simple question, I don't think slashes should carry semantics (simply because it's extra implicit knowledge that I think should be made explicit (i.e. in Link headers)), but I'm very interested in hearing clear arguments for it.

@timbl
Copy link
Contributor

timbl commented Oct 7, 2019

Briefly, my code assumes that slashes carry semantics.

  • If you save a file as /foo/bar/baz then /foo/ will exist can contain /foo/bar/ and /foo/bar/will exist and contain /foo/bar/baz

A solid store is hierarchical. The slashes in the URIs represent that hierarchy.

Users are already used to / as used in file: space and find it

The file.dir() function in rdflib can be used to go to the containing dir by making a function of teh URI with no network access.

You can build a flat non hierarchical system in solid - just don't use the slash/.

Tim

@csarven
Copy link
Member

csarven commented Oct 7, 2019

Do you want URI intention as a MUST criteria or would a Note suffice? Can it be confined to the "file-based-solid-server" spec - which defines a subcategory of interactions - or should be in the general "The Solid Ecosystem" spec?

@csarven
Copy link
Member

csarven commented Oct 7, 2019

If you save a file as /foo/bar/baz then /foo/ will exist can contain /foo/bar/ and /foo/bar/will exist and contain /foo/bar/baz

I presume you mean server code (that's actually saving). Otherwise I don't see how or why a client needs to know /foo/ and /foo/bar/ exists prior to requesting /foo/bar/baz to be saved. Mentioned in #35 (comment)

A solid store is hierarchical. The slashes in the URIs represent that hierarchy.

At the risk of repeating #35 (comment) , if slashes in URI drives meaning for servers and clients, there are implications which complicates the overall system rather than simplify. I would at least like some responses to each of the points mentioned in that comment in order to move towards resolving this issue.

If slashes in URI become a thing, there are a number of other features in Solid that are affected by it.

If slashes in URI are authoritative and have stronger semantics than representation metadata and data, we may even be able to throw out a lot of the core requirements from LDP eg. starting with ldp:contains for some cases. After all, if URIs are driving behaviour (but actually server's hierarchical store imposing that on clients!) why would there be a need to look at the description of a resource (especially if only container information is needed)? There even wouldn't be a need to perform GET, HEAD,.. we can just write a whole spec around str.split('/').

The arguments for -- use cases for - having a directory system must be many and varied, they the same arguments as for having directory structure in un*x systems.

You can build a flat non hierarchical system in solid - just don't use the slash/.

The objection is about URI designs and that it should not be imposed in the spec.

The objection is not about preventing the design of hierarchical storage systems where their assets are directly mapped to URIs.

[Don't tell me what I should do and not in my bedroom. Except maybe to turn off the lights before sleeping because that's a good thing.]

@RubenVerborgh
Copy link
Contributor Author

Can it be confined to the "file-based-solid-server" spec - which defines a subcategory of interactions

Minor note: that spec is confined to server--server interoperability only, i.e., if I move my filesystem to another server, then it can read from that same filesystem. This issue is specifically about client--server interoperability.

@csarven
Copy link
Member

csarven commented Oct 7, 2019

Noting now the planned (defined/documented?) intention of "file-based-solid-server" spec. What I said was in context of server-client.

@kjetilk
Copy link
Member

kjetilk commented Oct 7, 2019

What I think everyone is grappling with here, @timbl , is how the shared client-server semantics of the slash is viewed in light of the URI Opacity practice advocated by AWWW, which I suppose we are all taking as foundational. I can certainly see the practical case for the shared semantics, but I don't understand how it is compatible with the URI Opacity practice of AWWW, and from the thread here, it seems I am not alone.

@csarven
Copy link
Member

csarven commented Oct 8, 2019

Interestingly https://www.w3.org/TR/ldp/#ldpr-informative :

What guidelines exist when interacting with LDPRs that are common but are not universal enough to specify normatively?

I've completely forgot about the following two sections from the Linked Data Platform Best Practices and Guidelines :

Also noting:

Software agents (code acting on behalf of users [WEBARCH]) must be careful before exploiting the structure of URIs, considering historical problems when doing so ([WEBARCH], [metaDataInURI]).

Which arrives at the exact guidance as I've presented earlier.

We could reiterate the two guides in the The Solid Ecosystem along the lines of "Hierarchical storage systems may want to adopt the following best practices." Anything normative requires.. work.

@csarven
Copy link
Member

csarven commented Oct 30, 2019

Proposal following F2F meeting of 2019-10-30 with @csarven @timbl @kjetilk @RubenVerborgh :

The proposed shared semantics for slash (/) is intended to build on URI Generic Syntax's notion of path segments ( https://tools.ietf.org/html/rfc3986#section-3.3 ) as opposed to RDF's use of IRIs (which carries no semantics).

@csarven
Copy link
Member

csarven commented Jun 22, 2020

07a1cd4
2abbeca

@pietercolpaert
Copy link

I believe adding this in the spec was a mistake: a resource is not defined by the container it is contained in. The same resource can be part of multiple containers and this containment can change over time. In that sense, I believe the comparison between a file path on disk and an IRI for an ldp:Resource is flawed, as the file path identifies a location of a file, it doesn’t identify this particular resource as is the case on the Web. A client however can decide to implement a strategy like this, but it would be on my list of bad and unsustainable information architecture practices.

Proposal: can we remove this chapter from the spec, and even note that a client cannot by default assume that, because an IRI has a container IRI as its prefix, that is contained in it, because that may have changed since the creation of the resource.

@kjetilk
Copy link
Member

kjetilk commented Dec 3, 2021

I understand that there are some such arguments. However, I think they are due to that people tend to overlay requirements stemming from knowledge organization on the top of the URI space. That is what results in the requirement to have multiple containers and containment that changes over time.

I think we need to have a best practices document that makes it clear that the containment hierarchy is just a containment hierarchy. You may attach some access control to it, but not much more than that.

Instead, people should be using other knowledge organization systems on the top of the containment hierarchy. One may even design a different filesystem-like thing on the top of it, that does not follow the containment hierarchy if the requirement is to move resources around.

In particular, I think people should be using SKOS for most of their needs, i.e. let SKOS point to resources which may reside in a hierarchy, or not using relations between them. The key, either way, is not to put much interpretation into the containment hierarchy, it is not intended as an important part of the information architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Specification
  
Done
Development

No branches or pull requests

10 participants