Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify/advise semantics of default Container resources (index.html/.ttl) #69

Open
dmitrizagidulin opened this issue Sep 24, 2019 · 94 comments

Comments

@dmitrizagidulin
Copy link
Member

dmitrizagidulin commented Sep 24, 2019

A topic that causes a lot of implementor and app-developer confusion is the handling of default resources for Containers - index.html and/or index.ttl.

Questions:

  1. Is the index.html behavior a MUST or MAY?
  2. Is the index.ttl behavior a MUST or MAY?
  3. How to prevent the fairly common occurrence of "I accidentally created an index.html but forgot to add an .acl, plus that kicks me out of the Data browser, so anyways what do I do now."

In either case, we should either add it to spec or provide non-normative text advising devs what to do.

Related issues:

@csarven csarven added this to the March 19th milestone Oct 3, 2019
@csarven csarven self-assigned this Oct 4, 2019
@csarven
Copy link
Member

csarven commented Nov 5, 2019

There is rough consensus in solid/solid-spec#134 that handling of LDPC resources is an implementation detail.

I've created #109 to provide some answers to the general issue around representation handling.

Regular Accept rule should apply (with optional q parameters) for reading. For writing, it'll fallback on the interaction rules around LDPC specified by applicable methods.

What kind of non-normative text do you think may help? If we can, I prefer to not include such text.

@kjetilk kjetilk modified the milestones: March 19th, December 19th Nov 15, 2019
@kjetilk kjetilk added this to Under discussion in Specification Nov 15, 2019
@mikeadams1
Copy link

3. "I accidentally created an index.html but forgot to add an .acl, plus that kicks me out of the Data browser, so anyways what do I do now."

I had made a comment about creating a /public/index.html file in a pod on gitter and that it can cause problems for some users so, I wanted to give an update on how to restore your public folder if someone had done this.

Log into your pod, select "your stuff" on the drop down menu, select the "your storage" tab, locate and open the index.html file, click the gear icon and select delete, and your public page will be restored to the default view.
Hope this is helpful.

@csarven
Copy link
Member

csarven commented Nov 17, 2019

Documenting a clarification on this issue (based on meeting with @dmitrizagidulin ):

If /index.* exists, what happens when / is requested? Common practice: where /index.* placed in the system and / eventually resolves with a message body including the contents of of index.*. There is some overlap with #109 . For example:

  • must updating /index.* happen through /?

I think we should first have agreement on the interaction, ie. should request to /:

  • redirect to /index.*
  • include contents of index.* in the message body
  • include contents of index.* in the message body and Content-Location: index.* in the header
  • or something else?

/index.* will inherit the authorization policies from /'s ACL (or ancestor's).

@kjetilk
Copy link
Contributor

kjetilk commented Nov 18, 2019

Seen in isolation, I feel that this issue should have been resolved without exposing the existence of index.* to the client at all, it feels much like an filesystemy anachronism, that the use of /index.* is just a way to store a representation of / internally.

The risk that people will use both / and /index.html to refer to the same resource, and therefore complicate URI Normalization is great. That is, it creates a mess for us when we try to determine if two resources are the same, and complicate query and caching.

If we do come up with a consistent way to deal with representations more generally, (#109), then it does make sense to apply that to this problem too. I think it is crucial that the minimal container triples and containment triples are in all representations, and that we minimize the risk that people will use different URIs for what is actually the same resource.

@JordanShurmer
Copy link
Contributor

Forgive me if I'm off topic, but I think this is the same issue:

It seems to me like there are 2 types of resources in a Solid pod.

  1. Server/Pod-Provider managed things (e.g. the pod management interface)
  2. User managed things (i.e. LDP resources)

To me, it seems the problems described in this issue are caused by the lack of this distinction in NSS and in Solid in general right now. The Server managed resources should not be able to be managed by the user.

I think this paradigm of separating the User managed from the Server managed is helpful. It could even be advertised by the server. For example, servers could list their server-managed resource(s) (i.e. their "entry point" for users) when receiving a OPTION * request.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 5, 2019

I discussed this briefly with @timbl , and he said that in the databrowser's use of index.ttl, there was no expectation that it is a default container resource, to the contrary, it is a normal resource, contained as usual.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 9, 2019

I also discussed index.html briefly with @timbl . First, there is no expectation that all index.* will behave the same. index.ttl is merely a databrowser convention, it is just another resource.

index.html OTOH is a HTML representation of the container, as per old Apache convention. @timbl intended this to be governed by the Accept header.

There are some problems that I see with this approach (that we didn't have time to discuss further), since there are some things that are important in a container's representation, notably the containment triples, the minimal server-managed LDP triples, and the "POSIX metadata", i.e. metadata to indicate ctime, mtime, atime, etc, which would come in handy if Solid was used as an actual file system, e.g. through FUSE.

So, I am uncomfortable about not having them in the representation, even though people could get at them by adjusting the Accept header.

From an LDP purist's point of view (which I am not), the HTML representation is also a non-RDF source, so it breaks LDP's model in which LDPC isa LDP-RS.

One possible resolution to this problem is to require the inclusion of the RDF as RDFa. Even though people might PUT index.html without the RDFa, the RDFa could be added to the DOM by the server.

I see three ways to resolve this:

  1. Live with that the HTML representation is not a representation of the container with its triples, but return HTML only if it is requested through the Accept header.
  2. Try to work out something that behaves in the browser like things used to do in the past, but does so in a way that makes it clear that ./index.html is not a representation of ./
  3. Require (or at least say SHOULD) returning the RDF representation of a container as RDFa in the HTML.

My preference would be the latter.

@csarven
Copy link
Member

csarven commented Dec 9, 2019

index.html OTOH is a HTML representation of the container

Possibly. To actually resolve that, I was hoping to get an answer to #69 (comment) , #109 (comment) eg. Content-Location?

That aside..

Any representation of a container is required to be RDF bearing; RDF Source; LDP-RS... having triples that can be parsed by an RDF parser... Full stop. Whether it has a certain number of triples or even the "right" triples is orthogonal. That is a different requirement about representations and equivalences we can work out.

From an LDP purist's point of view (which I am not), the HTML representation is also a non-RDF source

Not necessarily. If an HTML has no RDFa in it, an RDFa parser will obviously find 0 triples. That's exactly the same as a Turtle, if there are no triples in the document. Attempting to creat or update / with text/turtle or text/html with no RDF can be expected to have the same treatment - basically 0 triples.

#45 (comment) proposes how to handle equivalent representations:

Servers supporting Turtle, JSON-LD and optionally other RDF serializations for LDP-RS SHOULD provide different RDF serializations based on client's proactive negotiation. For example, if a server allows the creation of an LDP-RS in text/html (including RDFa), it SHOULD respond to GET with Accept: text/turtle requests with 200.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 9, 2019

index.html OTOH is a HTML representation of the container

Possibly. To actually resolve that, I was hoping to get an answer to #69 (comment) , #109 (comment) eg. Content-Location?

Right, so, @timbl saw this as behaving like Apache does, i.e., there's nothing on the surface, the server just silently takes the index.html file on the server and returns it's content as a representation of ./. No Content-Location or anything.

That aside..

Any representation of a container is required to be RDF bearing; RDF Source; LDP-RS... having triples that can be parsed by an RDF parser... Full stop.

That'd be the LDP purist's view, ;-) this stems from the LDP's hierarchy of interaction models, it is not a concern of Solid as currently designed.

Whether it has a certain number of triples or even the "right" triples is orthogonal. That is a different requirement about representations and equivalences we can work out.

Right, but I take the pragmatic view: What are the useful pieces of LDP that we actually need? The mentioned RDF is something we need, and from that, it follows that they must be included with the container representation.

Now, I'm sufficiently purist myself to be wary of a design that doesn't include the container representation with all possible representations, regardless of Accept header. From that, it follows that indeed, any representation is required to be RDF bearing, so we arrive at the same conclusion, but through purism on different parts of the architecture. :-)

However, neither of these views are consistent with the view that @timbl gave me of his motivation behind the index.html, we need to work out something that is.

From an LDP purist's point of view (which I am not), the HTML representation is also a non-RDF source

Not necessarily. If an HTML has no RDFa in it, an RDFa parser will obviously find 0 triples. That's exactly the same as a Turtle, if there are no triples in the document. Attempting to creat or update / with text/turtle or text/html with no RDF can be expected to have the same treatment - basically 0 triples.

Right, but my (unstated) assumption is that the HTML would also contain content that is not represented as RDF, that's the whole point of putting HTML there, which makes it non-RDF in LDPs (flawed) model where LDP-RS and LDP-NR are mutually exclusive, regardless of whether there is RDF content hosted in it.

#45 (comment) proposes how to handle equivalent representations:

Servers supporting Turtle, JSON-LD and optionally other RDF serializations for LDP-RS SHOULD provide different RDF serializations based on client's proactive negotiation. For example, if a server allows the creation of an LDP-RS in text/html (including RDFa), it SHOULD respond to GET with Accept: text/turtle requests with 200.

No, that does not appear to address the problem, the point is that the index.html would contain content that is not represented in RDF, that's the whole point, if it was fully represented with RDF, it would not be put as HTML in the first place.

Moreover, the UNIX filesystem analogy extends to containers too, in that the container just contains resources, it is not intended to have an extensive representation on its own.

@csarven
Copy link
Member

csarven commented Dec 9, 2019

Edit/Note: I wrote the prior to seeing Kjetil's comment above / in #69 (comment)

This issue was intended to work out whether index.* can be observed or interacted with.

I still think the handling of index.* is ultimately an implementation detail. If it is a container's representation with its own URL, so be it.

#109 (comment) actually tries to resolve this:

where index.* exists by other means, we need to resolve #119

The point of that was to have a clear split between resources that come to existence via Solid's prescribed interaction vs. by other means, and so re "exists by other means" ie. some implementation happens to have index.* with its own accessible URL where it didn't go through an interaction with the container, then it is of course a separate resource (and certainly not a representation of /). Out of Solid spec by default. But the point with 119 is that if there are some unique cases that should have special handling, and so that would help to address:

if /index.* exists, what happens when / is requested, we need to resolve #69

What databrowser, Apache are doing with /index.* are implementation details. So, requesting / should have nothing to do with /index.* because they are different resources. Just because Apache can be configured to handle / and processing/serving /index.* is not something that Solid needs to adopt. Again, if for example a Solid server implementation wants to accept text/html or text/turtle.. at / and store the representations at /index.*, then that's its own decision. It naturally needs to address "if /index.* exists, what happens when / is requested".

@csarven
Copy link
Member

csarven commented Dec 9, 2019

RDF 1.1 says that RDFa in markup languages qualifies it as RDF. If there is any information in there that's not part of an RDF graph, then it doesn't suddenly become non-RDF. If two RDF graphs are isomorphic, that's all that counts towards representation equivalence.

If / is intended to be in RDF, then by minimum server can take Turtle and JSON-LD (as previously roughly agreed for minimum serialisations). If a server wants to accept any other media type, it needs to be an RDF bearing document. Similarly, if it wants to serve a container in text/html, it needs to encapsulate the information in RDFa.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 9, 2019

Firstly, this resource is now only about index.html as indeed all other index.* are just normal resources, and thus implementation details.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 9, 2019

RDF 1.1 says that RDFa in markup languages qualifies it as RDF. If there is any information in there that's not part of an RDF graph, then it doesn't suddenly become non-RDF.

Eh, well, it makes it a not RDF source, at least, from LDP:

Linked Data Platform RDF Source (LDP-RS)
An LDPR whose state is fully represented in RDF, corresponding to an RDF graph. See also the term RDF Source from [rdf11-concepts].

Linked Data Platform Non-RDF Source (LDP-NR)
An LDPR whose state is not represented in RDF. For example, these can be binary or text documents that do not have useful RDF representations.

So, HTML+RDFa is an RDF Source iff it is fully represented by the hosted RDF. If it contains information that is not represented by the hosted RDF, it is neither an LDP-RS nor an LDP-NR. Then, my proposition is that the whole point with having an index.html file there is to have human-readable information that is not fully contained in RDF. It is still an LDPR though, but it is not consistent with the LDPC interaction model, which doesn't worry me too much, as long as we can represent the stuff that should be in there as RDFa.

If two RDF graphs are isomorphic, that's all that counts towards representation equivalence.

I don't think so, because what happens then if you round-trip between the HTML+RDFa and Turtle? index.html is can't be an RDF source. It is a special case that doesn't fit with LDP's flawed model. Which, BTW, is flawed for all HTML+RDFa that contains information not fully represented with RDF.

If / is intended to be in RDF, then by minimum server can take Turtle and JSON-LD (as previously roughly agreed for minimum serialisations). If a server wants to accept any other media type, it needs to be an RDF bearing document. Similarly, if it wants to serve a container in text/html, it needs to encapsulate the information in RDFa.

Right. So we agree that it is a good thing if the index.html returns RDFa, but we still have to the define this as a special case, and also whether it is a MUST in Solid to return RDFa, but we cannot do that with reference to LDP as LDP doesn't deal with this case.

@kjetilk kjetilk changed the title Specify/advise semantics of default Container resources (index.html/.ttl) Specify/advise semantics of index.html Dec 9, 2019
@csarven
Copy link
Member

csarven commented Dec 9, 2019

The intended state of an HTML+RDFa representation is what corresponds to an RDF graph. Round-tripping is a non-issue because information that's not marked in RDFa is neither intended or expected to preserve. HTML+RDFa is an LDP-RS.

There is no need to specify index.html as a special case anymore than specifying index.ttl. I suggest to close this issue in favour of resolving #109

I think the original issue title reflects what's discussed and linked. I find the one you've changed to omits key information. Can we revert?

@TallTed
Copy link
Contributor

TallTed commented Dec 9, 2019

[@csarven] information that's not marked in RDFa is neither intended or expected to preserve

Really? I don't think many if any HTML+RDFa creators would agree with you. Certainly, I would find it a very large problem if my HTML+RDFa documents were to suddenly lose all HTML content and be reduced to RDF in any serialization.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 9, 2019

The intended state of an HTML+RDFa representation is what corresponds to an RDF graph. Round-tripping is a non-issue because information that's not marked in RDFa is neither intended or expected to preserve. HTML+RDFa is an LDP-RS.

No, it is not! Please comment on each of my points if you disagree!

There is no need to specify index.html as a special case anymore than specifying index.ttl. I suggest to close this issue in favour of resolving #109

I strongly disagree! This is very much a special issue, as it describes a feature that is in Solid and has been in Solid since the dawn of ages. You will then have to argue for the removal of a feature that people rely on and that the Director has voiced a clear opinion on.

I think the original issue title reflects what's discussed and linked. I find the one you've changed to omits key information. Can we revert?

Sure, but then, please be a little sensitive to the fact that some of this happened in a F2F discussion that you didn't attend to. I have tried to explain it in clear terms, but you seem to simply dismiss the discussion without considering the merits of the arguments.

@csarven
Copy link
Member

csarven commented Dec 9, 2019

@TallTed ,

Really? I don't think many if any HTML+RDFa creators would agree with you. Certainly, I would find it a very large problem if my HTML+RDFa documents were to suddenly lose all HTML content and be reduced to RDF in any serialization.

If the intention is to preserve content in the RDF graph universe of things, it needs to emit itself to get picked up. What wrote HTML+RDFa and what decisions did it make?

@kjetilk ,

Dmitri's initial comment is the original issue covering a bunch of related stuff, hence the title!

Right, so, @timbl saw this as behaving like Apache does, i.e., there's nothing on the surface, the server just silently takes the index.html file on the server and returns it's content as a representation of ./. No Content-Location or anything.

When you say "No Content-Location or anything", that can be taken as a response to #69 (comment) :

I think we should first have agreement on the interaction, ie. should request to /:
include contents of index.* in the message body

being the preferred interaction from that list. [Perhaps that's another way of looking at "Please comment on each of my points if you disagree!" ;)]

Note how not requiring Content-Location in this case is contrary to our preference of requiring it for the general case #109 . I'd like to resolve that.

The risk that people will use both / and /index.html to refer to the same resource

The point of resolving issues like #119 is so that we understand the scope and methods in which resources make their way into a system. How did index.html materialise? Was it created as a representation of / or as a resource different than /? Contained or not? Anything referring to it?

Require (or at least say SHOULD) returning the RDF representation of a container as RDFa in the HTML.

If an implementation accepts text/html on a resource eg. /, with Accept and Content-Type, I prefer this (even as a MUST) - needs to have containment information and following the common criteria on affecting server-managed triples.

@csarven
Copy link
Member

csarven commented Dec 10, 2019

So, HTML+RDFa is an RDF Source iff it is fully represented by the hosted RDF. If it contains information that is not represented by the hosted RDF, it is neither an LDP-RS nor an LDP-NR.

That's a misinterpretation of LDP-RS, LDP-NR and the RDF Source that it links to (RDF 1.1). RDF 1.1 is clear about RDFa:

An RDF document is a document that encodes an RDF graph or RDF dataset in a concrete RDF syntax, such as Turtle [TURTLE], RDFa [RDFA-PRIMER], JSON-LD [JSON-LD], or TriG [TRIG]. RDF documents enable the exchange of RDF graphs and RDF datasets between systems.

A concrete RDF syntax may offer many different ways to encode the same RDF graph or RDF dataset, for example through the use of namespace prefixes, relative IRIs, blank node identifiers, and different ordering of statements. While these aspects can have great effect on the convenience of working with the RDF document, they are not significant for its meaning.

We informally use the term RDF source to refer to a persistent yet mutable source or container of RDF graphs. An RDF source is a resource that may be said to have a state that can change over time. A snapshot of the state can be expressed as an RDF graph. For example, any web document that has an RDF-bearing representation may be considered an RDF source.

I think you're mistakenly overloading the term "fully". LDP-RS uses that term to differentiate from LDP-NR. Just as LDP-NR uses "do not have useful RDF". The use of the term "fully" alone is inadequate to cover all the intricacies or even to create a new constraint with a whole set of ramifications without definitions. It is not LDP's place to do that because the simplest explanation is that it respects spec orthogonality. LDP is merely classifying the kind of documents for its interaction model - the intended semantics being "RDF-bearing" or not.

Proposing that (HTML+)RDFa is somehow incompatible with, falls between, or depends on conditions (?) for RS and NR is nonsensical and renders things useless for no practical benefit.

If it helps to be sure, use the LDP-RS interaction model when communicating representations with RDFa - something I've already suggested elsewhere. If we don't need LDP's interaction models in the end, nothing fundamentally changes, so there is nothing else to do here. Great. If / is intended to have RDF-bearing representations, then servers permitting media types that could potentially contain RDFa needs to decide what is acceptable given the underlying semantics eg. involving server-managed triples.

@kjetilk
Copy link
Contributor

kjetilk commented Dec 10, 2019

This is going to go down in history as an example of why some stuff should be agreed on F2F, as we now ended up generating more heat than illumination ;-) So, we're in "violent agreement" for the most part, I guess. There's just one thing I still feel like responding to here:

Proposing that (HTML+)RDFa is somehow incompatible with, falls between, or depends on conditions (?) for RS and NR is nonsensical and renders things useless for no practical benefit.

since it does have a practical consequence, that "fully" means it is round-tripable, i.e. you can choose any RDF serialization, and the semantics will be the same.

That aside, I think we have an agreement on the following:

  1. That all operations on the container will use the container request-URI.
  2. That the actual use of index.html is an implementation detail local to the server, which will not be exposed to the client.
  3. That the use of HTML+RDFa will need to follow the same rules as any other RDF serialization, i.e. it cannot modify server-managed triples such as containment triples.
  4. That said, HTML+RDFa may contain more information than is captured in RDF, and therefore isn't round-tripable, the HTML therefore needs separate storage.
  5. The server will need to be able to modify the triples included as RDFa, since at the very least, it needs ot manage server-managed triples.
  6. It is thus not an actual "Representation URL" in the sense of Representation URLs and interactions #109, since it doesn't have a different URL than the container itself, it is merely an augmented representation of the same resource.
  7. Therefore, it also does not have its own ACL, etc.

Any misrepresentation in the above? If not, I think the main question is whether the HTML representation MUST have the container's triples as RDFa or if it should be a SHOULD.

@csarven
Copy link
Member

csarven commented Dec 11, 2019

  1. Agree.
  2. Agree, server is not prohibited (whether as a representation of a container or as an independent resource with that name).
  3. Agree.
  4. Disagree, only the RDF graph is intended to be round-tripable. That is the agreement that's being committed to with RDFa use.
  5. Agree, server will have the same behaviour for all supported RDF serializations.
  6. Agree.
  7. Agree.

Any misrepresentation in the above? If not, I think the main question is whether the HTML representation MUST have the container's triples as RDFa or if it should be a SHOULD.

If an implementation accepts text/html on a resource eg. /, with Accept and Content-Type, I prefer this (even as a MUST) - needs to have containment information and following the common criteria on affecting server-managed triples.

@TallTed
Copy link
Contributor

TallTed commented Dec 11, 2019

[@csarven] only the RDF graph is intended to be round-tripable. That is the agreement that's being committed to with RDFa use.

I could not disagree more strongly with the above, which I do not believe can be found in any RDFa specification nor guidance.

You are approaching RDFa from the wrong side. HTML+RDFa is embellished HTML (hence, HTML plus RDFa), it is not embellished RDF (which would be RDFa plus HTML).

@TallTed
Copy link
Contributor

TallTed commented Dec 11, 2019

Regarding LDP-NR vs LDP-RS classification —

As a member of the LDP WG, I understood us to be saying that LDP-NR might include RDF content, but always include non-RDF content which is meant to be preserved, so the document must be preserved as PUT or POSTed.

LDP-RS are 100% RDF, and might be stored by the back-end in their original form, or transformed into another RDF serialization, or loaded into a graph store and not preserved as a document per se — though always retrievable in either Turtle or JSON-LD serialization.

(I was a minority in thinking that Turtle — which may include out-of-band, non-RDF comments and statement order — should be considered LDP-NR, and thus should be preserved entirely.)

@kjetilk
Copy link
Contributor

kjetilk commented Jan 24, 2020

I'll try to comment just on the behalf of myself, and since I woke up early and couldn't sleep. :-)

First, @elf-pavlik ,

I think this would prevent people to publish just plain HTML representation.

Yes, but that is I think, an important feature, as a container has, per definition, an RDF representation, which will always be significant. Moreover, it is important for Databrowser, because it can easily embed itself in the HTML document at an appropriate place if the RDF is there, which I also think is desireable. However, I think we can allow more embeddable RDF formats, it doesn't need to be constrained to RDFa.

I'm all for being use case driven, but I am concerned that we are spending too much time on this pretty edge-case feature (this will be the 79th comment), so I would much prefer to find an urgent resolution to it. I think the use case is pretty clear, people have maintained HTML representations of containers since the dawn of ages, and we don't want to break that even when there is a client-side generated view.

@csarven ,

I'm afraid you didn't capture the discussion very well, because I fundamentally disagree with the design ;-)

What I do agree upon is:

Read, Append, Write operations on a container should go through the canonical / (slash semantics).

However,

The handling of container representations is optional. A server may use the /index.* convention as representations of a /.

index.* was put outside of the design space already in December, we should not go in more circles on this, this is specifically about the behaviour of an HTML representation.

They may expose the representation URL through Content-Location. A container may have RDF bearing and non-RDF bearing representations. The representations of a container may be listed in its containment triples.

I also think this is inconsistent with the design that manipulations go through /. With this design, index.html is a resource in its own right, and then we must allow it to be manipulated as any other resource, with all the problematic side effects it has with the applicability of ACLs and possibly other metadata resources, as well as problems it might cause for queries and stuff down the road.

index.html is either a resource in its own right, or it isn't. It can't be something in between. If it is not a resource in its own right, your concerns as to additional constraints to the RFC is not applicable.

ACL is set on the container and applicable to all of container's representations.

Yes, but if you insist that index.html is something that can be referenced as a member of the container and Content-Location, it places that in a special situation. It can be solved, but it is a more complicated solution, and therefore more error prone.

RDF bearing representations of a container should not update the server-managed triples.

Yeah, but it needs to be stronger, since it just cannot update the server-managed triples, so it doesn't capture the nuances.

So, this proposal is not consistent with Trellis (which is in my 3.ii space), NSS (which doesn't manipulate through /), the design constraints we had earlier (it is only about HTML), nor internally.

@csarven
Copy link
Member

csarven commented Jan 24, 2020

index.* was put outside of the design space already in December, we should not go in more circles on this, this is specifically about the behaviour of an HTML representation.

No, once again, the original issue including the title and the comment that Dmitri created is about and I quote: "index.html and/or index.ttl". If you're only interested in focusing on the index.html bit, that's fine, but the issue still needs to address index.ttl or at least see it index.* as a specialisation of #109 for starters - there is a reason why I've created that first so we can revisit this (also mentioned that before). "index.*" was used as an alias to both of those indexes (and others obviously). There are also numerous references to both (if not more) index formats elsewhere.

I also think this is inconsistent with the design that manipulations go through /. With this design, index.html is a resource in its own right, and then we must allow it to be manipulated as any other resource, with all the problematic side effects it has with the applicability of ACLs and possibly other metadata resources, as well as problems it might cause for queries and stuff down the road.

index.html is either a resource in its own right, or it isn't. It can't be something in between. If it is not a resource in its own right, your concerns as to additional constraints to the RFC is not applicable.

Yes, but if you insist that index.html is something that can be referenced as a member of the container and Content-Location, it places that in a special situation. It can be solved, but it is a more complicated solution, and therefore more error prone.

You've misunderstood. The base requirement is that interactions go through /. If however a server exposes the representation URLs (see issue 109), interactions can still go through /. That doesn't exclude index.* being their own resources... and whether a read or write can happen. Authz policy on the representations is still.. literally what's set for /. Issue 109 and issues involving ACL and representations is clear about being set on the primary resource (as opposed to the representation).

Yeah, but it needs to be stronger, since it just cannot update the server-managed triples, so it doesn't capture the nuances.

Yeah, I've proposed stuff.. please see how these relate #40 (comment) , #45 (comment) , #40 (comment) .. The repo is littered with possible ways forward.

The whole point of "RDF bearing" was if client/server deems a resource to be so, the rules on containment applies. Heck, we can even go all the way back to this: https://github.com/solid/solid-spec/issues/202#issuecomment-512902223 . However server handles / with text/html, the rest follows. It would literally allow /'s text/html representation to be treated as RDF bearing or non-RDF bearing.

the design constraints we had earlier (it is only about HTML)

Clearly you are mistaken:

Obviously not all implementations are doing the same. Some of the informal criteria that I've mentioned is what NSS does and what Trellis either does or can do. Your 3. is a non-starter based on... guess what "we've already highlighted use cases where some implementations may want to expose representation URLs." So, you can't just ignore that and still try to force your preference.


I've suggested that I can clarify and expand. I've also suggested that you should take the comment as a whole and see how it connects with the agreements made elsewhere. That wasn't an arbitrary decision and I didn't mention all that for fun.


I'm reverting the issue title until there is consensus without obvious objections or at least minimal approval from the creator of this comment.


Clearly we are talking past each other. I am as frustrated (if not more) as anyone else. We can pick this issue up in a call or a F2F :)

@csarven csarven changed the title Specify/advise semantics of index.html Specify/advise semantics of default Container resources (index.html/.ttl) Jan 24, 2020
@solid solid locked as too heated and limited conversation to collaborators Jan 24, 2020
@csarven csarven modified the milestones: February 19th, ~First Public Working Draft Jan 24, 2020
@kjetilk
Copy link
Contributor

kjetilk commented Jan 24, 2020

@timbl and I resolved to go for option 1. i.e. something close to the current NSS behaviour. I'll do a writeup.

@csarven
Copy link
Member

csarven commented Jan 24, 2020

Please specify diff with #69 (comment) .

@kjetilk
Copy link
Contributor

kjetilk commented Jan 25, 2020

Let me first apologize for the fast turn of the events here. It was truly not intentional: This issue is interesting as it have brought out pretty much all the tensions between the different components and philosophies of Solid, the LDP, the UNIX file system, the roles of representations on the Web, etc. However, it is also an issue quite far on the edges, and we cannot keep it open just for the undeniable intellectual exercise it provides. With more than 80 comments, I think it has been on an extensive hearing, and suddenly today, I had the rare opportunity to take it up with Tim in real life, and so I hope people aren't too annoyed if we can take that conversation as the guide. Moreover, you will quickly notice that my own favorite, the "3.ii." direction was quickly dismissed. So, here we go:

index.html is a resource in its own right, and will be manipulated as any other resource. That is, it can be read, and all update operations to the HTML representation will be done on index.html itself. index.html will be contained in the container as any other resource. It may have its own ACL that will apply when the index.html resource itself is dereferenced.

Only when a read operation is executed on / with an Accept header that indicates that a HTML representation is wanted will the contents of index.html be returned (there's some wiggle room around q factors here). It MUST then include the RDF representation of the container embedded in the HTML (the details of this was not discussed, but the idea is that the databrowser can then be embedded in the HTML). Content-Location MUST be set.

The ACLs will be applied as follows: First, the container's ACL will be applied. If the client has read access to the container, an internal redirect is made. If the index.html has its own ACL, then that too will need to indicate that read is authorized for the content to be returned.

Time ran out as we started to discuss what should be done if the client is authorized to read the container, but not the index.html. I think the natural thing to do would be to return to the Accept header to check if there are other representations (i.e. RDF) that are acceptable, if not, I would suggest a 406, but a case could also be made for always having an RDF fallback in that case, since an RDF representation of a container always exist.

The diff to @csarven's comment is that index.* is not considered. index.ttl has had a different mission in Solid and has had so for some time (it has certainly nothing to do with my preference, I have not even been aware of this before Tim told me about it, we've just basically had a collective misunderstanding around it). Most operations on / will not be affected by the presence of index.html, only Read will. The Content-Location and containment is required, the ACL algo is different.

So, this wasn't my favorite resolution, but with the clarification on how the ACLs are applied, it resolves the initial issues that prompted many of the reports on this. It also eases some tensions on the method definitions, as index.html is relevant only for read operations. I think it should be helpful to settle it.

@csarven
Copy link
Member

csarven commented Jan 31, 2020

Housekeeping: The reference to "index.ttl" in this issue should be left as a representation. The data augmentation case raised by data browser (which happens to use index.ttl) will be addressed in #144 .


it is also an issue quite far on the edges

The use cases that are brought up are are among the most common practices on the Web. Solid must be able to address them.


With the proposal that's brought up:

It MUST then include the RDF representation of the container embedded in the HTML (the details of this was not discussed, but the idea is that the databrowser can then be embedded in the HTML). Content-Location MUST be set.

I'll respond to the questions I've raised in #69 (comment) :

Do we expect multiple representations of a resource to be equivalent (in some reasonably defined or predictable way eg. isomorphic RDF graphs imply information equivalence)? If not, can representations be kept tracked and updated independently - is this too complex?

The proposal suggests that the representations are at the very least expected to be equivalent based on RDF graph. The proposal also suggests that they will be tracked and updated independently.

The details indeed need to be worked out:

The discussion in #108 shows that server interference ie. injection of containment information into HTML is not particularly practical or mature. There is also no implementation experience. Moreover, when updating non-RDFa RDF bearing resources, server interference is not expected, that is the server will either allow the request or reject. Explained further below based on existing consensus.

The intended state of / in text/html should be controllable by a client without server interference provided that the representation conforms to server-imposed constraints. Rough consensus in #40 (comment) :

The criteria from 5.2.4.1 can [in addition to PUT] be applied to POST (#108) , PATCH (#85), index.html (#69).

helps to clarify client and server expectations. That is, client needs to ensure the integrity of the containment information in an HTML with RDF bearing representation when making changes and the server verifies the request. This is the same criteria for all RDF bearing representations.

As no details are provided on equivalence or particular information persistence beyond encoded RDF graph, it can be deemed to be compatible with #69 (comment) :

If two RDF graphs are isomorphic, that's all that counts towards representation equivalence.

In #69 (comment) , I proposed a relaxed version of representation equivalence which can be determined by the agreement between a server and a client:

A container may have RDF bearing and non-RDF bearing representations. The representations of a container may be listed in its containment triples.

It meant that a representation in HTML may or may not be RDF bearing. If deemed to be RDF bearing, there are specific expectations. This is a far simpler design for both servers and clients. For instance, updating an RDF bearing representation does not entail that the non-RDF bearing representations needs to be updated for some "equivalence", and vice-versa.

To summarise suggestions (preferably selecting one from this order):

  • A representation being RDF bearing or not is an agreement between a server (verification) and a client (intention).
  • Client ensures that the representation is RDF bearing (in addition to adhering to server constraints) and server verifies.

Aside: Either of those options can allow any application (eg. data browser, dokieli) to be embedded in HTML. One of the difference between those applications may be that data browser's primary reason to have a container's representation in HTML (and the whole design on having index.html) is so that its JavaScript can be embedded and it doesn't care about the underlying information (at this time). In dokieli, resource's content matters and its JavaScript is only intended to act as a way to introduce interactions on the resource - if JavaScript is unavailable, the resource can still be expected to be human and machine-readable (useful).

Content-Location MUST be set.

No objection, however I think that should be inherited from the general case in #109 .

The ACLs will be applied as follows: First, the container's ACL will be applied. If the client has read access to the container, an internal redirect is made. If the index.html has its own ACL, then that too will need to indicate that read is authorized for the content to be returned.

Do I understand you correctly in that the order of this check is different to resources in general ie. if a resource in a container has its own ACL, it will be applied, otherwise, the inheritance algorithm is applied. What I'm not clear about the proposal is if index.html's ACL exists it will be applied instead of container's ACL. Can you clarify that bit?

what should be done if the client is authorized to read the container, but not the index.html. I think the natural thing to do would be to return to the Accept header to check if there are other representations (i.e. RDF) that are acceptable, if not, I would suggest a 406, but a case could also be made for always having an RDF fallback in that case, since an RDF representation of a container always exist.

I don't particularly see why index.html's ACL needs to exist (as opposed to just inheriting container's). It opens up more complications than it actually helps. While a container's representations are resources in their own right, it doesn't mean that they must have their own ACL. Simply use container's (fixed reference). What's the actual use case to be different?

@kjetilk
Copy link
Contributor

kjetilk commented Jan 31, 2020

I suggest we remove this from the FPWD as we will not be able to resolve this in foreseeable future.

@kjetilk kjetilk removed this from the ~First Public Working Draft milestone Jan 31, 2020
@solid solid unlocked this conversation Feb 1, 2020
@kjetilk
Copy link
Contributor

kjetilk commented Feb 3, 2020

I just want to comment very briefly on this:

injection of containment information into HTML is not particularly practical or mature. There is also no implementation experience.

That's wrong. Trellis already supports it as indicated above. We also support it trivially with Perl:

my $gen = RDF::RDFa::Generator->new;
$gen->inject_document($dom, $model);

where $model contains the RDF you wish to inject, and $dom contains the DOM object of the XHTML document you inject into. This is code that has been in production for a decade. I quickhacked a little script that takes an XHTML document on the STDIN and outputs the injected document to STDOUT last night: inject_rdfa.pl.txt

There is plenty of implementation experience, and it is very mature. I would be very surprised if it is not equally simple in JS, it is just about generating the RDF and adding it to the DOM tree, that's all there is to it.

@csarven
Copy link
Member

csarven commented Feb 3, 2020

Edit: Didn't notice your comment before sending mine, so here is a quick reply:

That's wrong. Trellis already supports it as indicated above. We also support it trivially with Perl:

It is not an arbitrary injection. Obviously triples can always be thrown in somehow. Even possible with sed but that's probably not a good idea, right? The whole document needs to be properly serialised and be coherent, and anything added or removed from the containment triples should not interfere with everything else, including ideally structure and rendering. Having code to inject is one thing but I'd like to see actual HTML documents in practice that's subject to updates.

In any case, I've noted below that this part of the update is an implementation detail.


Revisiting this.. there is more commonality in the approaches than they seem.

Appending a resource to a container:

PUT /index.html
POST /
Slug: index.html

It should include a triple like:

[about=""]
rel="ldp:contains" href="index.html"

Following are equivalent:

GET /index.html
GET /

Content-Location: index.html

[about=""]
rel="ldp:contains" href="index.html"

[
RDF in HTML best practice:
Don't set base URI in the HTML representation (index.html), but if set, it should end with /. This is so that the RDF graph in index.html is same as /.
]

Is this correct:

[/ or /index.html] MUST then include the RDF representation of the container embedded in the HTML.

Effectively establishes container's HTML to be RDF bearing.

Influences required RDF serializations: #45 ie. adding RDFa (or script with RDF) to Turtle and JSON-LD.

When a new resource is appended or removed from a container, all of its representations (including HTML) needs to incorporate the changes to the containment.

Appending another resource:

POST /

Location: foo

GET /index.html
GET /

Content-Location: index.html

[about=""]
rel="ldp:contains" href="index.html"
rel="ldp:contains" href="foo"

How exactly a server includes the containment triples is an implementation detail. Same level of requirement for Turtle and JSON-LD.

This may be a less of an issue when /index.html is updated directly (eg. PUT /index.html) because the container's integrity falls on the server-imposed constraint ie. listing containment triples.

Server should reject if update to /index.html changes containment triples (aligned with global rule on updating containers). That entails that an RDFa parser is required. Having an RDFa parser makes it possible to serialize to other RDF.


If we treat all representations of a container as equivalent (based on underlying RDF graph), then adding new resources to a container ultimately requires an update to container's description. So the representations need to include the containment triples.

The following criteria can work for updating resources but not sufficient for appending or deleting a resource from a container:

Client ensures that the representation is RDF bearing (in addition to adhering to server constraints) and server verifies.

So, we do need the following any way:

[/ or /index.html] MUST then include the RDF representation of the container embedded in the HTML.

Having said that, there is the question of whether the HTML representation of / may be exempt from this and so a SHOULD instead of MUST. Relaxing the requirement definitely simplifies server and client implementations and allow more flexibility on / (eg. as arbitrary homepage or directory listing).. Keeping it strict means more consistency but also there is a chance that the resulting HTML is not necessarily what the client would like to see/interpret.

@elf-pavlik
Copy link
Member

elf-pavlik commented Feb 4, 2020

TL;DR: Please explain benefit motivating the requirement of embedding RDF in HTML representation of a container.

I honestly don't understand benefit of requiring HTML representation of a container to include RDF (embedded via <script> tag or RDFa). In use case of 'home page', I think user often would want to use some site generator and let it PUT / POST that HTML representing container. If any client needs RDF it can always request text/turlte or application/ld+json. No one stops users to embed RDF in HTML if they have reason for that, still requiring it would force them to regenerate the home page every time they add or remove something from that container. I think that requirement adds burden without any clear benefit, again if someone wants to embed RDF in HTML they can always choose to do it.

@csarven
Copy link
Member

csarven commented Feb 5, 2020

There is much repetition at this point. I'm responding below but it'd be great if we can continue the recurring themes and ideas in public chat, calls, F2F etc.

TL;DR: Please explain benefit motivating the requirement of embedding RDF in HTML representation of a container.

The exact same question can be asked for any RDF format. There is no difference between them if the exchange language is RDF and the sources are RDF bearing. Any one can do the job. We can also discuss why HTML+RDFa alone may suffice, and can in fact handle more use cases than the alternatives. It can be very accessible and neither would it require JavaScript to read (or use) a homepage or a directory index. So, why even bother introducing the others? As appealing or appalling as that may be to some, framing as such may not be fruitful.

Once again, if the representations are expected to encode equivalent RDF graphs, then being consistent is a reasonable design decision. The contrary is easy to raise: why should format x and y be expected equivalent but not z, especially when the resource is expected to be RDF bearing with containment information to begin with. This case is not to be conflated with a resource that's deemed to be non-RDF bearing (like an image) and then also providing an RDF-bearing representation. That's not what we want or should be practiced. The homepage case not only predates but is widely deployed than root storage or container. Hence, if the distinction between a homepage and root container is so important, then it only makes sense to leave the homepage or a directory index alone at / and simply use another URI for the root container. Certainly that's not a nice option to some people and it only complicates the situation. So, for the time being, we have to look into how to accommodate both cases - which are actually quite similar - using the same URI.

In use case of 'home page', I think user often would want to use some site generator and let it PUT / POST that HTML representing container.

I am a user. That's not what I want. I do not want to switch between applications just to update different aspects of a resource, or worse, have it switch perspectives based on a particular representation - the classic "pure RDF" and "not so pure RDF". Nothing like that is in practice or can be considered a good design. I want to be able to use an application like dokieli to update / where it works as storage root (including containment information) as well as a human and machine-readable homepage including my WebID Profile. I consider my WebID Profile in a HTML+RDFa to be canonical because that gives the most utility. I want to be able to authenticate using that WebID. If a server can't provide an RDF bearing representation of my WebID Profile or a client unable to parse an RDF bearing representation (as in RDFa), I can't authenticate. Currently, only Turtle (and JSON-LD) are acknowledged by some servers and authentication clients. We need to clarify this gap so that people are not prevented from publishing as they wish while adhering to minimal global requirements.

@RubenVerborgh
Copy link
Member

Proposal for a new solution in #198, which considers “index” representations part of a compound state of a container (and considers the index.html and index.ttl behavior implementation-specific details).

@kjetilk
Copy link
Contributor

kjetilk commented Feb 7, 2022

clears throat Soooo, since we're not aligning here...

May I just throw out a totally breaking greenfield idea...?

/me types quickly in case anybody was about scream "NOOOOOO!"

Lets make containers server managed, but have it link to any metadata and any other data it might point to for a client to make a reasonable representation of it.

Having containers that have protected data, but also data that can be changed has caused all kinds of problems. Compound state LGTM, but probably not something we can find consensus around.

Then, a more elaborate aux resource system is under consideration, and so, it seems straightforward that a client will pull in data from various sources after GETting a container anyway. Pages built for humans by typical browser UAs tend to consist of a large number of resources anyway.

Interacting directly with it, which was @timbl 's preference can be done trivially if index.html is separate from the container. The difference between @timbl 's proposal and mine is only that previously, the server served index.html, whereas my idea is that index.html is YA auxiliary resource type (and thus isn't necessarily named index.html), is linked from the container and clients will need GET it separately. I acknowledge that this breaks existing clients.

But so much would be simpler if we just made the container server managed and told clients the resources they might want to get for a given application.

@RubenVerborgh
Copy link
Member

Interacting directly with it, which was @timbl 's preference can be done trivially if index.html is separate from the container. The difference between @timbl 's proposal and mine is only that previously, the server served index.html, whereas my idea is that index.html is YA auxiliary resource type (and thus isn't necessarily named index.html), is linked from the container and clients will need GET it separately. I acknowledge that this breaks existing clients.

Could we have an exception for GET, where the representation served is the auxiliary resource? But all other interactions need to go through that separate resource?

@kjetilk
Copy link
Contributor

kjetilk commented Feb 7, 2022

Could we have an exception for GET, where the representation served is the auxiliary resource? But all other interactions need to go through that separate resource?

We could. We could also say that the container RDF needs to be brought along by injecting RDFa into the resulting representation. But is that exception really worth the bother given that UAs tend to slurp in a large number of resources anyway?

@justinwb
Copy link
Member

justinwb commented Feb 7, 2022

Lets make containers server managed, but have it link to any metadata and any other data it might point to for a client to make a reasonable representation of it.

Nooooooooooooo (sorry I couldn't help it @kjetilk)

I do have to say that I'm strongly -1 on this approach at the moment, because it would have an immediate breaking impact on a lot of code (including mine), and specifications like shape trees and application interoperability. All of that said, if you provided some concrete examples of what you're proposing, specifically in cases where there is a dependency on and usage of data in the graph of the container resource, I'd be happy to look at ways to reconcile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Consensus Phase
Specification
  
Consensus Phase
Development

No branches or pull requests