Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning representations of document and container resources with REST via single and compound state #198

Closed
RubenVerborgh opened this issue Sep 16, 2020 · 37 comments

Comments

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Sep 16, 2020

No description provided.

@acoburn
Copy link
Member

acoburn commented Sep 16, 2020

I am very much in favor of this proposal and I agree with all of the normative language provided here.

@csarven
Copy link
Member

csarven commented Sep 16, 2020

I agree that this proposal ties up roughly agreed conclusions from issues and current spec/PRs.

I find the normative text and examples above on HTML+RDFa clear. In the PR to follow, I'd however suggest that we place emphasis on RDFa without requiring a particular host language - as per RDF 1.1 Concepts. So then for example, a container resource could be served with an SVG+RDFa representation.

@elf-pavlik
Copy link
Member

elf-pavlik commented Sep 21, 2020

Would this approach still work if instead of HTML+RDFa someone uses plain HTML with script tags embedding Turtle or JSON-LD ? From content negotiation those two cases seem identical since both use text/html as content type.

EDIT: Thinking about it more, would this proposal prevent accepting HTML+RDFa for text/html and later responding to text/html with HTML embedding the same RDF graph in a script tag as Turtle or JSON-LD?

@elf-pavlik
Copy link
Member

elf-pavlik commented Sep 21, 2020

So then for example, a container resource could be served with an SVG+RDFa representation.

How would the client know that image/svg+xml is available? Would it rely on some prior knowledge or when available server would advertise it in HTTP header with

Link: <https://solid.example>; rel="alternate"; type="image/svg+xml"

@csarven
Copy link
Member

csarven commented Oct 11, 2020

@RubenVerborgh

but it SHOULD represent the same RDF graph.

I suggest MUST.

SHOULD include any part of the client-managed state that can be represented as RDF

would not ensure documents at / eg. profiles, slideshows, be usable by all clients eg. for authentication. If server accepts resource state for the client-managed part of a container resource, then using MUST (instead of SHOULD) would persist RDF encoded information (in RDF documents) across RDF representations. Adding a human-readable label of a container for example. This is also aligned with #196

@csarven
Copy link
Member

csarven commented Oct 18, 2020

I'd like to clarify/propose a bit to above because some key aspects of the container representations are not quite jumping out but the good news is that the fundamental requirements are already in place.

Re spec:

When a server creates a resource on HTTP PUT, POST or PATCH requests such that the request’s representation data encodes an RDF document (as determined by the Content-Type header), the server MUST accept GET requests on this resource when the value of the Accept header requests a representation in text/turtle or application/ld+json

Servers MUST NOT allow HTTP POST, PUT and PATCH to update a container’s containment triples; if the server receives such a request, it MUST respond with a 409 status code.

The representation and behaviour of containers in Solid corresponds to LDP Basic Container and MUST be supported

The key expectation from server's container representation is that it MUST include the server-managed part in RDF ie. Turtle or JSON-LD from the above requirement, and client can't modify the server-managed part directly.

If server allows client-managed part of the container to be updated (potentially via any content type), then any RDF that's encoded in that representation MUST persist in resource's Turtle and JSON-LD representations. This is important!

That makes it lossless.

The same rules from the spec would allow containers to be created or updated using HTML+RDFa (ie. essentially the client-managed part eg. WebID Profile, a human-readable label, persistence policy...), and so when the container is requested in Turtle or JSON-LD, it will include both the server-managed and client-managed part.

A container representation in other than Turtle and JSON-LD in server's response is not required to include the server-managed part. This is important!

Clients are ensured to get the server-managed part of a container by asking for a Turtle or JSON-LD representation whilst no interference from/to other representations if server allows changes to client-managed parts.

@csarven
Copy link
Member

csarven commented Oct 18, 2020

would this proposal prevent accepting HTML+RDFa for text/html and later responding to text/html with HTML embedding the same RDF graph in a script tag as Turtle or JSON-LD?

No. But what an odd and overly complex thing to do. So such server would be well capable of parsing RDFa and re-serialising the data in an HTML script block. I can only think of extremely specific, and generally unlikely to be Solid clients that would ask for HTML in order to obtain the script block, as opposed to directly asking for Turtle or JSON-LD. Crawlers for search engines seeking schema.org comes to mind. And, what a way to screw with the client making that update request. In any case, server can knock itself out and do that if it is allergic to RDFa.

How would the client know that image/svg+xml is available? Would it rely on some prior knowledge or when available server would advertise it in HTTP header with [..]

Accept-Put would be for updates ( #201 ). Client potentially accepting image/svg+xml may actually get that since server may deem it to be of higher quality than RDF-only representations.

@elf-pavlik
Copy link
Member

In any case, server can knock itself out and do that if it is allergic to RDFa.

I asked it mostly to clarify what clients can rely on if they write with Content-Type: text/html, in any of possible cases

  • plain HTML with no RDF embedded
  • HTML with RDF embedded in <script> tag
  • HTML with RDF embedded using RDFa

Based on content type server doesn't seem to have a way to distinguish between those cases, it could only do that after parsing and analyzing the payload.

The same rules from the spec would allow containers to be created or updated using HTML+RDFa (ie. essentially the client-managed part eg. WebID Profile, a human-readable label, persistence policy...), and so when the container is requested in Turtle or JSON-LD, it will include both the server-managed and client-managed part.

Based on that, when receiving write with Content-Type: text/html server would need to parse the HTML and check if it contains any client-managed RDF in <script> tags or RDFa. In that case for PUT those statements found in <script> tags or RDF would fully replace all previously existing client-managed triples.

What happens in cases of PUT where HTML contains no triples <script> tags or RDFa, would it update graph to empty one or preserve existing triples (created earlier with Turtle or JSON-LD)

@csarven
Copy link
Member

csarven commented Oct 18, 2020

There needs to be a guarantee that RDF encoded client-managed statements are available in Turtle and JSON-LD representations. Otherwise, it misses the point on accepting resource states in Turtle or JSON-LD for client-managed parts and then making those RDF statements available in Turtle or JSON-LD. The same MUST precisely makes it possible for RDFa and others to work.

The requirement is as per RDF 1.1 Concepts, RDF documents using concrete RDF syntaxes. A fixed list. Not open ended.

One fully spec-compatible RDFa parser is great. And there are several others while may not be as great, still deemed to be pretty useful. If we are willing to accept far less for many other requirements and high expectations from the Solid ecosystem, then counting the number of perfect parsers wouldn't do justice.

We do not have to require parsing of all possible ways to encode RDF in all markup languages - whether that's RDFa in HTML or XML, or JSON-LD or Turtle in HTML script blocks, or something else. Start with: if server accepts text/html on container, it should be equipped to parse RDFa. If it accepts image/svg+xml on container...

@elf-pavlik
Copy link
Member

There needs to be a guarantee that RDF encoded client-managed statements are available in Turtle and JSON-LD representations.

Best effort is enough: it's perfectly okay for one representation to have a lower quality than another.
Hence my SHOULD.

Not requiring server-managed statements to appear in any way in text/html representation already seems to acknowledge it as lower quality with respect to RDF content.
Please correct me if get wrong reading, but we also don't require client-managed statements added using application/ld+json or text/turtle to appear in any way in text/html?

@matthieubosquet
Copy link
Member

From an arguably fairly naive perspective on this long debate, @TallTed 's comment caught my eye:

I was a minority in thinking that Turtle — which may include out-of-band, non-RDF comments and statement order — should be considered LDP-NR, and thus should be preserved entirely.

As a user, I would expect:

  • My turtle file in my Solid data pod to be completely preserved (including comments and statement order) and I would find it really frustrating to lose some of its state unknowingly
  • To be able to edit my turtle files by default in a way that doesn't truncate its state (a simple text editing GUI comes to mind)
  • To be able to retrieve and query the graph data my turtle files contain in a consistent manner with all the rest of the graph data in my pod and hopefully query all of the graph data in my pod at once, for example through a SPARQL endpoint
  • I wouldn't mind being asked if my Turtle file should be imported as RDF and lose all its non-RDF data such as comments and statement order when I add it to my pod
  • I would find it handy to have the option to convert any RDF Source (in the RDF 1.1 spec sense, not the RDF only LDP spec sense) containing non-RDF data to an RDF only source

I wholeheartedly agree with a resource having at most one state. Instead of using a "higher-quality" concept for resources containing more state than just the RDF data, I would consider an "information authority" approach. If my resource is a turtle file and hence has more state than just RDF data, this file is the authoritative source of state for my resource and thus should be preserved "as is" unless I explicitely consent to losing some state. No edit to the state of that resource should "silently" result in a loss of state (for example because the resource editor doesn't deal with comments or statement order).

Another maybe controversial point that remains a bit unclear to me is the container vs index resource. From a graph perspective <https://mypod.example.org/index> and <https://mypod.example.org/> are different IRIs and therefore different resources. I can imagine adding client managed state to a container (for example to customize a container's display or create symlinks) but I would consider it counter-intuitive, potentially dangerous and inacurate to conflate resources.

As a user, I would expect to be able to configure a read-only mode for my pod where:

  • Navigating to https://mypod.example.org/ with a browser might present me with a representation of https://mypod.example.org/index probably by order of precedence a file if it exists or RDF data serialised through an auxiliary html template resource if it exists or RDF data serialised through a default html template resource (say a table with a list of ?s ?p ?o) if it exists or RDF serialised as whatever preferred serialisation format I configured (Turtle, JSON-LD...)
  • The default resource name for read-only pod navigation to a container is configurable globally and overridable at the container level (I can decide that https://mypod.example.org/friends/ serves https://mypod.example.org/friends/public-contacts rather than https://mypod.example.org/friends/index)
  • Navigating to https://mypod.example.org/ might give a 401

Maybe if there is a default databrowser embedded in a Solid Server, one could access it through some recommended standard-ish parameter https://mypod.example.org/?browsedata.

@csarven
Copy link
Member

csarven commented Oct 19, 2020

Context: optionally (re "MAY") setting client-managed part of container state.

Then the system can only guarantee server-managed information.

Having a MUST to include RDF from any accepted representation available back out in Turtle and JSON-LD is still within the optional context. The system needs to provide basic guarantee that if it is willing to accept client-managed part of container where the resource is deemed to be representable as a RDF document, it needs to put it to use.

If a label is added in say Turtle, and if server accepts the request, SHOULD downplays what the ecosystem can or ought to do. That label needs to persist and appear again when Turtle or JSON-LD representation of the container is called. That's why MUST is necessary.

We can't claim to have satisfied our use cases with a maybe (re MAY and then SHOULD). Can't expect interesting clients to emerge out of thin air if servers only guarantee a maybe.

There are significant use cases and existing publishing practices in the wild that needs to be met with a MUST (even within the optional context). This is categorically different than losing a couple of triples.

But wait, there is more! We haven't even discussed RDFa yet.

We can't dismiss imperfect, non-fully compliant, or even implementations with wilful violations. That is the reality of the system.

Remember, the server can always take the optional path. The story ends, it wake up in its bed and believes whatever it wants to believe.

@elf-pavlik
Copy link
Member

elf-pavlik commented Oct 19, 2020

I would find it reasonable to distinguish client-managed triples from broader client-managed parts and only mandate MUST on server to support updating them with HTTP PATCH and application/sparql-update. At the same time having MUST on including those client-managed triples in text/turtle and application/ld+json representations. This could provide a straight forward and reliable approach across the ecosystem.

When it comes to preserving a verbatim copy of text/turtle with comments and formatting, or application/ld+json with specific @context and frame, it does seems quite problematic if client can write both content types plus application/sparql-update. It might be helpful to clarify use cases for that and look at other approaches which could satisfy requirements coming from those use cases. For example having a way to treat them as a NonRDFSource.

@matthieubosquet
Copy link
Member

matthieubosquet commented Oct 19, 2020

That's in general incompatible with SPARQL UPDATE patches though.

Yes!

SPARQL 1.1 Update, an update language for RDF graphs.
SPARQL 1.1 Update - Abstract

Isn't it really important to draw a strong line between RDF graphs/RDF-only-resources and resources that contain RDF graphs? Maybe, is it the case that a resource that has non-graph state MUST NOT be updated via a sparql-update and an RDF-only resource MAY be updated via sparql-update?

Turtle seems like a nice, cheap to parse/serialise, widely available default interchange format. I'm not sure what the intrinsic value of requiring (MUST) more than one default graph serialisation format is. Could someone expand on that or point me to the rationale?

@csarven
Copy link
Member

csarven commented Oct 25, 2020

@elf-pavlik Restricting the protocol as such would unfortunately be counter to existing publishing practices, updating resources in alternative ways, having any URI for a WebID as well as it being described in a document using RDFa, among other things.

@matthieubosquet It is not only useful but important to acknowledge that semantic structure can be embedded in narrative or prose-based documents, as well as graphics - typically via host languages that can embed RDFa, and sometimes in Turtle or JSON-LD blocks. The underlying use cases are by far the most widely used on the Web and the challenge is to incorporate them into the Solid ecosystem, and especially pose no barriers for transition. This is orthogonal to the possibility of expressing narrative or prose in Turtle or JSON-LD, but it is not practised - results in non-human-friendly documents.

It is important for a user to be able to author, publish, and update such documents in for example HTML+RDFa ( https://csarven.ca/linked-research-decentralised-web ). It is useful for clients to be able to extract the underlying structured information from such resources. Applications can only construct a RDF graph based on what the server provides. If a server can't alternatively provide a Turtle or JSON-LD serialization of certain resources, that limits the kind of clients that can obtain and use information. Same holds true for WebIDs published in documents ending / for reading, and even be able to use use them towards authentication. We need to enable more clients to do stuff by having the server do the basics.

@matthieubosquet
Copy link
Member

@csarven I wholeheartedly agree that HTML+RDFa is very important and a great source of RDF graphs. I'm just thinking that leaving the possibility to have a "Solid Core" compatibility is an asset. I imagine an extremely bare Solid server that serves nothing but graph serialised as turtle and where data can only be updated as such. That doesn't detract for the fact that if a Solid server serves HTML, it probably SHOULD be able to extract graph from it, especially extract graph embedded with widely recognised syntax such as RDFa, and SHOULD make that RDF available to query (SPARQL) and retrieve as the default interchange format (turtle).

Your WebID point is good! Actually, if your WebID is held in an HTML+RDFa document, your Solid Pod definitely MUST be able to serve it as turtle I guess. For example so that client applications can retrieve the solid:oidcIssuer from it and maybe other required properties that I'm missing.

My second point was simply that sparql-update is for graph, so if you're gonna update HTML+RDFa, do it with an HTML editor or a text editor (but query the graph in there as you want!).

@csarven
Copy link
Member

csarven commented Oct 26, 2020

[Keeping this brief because as I see it, there is no new or significant information to respond to. Everything here, related issues, chats, meetings, drinks... over the past ~10 years in/around the tech/solutions.. has been considered and reconsidered..]

I'm just thinking that common use cases on the Web, existing URI allocation and publishing practices, respecting specification orthogonality, .. all of which arguably far outweighs chasing a hypothetical pure design which is in fact a subpar solution.

We need to agree on the actual observations in order to come up with a prescription. If the use cases and existing practice are acknowledged, they are "core" for all intents and purposes. That's what the design needs to meet. As said, we can't claim to meet key use cases and at the same time sideline it when it comes to actual interop. This is in fact one of the things that I've been trying to get the group/project to acknowledge.

Priority of Constituencies: https://www.w3.org/TR/html-design-principles/#priority-of-constituencies

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors; which in turn should be given more weight than costs to implementors; which should be given more weight than costs to authors of the spec itself, which should be given more weight than those proposing changes for theoretical reasons alone. Of course, it is preferred to make things better for multiple constituencies at once.

"Experience" that a user can get because of HTML+RDFa use is potentially far greater than the alternative RDF syntaxes. So, if you want to talk about core, minimalism, design purity,.. we can also start from there.

@matthieubosquet
Copy link
Member

I don't disagree with you Sarven. And I see there is no conditional statements in RFC2119, probably for a good reason. I think you are right: "A Solid Server MUST be able to serve all RDF graphs expressed as RDFa in the documents it hosts as turtle."

If we do actually formulate this requirement as such, it doesn't forbid someone from implementing a Solid Server that will not accept or serve HTML documents. And I think it does respect specification orthogonality, since your server will still be perfectly fluent in the Solid Ecosystem, perfectly able to talk with other servers, be perfectly interoperable. It's only from an editorial point of view that it would be a lesser implementation and it's not a bad thing.

@elf-pavlik
Copy link
Member

elf-pavlik commented Oct 26, 2020

"Experience" that a user can get because of HTML+RDFa use is potentially far greater than the alternative RDF syntaxes.

Do you state here your personal opinion or base this statement on some available data?

@csarven
Copy link
Member

csarven commented Oct 26, 2020

Matthieu, that's not what I'm saying. Either all clients need to handle RDFa or servers need to make its resource available in Turtle/JSON-LD. We are requiring the server to come up with the lingua franca re Turtle/JSON-LD. We don't care much about server's ability for RDFa serialization - while possible, not particularly useful. The point is that if a server acknowledges/accepts a representation that can contain RDFa, eg. in HTML or XML Family languages, it should be able to parse it in order to serialize it in Turtle/JSON-LD - this is so that the underlying RDF graph can be exposed to more clients.


Hey Pavlik, I'm completely making things up, didn't you know? All relevant use cases, publishing practices, (self-)dogfooding, applications in the wild, spec orthogonality, ... and how all that comes together. No data. Just feelings.

When you ask me for "available data", the burden of proof is certainly not on you - what a great nitpick if I may say considering all that's been put forward - but if you so feel like it, and to really shut me down, all you have to do is a simple demonstration of how Turtle/JSON-LD serialization of https://csarven.ca/linked-research-decentralised-web can be both human- and machine-readable/usable with fewer or less complex specs and tooling than using HTML+RDFa. Perhaps a better question is.. why haven't you all this time? Can you share from your implementation/publishing experience?

By the way, when you propose solutions, please take a moment to state whether you're actually acknowledging the use cases as well as existing URI allocation and publishing practices that's put forward, and not to forget the often repeated what "RDF documents", graphs.. entails. That way we can be on a common ground when discussing this stuff.

@TallTed
Copy link
Contributor

TallTed commented Oct 28, 2020

Requiring Solid-compatible servers to expose the RDF graph content of HTML+RDFa or Turtle (or other RDF-bearing formats which include out-of-RDF-band data) sourcefiles as Turtle is entirely reasonable, because extracting the RDF is generally a fairly trivial operation.

Requiring those same Solid-compatible servers to update the RDF content of HTML+RDFa or Turtle (or other formats which include out-of-RDF-band data) sourcefiles via SPARQL-Update is not so reasonable, because this requires those servers to also understand and preserve (and possible update) the out-of-RDF-band data in multiple (possibly many!) formats.

@csarven
Copy link
Member

csarven commented Oct 28, 2020

Ted, I'm glad to see you back! :)

I think we are saying/thinking the same thing.. just to double check:

For PATCH + application/sparql-update, only the RDF graph of resource's state is intended to be updated. So, that excludes the "out-of-RDF-band data" - correct me if I'm misinterpreting but I think you are referring to any information besides the RDF graph eg. HTML (minus the RDF), and perhaps most commonly, comments, whitespace, prefixes etc.. in any RDF document will not be preserved. If so, I agree.

Aside: After that PATCH, a HTML+RDFa representation of the resource could include the changes to the RDF graph but there is no expectation that the non-RDFa bits will be preserved. That usually entails that server will provide its own HTML+RDFa serialization. Put differently, client's HTML+RDFa is only used towards transmitting the RDF graph and nothing else. So, that's not particularly useful in the end.

As it stands, I believe PUT text/html should be used to update the resource state for HTML(+RDFa). At least until XML Patch, line-based, or alternative approaches are taken up / implemented to modify as opposed to replace.

Re your first point, say a WebID URI is allocated at /#i or a homepage/index at /, and say update using text/html is accepted. Should then GET / with text/turtle always have a response including both server-managed (containment...) and client-managed triples (if any RDFa was provided from last update)? Similarly, after creating/updating /foo using text/html (including RDFa), server must be able to get /foo in text/turtle and expect the whole RDF graph, right?

@TallTed
Copy link
Contributor

TallTed commented Oct 29, 2020

@csarven

Regrettably, I think we're not on the same page. (I may not express myself as clearly as I'd like here, as I'm still recovering from chemo-brain fuzziness. Please bear with me.)

Solid is a challenging beast, because it is trying to meld an RDF server and RDF store with a "traditional" HTTP server and (typically filesystem-based) document store.

My baseline is that the appropriate action in any case is the action which results in least (preferably NO) loss of data/information. Among other things, that means that updates, whether by PUT or PATCH or otherwise, should match the thing they're updating -- which may not be the thing retrieved by a GET!

I have particular concerns with the idea that "only the RDF graph of resource's state is intended to be updated". Intended by whom? The entity submitting an application/sparql-update PATCH may have no idea what's actually going to be changed, and what's going to be lost or left alone. How is that submitting entity to know whether the HTML+RDFa document upon which they based their PATCH was (a) a static file, or (b) dynamically generated from an RDF graph in an RDF store, which graph was loaded from a JSON source (so lost nothing in that loading), or (c) dynamically generated from an RDF graph in an RDF store, which graph was loaded from a Turtle source (and lost comments, whitespace, etc., in that loading), or (d) something else?

I believe that updates/changes MUST be rejected if not appropriately constructed for the target being updated/changed. Likewise, PATCH + application/sparql-update MUST ONLY be accepted for modification of RDF graphs which are stored as such, i.e., which are not stored in any form which may include out-of-RDF-band information (whitespace, comments, HTML meant for humans, etc.), including but not limited to Turtle and HTML+RDFa.

Flowing from the above, PUT of any format (a/k/a MIME type) that does not match the format of any existing document by the same name should be rejected or at least confirmed with appropriate messaging ("Target is HTML+RDFa. Upload is Turtle. Applying this upload may result in loss of non-RDF data. Proceed?"). (PUT that creates a new document should be accepted regardless of format.)

Submission of HTML+RDFa to update RDF which is stored in an RDF store should result in a similar warning, because the submitter may well have included human-targeted HTML which they do not intend to lose. It would be acceptable to me to store the HTML+RDFa document and use the RDF found therein to update the RDF in the RDF store IFF that action is communicated to the uploader, but I think this should include confirmatory interactions with the user, potentially presenting a DIFF, making clear what the changes are -- in order to avoid DELETE of triples which were not meant to be deleted, though they were not included in the new submission... It's a tangled activity.

A GET request to a Solid server results in receipt of a representation of the requested URI. That representation might be a dynamically generated Turtle, JSON-LD, or RDF/XML document, which resulted from server-side parsing of an HTML+RDFa document, which should at least be discoverable through HTTP Alternate header, optimally flagged somehow as the "original source". Updates SHOULD be submitted to match/replace that "original source"-- so, application/sparql-update should not target any static filesystem document, only an RDF graph within an RDF store.

I know the above is not as clear as it might be, and I've not been able to present it in a clear nor logical order ... but perhaps it helps show where I'm coming from.

@csarven
Copy link
Member

csarven commented Oct 30, 2020

Ted, you're doing great. Thanks for the feedback.

I have particular concerns with the idea that "only the RDF graph of resource's state is intended to be updated". Intended by whom?

The quoted phrasing seems inaccurate; following my proposal, that would be "only the client-managed part of the compound resource state can be updated by the client".

To be absolutely accurate, from the spec (as of this writing):

When a server supports the HTTP PUT, POST and PATCH methods [RFC7231] this specification imposes the following requirements

The representation and behaviour of containers in Solid corresponds to LDP Basic Container and MUST be supported.

Servers MUST NOT allow HTTP POST, PUT and PATCH to update a container’s containment triples; if the server receives such a request, it MUST respond with a 409 status code.

See also my proposal #188 (comment) as to why above requirement is a MUST NOT and not a SHOULD NOT (as per LDP).

Provided that server accepts the HTTP method on target (container in this case), and client's payload is processed and approved, there is no constraint from our end that could result in rejecting the request. The whole point is that client should not try to directly alter/modify/change/update/interfere with the server-managed triples by targeting the container. Put differently, payloads including server-managed triples are acceptable provided that it does not result in an attempt to change container's state. The client is completely free to update anything else eg. adding/removing a label, like on all applicable resources. That's not a new requirement, it comes from the primitives.


When we refer to PATCH application/sparql-update in context of Solid, we generally mean ( https://www.w3.org/2001/sw/wiki/SparqlPatch ):

intended to modify the RDF graph or apparent RDF graph at the effective request URI

Although informative, PATCH + SPARQL Update combination is more accurately described in https://www.w3.org/TR/sparql11-http-rdf-update/#http-patch

This is simply about updating the graph of resource state. Nothing about producible representations of a resource.


General remark: we should take care to not introduce new terms/concepts into the spec unless it is absolutely necessary. Completely okay for the purpose of discussion of course. Even rehashing existing concepts in non-normative text can potentially confuse the reader with respect to their prior knowledge.

@TallTed
Copy link
Contributor

TallTed commented Oct 30, 2020

@csarven -- I agree that the user should not be able to change server-managed Container triples. Those triples have not been the focus of any of my comments.

My concern stems from the fact that this conversation has rarely mentioned Containers, as opposed to "contained" resources -- and in fact has frequently focused explicitly on those "contained" resources (e.g., HTML+RDFa index.htm)! Indeed, the title of this proposal is "handling representations of resources" not "handling representations of Containers"!

The redirection of web browsers which request / text/html to /index.htm is NOT a part of HTTP spec, so far as I have found, though it is the typical (default?) setup of apache, nginx, etc. -- which all (I believe) allow the server admin to specify a different target for such Container "directory" redirections, including cgi/php/script template-ish files. This, I think, is what Trellis is doing with their template file -- the HTML+RDFa wrapper which is applied to all Container directories, i.e., not just / but /*/*/*/ etc.

@RubenVerborgh -- I had started a response to your comments above, but I wonder whether the rest of this comment suffices to explain why I don't agree with you (yet)? If you don't see my points in regard to contained resources (not Containers), per se, please say so, and I'll return to that response.

@csarven
Copy link
Member

csarven commented Oct 30, 2020

The redirection of web browsers which request / text/html to /index.htm is NOT a part of HTTP spec

Right. #69 kicked off a number of confusions, one of which is what you've highlighted above, other being representation URLs ( #109 ) and another pertaining application-specific use of index.ttl ( #144 ). We already have consensus (and spec'd) on index related stuff.

@elf-pavlik
Copy link
Member

After having quick gitter chat with @csarven I'd like to clarify something shortly:

In now way I dismiss practice of publishing HTML+RDFa documents on The Wab. I try to look for the data on how common it is (pointers welcomed) and how much of it is limited to Open Graph in <head><meta/><meta/><meta/></head>.
While acknowledging practice of publishing HTML+RDFa, I also acknowledge practice of publishing Turtle and JSON-LD in <script> tags. Once again I have no data to compare how broadly those two are practiced.

While think of those two practices, possibly equally important and adopted. I can also see two more practices:

  • publishing 'plain' HTML - no RDF graphs whatsoever - possibly most common of all discussed practices
  • publishing HTML with Microdata, here we have IG Note https://www.w3.org/TR/microdata-rdf/ and HTML spec including Microdata

I think focusing only on content type text/html we have all those 4 cases (maybe more?) possible. As far as I know client making request with Accept: text/html doesn't have clear way of hinting which of those 4 cases they prefer.

Even if we consider Microdata as abomination, I think we should give equal consideration to:

  • plain HTML - with no RDF
  • HTML with RDF in <script> tags
  • HTML+RDFa

I plan to process Ted's arguments and possibly respond the the as well, this comment is direct follow up from my chat with @csarven

@csarven
Copy link
Member

csarven commented Nov 5, 2020

Use case: narrative or prose-based communication. More specific: semantically structured. Semantic representations may also incorporate non-narrative structured information.

Requirement: The system shall provide the ability to create, read, update and delete semantically structured narrative or prosed-based communication.


This addresses what's generally known as "semantic publishing", https://en.wikipedia.org/wiki/Semantic_publishing . FWIW:

Semantic publishing has the potential to revolutionize scientific publishing. Tim Berners-Lee predicted in 2001 that the semantic web "will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine".[13] Revisiting the semantic web in 2006, he and his colleagues believed the semantic web "could bring about a revolution in how, for example, scientific content is managed throughout its life cycle".[14] Researchers could directly self-publish their experiment data in "semantic" format on the web. Semantic search engines could then make these data widely available. The W3C interest group in healthcare and life sciences is exploring this idea.[15]

Almost literally what my research and development was/is about re: https://csarven.ca/linked-research-decentralised-web ie. applying Solid to address some common problems in research communication. The resource is self-published on a Solid server with a representation available in HTML+RDFa, all significant units of communication having their own HTTP URI, sefl-described, as well as accompanied with its own LDN inbox, annotation server, referring to Memento timemap/snapshots, among other things, and includes the read-write application dokieli for authoring, publishing, social interactions.

There is no shortage of use cases leaning on "semantic publishing". Even technical specifications in HTML+RDFa: https://www.w3.org/TR/ldn/ where each significant information eg. concepts, requirements, has an identifier and self-described. See the diagram in https://csarven.ca/linked-specifications-reports#figure-linked-specifications-reports for an overview on how several interdependent bits fit together. This has in fact served as inspiration to have human- and machine-readable Solid specs in junction with test suites and implementation reports. See also: https://www.w3.org/2020/10/TPAC/breakout-schedule.html#specmining (minutes and video).

All very real.

I'm exemplifying my nearby stuff not to brag or push it on people's faces but to use them as a representative of example in the wild. This is not hypothetical. Can it be done using different stacks, and Harder, Better, Faster, Stronger? Sure, why not. Let's see it. It still would not dismiss what actually exists and addresses use cases in a legitimate way using existing standards - and hopefully with many things to improve along the way.


If any doubt or objections to above such that is should not be core to the Solid "ecosystem" and fully supported by the system on equal grounds with everything else, then please say so now. We could continue to discuss that perspective however I would hope to move beyond that. I'm finding the need to repeat myself because the key elements are either not fully understood or simply downplayed. I may of course be misreading the room (over the years). In the event that this is anything but short of "core", I propose that we revisit all use stories and use cases that's brought up in the Solid project, and whether the read-write operations against resources are actually attainable, practised, or just hypothetical with respect to semantic publishing.


I hope we are at least 83.47% on the same page at this point.

It goes without saying that while there may be finite number of solutions to meet the requirements for the use cases, the decision to focus on approaching from RDF documents (RDF 1.1 Concepts) as opposed to any non-RDF concrete syntax is quite simply that there needs to be a clear enough path to working with RDF graphs. Within the realm of RDF-based representations, we have agreed to use Turtle and JSON-LD as the baseline representation format that a server must make available so that all clients can at the very least read and use the information to fit their purpose.

With respect to RDF 1.1 Concepts, RDFa in markup languages outweighs Turtle or JSON-LD in HTML simply because the former is well-defined and normative, whereas the latter is non-normative in their respective specifications. It is perfectly reasonable to use a host language like HTML and include RDF blocks in script tags. That is not a replacement or better than RDFa. Never was. If ones wants to understand this better, they can have a look at the history of RDF in HTML in the first place. Or just skip history and jump to RDFa Use Cases: Scenarios for Embedding RDF in HTML: https://www.w3.org/TR/xhtml-rdfa-scenarios/

I've shared why RDFa https://csarven.ca/linked-research-decentralised-web#why-rdfa in sufficient detail in my thesis if anyone would like to have a closer look at the advantages of human- and machine-readable information.

Moreover, the focus ought not to be about any syntax to a concrete RDF syntax. Hence, any further discussion eg. on Microdata, CSV, would be a distraction because it is not a concrete RDF syntax. It not only overcomplicates the system, it is not immediately required or used in Solid. Having said that, the Solid ecosystem should not prohibit the possibility where the system supports any syntax to a concrete RDF syntax. It doesn't need to prescribe it. Completely orthogonal.


Yes, of course the system must support read-write plain HTML. Why would this be a question or even a problem? That's already possible.

If ordering syntaxes on their usage in the wild in order to make a decision on what to support is ever up for question - and I don't think it needs to be - then I propose starting with getting data on number of HTML (with or without embedded RDF) vs straight up Turtle or JSON-LD. As you can see, that line of argument quickly falls apart.

We could dwell on "yea but PATCH with SPARQL Update.. can't do that for HTML". As generally true as that is - although quite feasible for cases that don't care about non-RDF bits and serialization - it is completely arbitrary. A bit more detail: https://csarven.ca/linked-research-decentralised-web#http-patch-articles . So, for the time being, there is nothing wrong with PUT, and it is possible to PATCH on (X)HTML/XML (see eg XML Patch). TimBL even suggested that it'd be possible for client to serialize Turtle from RDFa source, and to use that as the payload for PATCH. Server then does its magic to reassamble. The question is not whether it is possible or not. See below for final thoughts on what we are willing to do and what we are somehow concerned about.


If AWWW's orthogonality means anything, URI allocation should not conflict with data formats. So, to cut this short, we don't mess with people wanting to assign whatever URI for their WebID. That would go against the grain, and not particularly allow a flexible or evolvable ecosystem. Arbitrary limitations will have consequences. Especially if they refute existing practice: eg: https://csarven.ca/#i . Or anyone wanting to have their WebID on their homepage with profile described in HTML+RDFa for that matter. Or is that crazy talk?


And isn't it ironic... don't you think that on one hand we want the promise the world with Solid but somehow find it rocket science to deal with markup languages / RDFa. Really?! Especially given examples in the wild and applications, FWIW. We are willing to update all sorts of components and mechanisms needed in the Solid ecosystem, or possibly even replace existing specifications having implementations that are actually used - with all the pain and costs attached - and in the same breath there is some absurd perceived shortcoming of some concrete RDF syntaxes? All under the argument that something is not pure or imperfect? Is this genuinely out of reach considering everything that's going on and we're willing to do? Is this really where we are at?

We must maintain an inclusive perspective.

@elf-pavlik
Copy link
Member

Yes, of course the system must support read-write plain HTML. Why would this be a question or even a problem? That's already possible.

I think this may be a good angle to look at the issue. Given in original proposal in this issue:

Principles for all resources:

  1. A resource has at most one (simple or compound) associated state at any given point in time. Consequently, clients cannot update a resource's individual representations independently. Rather, setting the state of a resource through a representation results in an update to the state of the resource itself, and thus all associated representations.

Let's consider scenario:

  1. Alice creates document with PUT of text/turtle to https://alice.example/beep - payload serializes non-empty RDF graph
  2. Alice follows with PUT of text/html to same IRI https://alice.example/beep - payload does not include any RDF in any of discussed way of embedding it in HTML
  3. Alice makes GET accepting text/turtle to https://alice.exapmle/beep - what does she receive in representation?
    a) non-empty graph identical to what she created in step 1 (not getting into any possible nuance due to blank node labeling)
    b) empty graph - following lack of any RDF data when creating HTML in step 2

We could than look at a variants where we have different options in step 2

  • a)text/html payload includes non-empty graph in <script> tag
  • b) text/html payload includes non-empty graph in RDFa
  • c) text/html payload includes non-empty graph as Microdata

@acoburn
Copy link
Member

acoburn commented Nov 5, 2020

I realize that Solid doesn't require LDP, but for LDP servers that implement Solid, the initial PUT to /beep creates an ldp:RDFSource. The second PUT to /beep appears to attempt to change that interaction model to an ldp:NonRDFSource. While some LDP servers may support that change, that is definitely not standard practice, and so I would expect step 2 to simply fail.

@TallTed
Copy link
Contributor

TallTed commented Dec 14, 2020

[@acoburn]

While some LDP servers may support that change, that is definitely not standard practice, and so I would expect step 2 to simply fail.

That failure would be different than virtually every web or file server I've worked with. PUTting a resource to a location with a pre-existing resource by the same name results in replacement of that pre-existing resource, regardless of the two datatypes in play -- I can replace a test.txt of text/plain with image/png or application/ld+json or text/turtle or any other Content-Type.

(This does assume I have the relevant WRITE and other permissions.)

Do some of these change the interaction model within/beneath the file, i.e., regarding its contents? Sure. So what? At worst, in your world, I would need to DELETE first and then PUT, but why should the DELETE be required? (I won't go into the number of problems I see with DELETE+PUT being disallowed unless you tell me it is.)

Just like my local filesystem. I'm allowed to shoot myself in the foot, by naming a file with image/png content with a .txt extension, or vice versa. Or by replacing a 500 MB Turtle document with a 3 KB plain text file.

(Warnings are fine -- warnings are great! -- but I must be allowed to choose my doom.)

@TallTed
Copy link
Contributor

TallTed commented Dec 16, 2020

@RubenVerborgh --

I trip over the title of this issue every time I see it. I think that --

Proposal: handling representations of resources conform the REST architectural style

-- should change to --

Proposal: handling representations of resources should conform to the REST architectural style

-- or perhaps --

Proposal: make handling representations of resources conform to the REST architectural style

-- but I may have misunderstood your intended meaning.

@TallTed
Copy link
Contributor

TallTed commented Dec 16, 2020

Possibly this?

Proposal: handle representations of resources in accordance with the REST architectural style

It would help me (and, I imagine, other future readers) if the issue title could be updated.

@RubenVerborgh RubenVerborgh changed the title Proposal: handling representations of resources conform the REST architectural style Aligning representations of document and container resources with REST via single and compound state Dec 16, 2020
@kjetilk
Copy link
Member

kjetilk commented Apr 14, 2021

I've been trying to catch up on this, and I'll first note that @RubenVerborgh 's proposal is generally aligned with my opinions, and on the index.html issue aligns pretty well with the conclusion I came to in #69 , which was rejected by @timbl when we met F2F. @timbl 's proposal is more aligned with @csarven 's requirements motivated by Semantic Publishing.

Myself, I'm more inclined to go, "yeah, @csarven , I really love your Semantic Publishing stuff, but do you really need to use the container when you need full control over the RDFa?"

For the container, I'm perfectly fine with having a compound state, where the server-managed triples are a MUST, any other triples are managed on a best-effort basis, and any RDF is injected into an HTML document so that it can be processed, but not kept as pretty. I still feel that's the best we can do, but I'll also note that this remains a key piece of contention that we are unlikely to agree on.

Thus, I think the most important thing here is to not cut off possible future improvements, but otherwise leave the index.html issue unresolved for now.

@kjetilk
Copy link
Member

kjetilk commented Feb 7, 2022

Returning to this topic without the index.html glasses on because I think we need to get normative language around representations into the spec to avoid more patchy language such as the stuff I tried in #309 (my comments there still stand from pragmatic viewpoint though).

Indeed, I think the first statement:

A resource has at most one (simple or compound) associated state at any given point in time. Consequently, clients cannot update a resource's individual representations independently. Rather, setting the state of a resource through a representation results in an update to the state of the resource itself, and thus all associated representations.

is a well formulated text, but I think we need to make it a server requirement as servers need to enforce this. Something like "Servers MUST NOT allow clients to update resource's individual representations independently".

Therefore, when a client sets the state of a resource by sending a representation via a PUT request, the server MUST erase any previously stored and for the resource take over the new state

isn't ready for the spec, because whether the server actually need to erase anything is an implementation detail. We need to wordsmith that into place, but it is probably more efficient to do so in a shared document in a meeting.

Point 3 is good. Point 4 is good, except that we shouldn't need to say "stored", resource state is sufficient.

Then, I think that we should require Content-Location for representation URLs (#109) for the reasons I discussed in #309 , and discuss the usage of origin-form (#368).

I think that we can deal with the specifics of a compound state later, we should resolve ambiguities that follow from that we haven't been articulating the idea that there is a single state clearly.

@csarven
Copy link
Member

csarven commented Feb 7, 2022

A resource has at most one (simple or compound) associated state at any given point in time. Consequently, clients cannot update a resource's individual representations independently. Rather, setting the state of a resource through a representation results in an update to the state of the resource itself, and thus all associated representations.

is a well formulated text, but I think we need to make it a server requirement as servers need to enforce this. Something like "Servers MUST NOT allow clients to update resource's individual representations independently".

https://solidproject.org/ED/protocol#server-representation-write-redirect :

When a PUT, POST, PATCH or DELETE method request targets a representation URL that is different than the resource URL, the server MUST respond with a 307 or 308 status code and Location header specifying the preferred URI reference.

@kjetilk
Copy link
Member

kjetilk commented Feb 7, 2022

Yup, but that doesn't cover it. It could be that they are using conneg rather than representation URLs. Also, we could be clearer in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Specification
  
Awaiting triage
Development

No branches or pull requests

7 participants