Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of Memento resources #61

Open
acoburn opened this issue Sep 20, 2019 · 20 comments
Open

Creation of Memento resources #61

acoburn opened this issue Sep 20, 2019 · 20 comments

Comments

@acoburn
Copy link
Member

acoburn commented Sep 20, 2019

The Solid Ecosystem document mentions Memento as an optional dimension for content negotiation.

Memento, itself, defines a mechanism whereby a client can discover and retrieve previous states of a resource. Memento does not define how these previous states (Mementos) are created or otherwise managed. Is this an implementation decision or will the Solid specification take a stance on issues such as:

  1. Will it be possible to delete a Memento? If so, what are the ACL implications?
  2. How and under what conditions are Mementos created? Is a new Memento created every time a resource is modified?
  3. When a resource is deleted, are all associated Mementos also removed?
@RubenVerborgh
Copy link
Contributor

I would consider these implementation decisions.
Tagging @hvdsomp.

@csarven
Copy link
Member

csarven commented Sep 21, 2019

I think it would help to bridge LDP and Memento by defining:

  • how a server may advertise its support for the creation (otherwise management) of versioned (immutable) resources.
  • how clients can request a versioned (immutable) resource to be created (otherwise managed) and what would the request or shape of the payload look like.

I used broader language above so that while Memento is the main candidate, it can potentially be extended - but that's just speculation for now.

I think the Solid spec should draw from or align with https://fcrepo.github.io/fcrepo-specification/#resource-versioning . We need to process/digest this properly and should have some implementation experience before throwing something into the spec.

We can answer the questions you've raised once we have some consensus that:

  • the relationship between LDP and Memento (at the very least, but not strictly, given relationship or dependencies on other mechanisms eg. ACL) is needed and defined;
  • the level of requirement is determined.

I'm not certain myself but would like to have a discussion to what degree Memento is required or baked into Solid servers. From an application's perspective, I think it'd would be great to have it (so that's more like a MUST) but from a server's perspective, MUST is a high requirement in that basically we're forcing all servers to make a certain "promise" about resources. Server or resource "owners" may not want to commit to that obviously or even be particularly useful/sensible for the kind of data they are generally storing/serving. SHOULD/MAY? We can however at the very least use language like "if a server supports Memento, then it MUST..." which will allow interop with clients with Memento know-how.

@RubenVerborgh
Copy link
Contributor

As interesting and useful as Memento is, I strongly suggest a MAY only. Memento is an orthogonal specification; no need to mandate it.

@csarven
Copy link
Member

csarven commented Sep 21, 2019

Pretty much everything in Solid is an orthogonal spec. The spec already (or plans to) mandate a number of things, so Memento is no different, if of course desired. What the Solid spec would be doing is clarifying the relationships between Memento and everything else in Solid. Or more broadly resource versioning. I think that is needed and now is a good time to map that out.

I'm hoping to extend the discussion and the implications of a system (and eventually the ecosystem) that is generally aware of versioning. Understanding that better would help to determine if MUST/SHOULD/MAY.

@RubenVerborgh
Copy link
Contributor

Pretty much everything in Solid is an orthogonal spec.

Agreed; I meant orthogonal to the whole of what we now consider as "Solid", i.e., the combination of LDP+WAC+Auth.

I would need to see very strong arguments for anything other than a MAY. Of course, we can also have a trivial implementation of Memento which can easily be a MUST, i.e., Memento says that versioned resources must support datetime conneg. A server can easily be fully compliant with Memento without supporting it, namely by just not offering versioned resources at all.

@pmcb55
Copy link

pmcb55 commented Sep 21, 2019

I would very strongly suggest only a MAY too (simply because I think it's a value-add feature that many implementors may not want to be burdened with). I also think it offers nice differentiation (and justification) for Enterprise implementations to provide lots of nice features around versioning.

But I do agree with Sarven's suggestions for the spec to provide guidance for implementations that choose to implement Memento to help interoperability.

@csarven
Copy link
Member

csarven commented Sep 22, 2019

I was hoping that we continue to develop our understanding before jumping into a decision on the conformance level. Without looking at use cases, having some implementation experience or at the very least having confidence on the kind of space we envision, the decision is arbitrary. It entirely depends on our assumptions and we ought to document some scenarios which grounds our reasoning.

While what may happen out there is at best a speculation, I thought that MUST may be difficult to achieve for implementers and resource "owners". I think we generally agree on that, but that doesn't imply MAY. Not MUST is not a free pass to MAY. Can't go from "difficult to enforce" to MAY. All we can say for now is that, it is probably not a MUST and that we need to bring forth more arguments and discussion. We need "very strong arguments" for MAY as well as anything else. Calling MUST, SHOULD, MAY or nothing at all is relatively the easy part once we know exactly where we are heading. How important is it? What do we gain/lose? Tradeoffs? Risks?

There is nothing particular about MAY that's preferable than not saying anything. Sure it is a good signal - we think it is relevant and makes sense. However, ultimately it is not required for interop with the exception of prescribing some glue for LDP/Memento for those that care: "if you do happen to implement x, make sure sure to do y". After all, we can throw in MAYs for whatever we think of is interesting or useful out there. So what? The point is that even if/when we eventually arrive at a MAY, we more or less need to hint at the why in the spec, and not "oh four people showed up on github and thought really hard about it" :)

@hvdsomp
Copy link

hvdsomp commented Sep 23, 2019

Apologies for a late response. I've been feeling a bit under the weather. A few things:

  • Memento defines HTTP-based interop for access (read only) to resource versions. It does that by defining (1) the TimeGate mechansism for datetime conneg and (2) the TimeMap, which is - in essence - a list of existing resource versions - RFC7089
  • The LDP-based Fedora API introduces version creation (write) in a way that aligns nicely with Memento - Fedora API
  • The Fedora API touches upon some of the aspects mentioned by @acoburn . I agree with @RubenVerborgh that those are implementation choices. Still, I would respond: (1) no, because flagging a resource as a Memento entails a promise that its state will no longer change (2) definitely a matter of policy/implementation; in the Fedora API clients and servers can create new versions (3) no, same reason as (1)
  • Regarding the MUST/SHOULD/MAY discussion: (1) I would distinguish between read (Memento) and write (Fedora API) (2) I would hope that - if a system decides to support resource versions - adherence to the respective spec (i.e. Memento for read) would be a MUST, meaning no other approach to deal with versions would be invented/implemented.

@acoburn
Copy link
Member Author

acoburn commented Sep 23, 2019

While a Memento resource may carry with it a promise of immutability, there is a very practical need to be able to delete these resources.

In order to comply with legal and/or privacy-related rules (e.g. GDPR), it will be absolutely necessary for an implementation to have some mechanism to purge data from its history. Whether that mechanism is part of the public HTTP interface can be a separate question, though it is worth noting that even the Fedora API supports DELETE for Memento resources.

@hvdsomp
Copy link

hvdsomp commented Sep 23, 2019

I very much agree with that, but, with my answers, wanted to point out what the expectation per the Memento spec is. Reality can obviously differ. We (creators of the Memento spec/tools) had conversations like this with the editors of the Fedora API spec, who were dealing with challenges that were very similar to the ones you describe. The Fedora API spec reflects the results of those conversations.

@csarven
Copy link
Member

csarven commented Sep 23, 2019

I have to ask a dumb question because I want to be clear. Perhaps I'm overlooking the obvious thing. As I understand it, the resource state (as per AWWW) that Memento refers to is about the representation, and so the promise of immutability is that it didn't change. If so: does that actually exclude the case where a resource ceases to exist? After all, we can't get to the resource's state to determine if it changed or not. Hence, wouldn't that permit deletion of resources without conflicting with expected Memento behaviour?

@hvdsomp
Copy link

hvdsomp commented Sep 24, 2019

That's pretty meta, but yes ;-)

Generally speaking, the intent is that a resource representation doesn't change once it's been flagged as a Memento. But, reality doesn't make it easy to live up to that promise. For example, in web archiving, where the Memento protocol is used abundantly:

  • Mementos are made inaccessible as a result of "right to be forgotten" requests
  • Mementos are frequently not the "same" when accessed repeatedly as a result of idiosyncrasies of web archive playback mechanisms; see a reference to this, for example, in this talk by Michael Nelson

Regarding the latter, RFC7089 has some language that has its grounding in digital preservation practice:

Although a Memento encapsulates a prior state of an Original
   Resource, the entity-body returned in response to an HTTP GET request
   issued against a Memento may very well not be byte-to-byte the same
   as an entity-body that was previously returned by that Original
   Resource.  Various reasons exist why there are significant chances
   these would be different yet do convey substantially the same
   information.  These include format migrations as part of a digital
   preservation strategy, URI-rewriting as applied by some Web archives,
   and the addition of banners as a means to brand Web archives.

@pmcb55
Copy link

pmcb55 commented Sep 24, 2019

Great discussion (and very interesting pointers from Herbert!).

So my probably overly simplistic summary: Memento (or more generally 'resource versioning') is not a MUST in the spec. We can offer guidance though, saying if implementations want to offer versioning, they SHOULD strongly consider Memento (for read). We can also offer the guidance that if they want to support version write they SHOULD strongly consider the Fedora API. We can expand that guidance further by explaining that in reality resources can always be deleted (giving the great example of GDPR's 'right to be forgotten'), and can also 'change' at the byte level (giving Herbert's great examples above). And finally, Solid servers that delete a resource MUST (I'm not 100% on that yet!) also delete any associated versions (e.g. Mementos).

I take Sarven's point about needing further thought - but is this the general direction?

@RubenVerborgh
Copy link
Contributor

Alternatively, we can make Memento a MUST if the server supports versions. But version support could still be a MAY.

@csarven
Copy link
Member

csarven commented Sep 24, 2019

Perhaps meta-ish. I wanted to explore and not step on any toes. If we take it as is and we want to say something about deleting Memento resources, then "SHOULD NOT delete Memento resources" will fit. That also works for legal and/or privacy-related cases. We don't want to encourage deleting, so we don't say MAY delete.

Related issue: #46 where the current consensus (warning: not final or official in any way at the time of this writing) seems to be that Solid's position should be the same as LDP's ie. "LDP servers should not re-use URIs" (re AWWW). So, that may also mean that while it is technically and socially allowed to delete a Memento resource, and even reuse the same resource for something completely different, one really should not. That possibility seems completely silly of course, but I think it is fair to acknowledge that here (for whomever is reading this in the future). I can't think of a particularly good reason right now why that may happen other than to create misinformation, or best case would be accidental (still unintentionally harmful). Perhaps out of scope for Solid but worth to acknowledge nevertheless. If Solid can do something about that, we should (re: ethical web principles).

@csarven
Copy link
Member

csarven commented May 4, 2020

Client request to create a URI-R where server creates URI-M and includes Memento headers. Without the header, server only need to create a regular resource without URI-M. Server could of course always create URI-Ms and create/update URI-T depending on the activity. AFAICT, this aligned with the client request to create a Memento resource: https://fedora.info/2018/11/22/spec/#resource-versioning . Straight forward IMO.

PUT https://csarven.ca/linked-research-decentralised-web
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"

201 Created
Location: https://csarven.ca/linked-research-decentralised-web

GET https://csarven.ca/linked-research-decentralised-web
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"
Link: <https://csarven.ca/linked-research-decentralised-web.timemap>; rel="timemap"
Link: <https://csarven.ca/archives/linked-research-decentralised-web/ce36de40-64a7-4d57-a189-f47c364daa74>; rel="memento"

GET https://csarven.ca/archives/linked-research-decentralised-web/ce36de40-64a7-4d57-a189-f47c364daa74
Link: <http://mementoweb.org/ns#Memento>; rel="type"
Link: <https://csarven.ca/linked-research-decentralised-web>; rel="original"
Link: <https://csarven.ca/linked-research-decentralised-web.timemap>; rel="timemap"
Memento-Datetime: Mon, 22 Jul 2019 16:03:11 GMT

GET https://csarven.ca/linked-research-decentralised-web.timemap
Link: <http://mementoweb.org/ns#TimeMap>; rel="type"
Link: <https://csarven.ca/linked-research-decentralised-web.timemap>; anchor="https://csarven.ca/linked-research-decentralised-web"; rel="timemap"

Edit: Corrected to use anchor in TimeMap resource example.

@hvdsomp
Copy link

hvdsomp commented May 4, 2020 via email

@csarven
Copy link
Member

csarven commented May 4, 2020

The suggestion was primarily about having the client request to create URI-R and have its URI-M available. I realise there are different reasons to support TimeMap and TimeGate. Would it make sense to require one (TimeMap in my opinion) in order for servers and clients to interop, and have the other (TimeGate) as optional?

If a server implements Memento, is there a reason why snapshot discovery and negotiation not available for any resource? Is it something that the server should just handle itself without any interface with the client ie. having the client to request to create URI-R.. in the first place?

@phonedude
Copy link

(jumping in)

there's certainly a preference for servers (esp. CMSs) to implement TG/TM, but there's no explicit requirement. As @hvdsomp said, external links are fine (even though their knowledge might not be complete).

The minimum threshold is providing Memento-Datetime and link rel=original, and other servers can build TGs/TMs from that. You can also think of the (public) web having implied TG/TM links to archive.org, and the server overrides those when it knows of "better" TGs/TMs.

@sjoerdvangroning
Copy link

Maybe this shares common ground with something we called solid-link-metadata. See https://pdsinterop.org/solid-link-metadata/
It can be used to create archives in combination with a archivedate and content hash it opens opportunities to serve and/or store older data.
More in #136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Consensus Phase
Development

No branches or pull requests

8 participants