Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery of storage #310

Open
kjetilk opened this issue Sep 16, 2021 · 24 comments
Open

Discovery of storage #310

kjetilk opened this issue Sep 16, 2021 · 24 comments

Comments

@kjetilk
Copy link
Member

kjetilk commented Sep 16, 2021

While writing tests for Section 4.1, I found no reliable way to discover a storage, and therefore no reliable way to test this section's requirements.

Now, since we generally have a storage on the server's /, for the most part, you could just start with that, and the tests would pass, but that's not the requirement. A server may support several storages, but parts of the URI space doesn't need to be a storage.

In the current protocol, you can discover the storage by traversing towards it's root container. So, in principle, you could GET /foo/bar to discover that the root container is /foo/. However, this can't be used for discovery, because it would require the entire URI space to be covered by storages, and that may not be the case. If I have understood it correctly, you could for example have that /pods/users/foo/ is the root container for the user foo, and similarly for other users. Then, just GET /bar/baz will not yield a pointer to a storage.

To remedy this problem, I suggest that we have a way that clients can discover the pods as a SHOULD level requirement (so that a server may support anonymous pods, but otherwise should make them discoverable).

@kjetilk kjetilk added doc: Protocol topic: resource access status: Nominated An issue that has been nominated for the next monthly milestone labels Sep 16, 2021
@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

There are basically two different mechanisms that I have in mind.

One is that we have a resource .well-known/solid/storages that provides an RDF resources listing the storages.

The other is that we define that the response payload to OPTIONS * has it.

Personally, .well-known has always been itching my aeastheticle (that much out-of-band knowledge of resource identifiers just feel wrong), so I would prefer the latter if possible.

Orthogonal to that is the content of that response. I see two options, one is for example:

<> pim:storage </pods/users/foo/> , 
               </pods/users/bar/> .

However, there's no value to this being self-describing. Also, it appears to stretch the definition of pim:storage. Even though its domain is undefined, the vocabs says rdfs:comment "The storage in which this workspace is", and this resource is certainly no workspace.

Therefore, I rather suggest we do for example:

</pods/users/foo/> a pim:Storage . 
</pods/users/bar/> a pim:Storage .

That tells you exactly what you need to know in concise terms, and without stretching the definition of anything, it describes the storages exactly the way they describe themselves.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

However, this can't be used for discovery, because it would require the entire URI space to be covered by storages and that may not be the case.

I don't quite understand this. Any given resource is part of only one storage and there are no overlapping storages.

If I have understood it correctly, you could for example have that /pods/users/foo/ is the root container for the user foo, and similarly for other users. Then, just GET /bar/baz will not yield a pointer to a storage.

No. /pods/users/foo/ and /bar/baz are under different storages.

"The storage in which this workspace is"

That's old. There was an update and the current definition is https://github.com/solid/vocab/blob/bd9ce1c7806254bf1307c73f2b5e0dd3eeaab101/space.n3#L91-L92 . I'll follow up with Tim to push latest to w3.org.

That tells you exactly what you need to know in concise terms

That works only for 200 response. Clients may need to navigate up the hierarchy eg. to find the storage owner, irrespective to their permission on each resource on the way. This was the consensus.

To remedy this problem,

I don't see a problem :) but I'll respond to the following:

I suggest that we have a way that clients can discover the pods as a SHOULD level requirement

The Protocol describes one way that is guaranteed for clients to discover the storage. Clients are not required to discover the storage.

(so that a server may support anonymous pods, but otherwise should make them discoverable).

What's an "anonymous pod" - defined anywhere?


Close issue?

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

No. /pods/users/foo/ and /bar/baz are under different storages.

What is that requires /bar/baz to be in a storage?

Does that mean that I am not allowed to have a something like a /static/ which is just a Web server? That would seem like an infringement on the rights of a server.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

What is that requires /bar/baz to be in a storage?

How was /bar/baz created? How was /bar/ created? How was / created - that's given; root container / Storage.

Does that mean that I am not allowed to have a something like a /static/ which is just a Web server?

Of course /static/ can exist. That can indeed be the root container / Storage. It is just that resource /bar/baz wouldn't be under that storage. /bar/baz may be in /bar/, and if that's not the storage, it'll be in / as storage.

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

What is that requires /bar/baz to be in a storage?

How was /bar/baz created? How was /bar/ created? How was / created - that's given; root container / Storage.

A legacy CMS system or something. Whatever, but not the Solid protocol.

Does that mean that I am not allowed to have a something like a /static/ which is just a Web server?

Of course /static/ can exist. That can indeed be the root container / Storage. It is just that resource /bar/baz wouldn't be under that storage. /bar/baz may be in /bar/, and if that's not the storage, it'll be in / as storage.

OK, I didn't quite parse that, but / can't be a storage, because then it would be overlapping with /pods/users/foo/, right?

And BTW, /pods/ can't be in a storage either, right, for the same reason. It follows from the requirements that there can be more than one and the requirement that they are non-overlapping, there there has to exist some space that is not necessarily contained.

@elf-pavlik
Copy link
Member

One is that we have a resource .well-known/solid/storages that provides an RDF resources listing the storages.

FYI proposal for notification also defines .well-known/solid solid/notifications-panel#3

@csarven
Copy link
Member

csarven commented Sep 16, 2021

If / is a Storage, then there are no other Storages in that path.
If /pods/ is a Storage, there can be other Storages under /, e.g., /bar/, /static/.

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

Yes, and therefore, if /pods/users/foo/ and /pods/users/bar/ are storages, then /pods/users/ and /pods/ can't be storages, and so, the discovery method breaks down.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

the discovery method breaks down

What breaks exactly? There is nothing prior to Storage. The path may be in the URI but there is nothing to see in /pods/users/ or /pods/. Discovery of Storage starts by giving URI as input.

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

Yes, but which? If you do not know anything about the structure of the pod, how do you get started?

@csarven
Copy link
Member

csarven commented Sep 16, 2021

I'll bite. How did a client come across the pod?

How do you find out my inbox? What's the input?

Client select or arrives at a target somehow. Do you think it would help to introduce language along the lines of https://www.w3.org/TR/ldn/#discovery :

The starting point for discovery is the resource which the notification is to or about: the target. Choosing the most appropriate target resource from which to begin discovery is at the discretion of the sender or consumer, since any resource (RDF or non-RDF) may have its own Inbox.

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

No, I don't think that would help.

Say that Solid lives in a Web ecosystem, where servers manage pods along with legacy systems, using the same origin. You may then have arrived at the server by very conventional means, like a link to a non-Solid resource, news articles. Whatever stuff you find on the Web today. And then you, as the client, want to discover if there are any pods there. The content admin may not have made that data available on the resource that you already have, not very accommodating that is, but you want to know.

We can't control all the content out there, but in order to lead a healthy existence in an ecosystem, with a ramp-up part, I think it is vitally important to have some hooks that you can be sure exists.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

The content admin may not have made that data available on the resource that you already have, not very accommodating that is, but you want to know.

Hmm, I think that's a broader question or orthogonal to discovering Storage, and there may be a simple answer to it by checking the type of resource. For example, LDP mentions rel=type ldp:Resource ( https://www.w3.org/TR/ldp/#ldpr-gen-linktypehdr ) to indicate LDP support.. and for Solid, there could be something like solid:Resource ( see also #194 (comment) )

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

So, testing out the algorithm:

GET /pods/users/

200 OK
Content-Type: text/html

<h1>These are our users</h1>
<ul><li>foo</li><li>bar</li></ul>

GET /pods/

200 OK
Content-Type: text/html

<h1>Yeah, we really love Solid!</h1>

GET /

200 OK
Content-Type: text/html

<h1>Welcome to our site!</h1>

Result: Storage not discovered.

GET /pods/users/foo/assets/images/bar.jpg

200 OK
Content-Type: image/jpeg

sdoglghsdgfjh

GET /pods/users/foo/assets/images/

200 OK
Content-Type: text/turtle

<> ldp:contains <bar.jpg>


GET /pod/users/foo/assets/
Accept: text/html

200 OK
Content-Type: text/html

<h1>Seriously</h1>
<p>I've done six requests now, and I still haven't discovered a storage. 
How long do you actually expect me to keep doing this? 
This is no way to make a discovery protocol. 
Can't I just have a list of storages?</p>

Result: The client received no guidance, it wasn't required to, and so it gave up. Very reasonably, I might add.

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

The content admin may not have made that data available on the resource that you already have, not very accommodating that is, but you want to know.

Hmm, I think that's a broader question or orthogonal to discovering Storage, and there may be a simple answer to it by checking the type of resource. For example, LDP mentions rel=type ldp:Resource ( https://www.w3.org/TR/ldp/#ldpr-gen-linktypehdr ) to indicate LDP support.. and for Solid, there could be something like solid:Resource ( see also #194 (comment) )

There are too many heuristics in there. We need a simple mechanism just to get started.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

First of all, what kind of an evil application are you that uses GET instead of HEAD just to discover Storage in HTTP header. The spec allows both for obvious reasons...

And again, the response could've been 403.

Need to discover root container / Storage somehow. There has to be an input. The Protocol already works with 3986 on hierarchical paths and containment. So what's in place is a quick way to break the segments and check. The Protocol also states:

Clients may check the root path of a URI for the storage claim at any time.

When you say:

Result: Storage not discovered.

If it is a Solid server, it should've had Link rel=type Storage in header. If it is not discovered by that point, the application would know it is not working with a Solid server.

There are too many heuristics in there. We need a simple mechanism just to get started.

We already have something but you want something different.

What I'm saying is that the use cases to discover the Solid server's Storage is different than the use case to determine whether a server supports the Solid Protocol.

If you don't like the discovery algorithm looking for rel=type Storage or resource with rel=type solid:Resource (for Solid server). We could introduce pim:storage in the HTTP Link header (similar to what we have currently for message body):

Clients can discover the storage which contains the resource of an HTTP HEAD or GET request target by checking for the Link header with rel="http://www.w3.org/ns/pim/space#storage". The target of the relation is the storage (pim:Storage).

And the requirement for the server obviously.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

Can't I just have a list of storages?

Completely different use case. The 1) storage of a resource, is different than 2) list of storages in a pod, is different than 3) whether a server supports the Solid protocol.

@acoburn
Copy link
Member

acoburn commented Sep 16, 2021

Can't I just have a list of storages?

I do not understand the use case motivating the feature that lists all storage locations on a Solid server. Surely there are cases where storage location would not be discoverable by public agents.

The use case that I do understand is the following: As a user, what are the locations of my storage roots on a Solid server.

For example, I may have a WebID at https://id.example/{username} with several Solid Pods at: https://solid.example/{uuid}/. How do my apps know where my data is stored? Assume that I don't want to advertise these locations to the world in my WebID profile.

It would be useful to have an endpoint on the storage server that, given an access token asserting a user's identity, list the data pods that this user owns. Clearly, this endpoint would require authentication. I also see this only being relevant in cases where there are multiple storage roots on a Solid server.

@kjetilk
Copy link
Member Author

kjetilk commented Sep 16, 2021

Can't I just have a list of storages?

I do not understand the use case motivating the feature that lists all storage locations on a Solid server. Surely there are cases where storage location would not be discoverable by public agents.

Primarily, it is not so much a use case, it is a quality attribute, that conformance should be easy to verify. The current specification makes it difficult.

@csarven
Copy link
Member

csarven commented Sep 16, 2021

Surely there are cases where storage location would not be discoverable by public agents.

Nod.

I do not understand the use case motivating the feature that lists all storage locations on a Solid server.

Nod.

The use case that I do understand is the following: As a user, what are the locations of my storage roots on a Solid server.

I do not understand the use case motivating the feature that lists all storage locations of an agent on a Solid server.

The use case that I do understand is the following: As a user, what are the locations of my storages on the Web.

Discovery of storages starts from the agent because the agent can have storages on different origins. One way that apps can start discovery of an agent's storages is from the WebID Profile document (accessible by any agent with Read). Another would be from a different (access controlled) resource listing the storages such as the pim:preferencesFile or if necessary, a dedicated property, e.g., storageIndex. The statements will be in this form: <agent> pim:storage <storage>. I think we have this use case covered.

Edit: In the Protocol, the subject would be the agent as per the use case above:

Clients can discover a storage by making an HTTP GET request on the target URL to retrieve an RDF representation [RDF11-CONCEPTS], whose encoded RDF graph contains a relation of type http://www.w3.org/ns/pim/space#storage. The object of the relation is the storage (pim:Storage).

@kjetilk
Copy link
Member Author

kjetilk commented Sep 24, 2021

I have started to think about this differently: It is also about the authority of the URI space of the storage.

There may/will be other APIs on a Solid server, and they will occupy parts of the URI space. It is further very likely that parts of the storage's space will be occupied by URIs naming things that aren't controlled by the storage. In fact, if the storage is at /, then there will be URIs of authentication and other mechanisms that aren't controlled by the storage.

The question is then, who has the authority to name things in the URI space of a storage? If we leave that to some "server admin" entity, which can have a server-wide discovery mechanism, then that "server admin" entity can override the wishes and priorities of the storage's owner or user.

I think we should be very careful to use server-wide discovery mechanisms in other protocols that define APIs. Instead, the server should make storage discovery really easy, and thereby leave the control to the storage owner or user of their own URI space. The root container is a better place to manage further discovery.

@csarven csarven added this to the October 2021 milestone Oct 6, 2021
@kjetilk
Copy link
Member Author

kjetilk commented Oct 25, 2021

I think that with the resolution of solid-contrib/conformance-test-harness#119 it now makes sense to remove this from the milestone and return to it for 1.0. While I still think the current behavior is not sufficiently testable (as also confirmed by the above PR), we don't need to prioritize it right now.

Any opposition to remove it from the milestone?

@kjetilk kjetilk removed the status: Nominated An issue that has been nominated for the next monthly milestone label Oct 26, 2021
@kjetilk kjetilk removed this from the Release 0.9 milestone Oct 26, 2021
@kjetilk
Copy link
Member Author

kjetilk commented Jan 20, 2022

Having had time to think about this further, I retract my idea to have a listing of server's storages, that is not sufficiently aligned with privacy expectations in the case where a server hosts many (my assumption has been that that's a rare case, you'd want a domain).

I still think the algorithm needs to have work, it doesn't scale to have to move up the tree, possibly in a large tree. Whether that should be addressed in this issue or more generally in #355 is an open question.

@ianconsolata
Copy link

ianconsolata commented May 27, 2022

So, I have a slightly different use case than one that has been discussed already. I do not care about listing all the storages of a server (and agree with others that this is probably a bad idea), but as an application developer I need to know where to store data given a WebId. Therefore, I would like to propose extending section 4.1 on storage to include something along the lines of the following:

Servers exposing the storage resource MUST advertise by including in a pim:preferencesFile linked to from the WebID document.

It would also require an addition to the WebId section along the lines of:

WebID Documents MUST contain at least one (exactly one?) pim:preferencesFile triple.

As stated earlier, exposing the storage directly in the WebID document itself won't work, because it is a public document and not all storages will be public. However, there MUST be a way for application developers, given a WebID, to get a list of storages owned / controlled by that WebID. The natural place to put it then, would be in a private pim:preferencesFIle.

This aligns with the expectation set by the WebId profile group that a well-formed document MUST include a pim:preferencesFile (https://github.com/solid/webid-profile/blob/main/notes/pre-final-draft.md#3-private-preferences---pimpreferencesfile)

That document, however, suggests that if a preferencesFile does not exist, it should be created by application developers. As an application developer, this puts me in an impossible catch-22. If I have a WebId that has no pim:storage and no pim:preferncesFile, I cannot create a pim:preferencesFile because I don't know where to store it. Using the currently advertised way of accessing a storage (the Link header) only works if the WebID document exists as a subpath of that storage, and even then it does not allow me as an application developer to determine all storages a user controls, only the one the WebID document is stored in.

This is something that MUST be exposed by the server in some way, and this seems like the most reasonable way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Specification
  
Awaiting triage
Development

No branches or pull requests

5 participants