Skip to content
This repository has been archived by the owner on Apr 13, 2022. It is now read-only.

Remove globbing from the spec #145

Open
RubenVerborgh opened this issue Mar 21, 2019 · 35 comments
Open

Remove globbing from the spec #145

RubenVerborgh opened this issue Mar 21, 2019 · 35 comments

Comments

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Mar 21, 2019

I strongly think that globbing should be removed from the spec.

Reasons for removing

  1. No one really wants globbing. People want cross-file data access, and there are better ways of achieving that. Globbing has always been a hack for accessing data in multiple files. It was not thought through well (see below).

  2. Globbing is expensive on the server-side.

  3. Globbing can lead to denial of service (on server and client).

  4. Whatever can be achieved through globbing, can be achieved as efficiently without.
    With HTTP/2, there is zero overhead in just going through the files on the client side.

  5. Let's remove it soon before it is actually widely used and implemented.

Reasons for keeping it plus mitigations

  1. A (very low) number are using it.
    Let's upgrade them.

  2. It has been in the spec for many years.
    That doesn't make it a good idea, and only a low number of apps are using it anyway (see 1).

Conditions

  1. Globbing should NOT be removed until there is a replacement for the functionality it provides.

People who need to have a say in this

@timbl @melvincarvalho

@michielbdejong
Copy link
Contributor

+1

@elf-pavlik
Copy link
Member

👍 I questioned need for it a long time ago in solid/solid#116

@dmitrizagidulin
Copy link
Member

+1

@gobengo
Copy link

gobengo commented Mar 21, 2019

Eventually it is probably ideal for developer trust and adoption to adopt a Linux-like "never break userspace" policy, and to never make backward-incompatible changes like this. What is in the way of adopting such a policy? (i.e. what other spec features should be considered 'at risk for removal' before a 'v1'?)

Edit: I don't think it's necessary to adopt such a policy today. But maybe by a year from now? I am +1 on this proposal in the interest of applying Occam's Razor to the core of the spec. Just think it's good to trim and apply learnings all at once here instead of continuing a piecemeal feature-removal strategy ad-infinitum. AS2 made backward-incompatible changes far too long into the spec, IMO, and it actively undercut my ability to get it adopted inside my organization at the time.

@RubenVerborgh
Copy link
Contributor Author

Eventually it is probably ideal for developer trust and adoption to adopt a Linux-like "never break userspace" policy

Agreed: eventually.

What is in the way of adopting such a policy?

W3C standardization would be a good way of going through the spec with scrutiny, and identifying and fixing issues (such as globbing).

@melvincarvalho
Copy link
Member

-1

This is in use

Consider this a formal objection.

Please by all means work on a HTTP2 library that could possibly act as a replacement. That would go some way to me withdrawing this objection.

@melvincarvalho
Copy link
Member

More specifically, I think we did discuss this in the past, and globbing is not quite the same as grabbing all the content in a directory. Is that useful? I think there can be an argument for yes and one for no. For example in an LDPC you get all the file content in a directory, and that's useful. Not suggesting those are equivalents, but is more a meta point.

Also some stats on how well deployed http2 is would be handy. And a proposal on how http2 could replace globbing. I've had lots of interest from my social network app, in private discussions, including from former facebook people. And darcy etc. That uses globbing extensively. Could it be replaced. Possibly, but might take some design and time. Not time I currently have in the next quarter or two.

Im increasingly deploying solid servers on all my devices now including yesterday my android phone. So eventually I could see solid deployed widely including IoT. So http2 usage would be an interesting data point here. We cant just be a chrome specific project, we should think about solid as web servers running everywhere, in your home, your fridge, your watch, your phone etc.

There's a few things id like to see fixed and working before tackling this. So my more in depth answer is, not for now, we could mark it as "revisit", like many of our issues are.

However, speccing out a possible http2 solution seems to be a good idea, and I'd support that proposal.

@RubenVerborgh
Copy link
Contributor Author

@melvincarvalho I'm in full agreement. Nothing will or should be removed until there is a replacement. I planned to explicitly state this in the issue, but forgot; will update now.

@RubenVerborgh
Copy link
Contributor Author

Tracking implementation of such a client-side feature in solid/solid#253. Just a small note that we will likely not need anything HTTP/2-specific: HTTP/2 will automatically optimize the sequence of requests.

@gobengo
Copy link

gobengo commented Mar 22, 2019

On pros cons (from solid/solid#253 (comment))

So let's just be honest about what are the pros/cons. From #145 the OP starts with

  1. No one really wants globbing. People want cross-file data access, and there are better ways of achieving that. Globbing has always been a hack for accessing data in multiple files. It was not thought through well (see below).

I don't pretend to know about this.

  1. Globbing is expensive on the server-side.

Only if you implement it naively. Alternate datastores (or if the datastore is a filesystem, shell out to bash or a C lib instead of globbing in node) can make this an O(1) lookup. This assertion needs more justification.

  1. Globbing can lead to denial of service (on server and client).

See 2. DDOS are a risk no matter what, e.g. by repeatedly getting full directory listings that are huge and taking up all available OS connections. It's not that unique to globbing. Practically, pros would deploy behind a DDOS-protecting middleware that makes this a non-issue.

  1. Whatever can be achieved through globbing, can be achieved as efficiently without. With HTTP/2, there is zero overhead in just going through the files on the client side.

My argument here is meant to analyze if this is true.

At this point I think everyone agrees the 'HTTP/2' mention isn't what's important. Even over HTTP there is no'zero overhead', but the overhead is probably negligible in the vast majority of near-term scenarios.

  1. Let's remove it soon before it is actually widely used and implemented.

No argument here.


So 1 and 5 are likely good reasons. 3 is a bit of an overstatement ('zero overhead'), but can be rephrased to be just as convincing.


And the scalable solution here will be querying. Globbing is just a poor man's query; let's replace that with a client-side solution, and built proper query interfaces instead.

Totally agree! +1

@RubenVerborgh
Copy link
Contributor Author

Important update: it seems that globbing is much more loosely defined in the spec than how @melvincarvalho intends it. My objection here has been to the loose version; some of your objections might also have been.

So please have a look at #148 for a proposal to already narrow down the current definition of globbing.

@elf-pavlik
Copy link
Member

While I prefer removing globing all together. In case it stays maybe the response could at least use dataset (quad) representation (Trig, JSON-LD) so at least client knows from which graphs / documents which statements came from. Otherwise I don't see how client could perform updates when it needs to.

@melvincarvalho
Copy link
Member

@timbl I noticed that you thumbs uppped this one.

I think this is the first time in living memory that I possibly disagreed with you.

Globbing is in use. I spent months of time and work building apps based on this pattern. If this had been at risk, I would not have started that work, and left it until we had other patterns in place.

My intention was to revive work after the server work had stabilized, for which I have waited patiently.

The main question is on what time line would you want this. On a longer time line I could see myself getting behind this, particularly if there are like for like replacements. My concern is that there will be unilateral changes to the spec at short notice.

@melvincarvalho
Copy link
Member

melvincarvalho commented Mar 28, 2019

Whatever can be achieved through globbing, can be achieved as efficiently without.
With HTTP/2, there is zero overhead in just going through the files on the client side.

@RubenVerborgh There is a burden of proof for you to prove a number of things. But this one is foundational. So examine the apps that use globbing, and that also includes cimba, and make the case that globbing can do all the things that are done. In fact it needs to be said what the functional requirements are, because globbing is not just used to fetch files. It will be a good conversation and a learning experience, for those that follow, I think. And also, importantly for me, I will get some breathing space to digest the detail of the proposal and assess the timeline, which is the main thing that matters to me. I'd say 3 of our best 5 apps ever have used globbing and solid would not exist without them. Let's get to the bottom of the above, because I suspect there's some fine detail you've missed.

EDIT: or even better, if you feel like you are in super hero mode (which sometimes are are imbued with) why not take a crack at taking one of the apps and porting it to node solid server 5 / http2 -- I think such an effort, would likely be the ultimate win-win.

@RubenVerborgh
Copy link
Contributor Author

Whatever can be achieved through globbing, can be achieved as efficiently without.
With HTTP/2, there is zero overhead in just going through the files on the client side.

@RubenVerborgh There is a burden of proof for you to prove a number of things.

Happy to oblige:

  • a glob is nothing but a concatenation of RDF files in a container
  • can be replicated on the client side by a) fetching the container b) GETting those files individually
  • under HTTP/2, multiple requests have virtually no overhead compared to a single request (that was one of the explicit design goals)
  • only overhead we thus have downstream is receiving the names of files that are not RDF
  • only overhead we thus have upstream is sending the URLs of files that are RDF (but the upstream channel will never be the bottleneck)

But this one is foundational. So examine the apps that use globbing, and that also includes cimba, and make the case that globbing can do all the things that are done. In fact it needs to be said what the functional requirements are, because globbing is not just used to fetch files.

Hmm, this is new information.

For all we know (= your earlier statement at solid/solid#253 (comment) and TestGlob at https://github.com/linkeddata/gold/blob/b000d003f9e2aa40e4977839ca063f09435f80c8/server_test.go#L1193), the only implemented functionality is GET /data/* (confirmed by manual inspection of the GOLD code).

I'd say 3 of our best 5 apps ever have used globbing and solid would not exist without them.

They would just client-side loop over all files in the container.

RubenVerborgh added a commit that referenced this issue Mar 28, 2019
@RubenVerborgh
Copy link
Contributor Author

Added PR for removal as well, given that seems to be the demand of most: #151 No need to rush.

@RubenVerborgh
Copy link
Contributor Author

Nothing will or should be removed until there is a replacement.

Replacement at https://github.com/solid/ldp-glob; live demo at https://solid.github.io/ldp-glob/demo.html?https://drive.verborgh.org/public/

@melvincarvalho
Copy link
Member

Replacement at solid/ldp-glob; live demo at solid.github.io/ldp-glob/demo.html?https://drive.verborgh.org/public

@RubenVerborgh thanks for taking the time to create this. It's in the first place rather difficult to evaluate whether this is a like for like replacement, as it doesnt even have a README. I have had a very quick look at it, but will take some more time to do so.

I've readded the on-hold tag, as I would like to discuss this over a longer period of time. Would appreciate it if you didnt unilaterally remove it. Cheers!

@RubenVerborgh
Copy link
Contributor Author

It's in the first place rather difficult to evaluate whether this is a like for like replacement, as it doesnt even have a README.

It's just 9 lines, so I figured it would be overkill to turn it into a lib.
Went through it with @timbl and works for its purpose.

I've readded the on-hold tag, as I would like to discuss this over a longer period of time. Would appreciate it if you didnt unilaterally remove it.

on-hold is for things that are technically blocked. There are no technical blockers on this issue. I understand you don't have time, but that is not a technical blocker. So please remove that label and only use it when one technical issue needs to be resolved before another.

@melvincarvalho
Copy link
Member

Went through it with @timbl and works for its purpose

Citation required. Would appreciate to see the context, or better still, hear from Tim himself. Pain I know, but the bar for changing specs is necessarily high.

@RubenVerborgh
Copy link
Contributor Author

Citation required.

That's it right there. No need to doubt my word.

Would appreciate to see the context

It's a private conversation that I hence cannot share.

Assigned the issues to @timbl, and will ping him to take a look.

@RubenVerborgh
Copy link
Contributor Author

Discussed out of band with @melvincarvalho: I agree that #148 and #151 should be on-hold; I propose for this issue to not be on-hold (since it is not being blocked) so people can discuss.

@angelo-v
Copy link

angelo-v commented Apr 7, 2019

@NoelDeMartin Since you are using globbing in Solid Focus you should be aware of this

@NoelDeMartin
Copy link
Contributor

@angelo-v thanks for the heads up.

When it comes to my use case, the spec is already compatible with the things I want to do, I'm only using globbing because there is no support for SPARQL on node-solid-server implementation, as is being tracked on this issue: nodeSolidServer/node-solid-server#962

@elf-pavlik
Copy link
Member

@NoelDeMartin do you see it possible to replace you current use of globbing with client side replacement @RubenVerborgh shared in #145 (comment) ?

@NoelDeMartin
Copy link
Contributor

@elf-pavlik Yes it is possible, assuming the server uses HTTP/2 as @RubenVerborgh mentions. If it doesn't it's still possible but the performance won't be great.

@RubenVerborgh
Copy link
Contributor Author

If it doesn't it's still possible but the performance won't be great.

It's quite alright as long as there are not hundreds of RDF files (and there usually never are). All the rest is premature optimization 😉

@NoelDeMartin
Copy link
Contributor

@RubenVerborgh Well, considering I'm building a task manager there will probably be hundreds of files :). But yeah, I can live with that for the time being (and there is always HTTP/2).

@RubenVerborgh
Copy link
Contributor Author

Well, considering I'm building a task manager there will probably be hundreds of files :).

And every task is a file? In that case, yes.

@elf-pavlik
Copy link
Member

and there is always HTTP/2

I can't see any reason why any Solid server would not use HTTP/2. I run NSS behind nginx and it just takes listen 443 ssl http2; to have HTTP/2 enabled. I think if NSS doesn't have it already it should have a config option to use node native HTTP/2. Enabling HTTP/2 should really add no extra effort to deployment of NSS.

@RubenVerborgh
Copy link
Contributor Author

Enabling HTTP/2 should really add no extra effort to deployment of NSS.

Client certs…

@elf-pavlik
Copy link
Member

https://letsencrypt.org/ BTW doesn't secure OAuth2 so also OpenID Connect rely on SSL?

@RubenVerborgh
Copy link
Contributor Author

I meant client certs (not server certificates). NSS currently still allows authentication with client certificates. For that, NSS has to terminate the HTTPS connection, and NSS only does HTTP 1.1. While you can put a reverse proxy with HTTP/2 in front of NSS (which is what I do), this does break client-side certificates (or you have to find a way to forward the client certificate negotiation).

@Ryuno-Ki
Copy link

Ryuno-Ki commented Apr 9, 2019

Client certs like used in some enterprise environments (e.g. inrupt)? :o

@RubenVerborgh
Copy link
Contributor Author

For clarity, there is no problem with client certs, HTTP/2, or client-side globing: all of these are perfectly combinable. It’s just that NSS (which currently supports server-side globbing, so no issues) terminates with HTTP 1.1. You can proxy with HTTP/2, but then need to figure out client cert passing; an all-encompassing solution has HTTP/2 on the Solid server.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants