Support dag-json selector parsing #4

rvagg · 2021-09-13T06:49:56Z

Riffing on filecoin-project/lotus#6393, I'd like to be able to parse the full range of selectors and not just those we can represent with paths and not having to wait until we have finalised our text based form.

My thought here is that there could (should?) eventually be 3 ways to supply a selector on a command-line or via an API:

Simple path-based selector as per this library, that says (by default) "this is the path to the root of the full DAG I want"
Full selector optionality using dag-json to represent the Selector so you can supply whatever you like
Text-based simplified selector when we figure that out

So what I have here is really just a reimplementation of what you can do with traversal/selector/parse now, except that I've asserted the need to have a {"selector":{...}} envelope to be crystal clear that you're communicating a selector.

One reason I thought the envelope might be a good idea here is that it gets us toward a point where the overlap between these selector-as-text forms is zero. So an additional method we could add here is SelectorSpecFromString() that first tries to run SelectorSpecFromJson() on the string and then if that fails, run SelectorSpecFromPath(). So you could potentially make an API that says --selector=... where you can supply the path-based root selector or the full dag-json selector, or some future text-based simplified selector that we make sure doesn't conflict with either of these two (although the path based one may end up just being a subset of this).

The ambiguity may be a bit on the nose, however, so the alternative would be to --selector and a paired --selector-format to be explicit about what the user is supplying.

Re filecoin-project/lotus#6393 I'm not (yet) suggesting this should be used there because it's only wanting to select a root and this gives flexibility to do more than that, but I am suggesting that maybe in the Lotus case we consider the possibility of extending input formats so (a) not tie the option name to "path" and (b) allow for a possible future second argument that lets you switch to alternative input forms.

Discuss.

warpfork · 2021-09-13T06:52:19Z

I am completely 100% on board with that identification of three different kinds of formats. I'll let somebody else paint the bikesheds of what all three of those are called, in CLI args or in REST APIs and so on, but those three things: there should be space for all of them, unambiguously. Yes.

rvagg · 2021-09-13T06:54:59Z

Additional background here is that I'm toying with ipfs dag export <root> --selector=... and am considering using both the path-based sub-root selection approach but also arbitrary selectors for blocks. Both that and filecoin-project/lotus#6393 are ultimately related and there are questions that go to both of them - like the question of "what if the selector hits a node inside a block, does that just mean the containing block is selected for inclusion or do we error and say that only whole blocks can be included?".

warpfork · 2021-09-13T06:55:58Z

I'm also 99.8% on board with having a top-level "envelope" section to the message saying {"selector":{...}}. It's a useful disambiguator, it's a useful sanity check, it's a useful document header so when people pass snippets around there's a clue what they are...

And on top of all that it gives us a space where we could pivot to {"selectorv2":{...}} in the far future (if necessary -- it probably won't be, anytime imaginably soon, because we also have a dozen other points for smooth extension and schema evolvability in the Selector spec, and have already exercised it gracefully several times without needing the equivalent of a "major" version bump).

Where helper functions I've recently added to the go-ipld-prime selector package don't currently demand this {"selector":{...}} envelope, I'd be 100% okay with making them do so. They probably should've in the first place.

ribasushi

This is great, love it! Left one nit for consideration but fine to ship as-is

Btw: the reason this thing was called lite originally is because of this. Fine to grow this package as well, just pointing out some background.

ribasushi · 2021-09-13T07:02:28Z

parser.go

+
+See https://ipld.io/specs/selectors/ for more information.
+*/
+func SelectorSpecFromJson(jsonStr string) (builder.SelectorSpec, error) {


Great stuff!
My only concern here is the asymmetry of both calls:

func SelectorSpecFromPath(path Expression, optionalSubselectorAtTarget builder.SelectorSpec)

vs

func SelectorSpecFromJson(jsonStr string) (builder.SelectorSpec, error)

Not sure if the machinery is even compatible here though... so perhaps worthwhile to add everything as-is

yeah, I had it the same but then realised I had no idea how to munge the two things together! It'd force you to figure out if the selector had a logical place to insert a subselector first and then do the stitching, which seems 😬 so best to leave it off?

@rvagg I was thinking more about this - there is something missing conceptually ( not the code, selector-space in general ).

In general selectors are never standalone: they require a "starting point" for the traversal. Also in general this starting-node is not described within the selector itself. Instead when you consume a selector you bring two pieces of knowledge to the table: where to start, and what to do

The way the "path" part of this library is used in filecoin today, is:

Tack on a recursor to the path, grab the entire thing given the start describe in the deal

Re-evaluate the selector over what we grabbed, find the new root

Give the new start + implicit recursor to the unixfs engine

Profit

The above workflow becomes impossible with a selector as expressed in this PR: there is no way to communicate what the "new start" would be, so you can not reify the result ( unixfs or whatever it is ). We touched on this higher up, but I didn't click on the implications, so jolting them down.

What this code can be used for is doing a retrieval only, without reification. This implies the user assembling a complete selector using your new code, and then assembling a new "start-point" path selector and applying to that to what ended up in their blockstore. Which... unergonomic, and a nightmare to put behind an API surface...

Ideally a selector, to be useful, ought to also be able to specify the entire "playbook" within itself...
@rvagg @willscott @warpfork what are your thoughts?

don't we want a starting point CID, and a selector from that CID?

@willscott ideally yes, today we don't have that. What we have is:

Starting point of complete dag ( root cid in label )

Path to the starting point of the sub-dag within that dag ( the selector-lite-path thing )

Implicit need to reify the subdag as a unixfs "thing"

@willscott another way to phrase this is "we lack ability to specify conditional recursion"

we can do this potentially in 2 iterations, right? first using a selector (or the selector-lite-path expression into a selector) to get the CID of the actual starting point.
Then a second request based on the actual starting point, and we'd hopefully have hamt-signalling-in-selectors to better select over unixfs. There's a MVP version of that signalling that should be able to be introduced without too much pain.

@willscott you certainly could, though if this is a paid retrieval doing the deed twice is... rough UX
In any case - until non-piece-root-cid retrieval works in lotus, this is all moot: what we merged was the only thing possible at the time.

for lotus nodes with the dagstore, the non-piece-root-cid retrieval is already working, right?

BigLep · 2021-10-29T06:42:30Z

I assume this is still needed. What are the next steps?

rvagg · 2022-05-23T10:10:55Z

Rebased to latest master. I've not included the new optionalSubselectorAtTarget in the JSON form because that seems unnecessary if you can specify the whole selector as JSON up front, including the additional subselector.

any objections to merging this as is?

ribasushi · 2022-05-23T11:50:49Z

No objections from my side.

rvagg requested review from ribasushi and warpfork September 13, 2021 06:49

rvagg force-pushed the rvagg/dag-json-inp branch from 97a4b7a to 2e858a0 Compare September 13, 2021 06:51

ribasushi approved these changes Sep 13, 2021

View reviewed changes

BigLep assigned rvagg Apr 19, 2022

Support dag-json selector parsing

a813a55

rvagg force-pushed the rvagg/dag-json-inp branch from 2e858a0 to a813a55 Compare May 23, 2022 10:09

rvagg merged commit 6084f43 into master May 23, 2022

rvagg deleted the rvagg/dag-json-inp branch May 23, 2022 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dag-json selector parsing #4

Support dag-json selector parsing #4

rvagg commented Sep 13, 2021

warpfork commented Sep 13, 2021

rvagg commented Sep 13, 2021

warpfork commented Sep 13, 2021

ribasushi left a comment

ribasushi Sep 13, 2021

rvagg Sep 13, 2021

ribasushi Sep 13, 2021

ribasushi Oct 7, 2021

willscott Nov 9, 2021

ribasushi Nov 9, 2021

ribasushi Nov 9, 2021

willscott Nov 9, 2021 •

edited

ribasushi Nov 9, 2021

willscott Nov 10, 2021 •

edited

BigLep commented Oct 29, 2021

rvagg commented May 23, 2022

ribasushi commented May 23, 2022

Support dag-json selector parsing #4

Support dag-json selector parsing #4

Conversation

rvagg commented Sep 13, 2021

warpfork commented Sep 13, 2021

rvagg commented Sep 13, 2021

warpfork commented Sep 13, 2021

ribasushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willscott Nov 9, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willscott Nov 10, 2021 • edited

Choose a reason for hiding this comment

BigLep commented Oct 29, 2021

rvagg commented May 23, 2022

ribasushi commented May 23, 2022

willscott Nov 9, 2021 •

edited

willscott Nov 10, 2021 •

edited