Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dag-json selector parsing #4

Merged
merged 1 commit into from May 23, 2022
Merged

Support dag-json selector parsing #4

merged 1 commit into from May 23, 2022

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Sep 13, 2021

Riffing on filecoin-project/lotus#6393, I'd like to be able to parse the full range of selectors and not just those we can represent with paths and not having to wait until we have finalised our text based form.

My thought here is that there could (should?) eventually be 3 ways to supply a selector on a command-line or via an API:

  1. Simple path-based selector as per this library, that says (by default) "this is the path to the root of the full DAG I want"
  2. Full selector optionality using dag-json to represent the Selector so you can supply whatever you like
  3. Text-based simplified selector when we figure that out

So what I have here is really just a reimplementation of what you can do with traversal/selector/parse now, except that I've asserted the need to have a {"selector":{...}} envelope to be crystal clear that you're communicating a selector.

One reason I thought the envelope might be a good idea here is that it gets us toward a point where the overlap between these selector-as-text forms is zero. So an additional method we could add here is SelectorSpecFromString() that first tries to run SelectorSpecFromJson() on the string and then if that fails, run SelectorSpecFromPath(). So you could potentially make an API that says --selector=... where you can supply the path-based root selector or the full dag-json selector, or some future text-based simplified selector that we make sure doesn't conflict with either of these two (although the path based one may end up just being a subset of this).

The ambiguity may be a bit on the nose, however, so the alternative would be to --selector and a paired --selector-format to be explicit about what the user is supplying.

Re filecoin-project/lotus#6393 I'm not (yet) suggesting this should be used there because it's only wanting to select a root and this gives flexibility to do more than that, but I am suggesting that maybe in the Lotus case we consider the possibility of extending input formats so (a) not tie the option name to "path" and (b) allow for a possible future second argument that lets you switch to alternative input forms.

Discuss.

@warpfork
Copy link

I am completely 100% on board with that identification of three different kinds of formats. I'll let somebody else paint the bikesheds of what all three of those are called, in CLI args or in REST APIs and so on, but those three things: there should be space for all of them, unambiguously. Yes.

@rvagg
Copy link
Member Author

rvagg commented Sep 13, 2021

Additional background here is that I'm toying with ipfs dag export <root> --selector=... and am considering using both the path-based sub-root selection approach but also arbitrary selectors for blocks. Both that and filecoin-project/lotus#6393 are ultimately related and there are questions that go to both of them - like the question of "what if the selector hits a node inside a block, does that just mean the containing block is selected for inclusion or do we error and say that only whole blocks can be included?".

@warpfork
Copy link

I'm also 99.8% on board with having a top-level "envelope" section to the message saying {"selector":{...}}. It's a useful disambiguator, it's a useful sanity check, it's a useful document header so when people pass snippets around there's a clue what they are...

And on top of all that it gives us a space where we could pivot to {"selectorv2":{...}} in the far future (if necessary -- it probably won't be, anytime imaginably soon, because we also have a dozen other points for smooth extension and schema evolvability in the Selector spec, and have already exercised it gracefully several times without needing the equivalent of a "major" version bump).

Where helper functions I've recently added to the go-ipld-prime selector package don't currently demand this {"selector":{...}} envelope, I'd be 100% okay with making them do so. They probably should've in the first place.

Copy link
Collaborator

@ribasushi ribasushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, love it! Left one nit for consideration but fine to ship as-is

Btw: the reason this thing was called lite originally is because of this. Fine to grow this package as well, just pointing out some background.


See https://ipld.io/specs/selectors/ for more information.
*/
func SelectorSpecFromJson(jsonStr string) (builder.SelectorSpec, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!
My only concern here is the asymmetry of both calls:

func SelectorSpecFromPath(path Expression, optionalSubselectorAtTarget builder.SelectorSpec)

vs

func SelectorSpecFromJson(jsonStr string) (builder.SelectorSpec, error)

Not sure if the machinery is even compatible here though... so perhaps worthwhile to add everything as-is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I had it the same but then realised I had no idea how to munge the two things together! It'd force you to figure out if the selector had a logical place to insert a subselector first and then do the stitching, which seems 😬 so best to leave it off?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg I was thinking more about this - there is something missing conceptually ( not the code, selector-space in general ).

In general selectors are never standalone: they require a "starting point" for the traversal. Also in general this starting-node is not described within the selector itself. Instead when you consume a selector you bring two pieces of knowledge to the table: where to start, and what to do

The way the "path" part of this library is used in filecoin today, is:

The above workflow becomes impossible with a selector as expressed in this PR: there is no way to communicate what the "new start" would be, so you can not reify the result ( unixfs or whatever it is ). We touched on this higher up, but I didn't click on the implications, so jolting them down.

What this code can be used for is doing a retrieval only, without reification. This implies the user assembling a complete selector using your new code, and then assembling a new "start-point" path selector and applying to that to what ended up in their blockstore. Which... unergonomic, and a nightmare to put behind an API surface...

Ideally a selector, to be useful, ought to also be able to specify the entire "playbook" within itself...
@rvagg @willscott @warpfork what are your thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want a starting point CID, and a selector from that CID?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott ideally yes, today we don't have that. What we have is:

  • Starting point of complete dag ( root cid in label )
  • Path to the starting point of the sub-dag within that dag ( the selector-lite-path thing )
  • Implicit need to reify the subdag as a unixfs "thing"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott another way to phrase this is "we lack ability to specify conditional recursion"

Copy link
Member

@willscott willscott Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do this potentially in 2 iterations, right? first using a selector (or the selector-lite-path expression into a selector) to get the CID of the actual starting point.
Then a second request based on the actual starting point, and we'd hopefully have hamt-signalling-in-selectors to better select over unixfs. There's a MVP version of that signalling that should be able to be introduced without too much pain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott you certainly could, though if this is a paid retrieval doing the deed twice is... rough UX
In any case - until non-piece-root-cid retrieval works in lotus, this is all moot: what we merged was the only thing possible at the time.

Copy link
Member

@willscott willscott Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for lotus nodes with the dagstore, the non-piece-root-cid retrieval is already working, right?

@BigLep
Copy link

BigLep commented Oct 29, 2021

I assume this is still needed. What are the next steps?

@rvagg
Copy link
Member Author

rvagg commented May 23, 2022

Rebased to latest master. I've not included the new optionalSubselectorAtTarget in the JSON form because that seems unnecessary if you can specify the whole selector as JSON up front, including the additional subselector.

any objections to merging this as is?

@ribasushi
Copy link
Collaborator

No objections from my side.

@rvagg rvagg merged commit 6084f43 into master May 23, 2022
@rvagg rvagg deleted the rvagg/dag-json-inp branch May 23, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🎉 Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants