New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dag-json selector parsing #4
Conversation
97a4b7a
to
2e858a0
Compare
I am completely 100% on board with that identification of three different kinds of formats. I'll let somebody else paint the bikesheds of what all three of those are called, in CLI args or in REST APIs and so on, but those three things: there should be space for all of them, unambiguously. Yes. |
Additional background here is that I'm toying with |
I'm also 99.8% on board with having a top-level "envelope" section to the message saying And on top of all that it gives us a space where we could pivot to Where helper functions I've recently added to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, love it! Left one nit for consideration but fine to ship as-is
Btw: the reason this thing was called lite
originally is because of this. Fine to grow this package as well, just pointing out some background.
|
||
See https://ipld.io/specs/selectors/ for more information. | ||
*/ | ||
func SelectorSpecFromJson(jsonStr string) (builder.SelectorSpec, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
My only concern here is the asymmetry of both calls:
func SelectorSpecFromPath(path Expression, optionalSubselectorAtTarget builder.SelectorSpec)
vs
func SelectorSpecFromJson(jsonStr string) (builder.SelectorSpec, error)
Not sure if the machinery is even compatible here though... so perhaps worthwhile to add everything as-is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I had it the same but then realised I had no idea how to munge the two things together! It'd force you to figure out if the selector had a logical place to insert a subselector first and then do the stitching, which seems 😬 so best to leave it off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvagg I was thinking more about this - there is something missing conceptually ( not the code, selector-space in general ).
In general selectors are never standalone: they require a "starting point" for the traversal. Also in general this starting-node is not described within the selector itself. Instead when you consume a selector you bring two pieces of knowledge to the table: where to start, and what to do
The way the "path" part of this library is used in filecoin today, is:
- Tack on a recursor to the path, grab the entire thing given the start describe in the deal
- Re-evaluate the selector over what we grabbed, find the new root
- Give the new start + implicit recursor to the unixfs engine
- Profit
The above workflow becomes impossible with a selector as expressed in this PR: there is no way to communicate what the "new start" would be, so you can not reify the result ( unixfs or whatever it is ). We touched on this higher up, but I didn't click on the implications, so jolting them down.
What this code can be used for is doing a retrieval only, without reification. This implies the user assembling a complete selector using your new code, and then assembling a new "start-point" path selector and applying to that to what ended up in their blockstore. Which... unergonomic, and a nightmare to put behind an API surface...
Ideally a selector, to be useful, ought to also be able to specify the entire "playbook" within itself...
@rvagg @willscott @warpfork what are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want a starting point CID, and a selector from that CID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott ideally yes, today we don't have that. What we have is:
- Starting point of complete dag ( root cid in
label
) - Path to the starting point of the sub-dag within that dag ( the selector-lite-path thing )
- Implicit need to reify the subdag as a unixfs "thing"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott another way to phrase this is "we lack ability to specify conditional recursion"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do this potentially in 2 iterations, right? first using a selector (or the selector-lite-path expression into a selector) to get the CID of the actual starting point.
Then a second request based on the actual starting point, and we'd hopefully have hamt-signalling-in-selectors to better select over unixfs. There's a MVP version of that signalling that should be able to be introduced without too much pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott you certainly could, though if this is a paid retrieval doing the deed twice is... rough UX
In any case - until non-piece-root-cid retrieval works in lotus, this is all moot: what we merged was the only thing possible at the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for lotus nodes with the dagstore, the non-piece-root-cid retrieval is already working, right?
I assume this is still needed. What are the next steps? |
Rebased to latest any objections to merging this as is? |
No objections from my side. |
Riffing on filecoin-project/lotus#6393, I'd like to be able to parse the full range of selectors and not just those we can represent with paths and not having to wait until we have finalised our text based form.
My thought here is that there could (should?) eventually be 3 ways to supply a selector on a command-line or via an API:
Selector
so you can supply whatever you likeSo what I have here is really just a reimplementation of what you can do with traversal/selector/parse now, except that I've asserted the need to have a
{"selector":{...}}
envelope to be crystal clear that you're communicating a selector.One reason I thought the envelope might be a good idea here is that it gets us toward a point where the overlap between these selector-as-text forms is zero. So an additional method we could add here is
SelectorSpecFromString()
that first tries to runSelectorSpecFromJson()
on the string and then if that fails, runSelectorSpecFromPath()
. So you could potentially make an API that says--selector=...
where you can supply the path-based root selector or the full dag-json selector, or some future text-based simplified selector that we make sure doesn't conflict with either of these two (although the path based one may end up just being a subset of this).The ambiguity may be a bit on the nose, however, so the alternative would be to
--selector
and a paired--selector-format
to be explicit about what the user is supplying.Re filecoin-project/lotus#6393 I'm not (yet) suggesting this should be used there because it's only wanting to select a root and this gives flexibility to do more than that, but I am suggesting that maybe in the Lotus case we consider the possibility of extending input formats so (a) not tie the option name to "path" and (b) allow for a possible future second argument that lets you switch to alternative input forms.
Discuss.