Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traverse path locally when possible #75

Closed
wants to merge 7 commits into from

Conversation

aschmahmann
Copy link
Contributor

Resolves most of part 3 in #62.

Note: I haven't had time to exercise this yet so might have issues

Comment on lines +236 to +287
// TODO: This only deals with repeated pathing
// Do we want to deal parameters as well since they might mean walking a lot of data twice? Alternatively, this could wait until we can rework the GraphGateway logic to not rely on the BlocksGateway traversal.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling out that this will not handle the case where the data is fully local, the optimization only checks path locality in order to avoid doing the IPLD traversal twice.

We could certainly do it twice though in order to save bandwidth costs. This likely depends on how much data we think should go into the block cache and what's likely to be there.

Certainly we should consider doing this once we're further down the pipeline of changes (i.e. proper incremental verification), but could do it now too if that's of interest.

Copy link
Member

@lidel lidel Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann probably too late for me to reason about traversal, do you have an example how "fully local" case would that work for loading sub-article from /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki ?
(fysa I've increased block cache to 16384, and we have still plenty of room – #47 (comment), so if we can save bandwidth by growing this cache, i think its worth exploring)

I think we need to pass params to fetcher.Fetch when fetching remainingPathToGet.

Reason 1: without ?depth= we will be harcore over-fetching the entire wikipedia

2023-04-05T01:13:01.065+0200	DEBUG	caboose	caboose@v0.0.0-20230331212405-5b42e9465c12/fetcher.go:75	doing fetch	{"from": "64.44.166.5", "of": "/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki", "mime": "application/vnd.ipld.car", "requestId": "efb63fd6-12cd-4417-99bb-fd9106d0aa96"}

Reason 2: difficult to debug, as we are missing info if the failed request was ?format=car or raw (we can fix this by adding mime to error logs in caboose, but i think if we have params, we should also have explicit format one):

2023-04-05T01:13:03.040+0200	DEBUG	caboose	caboose@v0.0.0-20230331212405-5b42e9465c12/fetcher.go:122	fetch result	{"from": "64.44.166.5", "of": "/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki", "status": 200, "size": 203846, "ttfb": 0, "duration": 1.975232799, "attempt": 0, "error": "context canceled"}
2023-04-05T01:13:03.040+0200	DEBUG	caboose	caboose@v0.0.0-20230331212405-5b42e9465c12/pool.go:505	fetch attempt failed	{"from": "64.44.166.5", "attempt": 0, "of": "/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki", "error": "context canceled"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch that was a refactor bug., I meant it when I said I hadn't really tested yet 😅. It's a bug that the fetcher isn't using params. This comment was about whether the tryLoad should also do params. Right now it doesn't, but it's pretty doable to add it back in as well if we need it. WDYT?

The former is a pretty trivial fix, the later is a little more time. Not sure how much I'll be able to do before IPFS Thing, but will update if I can.

Copy link
Member

@lidel lidel Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if we aim at different things here, but in my mind the main value of tryLoad is to avoid fetching parents over and over again, and move root CID at which we start Fetch to the right of the path.

With that framing, we don't need to pass the same params as the final Fetch, we could always use bytes=0:0 or depth=1 here (no matter what actual params would be used for nial Fetch), no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryLoad is trying to reduce the amount of data we request that we already have. It's most notable/obvious for paths, but is true for things like when an entire file is stored in memory already.

Doing parents only (i.e. depth=0) happens to be easier than also dealing with the terminal element/params (e.g. byte range, depth=1, depth=all). Seems like we should deal with the others too though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most important optimization is to skip re-fetching parents we already have (biggest pain for websites and ranges over videos).

Could we land that first, and deal with the terminal element later, or is that the same amount of work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope parents is easier (it's already done). Will push the fixes shortly for the bugs you've found

@lidel lidel mentioned this pull request Apr 4, 2023
13 tasks
@aarshkshah1992 aarshkshah1992 mentioned this pull request Apr 5, 2023
7 tasks
Comment on lines +342 to +432
if !errors.Is(err, format.ErrNotFound{}) {
graphLog.Error(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opening http://en.wikipedia-on-ipfs.org.ipns.localhost:8081/wiki/Belize/ produces:

backend/graph	lib/graph_gateway.go:343	adl requested but not supported by link system: "unixfs"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ... lol thanks. This is a mixing of all the tooling and coming out badly. The UnixFSPathSelector inserts requests for ADLs, but the fetcher thing uses a default reifier. Should just drop the UnixFSPathSelector and build one largely copied from boxo/path/resolver so we're only sticking to one frankenstein'd version at a time. We could also switch to using the named ADLs, but doesn't seem worth it given this is all too hacked together at the moment anyway.

@aschmahmann aschmahmann force-pushed the feat/graph-gateway-local-path-loads branch from a4adacb to 9edb28f Compare May 2, 2023 20:04
@aschmahmann
Copy link
Contributor Author

Closing as this is no longer needed now that we have backpressure #160 and no block storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants