Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP CAR validation #222

Merged
merged 7 commits into from
May 11, 2023
Merged

HTTP CAR validation #222

merged 7 commits into from
May 11, 2023

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 10, 2023

This is pretty close to final. It needs better test coverage for all the failure cases, but you can already see the failure adjustments in httpretriever_test.go to adapt to the new strictness.

Potentially pending further discussion @ ipfs/specs#402 (CARv1/CARv2, roots, etc. isn't resolved).

We discussed what to do with this after we get it right. I think it should either go into go-car (v2/verified?) or in a separate module (ipld/go-verifiedcar?). I think it feels pretty CAR-focused so am leaning toward the former for now.

There may be additional details. We could also consider putting the CarScope types into this package and the selector derivation stuff that's currently in RetrievalRequest#GetSelector, because that's going to be common across users of this code.

@codecov-commenter
Copy link

codecov-commenter commented May 10, 2023

Codecov Report

Merging #222 (4a91087) into feat/http (6810ff2) will increase coverage by 0.51%.
The diff coverage is 88.06%.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##           feat/http     #222      +/-   ##
=============================================
+ Coverage      70.66%   71.17%   +0.51%     
=============================================
  Files             64       65       +1     
  Lines           5543     5690     +147     
=============================================
+ Hits            3917     4050     +133     
- Misses          1413     1422       +9     
- Partials         213      218       +5     
Impacted Files Coverage Δ
pkg/verifiedcar/verifiedcar.go 80.18% <80.18%> (ø)
pkg/internal/itest/unixfs/directory.go 91.50% <100.00%> (+4.55%) ⬆️
pkg/internal/itest/unixfs/generator.go 91.04% <100.00%> (+0.34%) ⬆️
...kg/internal/testutil/collectingeventlsubscriber.go 52.88% <100.00%> (ø)
pkg/retriever/httpretriever.go 86.08% <100.00%> (+5.91%) ⬆️

... and 2 files with indirect coverage changes

@rvagg
Copy link
Member Author

rvagg commented May 10, 2023

Promising (cc @olizilla):

$ ./lassie fetch --provider /dns4/freeway-staging.dag.haus/tcp/80/http/p2p/QmcCtpf7ERQWyvDT8RMYWCMjzE74b7HscB3F8gDp5d5yS6 bafybeihj3ji7jaoaojf4bijpr63glep4ou7wgafhg6gvug4ney6pr7u5su
Fetching bafybeihj3ji7jaoaojf4bijpr63glep4ou7wgafhg6gvug4ney6pr7u5su from [{QmcCtpf7ERQWyvDT8RMYWCMjzE74b7HscB3F8gDp5d5yS6: [/dns4/freeway-staging.dag.haus/tcp/80/http]}]............
Fetched [bafybeihj3ji7jaoaojf4bijpr63glep4ou7wgafhg6gvug4ney6pr7u5su] from [QmcCtpf7ERQWyvDT8RMYWCMjzE74b7HscB3F8gDp5d5yS6]:
        Duration: 3.780431007s
          Blocks: 12
           Bytes: 3.2 MiB

and boost iirc:

$ ./lassie fetch --provider /ip4/209.94.92.6/tcp/7777/http/p2p/QmcCtpf7ERQWyvDT8RMYWCMjzE74b7HscB3F8gDp5d5yS6 bafybeihptqlehg2slg2c74asl5fsjwplys5p2bnnyperoeg55ty7ne3rbm
Fetching bafybeihptqlehg2slg2c74asl5fsjwplys5p2bnnyperoeg55ty7ne3rbm from [{QmcCtpf7ERQWyvDT8RMYWCMjzE74b7HscB3F8gDp5d5yS6: [/ip4/209.94.92.6/tcp/7777/http]}]............................

...

Fetched [bafybeihptqlehg2slg2c74asl5fsjwplys5p2bnnyperoeg55ty7ne3rbm] from [QmcCtpf7ERQWyvDT8RMYWCMjzE74b7HscB3F8gDp5d5yS6]:
        Duration: 3m39.569461496s
          Blocks: 8463
           Bytes: 11 MiB

@rvagg rvagg marked this pull request as ready for review May 10, 2023 11:30
@rvagg
Copy link
Member Author

rvagg commented May 10, 2023

Marking this as ready for review because I fleshed out the tests with what I want to cover, for now.

There is one potential blocker though, there's a flake, a test called unixfs: all of large directory with file scope, errors. It sends all of a non-sharded directory, asks for a unixfs-preload match ., and should error after getting the first block and finding that there's more. But the flaky error it sometimes ends up with suggests that the block it wants isn't the first. I've put a TODO in there with my research so far on why it's flaky.

It could be a flaw in my fixture setup logic, but I'm a little concerned there's deeper stack problems here, maybe with unixfsnode. We also have occasional flakes in the integration tests that could be related, as might #185.

I'll spend some more time this week seeing if I can work that out, we could make a call that it's not a blocker because it mostly works and this is a special edge that we won't encounter the majority of the time. Maybe not a P0, but pretty close if this isn't just bad test data.

@rvagg
Copy link
Member Author

rvagg commented May 10, 2023

TestVerifiedCar/unixfs:_pathed_subset_inside_large_directory_with_file_scope,_errors failing in CI with the same issue: https://github.com/filecoin-project/lassie/actions/runs/4936467401/jobs/8824025114?pr=222

Comment on lines 203 to 206
if !t.first {
t.first = true
t.cb()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will file too early, right? you want the cb() to fire after t.r.Read() has returned, but before you return from this method, i think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @willscott here. Read will get called almost immediately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !t.first {
t.first = true
t.cb()
}
if !t.first {
t.first = true
defer t.cb()
}

this should be a reasonable fix, no?

pkg/verifiedcar/verifiedcar.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested refactor in #226

Also I think @willscott 's first byte concern is pertinent.

data []byte
}

func visitNoop(p traversal.Progress, n datamodel.Node, r traversal.VisitReason) error { return nil }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this part of ipld prime :)

pkg/verifiedcar/verifiedcar.go Outdated Show resolved Hide resolved
Comment on lines 203 to 206
if !t.first {
t.first = true
t.cb()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @willscott here. Read will get called almost immediately.

Co-authored-by: Rod Vagg <rod@vagg.org>
@rvagg rvagg requested a review from hannahhoward May 11, 2023 04:59
@rvagg
Copy link
Member Author

rvagg commented May 11, 2023

Fixed the flaky test(s), it was indeed test data, in two ways. There's enough random chance that you end up with a directory structure of a particular form that it fails. One of the biggest no-nos that I built in to the fixture generation was having the in-memory representation of a directory structure (the testdata.DirEntry stuff) not match a round-trip encoded one because of sorting. So making assumptions on ordering (children[0]) isn't safe because what's first now may not be first after a round-trip through the dag-pb encoder which sorts based on Name field.

Good to go now I think!

@hannahhoward hannahhoward merged commit c7c296e into feat/http May 11, 2023
rvagg added a commit that referenced this pull request May 12, 2023
* feat(verifiedcar): initial verifiedcar package

* feat(verifiedcar): verify http retrievals

* chore(verifiedcar): tests for basic error cases

* fix(verifiedcar): coverage of more cases, handle known edges properly

* fix(verifiedcar): remove extraneous go-routine (#226)

Co-authored-by: Rod Vagg <rod@vagg.org>

* fix(verifiedcar): address feedback

* fix(verifiedcar): fix flaky tests

---------

Co-authored-by: Hannah Howard <hannah@hannahhoward.net>
hannahhoward added a commit that referenced this pull request May 12, 2023
* First pass at adapting graphsync to http

* fix(http): gracefully handle selector vs path requests

* feat(http): extend graphsyncretriever so it can also do http

* feat(http): refactor http & graphsync specific pieces to "TransportProtocol" iface

* feat(prioritywaitqueue): add InitialPauseDone inspector

* feat(http): single peer http retrieval unit test

* fix(http): enable http everywhere gs & bs are

* chore(http): framework for suite of http unit tests

based on bitswap unit test framework

* feat(http): remove parallel-request flow, make all serial for now

* too-detailed http testing, will remove this in favour of graphsync
  style testing.

* fix(http): better testing framework

* fix(http): more test coverage, minor fixes

* fix(http): clean up time handling in tests

* HTTP CAR validation (#222)

* feat(verifiedcar): initial verifiedcar package

* feat(verifiedcar): verify http retrievals

* chore(verifiedcar): tests for basic error cases

* fix(verifiedcar): coverage of more cases, handle known edges properly

* fix(verifiedcar): remove extraneous go-routine (#226)

Co-authored-by: Rod Vagg <rod@vagg.org>

* fix(verifiedcar): address feedback

* fix(verifiedcar): fix flaky tests

---------

Co-authored-by: Hannah Howard <hannah@hannahhoward.net>

* fix(http): refactor MockRoundTripper (#229)

* Add HTTP integegration tests (#227)

* test: add itests for http

* test: add peer http server, minor refactors and fixes

* fix(itest): fix compile errors on rebase

---------

Co-authored-by: Rod Vagg <rod@vagg.org>
Co-authored-by: Hannah Howard <hannah@hannahhoward.net>
Co-authored-by: Kyle Huntsman <3432646+kylehuntsman@users.noreply.github.com>
@kylehuntsman kylehuntsman deleted the rvagg/http-validation branch May 25, 2023 01:25
@kylehuntsman kylehuntsman restored the rvagg/http-validation branch May 25, 2023 01:25
@kylehuntsman kylehuntsman deleted the rvagg/http-validation branch May 25, 2023 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants