Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly Implement Retrieval Lookups Based on CIDs #57

Merged
merged 8 commits into from Jan 25, 2020

Conversation

hannahhoward
Copy link
Collaborator

Goals

Allow Retrieval deals to work with actual Payload CIDs

Implementation

-- add a second mapping to the PieceStore that translates from a CID to pieces its a part of
-- support unsealling pieces from CIDs as needed (theoretically a dag could span multiple pieces in the future where pieces are not whole dags -- not super important though)
-- add very preliminary writes of CID -> Piece mapping on storage provider side
--- IMPORTANT: Not included is mapping any CIDs in the IPLD graph other than the root to Piece, or recording where in the piece blocks actually live. This can definitely be done by recording this information as we write the CAR file. This is definitely temporary implementation meant to get us to working retrieval by payload CID fast, with allowing retrievals based off of CIDs deeper in the dag later.

@hannahhoward hannahhoward requested review from shannonwells and ingar and removed request for shannonwells January 24, 2020 01:48
seperate block infos to their own table -- mapping links with pieceCIDs
modify markets to use actual payload CIDs instead of treating payload and piece CIDs as equavalent
very simple implementation of storing a map between payload root and pieceCID. to be augmented later
with actual block refs
Copy link
Contributor

@shannonwells shannonwells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

primarily editorial comments and peanut-gallery stuff

if err := ps.store.Get(newKey(pieceCID)).Get(&out); err != nil {
return PieceInfo{}, err
func (ps *pieceStore) ensurePieceInfo(pieceCID []byte) error {
has, err := ps.pieces.Has(newKey(pieceCID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it canHasPieceInfo /jk

err := ps.AddDealForPiece(pieceCid, dealInfo)
assert.NoError(t, err)

pi, err := ps.GetPieceInfo(pieceCid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I should think you could skip the error/value checking on the first try since it's tested above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just follow past of least resistance to get no lint-stage errors

@@ -92,3 +87,22 @@ func (lu *loaderWithUnsealing) attemptUnseal() error {

return nil
}

func (lu *loaderWithUnsealing) firstSuccessfulUnseal(cidInfo piecestore.CIDInfo) (io.ReadCloser, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little fuzzy on this: If it's already unsealed do you expect the unsealer func not to return an error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the logic changed a bit here. theoretically since we're basing our queries on payload CIDs, I guess it's possible that a query could span multiple pieces now (if a piece was a partial but not complete DAG -- something we intend to support I believe).

The point of this function is: for a given Cid, find the first piece that it's contained in that we can successfully unseal.

since we unseal and read the whole piece, in theory this function should not get called unless the query we make somehow crosses a piece boundary, which again, I think is possible.

If unsealing fails completely the first time, it's probably alright cause the loader calling code will immediately error and stop the retrieval deal.

@@ -82,12 +82,22 @@ func requireSetupTestClientAndProvider(bgCtx context.Context, t *testing.T, payC
providerNode := testnodes.NewTestRetrievalProviderNode()
pieceStore := tut.NewTestPieceStore()
expectedCIDs := tut.GenerateCids(3)
missingPiece := tut.GenerateCids(1)[0]
expectedPieces := [][]byte{[]byte("applesuace"), []byte("jam"), []byte("apricot")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how you resisted using "pork chops" after "applesauce"

sectorID, offset, length, err := p.spn.LocatePieceForDealWithinSector(ctx, deal.DealID)
if err != nil {
return nil, err
}
// TODO: Record actual block locations for all CIDs in piece by improving car writing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHould this be captured in an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will be soon, that's the intent.

@codecov-io
Copy link

codecov-io commented Jan 24, 2020

Codecov Report

Merging #57 into master will increase coverage by 0.51%.
The diff coverage is 57.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #57      +/-   ##
==========================================
+ Coverage   34.12%   34.62%   +0.51%     
==========================================
  Files          36       36              
  Lines        2615     2713      +98     
==========================================
+ Hits          892      939      +47     
- Misses       1588     1617      +29     
- Partials      135      157      +22
Impacted Files Coverage Δ
storagemarket/impl/provider_states.go 0% <0%> (ø) ⬆️
...ievalmarket/impl/providerstates/provider_states.go 94.32% <100%> (ø) ⬆️
...rievalmarket/impl/blockunsealing/blockunsealing.go 85.37% <100%> (-2.13%) ⬇️
piecestore/types_cbor_gen.go 28.58% <27.96%> (-0.87%) ⬇️
piecestore/piecestore.go 83.88% <83.68%> (-2.49%) ⬇️
retrievalmarket/impl/provider.go 75.19% <96.16%> (+2.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1aa65e2...582fa86. Read the comment docs.

Copy link
Contributor

@ingar ingar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far.

}
}

type pieceStore struct {
store *statestore.StateStore
pieces *statestore.StateStore
cidInfos *statestore.StateStore
}

func (ps *pieceStore) AddDealForPiece(pieceCID []byte, dealInfo DealInfo) error {
// Do we need to de-dupe or anything here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess we can remove this comment now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

sectorID, offset, length, err := p.spn.LocatePieceForDealWithinSector(ctx, deal.DealID)
if err != nil {
return nil, err
}
// TODO: Record actual block locations for all CIDs in piece by improving car writing
err = p.pieceStore.AddPieceBlockLocations(deal.Proposal.PieceRef, map[cid.Cid]piecestore.BlockLocation{
deal.Ref: {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is needed to store the root CID location here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even for the root, we need to figure out where it sits within the CAR file, so it will be addressed at the same time we address the todo above

@ingar
Copy link
Contributor

ingar commented Jan 24, 2020

The idea is we can store an index to arbitrary blocks in the IPLD graph of a deal payload, because a CAR file is just blocks that are concatenated together?

@hannahhoward
Copy link
Collaborator Author

@ingar -- basically yes -- when we write the car, we can basically record for any CID:

  • the piece its in
    • the relative offset within the piece
    • the size of the block
      And then when we combine that with pieces mapping, we can effectively get, for any CID:
  • a sector it's in
  • an offset within the sector
  • a length of bytes within the sector.

And theoretically we might start partial unsealling even pieces based on that, except that requires unsealing to be super fast which... I'm not sure where things are at with that.

@hannahhoward hannahhoward merged commit 2249baf into master Jan 25, 2020
@hannahhoward hannahhoward deleted the feat/implement-payload-CIDs branch April 30, 2020 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants