Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: TimeTraveling (History Traversing) query engine and doc fetcher #59

Merged
merged 29 commits into from Feb 4, 2022

Conversation

jsimnz
Copy link
Member

@jsimnz jsimnz commented Nov 24, 2021

Note: Needs rebasing onto the most recent work as this branch is pretty out of date, just wanted to get a PR on the books.

resolves #58

Architecture

This PR implements a new kind of DocFetcher called VersionedFetcher which loads the given document update-graph into an ephemeral or transient datastore in-memory that exists only for the lifetime of the query execution.

This datastore loads the blocks from the target version back to the genesis version and then re-serializes the target state from v0 to vX.

The primary goal here was to re-use as much of the current Fetcher and Query system without modification to the execution flow of a regular query vs a time-traveling query.

This is why the implementation is scoped only to the VersionedFetcher and that output of the VersionedFetcher matches the structure of the datastores as if it were a regular query.

@todo
Copy link

todo bot commented Nov 24, 2021

New system reserved indexes are NEGATIVE. Maybe we need a map here

Indexes []IndexDescription // @todo: New system reserved indexes are NEGATIVE. Maybe we need a map here
}
// IDString returns the collection ID as a string


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

Support multiple spans

// Prefix: df.spans[0].Start().String(), // @todo: Support multiple spans
// }
// if df.reverse {
// q.Orders = []dsq.Order{dsq.OrderByKeyDescending{}}
// } else {
// q.Orders = []dsq.Order{dsq.OrderByKey{}}


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@jsimnz jsimnz marked this pull request as draft November 24, 2021 21:23
@todo
Copy link

todo bot commented Nov 24, 2021

Support multiple spans

Prefix: df.spans[0].Start().String(), // @todo: Support multiple spans
}
if df.reverse {
q.Orders = []dsq.Order{dsq.OrderByKeyDescending{}}


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

Find an effecient way to determine if a CID is a member of a

// @todo: Find an effecient way to determine if a CID is a member of a
// DocKey State graph
// @body: We could possibly append the DocKey to the CID either as a
// child key, or an instance on the CID key.
hasLocalBlock, err := vf.store.DAGstore().Has(c)


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

set slice size

// subDAGLinks := make([]cid.Cid, 0) // @todo: set slice size
l, err := nd.GetNodeLink(core.HEAD)
// ErrLinkNotFound is fine, it just means we have no more head links
if err != nil && err != dag.ErrLinkNotFound {
return err
}


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

Right now we ONLY handle LWW_REGISTER, need to swith on this and get CType from descriptions

// @todo: Right now we ONLY handle LWW_REGISTER, need to swith on this and get CType from descriptions
if err := vf.processNode(fieldID, subNd, core.LWW_REGISTER, l.Name); err != nil {
return err
}
}


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

Update with document schemas

// @todo: Update with document schemas
func (doc *Document) setAndParseType(field string, value interface{}) error {
switch value.(type) {
// int (any number)
case int:


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

Remove NodeGed Wtter as a paramter, and move it to a MerkleClock field

// @todo Remove NodeGed Wtter as a paramter, and move it to a MerkleClock field
_, err = mc.ProcessNode(
&crdtNodeGetter{deltaExtractor: mc.crdt.DeltaDecode},
nd.Cid(),


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

FIX THIS

// error, lets panic for now. TODO: FIX THIS
panic("invalid index identifier for Merkle CRDT")
}
clk := clock.NewMerkleClock(headstore, dagstore, headsetKey.String(), reg)
// newBaseMerkleCRDT(clock, register)


This comment was generated by todo based on a TODO comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

Abstract schema definitions to CORE

Currently schema definitions are stored in db/base/descriptions


// @todo: Abstract schema definitions to CORE
// @body: Currently schema definitions are stored in db/base/descriptions
// which is suppose to be reserved for implementation specific data.
// However we need to have some reference of schema here in the MerkleCRDT
// system, which is a protocol design, and shouldn't rely on implementation
// specific utilities.


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

handle error if no CID is given

} // @todo: handle error if no CID is given
return &commitSelectNode{
p: p,


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@todo
Copy link

todo bot commented Nov 24, 2021

When running the optimizer, check if the filter object

// @todo: When running the optimizer, check if the filter object
// contains a _key equality condition, and upgrade it to a point lookup
// instead of a prefix scan + filter via the Primary Index (0), like here:
dockeyIndexKey := base.MakeIndexKey(&sourcePlan.info.collectionDescription,
&sourcePlan.info.collectionDescription.Indexes[0], core.NewKey(parsed.DocKey))
spans := core.Spans{core.NewSpan(dockeyIndexKey, core.Key{})}


This comment was generated by todo based on a todo comment in 53b61ea in #59. cc @sourcenetwork.

@jsimnz
Copy link
Member Author

jsimnz commented Nov 24, 2021

Also includes some extra explainer comments that arose during the code-tour onboarding video sessions.

@jsimnz jsimnz linked an issue Nov 24, 2021 that may be closed by this pull request
@jsimnz jsimnz added this to the DefraDB v0.2 milestone Nov 24, 2021
@jsimnz jsimnz added area/query Related to the query component feature New feature or request labels Nov 24, 2021
@AndrewSisley
Copy link
Contributor

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

@jsimnz
Copy link
Member Author

jsimnz commented Nov 25, 2021

@AndrewSisley im not seeing any comments :/

@jsimnz jsimnz marked this pull request as ready for review November 25, 2021 02:09
@jsimnz jsimnz force-pushed the jsimnz/feat/history-traverser branch from 53b61ea to 1025a9e Compare November 25, 2021 11:10
@todo
Copy link

todo bot commented Nov 25, 2021

parse groupby

// @todo: parse groupby
}
// if theres no field selections, just return


This comment was generated by todo based on a todo comment in 1025a9e in #59. cc @sourcenetwork.

@jsimnz jsimnz force-pushed the jsimnz/feat/history-traverser branch from 1025a9e to d98da05 Compare November 25, 2021 11:17
@jsimnz
Copy link
Member Author

jsimnz commented Nov 25, 2021

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet.
Lmk

@AndrewSisley
Copy link
Contributor

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet. Lmk

Ahh okay - didnt realize it behaved like that - let me know if you have a reference when reviewing across two days - e.g. submit on the second day, or submit twice so you get stuff early, or never actually open the review

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and makes sense - quite a lot of fun functionality for fairly straight forward code :) I've added quite a few mostly minor comments that would good to be addressed though - most fall under:

  • Dead code removal
  • Comment removal
  • Tests
  • Soft-conflicts with other recent changes/findings that wont be flagged by git

It is also making me really want to merge the key changes as we seem to be building up more and more string magic and it'll keep getting harder and harder to maintain that branch until it goes in...

Also sorry for any comments that were addressed by your latest changes, I think you might have resolved a few already

core/txn.go Outdated Show resolved Hide resolved
core/store.go Outdated Show resolved Hide resolved
// CollectionDescription describes a Collection and
// all its associated metadata
type CollectionDescription struct {
Name string
ID uint32
Schema SchemaDescription
Indexes []IndexDescription
Indexes []IndexDescription // @todo: New system reserved indexes are NEGATIVE. Maybe we need a map here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest that this quite a nasty bug waiting to happen - might be worth double checking that an issue is created on merge by the bot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this one? the negative stuff is only in relation to the Index identifiers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment suggests this could be accessed by index? Indexes[x] where x might now be negative?

@@ -102,6 +142,15 @@ func (sd SchemaDescription) IsEmpty() bool {
return false
}

func (sd SchemaDescription) GetFieldKey(fieldName string) uint32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this can be private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently its only used within the base package, but I can see its useful for future work outside of it. So might as well keep it in for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, duplicate as per below comment, will remove in favor of collection implementation

db/base/descriptions.go Show resolved Hide resolved
// Returns true, if there is a result,
// and false otherwise.
func (n *versionedScanNode) Next() (bool, error) {
if !n.scanInitialized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't spot why this is needed, and it is a bit ugly checking a near-constant value for every value (as well suggesting perhaps falsely that it is needed and that there is some non-obvious route into this function). Would suggest removal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is technically required, but only because I realized there is a bug in the executor that this resolves, however it should be resolved elsewhere. When we create the plan, for some reason we don't call plan.Init() before starting the executor and calling into Next().

So we can remove this check here (albeit, its a minuscule amount of over head, would only produce a single Assembly Op from the compiler), and add the necessary top level Init call to the executor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also extra overhead for any dev wanting to read the function :) And performance-wise (although I see that as trivial and unimportant here), it is an extra code branch that'll throw off any speculative execution somewhat (although if the bool is always the same value I think a lot of CPUs are good at picking up on that and assume it is that value ahead of time)

db/fetcher/versioned.go Show resolved Hide resolved

// get node
// decode the block
return dag.DecodeProtobuf(blk.RawData())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to set the node builder here or it might be spewing out v0 cids:

nd.SetCidBuilder(
		cid.V1Builder{
			Codec:    cid.Raw,
			MhType:   mh.SHA2_256,
			MhLength: -1,
		})

query/graphql/planner/versionedscan.go Show resolved Hide resolved
db/fetcher/versioned.go Show resolved Hide resolved
@jsimnz
Copy link
Member Author

jsimnz commented Nov 25, 2021

Added some minor comments as I was reading, but haven't gotten fully into it yet - will continue review tomorrow when my brain is sharper

If you made your comments as a review, but they are marked as "pending" its because you haven't submitted the review yet. Lmk

Ahh okay - didnt realize it behaved like that - let me know if you have a reference when reviewing across two days - e.g. submit on the second day, or submit twice so you get stuff early, or never actually open the review

either works fine, I guess it depends on the nature of the issue. If most of the comments/changes are independent, then you can submit multiple reviews.

We forgot to call `plan.Init()` after `makePlan(...)` which caused the scanners to not be properly initialized as there was no propagation. This was incorrectly "fixed" without realizing it by checking if the scanners had beed initialized on each `scan.Next()` call.
This worked, but added constant overhead to each call, and could be resolved by properly initializing the entire plan graph once at the beginning.
@jsimnz jsimnz force-pushed the jsimnz/feat/history-traverser branch from 5ac6247 to c3af989 Compare February 4, 2022 01:00
@@ -85,6 +87,8 @@ func (iterator *BadgerIterator) IteratePrefix(ctx context.Context, startPrefix d
formattedStartPrefix := asFormattedString(startPrefix)
formattedEndPrefix := asFormattedString(endPrefix)

iterator.prefix = formattedStartPrefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this isn't used, and this is incorrect (end is unlikely to fall within the full start prefix)

  • Remove setting of iterator.prefix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I put that in there while debugging and totally forgot, yea needs to be removed

if err := df.kvResultsIter.Close(); err != nil {
return err
}
}
df.kvResultsIter = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but as mentioned on discord should probs move df.kvResultsIter = nil inside the if

// decode cidRaw from core.Key to cid.Cid
// need to remove '/' prefix from the core.Key

c, err := cid.Decode(cidRaw.String()[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as I was losing interest in the Key refactor 😁

// run the DF init, VersionedFetchers only supports the Primary (0) index
vf.DocumentFetcher = new(DocumentFetcher)
return vf.DocumentFetcher.Init(col, &col.Indexes[0], fields, reverse)
// df.index = index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest removal of commented out code

slct.QueryType = ScanQuery
}

// @todo: parse groupby
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this todo for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old

@@ -156,7 +156,9 @@ func (p *Planner) makePlan(stmt parser.Statement) (planNode, error) {
if err != nil {
return nil, err
}
return plan, nil

err = plan.Init()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be why you started getting iterator errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, possible ill look into it

@@ -412,3 +467,5 @@ func (p *Planner) Select(parsed *parser.Select) (planNode, error) {
}
return top, nil
}

// func (p *Planner) select(parsed *parser.Select, source planNode, render bool, fromCollection string) (....)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest removal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved assuming you revert badger/v3/iterator and fix the linter errors (some are very important) - dont remember anything else serious

@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #59 (39862a1) into develop (9d822c3) will decrease coverage by 0.32%.
The diff coverage is 46.02%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #59      +/-   ##
===========================================
- Coverage    57.98%   57.66%   -0.33%     
===========================================
  Files           91       95       +4     
  Lines         8935     9363     +428     
===========================================
+ Hits          5181     5399     +218     
- Misses        3194     3354     +160     
- Partials       560      610      +50     
Impacted Files Coverage Δ
db/base/maker.go 66.66% <0.00%> (-33.34%) ⬇️
db/collection_update.go 39.81% <0.00%> (-0.41%) ⬇️
merkle/clock/ipld.go 31.34% <ø> (ø)
merkle/crdt/factory.go 65.15% <0.00%> (-2.04%) ⬇️
query/graphql/planner/commit.go 83.63% <ø> (-0.15%) ⬇️
query/graphql/planner/multi.go 60.54% <ø> (+4.29%) ⬆️
query/graphql/planner/update.go 73.40% <ø> (+0.48%) ⬆️
query/graphql/planner/versionedscan.go 0.00% <0.00%> (ø)
store/multi.go 0.00% <0.00%> (ø)
db/fetcher/dag.go 56.81% <33.33%> (-2.68%) ⬇️
... and 32 more

@jsimnz jsimnz merged commit 2a70c28 into develop Feb 4, 2022
@jsimnz jsimnz deleted the jsimnz/feat/history-traverser branch February 4, 2022 05:42
jsimnz added a commit that referenced this pull request Feb 7, 2022
…#59)

* Implemented versionFetcher

* Update Fetcher interface

* Created Txn utility

* Added versionedScan to query planner

* Added tests for versionFetcher
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
…sourcenetwork#59)

* Implemented versionFetcher

* Update Fetcher interface

* Created Txn utility

* Added versionedScan to query planner

* Added tests for versionFetcher
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/query Related to the query component feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TimeTraveling Queries
2 participants