refactor: Fetcher filter and field optimization #1500

jsimnz · 2023-05-12T12:51:31Z

Relevant issue(s)

Resolves #490
Resolves #1582 (indirectly)

Description

This is a reduced version of #491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible.

Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.

There will be two followup tasks in future PRs.
A) Proper lexigraphical encoding
B) Proper datastore iterator instead of query.

~~There are still some outstanding bugs and failing tests, but I wanted to get the reviews coming in on the core implementation while I continue to squash bugs to parallelize the time spent~~

~~Also..ignore some of the lingering Print statements for debugging, they will be removed before merging.~~

Tasks

I made sure the code is well commented, particularly hard-to-understand areas.
I made sure the repository-held documentation is changed accordingly.
I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...
Remove debug print statements.

How has this been tested?

Manually, unit + integration tests, CI

Specify the platform(s) on which this was tested:

Ubuntu (WSL2)

jsimnz · 2023-05-12T14:01:28Z

planner/scan.go

+	// Keeping for refence as there is an outstanding
+	// question.
+	//
+	// Don't need to run filter here anymore since the
+	// fetcher will run the filter for us
+	//
+	// Question: How do we want to handle `info.filterMatches`
+	// now? @shahzadlone
+	/*


@shahzadlone

answer: Technically if len(n.currentValue.Fields) != 0 shouldn't the state still be the same (i.e. doc match was found)? It's not explicitly a filter match, but would be fine to keep it anywhere after the if and before the return true, nil.
Also, open to changing the name if you have any better ideas to represent a "successful document found while scanning / fetching, which matched / passed the filters"

AndrewSisley

Leaving some early comments as I know you are actively working on this and looking for feedback asap. As mentioned over discord, I really like the core approach.

AndrewSisley · 2023-05-12T13:28:17Z

client/descriptions.go

@@ -115,6 +115,18 @@ func (sd SchemaDescription) GetFieldKey(fieldName string) uint32 {
 	return uint32(0)
 }

+// GetField returns the field of the given name.
+func (sd SchemaDescription) GetField(name string) (FieldDescription, bool) {


todo: There is already such a function on CollectionDescription - I suggest refactoring that one to just call this one.

AndrewSisley · 2023-05-12T14:20:08Z

db/fetcher/fetcher.go

-	schemaFields map[uint32]client.FieldDescription
-	fields       []*client.FieldDescription
+	filter  *mapper.Filter
+	mapping *core.DocumentMapping


todo: This does not appear to be used and should be removed

AndrewSisley · 2023-05-12T14:31:33Z

db/fetcher/encoded_doc.go

@@ -178,14 +183,32 @@ func convertToInt(propertyName string, untypedValue any) (int64, error) {

 // @todo: Implement Encoded Document type
 type encodedDocument struct {
+	mapping *core.DocumentMapping


suggestion: This is one change that I am really not a fan of. I was under the impression that fetcher was something that 'transcended' the planner system, yet here we are making a planner/mapper type a property on a core fetcher type (instead of a param to a couple of planner-centric funcs).

IIRC the only reason DocumentMapping is in the core package is because it had to be referenced by fetcher.FetchNextDoc and encodedDocument.DecodeToDoc. It is very coupled to the planner system and if you really want it as a prop on a fetcher packages type, I would instead suggest that we instead move fetcher out of db and into planner - I don't like that very much either, but here we are increasing the level of coupling between the two.

Might be worth having a call about this one.

Its a hard call, I'd say youre both right and wrong here.

Its needed here as you note because of fetcher.FetchNextDoc which returns a core.Doc type, which is used by the planner. In reality, the DocumentMapping and core.Doc could really be used anywhere, as its conceptually only tied to the CollectionDescription.

I was almost going to hoist it into the description just to see what that would look like, as that would mean a core.Doc could be used anywhere. Theres nothing specific core.Doc has to the planner. The goal of it is to have an effecient array based representation for a document, instead of relying on a map[string]any everywhere. Its primarily used in the planner, but as there are other places that document handling needs to be refactored (#935) it could be the case the core.Doc is used everywhere.

But thats a bigger conversation that we don't need to get into.

Bottom line, the fetcher now needs to be able to run a filter evaluation, it can use the old style of filter evaluation that was used before the mapper existed, but then we would be doing double work to decode the document. Once to a map[string]any to run with the old version of parser.RunFilter, and once to return for FetchNextDoc.

Additionally, when mapper was implemented, the connor package was migrated to use map[FilterKey]any instead of map[string]any which is generated in mapper/targetable.go.

So it seems its somewhat bound to use the mapper system due to the filter requirements from within the fetcher.

Open to suggestions.

One reason to avoid putting fetcher in planner is that it can be run entirely on its own, as you can see in db/fetcher_test.go, and from within the Collection API, which uses the fetcher.

Happy to have a call 📞

We spoke about this a while back, and I don't think we'll see eye-to-eye on this any time soon :) Author view to trump reviewer (ignoring CTO status ;)), and mark this convo as resolved.

I have a plan for the fetcher that I want to try out in the near future anyway, which will pretty much bypass this conflict entirely if it works out.

AndrewSisley · 2023-05-12T14:33:41Z

db/fetcher/encoded_doc.go

+}
+
+func (encdoc *encodedDocument) decodeToDoc(filter bool) (core.Doc, error) {
+	if encdoc.mapping == nil {


related to my comment on the mapping prop, I see the fact that we have to check this here (in the hot/usage path) and potentially error as a symptom of a bad constructor/type.

the cost of checking this is miniscule compare to the cost of decoding and running the filter.

Same answer to my thoughts above. Fetcher now needs to run filters, filters need a core.Doc, which you can only get by using a core.DocumentMapping.

If we use map[string]any to run the filter on the fetcher (like we did pre-mapper days) then we can remove the dependency on the mapping stuff, but then we double the decoding work.

I'm much less worried about performance here - I flagged this as it is an error path that we do not/should not need in the 'doing' part of the code

If we need to use a pointer's value within the function that doesn't define it, then we need to check that it's not nil. I feel like it makes sense to do that here.

I'm much less worried about performance here - I flagged this as it is an error path that we do not/should not need in the 'doing' part of the code

This doesn't address my point that we must do this here, I see no way around this. Moreover, this is based on how you wrote the mapper/filter/doc.

Tangentially, your noted objection to these x != nil come up all the time in reviews, and we keep reminding you that its as command and idiomatic as it gets for Go code.

This chain was only marked as a suggestion, so I guess we can assume that it is staying as is.

Tangentially, I think the insular dogma of 'idomatic Go' far worse than any deficiencies in the language, and it very much seems to prevent the language from evolving along with the rest of the programming profession. This problem has been solved for a very long time in pretty much every other major language, yet the Go community is locked by this dogma into relying on old fashioned 'everything that is a pointer is optional and you have no way of knowing if it really exists' mentality.

If you make this an function parameter it will never be nil, and it will be pretty obvious even if we refuse to unsubscribe from such dogma, as there are only a couple of references to it (the func, and parent func). Even if you keep the (dead) check in, it will be easier to read the code paths and see that it is dead code.

@jsimnz It felt like most of your comment was directed at me, not the suggestion itself - I hope I have explained why I raised this comment, and why I will continue to do so.

Even if it's a function parameter it can still be nil, it doesn't change anything.

In fact, it is a function parameter, just at a different function / call site to initialize the fetcher.

It should still be checked.

--

Regarding the idiomatic. Ultimately it doesn't matter what other languages do. This Is Go, and Go convention wins. The language authors and core maintainers have been very explicit that the goals as a language are not to be fancy, and will often ignore more modern PL techniques and research, in an attempt at "simplicity" and clarity.

The pointer has to be checked regardless. Putting it in something like a homegrown results or option container doesn't change that, nor does making it a function argument.

My comment was directed at the historical nature of this thread, as it comes up often, and the response is often the same on both ends.

PS. Word for word my previous response was 50% directed at you lol. The first half of the message was at your suggestion, as I pointed out there's almost no way around having the doc mapper in the fetcher unless either refactor the filter stuff, or double the decoding work. Neither seem ideal compared to leaving as is.

There's a reply here in the other comment thread you raised the issue on too, with my same points, from Friday.

#1500 (comment)

Resolving this convo, as it is going nowhere, and is of very low value lol. Leave as-is if prefered.

AndrewSisley · 2023-05-12T14:42:28Z

db/fetcher/encoded_doc.go

+	// we update the bitsets as we collect values
+	// by clearing the bit for the FieldID
+	filterSet *bitset.BitSet // filter fields
+	selectSet *bitset.BitSet // select fields


todo: This appear unused and should be removed

It was used, but was removed, thanks for catching!

Is still still relevant? I can't see if there was any changes as a result.

AndrewSisley · 2023-05-12T14:50:36Z

db/fetcher/fetcher.go

+	// allfields       map[uint32]*client.FieldDescription
+
+	// static bitsets
+	filterSet *bitset.BitSet


todo: The usage of this property is not self explanatory and needs code-comments, I am very tired atm, but it should still be easier to understand than it currently is for me :)

praise: Documentation here is really good - thanks John :)

AndrewSisley · 2023-05-12T14:56:29Z

db/fetcher/encoded_doc.go

+	// 0 means we we ignore the field
+	// we update the bitsets as we collect values
+	// by clearing the bit for the FieldID
+	filterSet *bitset.BitSet // filter fields


suggestion: I'm not a fan of the fact that this property is only ever used from outside of this type - to me it strongly suggests that it should be moved off the encodedDocument type and held somewhere else, but perhaps I will warm to it as my brain-fog fades and the documentation increases :)

nitpick: Also feels a bit odd that this (and doc, and the encPropertys in Properties) is a pointer.

Most of the properties of encodedDocument are directly accessed outside the type. Its really just a container type for holding state, and then its reponsible for combining state.

the doc filterSet could be on the fetcher directly, which would then have two sets, a "static" and a "dynamic" set, but as this is specific to the other fields within the encodedDocument type, it felt appropriate to keep here. Also means if I need to pass around the doc for any reason, the bitset goes with it.

re nitpick: Theres no good reason for the encProperty to be a pointer, but BitSet must be a pointer type as that's how its defined in its package. New method returns pointer, all methods are defined on pointer reciever, etc.

Most of the properties of encodedDocument are directly accessed outside the type. Its really just a container type for holding state, and then its reponsible for combining state.

Didnt realize you saw it that way - in that case I do think that this should have no funcs defined on it then as that means this type is used both as an object, and as a data-struct, which I really dislike. Change that is well out of scope regardless of what anyone thinks of it here though. Convo resolved :)

AndrewSisley · 2023-05-12T15:00:57Z

db/fetcher/fetcher.go

+// seekKV will seek through results/iterator until it reaches
+// the target key, or if the target key doesn't exist, the
+// next smallest key that is greater than the target.
+func (df *DocumentFetcher) seekKV(key string) (bool, *KeyValue, error) {


todo: It doesn't look like key needs to be a string, and it feels very strange that it is. Is this just because there is no easy to use std Golang []byte comparator and you want to save immediate dev time? If so I think it is worth documenting that (and very-maybe an issue).

This was mostly just a conceptual similarity choice as atm the KV level keys are considered strings, but digging further into it theres benefits to being a string.

Firstly, there is an equivalent byte comparator as string comparator.

But, at the moment, core.DataStoreKey.Bytes() just calls []byte(datastorekey.ToString()) anyway. Which will result in an extra allocation anyway ("safe" conversion between strings and bytes allocates+copies).

At the moment, theres just a lot of conversion back and forth between strings, bytes, back to strings, and bytes, and prob a few more. So until we get organized around the proper binary serialization, it doesn't really matter which to use here. But, the one fact we know is that datastorekey.ToString() is cheaper than datastorekey.Bytes(). So its a toss.

I think we can revisit this when we change the key encoding.

At the moment, theres just a lot of conversion back and forth between strings, bytes, back to strings, and bytes, and prob a few more. So until we get organized around the proper binary serialization, it doesn't really matter which to use here. But, the one fact we know is that datastorekey.ToString() is cheaper than datastorekey.Bytes(). So its a toss.

Ah fair enough, happy as-is then.

I think we can revisit this when we change the key encoding.

I do see that as quite tangential/separate, and wouldnt recommend doing those two tasks together.

AndrewSisley · 2023-05-12T15:06:01Z

db/fetcher/fetcher.go

+				}
+
+				// run filter
+				passed, err := mapper.RunFilter(filterDoc, df.filter)


suggestion: This at first glance looked like a horribly inefficient way of doing this, before I remembered that multiple fields might be involved. I think it is worth documenting this.

(also if for whatever reason you strongly disagree with that suggestion, please then remove the // run filter comment as it doesnt add any value - I assume you added it as you were planning out the implementation)

there was a comment above the if docDone line that said // if docDone xD. Just my process, thanks for flagging the comment.

What seemed ineffecient at first glance? It gets run anyway, previously on scanner, now on fetcher.

What seemed ineffecient at first glance?

I honestly cant remember, will reply again if I think I remember and if this code is still relevant (just going through the old comments before reviewing the new code atm)

AndrewSisley · 2023-05-12T15:07:45Z

db/fetcher/fetcher.go

+				if err != nil {
+					return nil, err
+				}
+				passed, err := mapper.RunFilter(decodedDoc, df.filter)


todo: You appear to be running the filter twice when docDone is true, please remove that before merge :)

I struggle to see this. Can you point me to where the second run of the filter happens?

Edit: I looked at it some more and I think I see it.

shahzadlone

Skimmed high-level, I mostly really like the changes, much smaller and simpler than the previous fetcher stuff.

Shared my thoughts on the questions you asked.

Will give a more deeper second look soon.

shahzadlone · 2023-05-12T20:17:31Z

db/fetcher/errors.go

@@ -25,6 +25,7 @@ const (
 	errVFetcherFailedToDecodeNode   string = "(version fetcher) failed to decode protobuf"
 	errVFetcherFailedToGetDagLink   string = "(version fetcher) failed to get node link from DAG"
 	errFailedToGetDagNode           string = "failed to get DAG Node"
+	errMissingMap                   string = "missing document mapper"


nitpick:

Suggested change

errMissingMap string = "missing document mapper"

errMissingMapper string = "missing document mapper"

shahzadlone · 2023-05-12T20:24:55Z

planner/scan.go

+		return false, err
+	}
+	// @question: Should this be after the length check?
+	n.execInfo.docFetches++


answer: Represents the attempts at fetching docs. Since the Fetch call is before and we do this after ensuring it was not a failure, it's fine where it is at the moment.

Commenting here for both this answer and your other comment.

Technically it means that docFetches and filterMatches will basically be the same value (or differ by at most 1).

Technically it means that docFetches and filterMatches will basically be the same value (or differ by at most 1).

I am going to talk from the perspective of what used to happen... The majority of the time, docFetches and iterations would be similar (because for every iteration we would attempt to fetch. But it's not strictly true. Also docFetches and filterMatches will mostly NOT be same, their difference is not strictly most by 1. It all depended on the number of docs and how many match the query or not. Consider these examples from our current execution explain tests.

Consider scan node execute results...

TestExecuteExplainRequestWithOrderFieldOnParent

"iterations": uint64(3), "docFetches": uint64(3), "filterMatches": uint64(2),

TestExecuteExplainMutationRequestWithCreate

"iterations": uint64(1), "docFetches": uint64(1), "filterMatches": uint64(1),

TestExecuteExplainMutationRequestWithDeleteUsingFilter

"iterations": uint64(2), "docFetches": uint64(3), "filterMatches": uint64(1),

TestExecuteExplainMutationRequestWithUpdateUsingFilter

"iterations": uint64(4), "docFetches": uint64(6), "filterMatches": uint64(2),

TestExecuteExplainRequestWithDocumentsButNoMatches

"iterations": uint64(1), "docFetches": uint64(3), "filterMatches": uint64(0),

For example in the last example it had two documents, it fetched 2 times and none matched the filter. The third time it fetched there was no more docs so total attempts are 3, with 0 matches, and we entered scan node once.

Fetcher update in this PR

I have to read the implementation in more detail to see if it is still possible to differentiate between the attempts fetches took place till match found vs total matches found. Might even just need to return a variable with result from the fetcher call if we want to keep track similar to before of all the fetch attempts till match.

My whole point is that because of the changes in this PR, the behavior is now different. Previously, the scanner "fetched" a document from the fetcher, then the scanner was responsible for filtering.

One of the core changes in this PR is that we are moving the filtering from the scanner to the fetcher.

So keeping track of both filterFetches and filterMatches in the scanner is meaningless now.

Yup makes sense, I did review the implementation in more depth and seems like in order to get that info now we would have to have the fetcher logic store that metric and then communicate back to the scanner. IMO let's keep the metrics we collect scoped to planner nodes only so the ones we need to bend over backward for in a more complicated manner "out-side" the plan nodes, I would skip those metrics for now.
tldr: can just remove/skip the metric that isn't as relevant now

shahzadlone · 2023-05-12T20:35:58Z

planner/scan.go

+	// Keeping for refence as there is an outstanding
+	// question.
+	//
+	// Don't need to run filter here anymore since the
+	// fetcher will run the filter for us
+	//
+	// Question: How do we want to handle `info.filterMatches`
+	// now? @shahzadlone
+	/*


answer: Technically if len(n.currentValue.Fields) != 0 shouldn't the state still be the same (i.e. doc match was found)? It's not explicitly a filter match, but would be fine to keep it anywhere after the if and before the return true, nil.
Also, open to changing the name if you have any better ideas to represent a "successful document found while scanning / fetching, which matched / passed the filters"

fredcarle · 2023-05-13T04:38:21Z

db/fetcher/fetcher.go

-	for _, field := range col.Schema.Fields {
-		df.schemaFields[uint32(field.ID)] = field
+	df.selectFields = make(map[uint32]client.FieldDescription, len(fields))
+	// @todo: We can probably make this size more accurate, this is an upper bound


question: How would you make this more accurate? You seem to be using all the allocated KVs in the for loop.

the @todo here is old, the line it refers to has been removed since opening the PR xD

fredcarle · 2023-05-13T05:00:26Z

db/fetcher/fetcher.go

 	Start(ctx context.Context, txn datastore.Txn, spans core.Spans) error
 	FetchNext(ctx context.Context) (*encodedDocument, error)
 	FetchNextDecoded(ctx context.Context) (*client.Document, error)
 	FetchNextDoc(ctx context.Context, mapping *core.DocumentMapping) ([]byte, core.Doc, error)
 	Close() error
 }

+// KeyValue is a KV store response containing the resulting core.Key and byte array value.
+type KeyValue struct {


question: Does this need to be public? Asking because it doesn't seem to be use outside of this file.

Nevermind. DocumentFetcher.KV() returns this type.

It doesn't need to be public. The NextKV, NextKey, and ProcessKV public methods all don't need to exist, theyre old methods mostly for debugging, they can be removed, so KeyValue can be made private.

Should we open a ticket to action this in the future?

Implemented now

db/fetcher/fetcher.go

islamaliev

I assume our fetcher is already somewhat covered with tests, but I see here quite some new or updated logic (like filtering, mapping, seeking...). Isn't it supposed to be covered as well?

islamaliev · 2023-05-16T10:13:22Z

db/fetcher/encoded_doc.go

@@ -27,6 +28,10 @@ type encProperty struct {
 	Desc client.FieldDescription
 	Raw  []byte

+	// Filter flag to determine if this flag
+	// is needed for eager filter evaluation
+	Filter bool


question : don't we want to stick to a "normal" convention of naming bool vars with prefixes like "Is", "Has"... This one could be names ShouldFilter
Current name is surely shorted, but reading if prop.Filter { feels a bit ambiguous (at least to my view)
Maybe it's still my C++ vision where you can if basically any type :)

Just bringing your attention to this one John.

islamaliev · 2023-05-16T10:16:36Z

db/fetcher/encoded_doc.go

+		if encdoc.doc.Fields[prop.Desc.ID] != nil { // used cached decoded fields
+			continue
+		}
+		if filter && prop.Filter != filter { // only get filter fields if filter=true


nitpick: could write if filter && !prop.Filter {

codecov · 2023-06-10T09:55:04Z

Codecov Report

Patch coverage: 79.53% and project coverage change: +0.18 🎉

Comparison is base (4fe32de) 73.47% compared to head (b45633a) 73.65%.

@@             Coverage Diff             @@
##           develop    #1500      +/-   ##
===========================================
+ Coverage    73.47%   73.65%   +0.18%     
===========================================
  Files          188      188              
  Lines        19320    19608     +288     
===========================================
+ Hits         14195    14442     +247     
- Misses        4073     4102      +29     
- Partials      1052     1064      +12

Flag	Coverage Δ
all-tests	`73.65% <79.53%> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
core/data.go	`91.61% <ø> (ø)`
db/fetcher/errors.go	`20.00% <ø> (ø)`
planner/mapper/errors.go	`100.00% <ø> (ø)`
planner/top.go	`68.53% <ø> (ø)`
planner/planner.go	`76.70% <40.00%> (+0.31%)`	⬆️
request/graphql/parser/filter.go	`51.61% <61.54%> (+3.34%)`	⬆️
planner/type_join.go	`75.19% <71.43%> (+0.32%)`	⬆️
db/fetcher/encoded_doc.go	`62.73% <77.42%> (+0.32%)`	⬆️
planner/scan.go	`87.70% <78.38%> (-2.37%)`	⬇️
db/fetcher/fetcher.go	`69.58% <78.57%> (+5.78%)`	⬆️
... and 12 more

... and 2 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fe32de...b45633a. Read the comment docs.

AndrewSisley · 2023-06-12T19:49:31Z

db/fetcher/encoded_doc.go

 	encdoc.Key = nil
+	if encdoc.mapping != nil {


suggestion: This is a bit odd without a New function. Here mapping should never be nil no? Yet at the same time if it ever is, the doc property will never be cleared by reset leaving this instance in an invalid state.

I'd suggest adding a New(...) EncodedDocument func that ensures mapping is never nil, or if for whatever reason you really dont want that, add an else clause that just sets doc to nil.

Wanted a Reset instead of a New so it keeps using the same allocated/defined struct on the fetcher.

As for the else, the doc will already be nil if the mapping is also nil, and the mapping never gets updated once its set from the beginning, so its either nil or not for the whole thing. I can add if you you prefer, but its not a codepath that will add any meaning/safety imo.

Wanted a Reset instead of a New so it keeps using the same allocated/defined struct on the fetcher.

I think you misunderstand why I mentioned New, by declaring a constructor you are declaring a restricted initial state. There you can enforce mapping is not nil and reset loses a dead code branch (the implicit else), as well as possibility of the invalid doc state that I mentioned. New can even call Reset internally for a stronger initial state guarantee/description if you want.

The way the fields of encodedDocument are handled in fetcher brings me to the conclusion that a Reset function makes more sense then a New function.

AndrewSisley · 2023-06-12T19:56:42Z

db/fetcher/fetcher.go

 )

 // Fetcher is the interface for collecting documents from the underlying data store.
 // It handles all the key/value scanning, aggregation, and document encoding.
 type Fetcher interface {
-	Init(col *client.CollectionDescription, fields []*client.FieldDescription, reverse bool, showDeleted bool) error
+	Init(col *client.CollectionDescription, fields []client.FieldDescription,


todo: Breaking up the props onto multiple lines, but still with multiple props per line is really quite hard to read and I'd much prefer that each prop would have their own line. and not share with the func name and brackets:

foo( a int, b int, ... )

Yea I didn't like it when I was doing it, but I also hate one line per for interface definitions. But I guess Ill have to, since the both alternatives suck more 🙃

AndrewSisley · 2023-06-12T20:55:40Z

db/fetcher/fetcher.go

+			// only run filter if we've collected all the fields
+			// required for filtering. This is tracked by the bitsets.
+			if df.filterSet.Equal(df.doc.filterSet) {
+				df.ranFilter = true


todo: Minor, this looks like it is set in the wrong location - if errors are produced ranFilter will be true, yet the filter may not have been run. I think this should be set after the error check for mapper.RunFilter (currently between ln 539 and 540)

Ahh, yea it should be. I don't think it makes a diff since it will error out anyway, but for sanity 👍

AndrewSisley · 2023-06-12T21:34:44Z

planner/datasource.go

@@ -43,7 +43,7 @@ func (p *Planner) getCollectionScanPlan(parsed *mapper.Select) (planSource, erro
 	}

 	scan := p.Scan(parsed)
-	err = scan.initCollection(colDesc)
+	err = scan.initCollection(colDesc) // @todo: Should `initCollection` be merged into `p.Scan(...)`


todo: I have an very large preference for this todo to be done. I guess it is technically out of scope, but initCollection is amusingly horrible (presumably from a time where unsure whether it would be shared/re-inited) and it would be quick todo.

Failing that, please make sure that either the todo is removed, or is bound to a ticket, before you merge this PR

… groupBy behavior

Resolved comments and todos

## Relevant issue(s) Resolves sourcenetwork#490 Resolves sourcenetwork#1582 (indirectly) ## Description This is a reduced version of sourcenetwork#491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible. Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.

jsimnz added this to the DefraDB v0.5.1 milestone May 12, 2023

jsimnz self-assigned this May 12, 2023

jsimnz mentioned this pull request May 12, 2023

feat: Fetcher field selection and optimistic filter evaluation #491

Closed

7 tasks

jsimnz requested a review from AndrewSisley May 12, 2023 12:55

jsimnz commented May 12, 2023

View reviewed changes

AndrewSisley reviewed May 12, 2023

View reviewed changes

jsimnz requested review from fredcarle and shahzadlone May 12, 2023 18:59

shahzadlone reviewed May 12, 2023

View reviewed changes

fredcarle reviewed May 13, 2023

View reviewed changes

db/fetcher/fetcher.go Outdated Show resolved Hide resolved

islamaliev reviewed May 16, 2023

View reviewed changes

fredcarle modified the milestones: DefraDB v0.5.1, DefraDB v0.6 May 29, 2023

AndrewSisley reviewed Jun 12, 2023

View reviewed changes

jsimnz added 21 commits June 27, 2023 15:51

Forgot to commit this earlier

9540923

debugging

598fb15

Updated fetcher to scan all fields if none specified

71f73f8

Updated mapper to include all necessary fields via selects

639b89c

Planner now reflects changes from fetcher and mapper

f71cd63

Fixed all tests based on new changes (mostly explain tests)

dbf4276

reverted to printStore logging

a8b24de

Make linter happy :)

8086f74

Cleanup after rebase, fixed type join filter

464e0ed

review changes

0a9a054

Fixed order by to only be in mapper

cd35400

reviewer changes, removed some scan/mapper overlap

77fe73b

Added missing error

ec0c3fd

make linter happy :)

5c1c350

post rebase cleanup

6e6b72e

Updated fetcher mocks

53f100d

Updated alias grouping to use field descriptions, fixed over fetching…

d510508

… groupBy behavior

Make linter happy :)

8ae1456

review fixup, mostly removing old comments/todos

fc60413

Added deeply nested related order tests

6307d66

Added copyright header for new test file

8efc47c

jsimnz force-pushed the jsimnz/refactor/fetcher-filter-selection-v2 branch from 9ca4ee6 to 8efc47c Compare June 27, 2023 19:51

Filter flag to follow 'Is' convention

b45633a

jsimnz force-pushed the jsimnz/refactor/fetcher-filter-selection-v2 branch from 481162c to b45633a Compare June 27, 2023 20:52

jsimnz merged commit 6c0c5bd into develop Jun 27, 2023
12 checks passed

jsimnz deleted the jsimnz/refactor/fetcher-filter-selection-v2 branch June 27, 2023 21:41

jsimnz mentioned this pull request Jun 27, 2023

Replace ds.Query with a simplified Iterator interface #1600

Open

jsimnz mentioned this pull request Aug 16, 2023

fix: Prevent multiple docs from being linked in one one #1790

Merged

	errMissingMap string = "missing document mapper"
	errMissingMapper string = "missing document mapper"

refactor: Fetcher filter and field optimization #1500

refactor: Fetcher filter and field optimization #1500

Conversation

jsimnz commented May 12, 2023 • edited

Relevant issue(s)

Description

Tasks

How has this been tested?

Choose a reason for hiding this comment

shahzadlone May 12, 2023 • edited

Choose a reason for hiding this comment

AndrewSisley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

But thats a bigger conversation that we don't need to get into.

Open to suggestions.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewSisley May 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fredcarle May 13, 2023 • edited

Choose a reason for hiding this comment

shahzadlone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shahzadlone May 13, 2023 • edited

Choose a reason for hiding this comment

Consider scan node execute results...

TestExecuteExplainRequestWithOrderFieldOnParent

TestExecuteExplainMutationRequestWithCreate

TestExecuteExplainMutationRequestWithDeleteUsingFilter

TestExecuteExplainMutationRequestWithUpdateUsingFilter

TestExecuteExplainRequestWithDocumentsButNoMatches

Fetcher update in this PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shahzadlone May 12, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

islamaliev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz commented May 12, 2023 •

edited

shahzadlone May 12, 2023 •

edited

AndrewSisley May 12, 2023 •

edited

fredcarle May 13, 2023 •

edited

shahzadlone May 13, 2023 •

edited

`TestExecuteExplainRequestWithOrderFieldOnParent`

`TestExecuteExplainMutationRequestWithCreate`

`TestExecuteExplainMutationRequestWithDeleteUsingFilter`

`TestExecuteExplainMutationRequestWithUpdateUsingFilter`

`TestExecuteExplainRequestWithDocumentsButNoMatches`

shahzadlone May 12, 2023 •

edited

codecov bot commented Jun 10, 2023 •

edited