Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ability to explain groupNode and it's attribute(s). #641

Merged
merged 4 commits into from
Jul 23, 2022

Conversation

shahzadlone
Copy link
Member

@shahzadlone shahzadlone commented Jul 17, 2022

Relevant issue(s)

Resolves #525

For Reviewer(s):

  • Should be easier to review commit by commit.
  • This PR completes and puts the lid on the simple @explain feature (with the exception of topLevelNode).
  • Planned to merge into v0.3.0 release.

Description

  • Makes groupNode explainable.
  • Explains the child selects list of attributes of groupNode.
  • Explains the attribute that represents the field the groupBy is on.
  • Includes integration tests for various types of groupNode explanations.

Demo

  • Request:
query @explain {
	author (
		groupBy: [age, verified],
	) {
		age
		_group(filter: {age: {_gt: 63}}) {
			name
		}
	}
}
  • Response:
{
  "data": [
    {
      "explain": {
        "selectTopNode": {
          "groupNode": {
            "groupByFields": [ "age", "verified" ],
            "childSelects": [
              {
                "collectionName": "author",
                "filter": {
                  "age": {
                    "_gt": 63
                  }
                },
                "docKeys": null,
                "groupBy": null,
                "limit": null,
                "orderBy": null
              }
            ],
            "selectNode": {
              "filter": null,
              "scanNode": {
                "collectionID": "3",
                "collectionName": "author",
                "filter": null,
                "spans": [
                  {
                    "end": "/4",
                    "start": "/3"
                  }
                ]
              }
            }
          }
        }
      }
    }
  ]
}

Limitations

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

How has this been tested?

Locally with unit tests + Altair + CI

Specify the platform(s) on which this was tested:

  • Arch Linux (specifically Manjaro flavor on WSL2)

@shahzadlone shahzadlone added feature New feature or request area/query Related to the query component action/no-benchmark Skips the action that runs the benchmark. labels Jul 17, 2022
@shahzadlone shahzadlone added this to the DefraDB v0.3 milestone Jul 17, 2022
@shahzadlone shahzadlone self-assigned this Jul 17, 2022
@codecov
Copy link

codecov bot commented Jul 17, 2022

Codecov Report

Merging #641 (08a13d1) into develop (a0332b7) will increase coverage by 0.18%.
The diff coverage is 87.27%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #641      +/-   ##
===========================================
+ Coverage    57.14%   57.32%   +0.18%     
===========================================
  Files          122      122              
  Lines        14662    14753      +91     
===========================================
+ Hits          8378     8457      +79     
- Misses        5567     5573       +6     
- Partials       717      723       +6     
Impacted Files Coverage Δ
query/graphql/mapper/targetable.go 53.84% <ø> (ø)
query/graphql/planner/explain.go 68.96% <ø> (ø)
query/graphql/planner/group.go 83.51% <85.10%> (+2.46%) ⬆️
query/graphql/mapper/mapper.go 87.86% <100.00%> (+0.07%) ⬆️
query/graphql/planner/arbitrary_join.go 79.55% <100.00%> (ø)

@shahzadlone shahzadlone changed the title Not ready to review (cleaning up local commits + testing). feat: Add ability to explain groupNode attribute(s). Jul 18, 2022
@shahzadlone shahzadlone force-pushed the lone/feat/explain-group-node-attributes branch 4 times, most recently from 6f3aabe to d021080 Compare July 22, 2022 08:49
"childSelects": []dataMap{
{
"collectionName": "author",
"docKeys": nil,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: not quite sure if this is implemented, because I haven't been able to hit the dockey filter case inside the child group.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can leave for now as we investigate

@shahzadlone shahzadlone marked this pull request as ready for review July 22, 2022 09:21
@shahzadlone shahzadlone requested a review from a team July 22, 2022 09:21
Copy link
Member

@jsimnz jsimnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue/suggestion regarding the change to the GroupBy struct (int vs Field)`.

Will give huge praise for very expansive testing though!

tests/integration/query/explain/group_test.go Show resolved Hide resolved
Comment on lines 163 to 172
Query: `query @explain {
author (groupBy: [name]) {
name
_avg(_group: {field: _avg})
_group(groupBy: [verified]) {
verified
_avg(_group: {field: age})
}
}
}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thats a complicated query 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't remotely tell you whats its trying to do in plain english

Copy link
Member

@jsimnz jsimnz Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slowly piecing it together lol, fun test!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Groups everything by name, and shows average of average of the age groupedBy verified.

"childSelects": []dataMap{
{
"collectionName": "author",
"docKeys": nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can leave for now as we investigate

FieldIndexes []int
Fields []Field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(blocking): I'm very hesitant to make a change like this without input from Andy, just for the sake of the explain system.

As I understand it, the goal here is to be able to efficiently get the corresponding FieldName when doing the GroupBy explain, but this change affects a lot of other places (as you know, since you had to update them all).

But, as I understand it, the mapper already has a utility to convert index into FieldName without needing to make a change like this.

eg: n.documentMapping.TryToFindNameFromIndex(index). Which you are already using for the order fields. Is it not possible to use this utility as well for the groupby fields? Which would mean you don't have to make this change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point John. I do think however that the codes reads a bit nicer with this change. For example, this:

for _, keyField := range keyFields {

reads better than this:

for _, keyField := range keyIndexes {

Copy link
Member

@jsimnz jsimnz Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point John. I do think however that the codes reads a bit nicer with this change. For example, this:

There's certainly more than a few rough edges w.r.t the mapper system, which are being tracked/tackled in #606.

The explain PRs should make an effort to not change core planner functionality, if it does need a refactor for something, it should be done in a seperate PR.

For this specific PR, as far as I can tall, the TryToFindNameFromIndex seems like it should be sufficient to circumvent this larger change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really change the functionality though. It merely adds information to a variable (from []int to []struct) and changes its name. I think of it as if it were already a struct, it would just be adding a struct field.

Copy link
Member Author

@shahzadlone shahzadlone Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a fair point, however this is basically just adding an additional field (is nicer IMO). It's nice to have a guarantee of field index always having a corresponding name.

I wrote the TryToFindNameFromIndex function for finding indexes of ordering elements in orderNode. Using that function wouldn't guarantee that the field name exists (even though it should), and obviously is not as nice doing lookup if we don't have to.

One other thing I could do to reduce the changes (however would still prefer this approach better), is I could still pass in only the list of indexes into the functions whose signatures were changed to mapper.Field[] from int[].

LMK what you think, at the end of the day this is a very safe change as it's just tagging an additional field, and the previous field stays there as it was before. I would be concerned if I had removed a field haha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer to try to minimize changes beyond utility funcs for explain PRs.

Although technically this is a small change ([]int to []struct) as Fred pointed out, it does sprawl all over the implementation.

Its OK for now, but lets try to minimize this stuff in the future.

@shahzadlone shahzadlone force-pushed the lone/feat/explain-group-node-attributes branch from d021080 to 98a5d89 Compare July 23, 2022 02:48
@shahzadlone shahzadlone force-pushed the lone/feat/explain-group-node-attributes branch from 98a5d89 to 08a13d1 Compare July 23, 2022 02:51
Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shahzadlone shahzadlone changed the title feat: Add ability to explain groupNode attribute(s). feat: Add ability to explain groupNode and it's attribute(s). Jul 23, 2022
@shahzadlone shahzadlone merged commit 355ee34 into develop Jul 23, 2022
@shahzadlone shahzadlone deleted the lone/feat/explain-group-node-attributes branch July 23, 2022 03:18
shahzadlone added a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
…cenetwork#641)


- Resolves sourcenetwork#525 

- Description:
	Makes `groupNode` explainable.
	Explains the child selects list of attributes of `groupNode`.
	Explains the attribute that represents the field the `groupBy` is on.
	Includes integration tests for various types of `groupNode` explanations.


- Request:
```
query @Explain {
	author (
		groupBy: [age, verified],
	) {
		age
		_group(filter: {age: {_gt: 63}}) {
			name
		}
	}
}
```

- Response:
```
{
  "data": [
    {
      "explain": {
        "selectTopNode": {
          "groupNode": {
            "groupByFields": [ "age", "verified" ],
            "childSelects": [
              {
                "collectionName": "author",
                "filter": {
                  "age": {
                    "_gt": 63
                  }
                },
                "docKeys": null,
                "groupBy": null,
                "limit": null,
                "orderBy": null
              }
            ],
            "selectNode": {
              "filter": null,
              "scanNode": {
                "collectionID": "3",
                "collectionName": "author",
                "filter": null,
                "spans": [
                  {
                    "end": "/4",
                    "start": "/3"
                  }
                ]
              }
            }
          }
        }
      }
    }
  ]
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/query Related to the query component feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explain the attributes of groupNode
3 participants