Skip to content

Return mutated uids which are part of an upsert operation in a map. #4145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 18, 2019

Conversation

pawanrawal
Copy link
Contributor

@pawanrawal pawanrawal commented Oct 9, 2019

We now return a map of the variables that were part of an upsert mutation along with the list of uids that they map to when debug parameter is set to true.

The PR also refactors http_test.go to return a struct from mutationWithTs.


This change is Reviewable

@pawanrawal pawanrawal requested a review from a team October 9, 2019 05:07
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@pawanrawal you can click here to see the review status or cancel the code review job.

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • In "uids" map, it has keys as uid(v) whereas in "mutated", it has keys as variable name. That is not consistent.
  • I think mutated may not be a good choice for the key name, creation of new UIDs is also technically called mutation.
  • Not sure whether there was a test for bulk upsert too, just check once please.
  • Need to update the docs too
  • Ensure that dgo PR is merged and vendor-ed after the merge

Reviewable status: 0 of 10 files reviewed, 1 unresolved discussion (waiting on @mangalaman93, @manishrjain, and @pawanrawal)


x/x.go, line 60 at r1 (raw file):

// ContextKey is used to set options in the context object.
type ContextKey int

Adding this type doesn't seem very useful.

@mangalaman93 mangalaman93 self-requested a review October 9, 2019 06:50
Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case when a variable is defined in the query and not used in mutation (may be used in the query itself). It will show up in the response of mutated, right?

Additionally, I was thinking how it would work with value variables. What about the case when uid(v) is used in the object, i.e. modifying edges.

For the current situation, we could call it vars instead of mutated because we are just returning values for the defined variables. Not sure whether that is what we want though.

Reviewable status: 0 of 10 files reviewed, 1 unresolved discussion (waiting on @mangalaman93, @manishrjain, and @pawanrawal)

@mangalaman93 mangalaman93 self-requested a review October 9, 2019 08:12
Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am wondering whether it can be extended to multiple mutations where I can use same variable in multiple mutations. Just thinking out loud.

Reviewable status: 0 of 10 files reviewed, 1 unresolved discussion (waiting on @mangalaman93, @manishrjain, and @pawanrawal)

@mangalaman93 mangalaman93 self-requested a review October 9, 2019 09:02
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! Just a minor comment on debugKey.


Reviewed with ❤️ by PullRequest

}
}

updateUIDInMutations(gmu, varToUID)
updateValInMutations(gmu, qr)
return l, nil
// varToUID is returned to the client, lets delete the dummy var that we put in there.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's describe here why varToUID is there, and why we should delete it.

}
}

updateUIDInMutations(gmu, varToUID)
updateValInMutations(gmu, qr)
return l, nil
// varToUID is returned to the client, lets delete the dummy var that we put in there.
delete(varToUID, varName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering, why is varToUID there to begin with?

x/x.go Outdated
@@ -108,6 +112,9 @@ const (
{"predicate":"dgraph.user.group","list":true, "reverse": true, "type": "uid"},
{"predicate":"dgraph.group.acl","type":"string"}
`

// DebugKey is the key used to toggle debug mode.
DebugKey ContextKey = iota
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DebugKey = 0, could easily overlap with another key in a context. It would be better if had DebugKey be a string.

@@ -361,7 +366,8 @@ func mutationHandler(w http.ResponseWriter, r *http.Request) {
req.StartTs = startTs
req.CommitNow = commitNow

ctx := attachAccessJwt(context.Background(), r)
ctx := context.WithValue(context.Background(), x.DebugKey, isDebugMode)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I think it would be better if DebugKey is a string.

if err != nil {
return resp, err
}
parsingTime += l.Parsing
if x.IsDebugRequest(ctx) && len(varToUID) > 0 {
// There could be a lot of these uids which could blow up the response size, specially for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: especially instead of specially

this comment should go above the if statement.

@@ -207,7 +207,7 @@ func queryHandler(w http.ResponseWriter, r *http.Request) {
return
}

ctx := context.WithValue(context.Background(), query.DebugKey, isDebugMode)
ctx := context.WithValue(context.Background(), x.DebugKey, isDebugMode)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted below, DebugKey is an int, and can get confusing. We should have x.DebugKey be a string, or have DebugKey be the value, with the field be something like mode.

Copy link
Contributor Author

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to MutatedVars and we are only returning values of variables that were part of some mutation.

What about the case when a variable is defined in the query and not used in mutation (may be used in the query itself). It will show up in the response of mutated, right?

Good point, I now remove variables which were not part of a mutation.

Additionally, I was thinking how it would work with value variables. What about the case when uid(v) is used in the object, i.e. modifying edges.

Same thing, we only return uids corresponding to v in this case and not the value variables.

Also, I am wondering whether it can be extended to multiple mutations where I can use same variable in multiple mutations.

Should work with multiple mutations, we just take the union of the variables that were used in any mutation.

I have addressed the other questions @mangalaman93. Let me know if you think anything else needs to be changed.

Reviewed 2 of 10 files at r1, 8 of 9 files at r2.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @danielmai, @mangalaman93, @manishrjain, @MichaelJCompton, and @pullrequest[bot])


edgraph/server.go, line 681 at r1 (raw file):

Previously, pullrequest[bot] wrote…

Let's describe here why varToUID is there, and why we should delete it.

Done.


edgraph/server.go, line 682 at r1 (raw file):

Previously, pullrequest[bot] wrote…

just wondering, why is varToUID there to begin with?

added a comment.


x/x.go, line 60 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

Adding this type doesn't seem very useful.

reverted

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I have added support for Go Modules now, can remove the changes in vendor folder now.

  • I am thinking just vars is fine instead of mutation_vars. mutation_vars just looks too long. vars looks more like uids that we already have in the response. In a normal mutation, we only return new UIDs and not the modified ones. Similarly, here, we return the UIDs stored in variables which are modified, not the rest. Calling it vars seems fine to me.

  • Just realized that, uid(u) is returned as key when its a variable whereas if it is a blank node we'd just return u. For example, look at the following response. Could you add this as a comment somewhere in the code.

{
  "data": {
    "code": "Success",
    "message": "Done",
    "mutation_vars": {
      "uid(u)": [
        "6",
        "9"
      ]
    },
    "uids": {
      "u": "0xf"
    }
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 24593,
      "processing_ns": 33807452
    },
    "txn": {
      "start_ts": 81,
      "commit_ts": 82,
      "preds": [
        "1-name"
      ]
    }
  }
}

for the query:

upsert {
  query {
    users(func: eq(name, "user1")) {
      u as uid
    }
  }

  mutation {
    set {
      uid(u) <name> "user1" .
      _:u <name> "user2" .
    }
  }
}
  • I ran the following query and got a response that has 6 instead of 0x6 in ratel. We should add a test for this too.

Steps to reproduce:
Schema:

name: string @index(exact) .
branch: string .
amount: float .

Mutation:

{
  set {
    _:user1 <name> "user1" .
    _:user1 <amount> "10" .

    _:user2 <name> "user2" .
    _:user2 <amount> "100" .

    _:user3 <name> "user3" .
    _:user3 <amount> "1000" .
  }
}

Upsert:

upsert {
  query {
    u as var(func: has(amount)) {
      amt as amount
    }
    me () {
      updated_amt as math(amt+1)
    }
  }

  mutation {
    set {
      uid(u) <amount> val(updated_amt) .
    }
  }
}

Result:

{
  "data": {
    "code": "Success",
    "message": "Done",
    "mutation_vars": {
      "uid(u)": [
        "6",
        "7",
        "8"
      ]
    },
    "uids": {}
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 17432,
      "processing_ns": 29050676
    },
    "txn": {
      "start_ts": 40,
      "commit_ts": 41,
      "preds": [
        "1-amount"
      ]
    }
  }
}

I also did the same thing using curl:

curl 'http://localhost:8180/mutate?commitNow=true' -H 'Content-Type: application/rdf' --data $'upsert {\n  query {\n    u as var(func: has(amount)) {\n      amt as amount\n    }\n    me () {\n      updated_amt as math(amt+1)\n    }\n  }\n\n  mutation {\n    set {\n      uid(u) <amount> val(updated_amt) .\n    }\n  }\n}'

Same result:

{"data":{"code":"Success","message":"Done","mutation_vars":{"uid(u)":["6","7","8"]},"uids":{}},"extensions":{"server_latency":{"parsing_ns":54287,"processing_ns":36651897},"txn":{"start_ts":46,"commit_ts":47,"preds":["1-amount"]}}}

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @danielmai, @manishrjain, @MichaelJCompton, @pawanrawal, and @pullrequest[bot])


edgraph/server.go, line 521 at r3 (raw file):

			// There could be a lot of these uids which could blow up the response size, especially
			// for bulk mutations, hence only return the first million.
			if numUids+len(uids) > 1e6 {

This addition is done twice, could be just done once.


edgraph/server.go, line 595 at r3 (raw file):

// doQueryInUpsert processes the query in upsert block.
// It return the latency and a map of variables => [ uids ...] used in the mutation.

returns*
in the upsert


edgraph/server.go, line 690 at r3 (raw file):

	updateUIDInMutations(gmu, varToUID)
	updateValInMutations(gmu, qr)
	// varToUID is returned to the client, lets delete the dummy var that we put in there for

Let's


wiki/content/mutations/index.md, line 875 at r3 (raw file):

* If the variable is empty i.e. no node matched the query, the `uid` function returns a new UID in case of a `set` operation and is thus treated similar to a blank node. On the other hand, for `delete/del` operation, it returns no UID, and thus the operation becomes a no-op and is silently ignored.
* If the variable stores one or more than one UIDs, the `uid` function returns all the UIDs stored in the variable. In this case, the operation is performed on all the UIDs returned, one at a time.
* A mapping of the uid variable to the list of uids that were used to execute the mutation are also returned as part of the response (as part of the `mutation_vars` key). Currently only upto a million uids are returned as part of the result.

Not sure why it was added here. This doesn't seem like the right place for this doc.

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure about Multiple Mutations. Think about a use case like this:

upsert {
  query {
    v as ...
  }

  mutation1 @if(false condition) { ... }
  mutation2 @if(true condition) { .... }
}

I still can't figure out from the response which mutation was successfully executed.

Reviewable status: 2 of 8 files reviewed, 6 unresolved discussions (waiting on @danielmai, @manishrjain, @MichaelJCompton, @pawanrawal, and @pullrequest[bot])

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Feel free to merge once the existing comments are addressed. Also, get the docs reviewed too.

Reviewed 1 of 10 files at r1, 1 of 9 files at r2, 6 of 6 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @danielmai, @manishrjain, @MichaelJCompton, @pawanrawal, and @pullrequest[bot])


edgraph/server.go, line 521 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

This addition is done twice, could be just done once.

You could directly assign to numUids. No need for totalUids

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: But have a bunch of comments, so address those.

Reviewable status: 1 of 8 files reviewed, 7 unresolved discussions (waiting on @danielmai, @mangalaman93, @manishrjain, @MichaelJCompton, @pawanrawal, and @pullrequest[bot])


dgraph/cmd/alpha/upsert_test.go, line 36 at r5 (raw file):

		for _, uid := range uids {
			require.True(t, strings.HasPrefix(uid, "0x"), "uid: [%v] should be a hex encoded string", uid)
			// ParseUint throws an error if the string has 0x prefix, so we need to strip the prefix here.

100 chars.


edgraph/server.go, line 521 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

You could directly assign to numUids. No need for totalUids

I wouldn't even sum them up. Make a decision per variable. Only include the ones which have less than a mil results.

And do the string conv here. A fast way to convert to hex string.


edgraph/server.go, line 600 at r5 (raw file):

// It returns the latency and a map of variables => [ uids ...] used in the upsert mutation.
func doQueryInUpsert(ctx context.Context, req *api.Request, gmu *gql.Mutation) (
	*query.Latency, map[string][]string, error) {

map[string]*api.Uids


edgraph/server.go, line 675 at r5 (raw file):

		uids := make([]string, len(v.Uids.Uids))
		for i, u := range v.Uids.Uids {
			uids[i] = fmt.Sprintf("%#x", u)

fmt.Sprintf is slow. It acquires some sort of a global lock, don't quote me.

Also, if we have over a Mil results, don't do this conversion and don't return anything.


edgraph/server.go, line 695 at r5 (raw file):

	// varToUID is returned to the client, let's delete the dummy var that we put in there for
	// evaluating the conditional upsert.
	delete(varToUID, varName)

Rename varName to something else.

varName := fmt.Sprintf("__dgraph%d__", rand.Int())
// conditionalVar is a dummy var that we use to evaluate the result of
// conditional upsert.
conditionalVar := fmt.Sprintf("__dgraph%d__", rand.Int())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G404: Use of weak random number generator (math/rand instead of crypto/rand) (from gosec)

@pawanrawal pawanrawal merged commit 6279251 into master Oct 18, 2019
@pawanrawal pawanrawal deleted the pawanrawal/return-mutated-uids-from-upsert branch November 28, 2019 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants