Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner): Adding GFE Latency and Header Missing Count Metrics #5199

Merged
merged 16 commits into from Jan 4, 2022

Conversation

asthamohta
Copy link
Contributor

@asthamohta asthamohta commented Dec 7, 2021

Benchmarking:
Each function was performed 100 times in a test and the time for it is noted . This is averaged out across 50 runs. Below is the summarisation of the numbers:. Following are the results:

Functionality ReadRow InsertOrUpdate Query
Time without adding GFE Latency Feature 18.47518476 23.79237414 17.92703158
Time with GFE Latency added and Disabled 18.84231938 21.38502053 23.93358834
Time with GFE Latency added and Enabled 19.45244371 22.03212286 21.28579487

For full details of each run, check here

tritone and others added 3 commits Dec 7, 2021
Allows users to supply a custom ErrorFunc which will be used
to determine whether the error is retryable.
Copy link
Contributor

@olavloite olavloite left a comment

You also need to change the mock server so it actually returns a server-timing header. To do this, add the following lines to just before the return statement of the BatchCreateSessions method in internal/testutil/inmem_spanner_server.go (before line 690):

	header := metadata.New(map[string]string{"server-timing": "123"})
	if err := grpc.SendHeader(ctx, header); err != nil {
		return nil, gstatus.Errorf(codes.Internal, "unable to send 'server-timing' header")
	}

(And this also needs to be added to all the other RPCs that should also return a server timing value)

spanner/oc_test.go Outdated Show resolved Hide resolved
spanner/oc_test.go Outdated Show resolved Hide resolved
spanner/oc_test.go Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
@asthamohta asthamohta force-pushed the GFELatency branch 3 times, most recently from e4ae8e1 to 7c462cf Compare Dec 9, 2021
@asthamohta asthamohta force-pushed the GFELatency branch 3 times, most recently from 30558b1 to bcc20d2 Compare Dec 12, 2021
@asthamohta asthamohta force-pushed the GFELatency branch 4 times, most recently from 7e2430f to f2646e9 Compare Dec 13, 2021
spanner/batch.go Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/oc_test.go Outdated Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
spanner/stats.go Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/client.go Outdated Show resolved Hide resolved
spanner/internal/testutil/inmem_spanner_server.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
@asthamohta asthamohta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 15, 2021
@vi3k6i5 vi3k6i5 added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 15, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 15, 2021
@asthamohta asthamohta requested a review from olavloite Dec 15, 2021
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated
if err != nil {
return client, err
}
md, errGFE := client.Header()
Copy link
Contributor

@olavloite olavloite Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer err instead of errGFE

Copy link
Contributor Author

@asthamohta asthamohta Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Knut, the reason why I did this was because in someplaces I was worries that the error was getting returned at the end of the function, example here . So will have to use a separate error in such places. Hence to seem consistent I used errGFE

Copy link
Contributor

@olavloite olavloite Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that. But I think that it is better to follow the standard naming of Golang when possible, and only make an exception when you need to. In this case for example, you no longer need to return err at the end of the function anymore, as you know that it is nil because of the if statement that was added above. So the last statement in the function can now be changed to return client, nil

Copy link
Contributor Author

@asthamohta asthamohta Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I looked into it further and I realised if I define err inside the loop even in the case I described I can use it as the scope of that variable will be limited to the if condition loop. So i think it should be okay to do that. Wdyt?

Copy link
Contributor

@olavloite olavloite Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure I understand what you mean in this case. But if you mean that you can define err only inside the if-block, then that would be good.

Copy link
Contributor Author

@asthamohta asthamohta Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I mean

spanner/client.go Outdated Show resolved Hide resolved
@asthamohta asthamohta force-pushed the GFELatency branch 2 times, most recently from 13d936d to 1641bce Compare Dec 20, 2021
@asthamohta asthamohta requested a review from olavloite Dec 21, 2021
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated
if err != nil {
return client, err
}
md, errGFE := client.Header()
Copy link
Contributor

@olavloite olavloite Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that. But I think that it is better to follow the standard naming of Golang when possible, and only make an exception when you need to. In this case for example, you no longer need to return err at the end of the function anymore, as you know that it is nil because of the if statement that was added above. So the last statement in the function can now be changed to return client, nil

spanner/client.go Outdated Show resolved Hide resolved
@@ -136,7 +140,13 @@ func (t *BatchReadOnlyTransaction) PartitionReadUsingIndexWithOptions(ctx contex
Columns: columns,
KeySet: kset,
PartitionOptions: opt.toProto(),
})
}, gax.WithGRPCOptions(grpc.Header(&md)))

Copy link
Contributor

@olavloite olavloite Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: I think it is better to add an if err != nil { return err }` statement before checking for the latency values. I worry that we otherwise might start returning errors that are harder to interpret for users if the following happens:

  1. The client.PartitionRead returns an error. That error is not checked here at the moment.
  2. The RPC did return a header (so md != nil), but that header does not contain a GFE latency value. So instead of the error that was returned from PartitionedRead, we return an error that indicates that it could not find a GFE Latency value.

I worry that the above would be confusing for users, and make it harder to debug errors that occur because of for example invalid requests.

Copy link
Contributor Author

@asthamohta asthamohta Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if there is no GFE Latency value it won't throw an error. Instead it will add to the header missing count. It only throws an error if it got a garbage value in the value for the key value pair "server-timing". Also currently in the Java implementation, they intercept every rpc call(even when there is a failed operation) and use it to increase the header missing count. I feel if we don't add to the Header missing count in this case then across languages the header missing count might be different in someway

Copy link
Contributor

@olavloite olavloite Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel if we don't add to the Header missing count in this case then across languages the header missing count might be different in someway

That's a good point. So agreed to keep this as is on that.

I still worry a little bit that we might be returning an error from extracting the metadata instead of the error from the actual operation. I think we had a discussion about that earlier, and agreed on returning the error from those methods. I think I'm starting to change my mind on that. It is not an error that the user can do anything about, and it might just cover other potentially more important errors. So I would suggest then ignoring those errors. What do you think?

Copy link
Contributor Author

@asthamohta asthamohta Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log these errors instead then? How do we let the user know that an error has occurred?

Copy link
Contributor

@olavloite olavloite Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have access to the logger, then logging it is probably a good idea. I don't think users can do anything with these errors anyways, so I don't think we need to actively inform them any more than that (and the chance that it happens should also be minimal, as it would mean that the data that is sent back is invalid)

Copy link
Contributor Author

@asthamohta asthamohta Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our conversation going to add a trace instead of returning an error

spanner/batch.go Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
@asthamohta asthamohta added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Dec 22, 2021
@kokoro-team kokoro-team removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 22, 2021
@asthamohta asthamohta requested a review from olavloite Dec 22, 2021
spanner/batch.go Outdated
if err != nil {
return nil, err
}
Copy link
Contributor

@olavloite olavloite Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this early return is no longer needed now that we don't return an error if getting the latency stats fails

Copy link
Contributor Author

@asthamohta asthamohta Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still needed as there might be an error in getting the Header, and in that case we return the error

Copy link
Contributor

@olavloite olavloite Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that happens on line 333. The reason that we added the early return above in the first place, was that we would return the error that might be returned createContextAndCaptureGFELatencyMetrics. As we don't do that anymore, we can safely wait until line 333 before returning the error from the client.Header() method.

Copy link
Contributor Author

@asthamohta asthamohta Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then won't md, err := client.Header() override the value of the original error. I will have to add a separate check as md, errGFE : = client.Header()
if errGFE != nil {
return nil, errGFE
}

Copy link
Contributor

@olavloite olavloite Dec 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we mean the same return in this case. I think you can safely return the if err != nil block starting at line 325. The err on line 329 will not override the err from the call to client.Header(), as that is a different variable, although it does have the same name. That err is only visible in the statement on line 329.

So to summarize:

  1. The if err != nil on line 321 should remain.
  2. The if err != nil on line 325 can be removed. (And yes, the md, err := client.Header() does override the previous value of err, but that is not a problem as we know that err at that point is nil)
  3. The if err := createContextAndCaptureGFELatencyMetrics on line 329 can safely remain. That will not override anything, as it is a separate variable only defined on line 329.
  4. The return client, err on line 333 will therefore always return the value of err that was returned by client.Header().

Copy link
Contributor Author

@asthamohta asthamohta Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay makes sense. My bad. I thought you meant remove the error return on line 321. Will do that.

spanner/batch.go Outdated Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
spanner/stats.go Show resolved Hide resolved
spanner/stats.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
@asthamohta asthamohta marked this pull request as ready for review Dec 22, 2021
@asthamohta asthamohta requested review from a team as code owners Dec 22, 2021
@codyoss codyoss changed the title Feat: Adding GFE Latency and Header Missing Count Metrics feat(spanner): Adding GFE Latency and Header Missing Count Metrics Dec 22, 2021
@asthamohta asthamohta merged commit 3d8a9ea into googleapis:main Jan 4, 2022
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants