Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove time series before adding it #766

Merged
merged 1 commit into from Jan 14, 2021

Conversation

olavloite
Copy link
Contributor

@olavloite olavloite commented Dec 31, 2020

Adding the same time series multiple times will cause an error in OpenCensus. A time series could be added multiple times for the same Database client id if a database client was invalidated, for example because if was created for a database that did
not exist.

This works for OpenCensusImpl 0.26 but not for 0.24, but that seems to be the case both with and without this change. With this change the problem is solved for 0.26. In version 0.24 OpenCensusImpl will always throw an error if you try to add a metric with the same name twice.

Fixes #202

Adding the same time series multiple times will cause an error in OpenCensus. A time
series could be added multiple times for the same Database client id if a database
client was invalidated, for example because if was created for a database that did
not exist.

Fixes #202
@olavloite olavloite requested review from skuruppu and sebright Dec 31, 2020
@olavloite olavloite requested a review from as a code owner Dec 31, 2020
@google-cla google-cla bot added the cla: yes label Dec 31, 2020
@olavloite olavloite added the do not merge label Dec 31, 2020
@olavloite olavloite removed request for skuruppu and sebright Dec 31, 2020
@olavloite olavloite added kokoro:force-run and removed do not merge labels Dec 31, 2020
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run label Dec 31, 2020
@olavloite olavloite requested review from skuruppu and sebright Dec 31, 2020
@olavloite olavloite added the kokoro:force-run label Dec 31, 2020
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run label Dec 31, 2020
@codecov
Copy link

@codecov codecov bot commented Dec 31, 2020

Codecov Report

Merging #766 (5dc62bc) into master (3d9689c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #766   +/-   ##
=========================================
  Coverage     85.00%   85.00%           
  Complexity     2562     2562           
=========================================
  Files           143      143           
  Lines         14007    14016    +9     
  Branches       1338     1338           
=========================================
+ Hits          11906    11915    +9     
  Misses         1538     1538           
  Partials        563      563           
Impacted Files Coverage Δ Complexity Δ
...ain/java/com/google/cloud/spanner/SessionPool.java 88.92% <100.00%> (+0.09%) 71.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d9689c...5dc62bc. Read the comment docs.

@product-auto-label product-auto-label bot added the api: spanner label Jan 1, 2021
@sebright
Copy link
Contributor

@sebright sebright commented Jan 4, 2021

@olavloite Thanks for fixing issue #202! I tested this PR with my application, and it worked with OpenCensus 0.26.0 as well as 0.28.0.

Copy link
Contributor

@sebright sebright left a comment

If I understand correctly, the issue with increasing client IDs could still occur if a different exception were thrown from

. It might help to also make a separate change to the error handling in SpannerImpl.getDatabaseClient.

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jan 4, 2021

If I understand correctly, the issue with increasing client IDs could still occur if a different exception were thrown from

That is correct.

. It might help to also make a separate change to the error handling in SpannerImpl.getDatabaseClient.

I'm somewhat reluctant to do that. As I see it, the real problem here is not that the client IDs are increasing, but the fact that createPool was throwing an exception. It is a method that should normally not throw any exceptions, and adding additional error handling in SpannerImpl#getDatabaseClient(DatabaseId) might mask other unexpected errors. Note that if the user has specified for example an invalid database name, createPool will not throw an exception. Instead, that error will be delayed until the user tries to get a session from the pool. Such errors that are caused by invalid input arguments will not cause the problem with ever-increasing client IDs.

@sebright
Copy link
Contributor

@sebright sebright commented Jan 4, 2021

That makes sense. I didn't realize that createPool was designed to not throw exceptions.

@thiagotnunes
Copy link
Contributor

@thiagotnunes thiagotnunes commented Jan 4, 2021

@olavloite Just for my understanding, won't this remove historical data (time series data) if the user redeploys their application?

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jan 5, 2021

@olavloite Just for my understanding, won't this remove historical data (time series data) if the user redeploys their application?

Good question. It won't be a problem after a redeploy, as these time-series are all in-memory on the local client, so these data structures will be all empty after a redeploy. But I'll take an extra look to check that it won't be a problem if a SessionPool is used for a while, then invalidated because of for example a deleted database, and then a new one with the same name is created.

@skuruppu skuruppu requested review from thiagotnunes and removed request for skuruppu Jan 6, 2021
@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jan 14, 2021

@olavloite Just for my understanding, won't this remove historical data (time series data) if the user redeploys their application?

I've been stepping through the OpenCensus implementation, and my conclusion based on that is:

  1. Removing a TimeSeries or other metric that has not already been added in the current local deployment is a no-op. This means that in the normal case where a new SessionPool is created, the removeTimeSeries(..) calls are no-ops.
  2. Removing a TimeSeries that has been previously added in the current JVM, will remove the local data structure from memory. This means that any data that has not yet been pushed to the server, will be lost. It will not remove any data on the server.
  3. How much metrics that might be lost, depends on the exporter and how often this exports data to the server.

It is however important to realize that any in-mem sampling data will only be lost in the case that:

  1. A SessionPool is created and can operate normally.
  2. The SessionPool is invalidated because the underlying database or instance is deleted.
  3. A new SessionPool is created in the same JVM for the same database that was deleted in step 2.

@thiagotnunes
Copy link
Contributor

@thiagotnunes thiagotnunes commented Jan 14, 2021

Thanks for the analysis @olavloite. LGTM

@thiagotnunes thiagotnunes merged commit 90255ea into master Jan 14, 2021
20 checks passed
@thiagotnunes thiagotnunes deleted the remove-time-series-before-add branch Jan 14, 2021
thiagotnunes pushed a commit that referenced this issue May 6, 2021
Adding the same time series multiple times will cause an error in OpenCensus. A time
series could be added multiple times for the same Database client id if a database
client was invalidated, for example because if was created for a database that did
not exist.

Fixes #202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner cla: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants