Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Datastore Tests #24

Merged
merged 4 commits into from Nov 27, 2019
Merged

Add Datastore Tests #24

merged 4 commits into from Nov 27, 2019

Conversation

aschmahmann
Copy link
Contributor

@aschmahmann aschmahmann commented Nov 26, 2019

Uses the Datastore TestSuite to the test the Measure datastore. There are issues that it already identifies including, that this datastore does not properly implement the Batch interface.

I'm happy to fix the Batching in this PR as long as we come up with a reasonable solution.

Problem:

  • Currently creating a Batch separately tracks Puts + Deletes. Additionally, when Committing a Batch we separately commit the Deletes and then the Puts. This means that Put(x) followed by Delete(x) will result in a Put(x) to the underlying Datastore

Possible Solution:

  1. Track the result of Puts + Deletes
    a) Use a single Batch, from the passed in datastore, for Puts + Deletes and keep an accounting of the keys
    b) Use an internal map (or map datastore) for the batch
  2. Measurements
    a) Only log the net results (e.g. do not log a Put(x) for a Put(x) + Delete(x), after Put(x, val1) + Put(x,val2) only log the size of val2)
    b) Log the total number of puts + deletes (including duplicates), but only add latency observations for the net results
    c) Log all puts + deletes and latency observations for each of them

I'm not sure what metrics we actually want to get out of a Batch, since according to the documentation "Batches do NOT have transactional semantics". For now doing 1a + 2a seems reasonable, but I'm up for alternative approaches.

EDIT: From @Stebalien on IRC it sounds like we don't really care about logging the Puts/Gets that go into Batches, so for now we're just going to setup a separate metric for Batching.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt?

@Stebalien
Copy link
Member

I'd say 1a and 2b. We're going to get pretty wonky per-key latencies otherwise given that many datastores will rate-limit batch commits.

Ideally, we'd track tuples of "(time, deletes, puts)" but I'm not sure how to do that with our current metrics library. I'm fine punting that for now.

We still ocationally do individual puts/deletes (I think?). Those should give us an accurate measurement of expected latency (which is all we really care about).

@Stebalien
Copy link
Member

We'll need to update CI to go 1.12.

@aschmahmann
Copy link
Contributor Author

aschmahmann commented Nov 26, 2019

@Stebalien how does this version look? I separately recorded all puts, deletes and commits from batches so we can figure out what to do with them later.

Getting other information can be a bit tricky. For example, Put(x, largeData) followed by Put(x, smallData) and Commit() will end up with smallData on disk, but bytes written might be largeData+smallData or just smallData.

@aschmahmann
Copy link
Contributor Author

aschmahmann commented Nov 26, 2019

@Stebalien Also, how much of the gx stuff should I remove? Just the CI build, or also the .gx folder, json.package and Makefile?

@Stebalien
Copy link
Member

@Stebalien Also, how much of the gx stuff should I remove? Just the CI build, or also the .gx folder, json.package and Makefile?

All of it except .gx.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if batch delete/put latencies are worth tracking but it doesn't hurt.

Once GX stuff is removed and CI passes, LGTM!

@aschmahmann aschmahmann marked this pull request as ready for review November 27, 2019 02:08
@aschmahmann aschmahmann merged commit 0ebe79f into master Nov 27, 2019
@aschmahmann aschmahmann deleted the feat/testing branch November 27, 2019 17:46
@aschmahmann aschmahmann restored the feat/testing branch November 27, 2019 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants