Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support and test mutation log queries at intermediate timestamps #1380

Merged
merged 7 commits into from Nov 12, 2019

Conversation

@gdbelvin
Copy link
Collaborator

gdbelvin commented Nov 7, 2019

Standardize on quantums of time.Microsecond

@gdbelvin gdbelvin requested a review from google/keytransparency as a code owner Nov 7, 2019
@googlebot googlebot added the cla: yes label Nov 7, 2019
@gdbelvin gdbelvin requested a review from mhutchinson Nov 7, 2019
Copy link
Contributor

pavelkalinnikov left a comment

Mostly nits, plus one "pleeeease".

core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
impl/memory/mutation_logs.go Outdated Show resolved Hide resolved
core/sequencer/server_test.go Outdated Show resolved Hide resolved
@gdbelvin gdbelvin requested review from pavelkalinnikov and google/keytransparency Nov 7, 2019
@gdbelvin gdbelvin force-pushed the gdbelvin:high_test branch from 0162955 to b4ba2ee Nov 8, 2019
@gdbelvin

This comment has been minimized.

Copy link
Collaborator Author

gdbelvin commented Nov 8, 2019

Ok, sooo...

I've removed all concepts of what "quantum" the storage layer has from tests, or anywhere else outside of the storage implementation itself. I think we've arrived at a place where the code in core can treat watermarks as opaque objects that it does NOT modify.

In order to have consistent behavior for the HighWatermarks test across storage implementations, I've settled on returning a timestamp "just beyond" the primary key of the highest item. Since we're using time.Time, it makes sense to use Add(1) regardless of the time fidelity of the storage layer.

It is now the storage layer's job to round timestamp inputs up to whatever quantum the storage layer itself decides to use. This preserves all the necessary semantics and (very helpfully) we now have tests that verify this behavior.

type LogsReadWriter interface {
sequencer.LogsReader
// Send submits the whole group of mutations atomically to a log.
// TODO(gbelvin): Create a batch level object to make it clear that this a batch of updates.

This comment has been minimized.

Copy link
@pavelkalinnikov

pavelkalinnikov Nov 9, 2019

Contributor

Is this TODO worth it? When a parameter is a list, it's already clear that it's a batch API.

This comment has been minimized.

Copy link
@gdbelvin

gdbelvin Nov 11, 2019

Author Collaborator

I think the idea was to create a distinct type to hold batches of mutations.
This would make the ReadLogs API much clearer since we could return []Batches rather than having the semi-awkard current situation of returning more than batchSize of mutations.

@mhutchinson -- your thoughts?

This comment has been minimized.

Copy link
@mhutchinson

mhutchinson Nov 12, 2019

Contributor

If a batch is a first-class object then it seems subtle and indirect that it is constructed by passing in a bunch of EntryUpdates at the same time. This feels like it should be a slice at a minimum.

For the purposes of just this method I wouldn't go beyond turning the varargs mutation into a slice, but an API shouldn't be designed a method at a time so I'd need to re-review the whole API & future plans to make a call on the situation.

core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
@@ -173,21 +176,25 @@ func (m *Mutations) HighWatermark(ctx context.Context, directoryID string, logID
ORDER BY Q.Time ASC
LIMIT ?
) AS T1`,
directoryID, logID, start, batchSize).
directoryID, logID, startQuery, batchSize).

This comment has been minimized.

Copy link
@pavelkalinnikov

pavelkalinnikov Nov 9, 2019

Contributor

I'm surprized the SQL query takes time.Time as a parameter. My understanding was that it was a microseconds int.

Have you considered not using the DATETIME type? There is much details on how it works, which, together with details on how time.Time works, make me nervous about missing some unexpected corner cases like leap years/seconds or timezones, or keeping in mind this implicit "microsecond quantum" quirk.

Is MySQL DATETIME type one of the reasons you decided to use time.Time? I wonder if we could avoid this complexity as follows:

  • Use BIGINT MySQL type for clarity
  • Use time.Now().UnixNano() timestamp when writing
  • Do not convert it back to time.Time when reading, just continue using the int
  • Use Watermark type to wrap this int in a type-safe manner

This comment has been minimized.

Copy link
@gdbelvin

gdbelvin Nov 11, 2019

Author Collaborator

The big motivation for all these changes was managing the discrepancy between the mysql implementation (which used to be BIGINT with UnixNanos), and Spanner, which uses microseconds. These were two very different kinds of "ints", which caused their own problems.

  • Trying to abstract away the time precision of the storage layer is how we got to time.Time.
  • Migrating to time.Time has also produced a whole new suite of tests to ensure that storage implementations implement these APIs correctly, and have removed quite a few sneaky modifications of these watermarks in the code.
  • In addition to abstracting away the time precision of the storage layer, I've also tried to match the precision of mysql with spanner (hence the use of DATETIME).

I think your concerns are very legitimate and worth discussing further.

@gdbelvin

This comment has been minimized.

Copy link
Collaborator Author

gdbelvin commented Nov 11, 2019

An interesting side effect of this PR is that is requires watermarks to have nanosecond precision.
Just thought I'd point that out for posterity.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Nov 11, 2019

Codecov Report

Merging #1380 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1380      +/-   ##
==========================================
+ Coverage   65.54%   65.61%   +0.06%     
==========================================
  Files          52       52              
  Lines        3959     3961       +2     
==========================================
+ Hits         2595     2599       +4     
+ Misses        969      964       -5     
- Partials      395      398       +3
Impacted Files Coverage Δ
impl/sql/mutationstorage/mutation_logs.go 72.6% <100%> (+0.38%) ⬆️
core/sequencer/trillian_client.go 58.57% <0%> (-2.86%) ⬇️
core/sequencer/server.go 69.83% <0%> (-1.7%) ⬇️
impl/memory/mutation_logs.go 79.59% <0%> (+18.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36d41a9...b0363e7. Read the comment docs.

@mhutchinson

This comment has been minimized.

Copy link
Contributor

mhutchinson commented Nov 11, 2019

@gdbelvin is this ready for review now, or are you planning on making the other changes regarding testing semantics for high watermark?

@gdbelvin gdbelvin force-pushed the gdbelvin:high_test branch from db50acb to a46bb87 Nov 12, 2019
@gdbelvin gdbelvin force-pushed the gdbelvin:high_test branch 2 times, most recently from 5f323f1 to 4d44f1e Nov 12, 2019
Replace tests that wanted to predict the exact value of HighWatermark
with tests that rely on the behavior of ReadLog to assert that the value
returned is indeed correct.

This allows different storage layers to use their own strategies
for returning high watermarks.

It also removes the requirement that HighWatermark values themselves
be stored with nanosecond precision.
@gdbelvin gdbelvin force-pushed the gdbelvin:high_test branch from 4d44f1e to 711dcc8 Nov 12, 2019
@gdbelvin

This comment has been minimized.

Copy link
Collaborator Author

gdbelvin commented Nov 12, 2019

TestHighWatermark has been reworked to avoid any dependency on the time fidelity of the storage implementation.

This also has the happy side effect of not imposing a particular time fidelity on the batch definition table. (Previous versions of this PR were going to force nanosecond fidelity)

@gdbelvin gdbelvin requested a review from pavelkalinnikov Nov 12, 2019
if got := len(rows); got != tc.want {
t.Fatalf("ReadLog(%v): len: %v, want %v", tc.limit, got, tc.want)
}
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {

This comment has been minimized.

Copy link
@mhutchinson

mhutchinson Nov 12, 2019

Contributor

I feel like I saw Pavel use an empty string instead of the fmt.Sprintf description and the framework automatically generates a description?

This comment has been minimized.

Copy link
@gdbelvin

gdbelvin Nov 12, 2019

Author Collaborator

Hmm, I suppose that's true. For tests with the same name, the framework will postpend _$i to the name. That feels a bit hacky. Here I preserve an explicit link between the test name and the row it's testing.

type LogsReadWriter interface {
sequencer.LogsReader
// Send submits the whole group of mutations atomically to a log.
// TODO(gbelvin): Create a batch level object to make it clear that this a batch of updates.

This comment has been minimized.

Copy link
@mhutchinson

mhutchinson Nov 12, 2019

Contributor

If a batch is a first-class object then it seems subtle and indirect that it is constructed by passing in a bunch of EntryUpdates at the same time. This feels like it should be a slice at a minimum.

For the purposes of just this method I wouldn't go beyond turning the varargs mutation into a slice, but an API shouldn't be designed a method at a time so I'd need to re-review the whole API & future plans to make a call on the situation.

core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
core/integration/storagetest/mutation_logs_reader.go Outdated Show resolved Hide resolved
@gdbelvin gdbelvin force-pushed the gdbelvin:high_test branch from 6f633e3 to eff04ab Nov 12, 2019
@gdbelvin gdbelvin changed the title Define a Highwatermarks test Support and test mutation log queries at intermediate timestamps Nov 12, 2019
@gdbelvin gdbelvin force-pushed the gdbelvin:high_test branch from eff04ab to b0363e7 Nov 12, 2019
@gdbelvin gdbelvin merged commit ea1b47d into google:master Nov 12, 2019
5 checks passed
5 checks passed
GolangCI No issues found!
Details
Travis CI - Pull Request Build Passed
Details
cla/google All necessary CLAs are signed
codecov/patch 100% of diff hit (target 65.54%)
Details
codecov/project 65.61% (+0.06%) compared to 36d41a9
Details
@gdbelvin gdbelvin deleted the gdbelvin:high_test branch Nov 12, 2019
gdbelvin added a commit to gdbelvin/keytransparency that referenced this pull request Nov 13, 2019
* master:
  Reduce log spam (google#1382)
  Support and test mutation log queries at intermediate timestamps (google#1380)
gdbelvin added a commit to gdbelvin/keytransparency that referenced this pull request Nov 13, 2019
* master:
  Define watermarks as micros (google#1384)
  Library for converting time.Time to sequencer watermarks (google#1381)
  Reduce log spam (google#1382)
  Support and test mutation log queries at intermediate timestamps (google#1380)
  In memory logs implementation (google#1375)
  Fix generic comparisons on protobuf messages (google#1379)
  Do pagination the right way (google#1378)
  Move the responsibility to pick an input log from storage to the frontend (google#1376)
  Init metrics for whole test file (google#1373)
  Break layering violation by using native types (google#1374)
  Switch Timestamp storage to mysql DATETIME (google#1369)
  Use testdb in integration tests (google#1371)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.