feat: beginnings of a mysql backend #22

pjenvey · 2018-09-10T22:52:05Z

w/ some initial calls and a test suite migrated from the sqlite version

prefers raw DQL (note: not DML) queries vs diesel's query builder for
potential reuse for other backends (spanner)
TODO: further fleshing out of the types, likely wanting i64 or wrappers
everywhere (as all spanner has is INT64) -- nor should the db layer be
responsible for conversions from unsigned

Issue #18

bbangert · 2018-09-11T00:22:50Z

migrations/2018-08-28-010336_init/up.sql

+CREATE TABLE `collections` (
+    `id` INT PRIMARY KEY          NOT NULL AUTO_INCREMENT,
+    `name` VARCHAR(32) UNIQUE     NOT NULL,
+    `modified` BIGINT DEFAULT '0' NOT NULL


What kind of modification would this undergo?

I don't think we need a modified timestamp here, because any modification timestamp is going to be per-user rather than global.

bbangert · 2018-09-11T00:24:24Z

migrations/2018-08-28-010336_init/up.sql

+CREATE TABLE `user_collections` (
+    `user_id` INT       NOT NULL,
+    `collection_id` INT NOT NULL,
+    `modified` BIGINT   NOT NULL,


Again, I'm curious about what kind of modifications are being tracked here. I realize this is likely due to something the Go/Python version tracks, but it'd be nice to document either inline or near here why this field exists.

This should be the last-modified timestamp of the collection as a whole, and should change whenever an item is added, modified or deleted in the collection. It will be greater-than-or-equal-to the MAX() of the modified column for that collection in the bso table, with the greater-than case being because an item was deleted.

FWIW, the python server has a bug here in its handling of deleted collections:

mozilla-services/server-syncstorage#62

Basically, it tries to calculate the last-modified time of the storage as a whole by doing SELECT MAX(modified) FROM user_collections WHERE uid = X. That's incorrect in the case of a deleted collection, which should cause the last-modified time of the storage as a whole to increase, but won't affect the last-modified time of any remaining collections.

It's clearly an edge-case, because we haven't bothered to actually fix it in the python version. But for greenfields code it's probably worth doing it right the first time. I suggest adding a deleted_at timestamp field to the user_collections table, so that we can explicit track the deletion of collections. But that can be a follow-up issue if necessary.

bbangert · 2018-09-11T00:38:07Z

src/db/error.rs

+    }
+}
+
+impl Fail for DbError {


src/db/mysql/test.rs

rfk

This looks like a great start!

Architecturally, I'm a little worried about the way that the DB is taking transactions and querying the current time from behind its API boundary. This could lead to subtle edge-cases under high concurrency, such as two concurrent PUTs inserting items with the same timestamp into the same collection, which could cause the two clients to never sync down each others changes.

Ideally, each HTTP request would be processed in a single logical transaction and at a single logical instant in time. (It might help to think of the modified integers here not as timestamps, but as opaque version numbers, with a new version number being generated for each change to the collection).

I wonder if there's a way to help enforce that at the DB trait API level here.

rfk · 2018-09-11T04:04:25Z

migrations/2018-08-28-010336_init/up.sql

@@ -0,0 +1,70 @@
+CREATE DATABASE IF NOT EXISTS `syncstorage` /*!40100 DEFAULT CHARACTER SET latin1 */;


The BSO payload can in theory be arbitrary unicode, since it's JSON. Should we set an explicit encoding like utf8mb4 to guard against any weirdness in unicode handling?

rfk · 2018-09-11T04:05:45Z

migrations/2018-08-28-010336_init/up.sql

+    `sortindex` INT DEFAULT NULL,
+
+    `payload` MEDIUMTEXT                    NOT NULL,
+    `payload_size` INT DEFAULT '0'          NOT NULL,


This payload_size is a separate column to (in theory) make it quicker to calculate total size of items stored in a collection. IIRC we don't ever do that in practice, so it may be worth considering whether that optimization still makes sense for us here.

payload_size does appear to be utilized in the info/quota/collectionion_usage calls

It might be interesting to compare SELECT SUM(payload_size) vs SELECT SUM(LENGTH(payload)) for this purpose to see whether having it as a separate column really provides much value in practice.

To be clear, I don't have any particular objection to keeping it, just wondering whether the extra complexity pays for itself or not.

rfk · 2018-09-11T04:10:04Z

migrations/2018-08-28-010336_init/up.sql

+CREATE TABLE `collections` (
+    `id` INT PRIMARY KEY          NOT NULL AUTO_INCREMENT,
+    `name` VARCHAR(32) UNIQUE     NOT NULL,
+    `modified` BIGINT DEFAULT '0' NOT NULL


I don't think we need a modified timestamp here, because any modification timestamp is going to be per-user rather than global.

rfk · 2018-09-11T04:15:44Z

migrations/2018-08-28-010336_init/up.sql

+CREATE TABLE `user_collections` (
+    `user_id` INT       NOT NULL,
+    `collection_id` INT NOT NULL,
+    `modified` BIGINT   NOT NULL,


This should be the last-modified timestamp of the collection as a whole, and should change whenever an item is added, modified or deleted in the collection. It will be greater-than-or-equal-to the MAX() of the modified column for that collection in the bso table, with the greater-than case being because an item was deleted.

FWIW, the python server has a bug here in its handling of deleted collections:

mozilla-services/server-syncstorage#62

Basically, it tries to calculate the last-modified time of the storage as a whole by doing SELECT MAX(modified) FROM user_collections WHERE uid = X. That's incorrect in the case of a deleted collection, which should cause the last-modified time of the storage as a whole to increase, but won't affect the last-modified time of any remaining collections.

It's clearly an edge-case, because we haven't bothered to actually fix it in the python version. But for greenfields code it's probably worth doing it right the first time. I suggest adding a deleted_at timestamp field to the user_collections table, so that we can explicit track the deletion of collections. But that can be a follow-up issue if necessary.

rfk · 2018-09-11T04:18:21Z

migrations/2018-08-28-010336_init/up.sql

+CREATE TABLE `batches` (
+    `user_id` INT       NOT NULL,
+    `collection_id` INT NOT NULL,
+    `id` varchar(64)    NOT NULL,


The "id" here is the batch id, right? It may be worth naming it "batch_id" or similar to avoid confusion.

rfk · 2018-09-11T04:35:37Z

src/db/mysql/models.rs

+}
+
+impl MysqlDb {
+    pub fn get_collection_id_sync(


It's not obvious to me why some of these are suffixed with _sync and some are not; what's the significance?

the higher level db interface (trait) supplies async calls. So the MysqlDb impl of this trait will call sync methods via the tokio blocking wrapping call.

rfk · 2018-09-11T04:37:08Z

src/db/mysql/models.rs

+        }
+        */
+
+        // XXX: consider mysql ON DUPLICATE KEY UPDATE?


Please do :-)

rfk · 2018-09-11T04:39:44Z

src/db/mysql/models.rs

+                    .execute(&self.conn)?;
+            } else {
+                let payload = bso.payload.as_ref().map(Deref::deref).unwrap_or_default();
+                let sortindex = bso.sortindex.unwrap_or_default();


It's not obvious to me whether this defaults sortindex to NULL or to 0; I believe NULL is the correct behaviour.

good catch, this is leftover from the port from go which instead defaulted to 0.

rfk · 2018-09-11T04:45:26Z

src/db/mysql/models.rs

+
+        // fetch an extra row to detect if there are more rows that
+        // match the query conditions
+        query = query.limit(if limit >= 0 { limit + 1 } else { limit });


The docs aren't clear on what happens when limit < 0, does it default to "no limit"?

rfk · 2018-09-11T04:48:24Z

src/db/mysql/test.rs

+    assert_eq!(bso.payload, payload);
+    assert_eq!(bso.sortindex, Some(sortindex));
+    // XXX: go version assumes ttl was updated here?
+    //assert_eq!(bso.expiry, modified + ttl);


FWIW I wouldn't expect the ttl to be updated unless you explicitly sent a new TTL in the update.

bbangert · 2018-09-11T18:46:40Z

src/db/mysql/models.rs

+        let mut query = bso::table
+            //.select(bso::table::all_columns())
+            .select((bso::id, bso::modified, bso::payload, bso::sortindex, bso::expiry))
+            .filter(bso::user_id.eq(user_id as i32)) // XXX:


I assume the // XXX are to maybe do a convert operation with ? so we can ensure we catch casting errors?

pjenvey · 2018-09-11T22:41:44Z

Will address further things later (likely switching get_bsos limit to u64, table encoding, batches table/architecture is still up in the air).

I'm already intending to have the transactions work like you suggest @rfk -- a lot like the python version: one transaction started (TODO: a transaction() call added to the Db trait) per handler request, having all db calls taking place within it.

modified timestamp likely following the same pattern

w/ some initial calls and a test suite migrated from the sqlite version - prefers raw DQL (note: not DML) queries vs diesel's query builder for potential reuse for other backends (spanner) - TODO: further fleshing out of the types, likely wanting i64 or wrappers everywhere (as all spanner has is INT64) -- nor should the db layer be responsible for conversions from unsigned Issue #18

pjenvey requested a review from bbangert September 10, 2018 22:52

bbangert requested a review from rfk September 11, 2018 00:13

bbangert reviewed Sep 11, 2018

View reviewed changes

src/db/error.rs

}

}

impl Fail for DbError {

Copy link

Member

bbangert Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

bbangert reviewed Sep 11, 2018

View reviewed changes

src/db/mysql/test.rs Outdated Show resolved Hide resolved

bbangert reviewed Sep 11, 2018

View reviewed changes

src/db/mysql/test.rs Outdated Show resolved Hide resolved

rfk reviewed Sep 11, 2018

View reviewed changes

bbangert reviewed Sep 11, 2018

View reviewed changes

bbangert previously approved these changes Sep 11, 2018

View reviewed changes

pjenvey mentioned this pull request Sep 11, 2018

Add a deleted_at to user_collections table #23

Closed

pjenvey dismissed bbangert’s stale review via 6f456b2 September 11, 2018 22:27

pjenvey force-pushed the feat/18 branch 2 times, most recently from 6f456b2 to a8cc797 Compare September 11, 2018 22:40

pjenvey force-pushed the feat/18 branch 2 times, most recently from 5722817 to 64d8a48 Compare September 11, 2018 23:28

pjenvey force-pushed the feat/18 branch from 64d8a48 to e878d96 Compare September 11, 2018 23:41

bbangert approved these changes Sep 11, 2018

View reviewed changes

bbangert merged commit fbd314b into master Sep 11, 2018

bbangert deleted the feat/18 branch September 11, 2018 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: beginnings of a mysql backend #22

feat: beginnings of a mysql backend #22

pjenvey commented Sep 10, 2018

bbangert Sep 11, 2018

rfk Sep 11, 2018

bbangert Sep 11, 2018

rfk Sep 11, 2018

pjenvey Sep 11, 2018

bbangert Sep 11, 2018

rfk left a comment

rfk Sep 11, 2018

rfk Sep 11, 2018

pjenvey Sep 11, 2018

rfk Sep 11, 2018

rfk Sep 11, 2018

rfk Sep 11, 2018

rfk Sep 11, 2018

rfk Sep 11, 2018

pjenvey Sep 11, 2018

rfk Sep 11, 2018

rfk Sep 11, 2018

pjenvey Sep 11, 2018

rfk Sep 11, 2018

rfk Sep 11, 2018

bbangert Sep 11, 2018

pjenvey commented Sep 11, 2018

		@@ -0,0 +1,70 @@
		CREATE DATABASE IF NOT EXISTS `syncstorage` /!40100 DEFAULT CHARACTER SET latin1 /;

feat: beginnings of a mysql backend #22

feat: beginnings of a mysql backend #22

Conversation

pjenvey commented Sep 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pjenvey commented Sep 11, 2018