DM-32131: Merge Cassandra branch of APDB #23

andy-slac · 2021-10-25T02:16:53Z

Adds new implementation for Apdb interface based on Apache Cassandra.

This merge contains a more or less complete history of all development and experimentation from the past couple of years, I decided not to squash that history as it may be potentially useful. It does not make sense to review individual commits, it's better to look at the final version.

There is also some refactoring of SQL implementation and tests to reduce code duplication between two implementations, but functionality of ApdbSql did not change. The pixelId column was removed from schema YAML definition, it is now added where needed by schema class - this is to keep YAML schema reusable between SQL and Cassandra (the latter does not have pixelId but it adds a different set of columns).

Cassandra does not store NULL so for realistic test we need non-NULL payload for all table columns. I added new configuration option for Cassandra backend which fills columns that are not explicitly set with random data. I'm doing it the backend instead of ap_proto because it is much simplere to do here and I need a quick and dirty way to do it right now. For final implementation we won't need it ao it will be removed when we settle on what final implementation is going to look like.

Previous commits on this branch come from rebasing of the testing branch (u/andy-slac/cassandra-2), next step is to update the implementation for new abstact interface. This commit adds a unit test for ApdbCassandra to make sure that it imports and can trivially function. This unit test can only be ran against actual Cassandra cluster, needs special environment. I added a protection for the case of the missing cassandra-driver package, it is still possible to import the module in that case but instantiation of ApdbCassandra fails.

morriscb

Small stuff to change.

morriscb · 2021-10-26T04:07:34Z

config/apdb-cassandra.py

@@ -0,0 +1,86 @@
+import lsst.dax.apdb.apdbCassandra
+assert type(config)==lsst.dax.apdb.apdbCassandra.ApdbCassandraConfig, 'config is of type %s.%s instead of lsst.dax.apdb.apdbCassandra.ApdbCassandraConfig' % (type(config).__module__, type(config).__name__)


Put spaces around all = and == when setting and testing. https://developer.lsst.io/python/style.html#binary-operators-should-be-surrounded-by-a-single-space-except-for

This file is generated by pex_config, I want to avoid fixing formatting in this file. In case I want to re-generate it later again, I'd have to re-format it again and again.

morriscb · 2021-10-26T04:14:44Z

python/lsst/dax/apdb/apdb.py

+    )
+    extra_schema_file = Field(
+        dtype=str,
+        doc="Location of (YAML) configuration file with extra schema",


Might be worth explaining how an extra schema works relative to the standard schema.

morriscb · 2021-10-26T04:16:45Z

python/lsst/dax/apdb/apdbSql.py

-                  doc="If True then print/log timing information",
-                  default=False)
+    db_url = Field(
+        dtype=str, doc="SQLAlchemy database connection URI"


Did you want to newline the doc like the other config options?

morriscb · 2021-10-26T04:18:36Z

python/lsst/dax/apdb/apdbSql.py

+        default=64
+    )
+    htm_index_column = Field(
+        dtype=str, default="pixelId",


morriscb · 2021-10-26T04:18:44Z

python/lsst/dax/apdb/apdbSql.py

+        doc="Name of a HTM index column for DiaObject and DiaSource tables"
+    )
+    ra_dec_columns = ListField(
+        dtype=str, default=["ra", "decl"],


morriscb · 2021-10-26T05:15:37Z