Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43097: Replication-related improvements #48

Merged
merged 8 commits into from Apr 16, 2024
Merged

Conversation

andy-slac
Copy link
Collaborator

A big refactoring happened as well:

  • Replication-related methods were moved from Apdb class to a new ApdbReplica class.
  • "InsertId" concept has disappeared and was replaced by "ReplicaChunk", lots of renamings related to that.
  • Schema for tables that support replication stuff has been updated. There is a new version number allocated to the replication part of the schema.
  • Backend-specific things were moved to separate apdb.sql and apdb.cassandra packages.

Interfaces used by AP pipeline are unchanged and will work as before.

Replication tools are being implemented in dax_ppdb package.

@andy-slac andy-slac force-pushed the tickets/DM-43097 branch 6 times, most recently from 7c07e7a to 64cdd83 Compare April 16, 2024 16:26
Replication-related methods now appear in a separate class `ApdbReplica`.
Instances of `ApdbReplica` are made from the same ApdbConfig.
Since code version for ApdbReplica classes is now separate from Apdb
code version we also need to store and check those numbers. Those new
version are stored and checked only if replica tables are enabled
(use_insert_id is true).
DiaInsertId structure changed, `id` now has integer type. One DiaInsertId
corresponds to multiple `store` calls. This is needed to reduce the number
of transfer chunks. One chunk corresponds to a configurable time window (10
minutes by default). Note that this update changes version number for replica
classes to 1.0.0. We should not have databases with replication enabled so
this should not be an issue.

Change types of collections returned from ApdbTableData
@andy-slac andy-slac force-pushed the tickets/DM-43097 branch 3 times, most recently from b6705a9 to 5dcec20 Compare April 16, 2024 17:08
Copy link
Collaborator Author

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quite big ticket with a lot of refactoring, I'm not going to burden anyone with review, but I have spent few hours reviewing it myself.

InsertId concept is replaced with ReplicaChunk. All names related
to InsertId have been changed to a combination of replica/chunk.
The only remaining mention of insert_id is the config field
`use_insert_id`. I do not want to change it yet, will drop and
replace when I migrate to YAML config. Related to that  - frozen
config also has `use_insert_id` for now.
SQL code is now in `lsst.dax.apdb.sql`, Cassandra is in `...cassandra`.
`felis.simple` is going to be replaced by `felis.datamodel`, but `datamodel`
is not exactly the same as `simple` was. I copied `felis.simple` module here
and updated it to use `datamodel` classes as an input.
Move parts of ApdbSqlSchema class responsible for conversion of
schema model into SQLAlchemy schema into a separate class which can be
reused by dax_ppdb.
@andy-slac andy-slac merged commit 72ec62d into main Apr 16, 2024
8 checks passed
@andy-slac andy-slac deleted the tickets/DM-43097 branch April 16, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant