Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partition awareness to base task components #16

Closed
morozov opened this issue Mar 2, 2021 · 1 comment · Fixed by #22
Closed

Add partition awareness to base task components #16

morozov opened this issue Mar 2, 2021 · 1 comment · Fixed by #22
Assignees
Labels
SQL Server Debezium connector for SQL Server

Comments

@morozov
Copy link

morozov commented Mar 2, 2021

In #15, there's the TaskOffsetContext implemented that contains all partitions and offsets to be processed by a task. Next, we need to implement the iteration over all task partitions in the event coordinator by introducing a parallel "partitioned" event source class hierarchy. There are two points to this:

  1. Not all connectors are partitioned by design (the MySQL one isn't).
  2. By introducing a new API, we'll be able to continue with the changes only for SQL Server and simplify the migration of other connectors to the new API later.

Acceptance criteria:

  1. Introduce the following classes and interfaces:
    • PartitionedChangeEventSourceCoordinator
    • PartitionedChangeEventSourceFactory
    • PartitionedSnapshotChangeEventSource
    • PartitionedStreamingChangeEventSource
      Apart from being parametrized with <O extends OffsetContext>, these APIs should be parametrized with <P extends TaskPartition>. This will allow the underlying components to use the database name of the currently processed partition via partition.getDatabaseName(). Copy/paste where extension is not possible (e.g. the coordinator).
  2. Alongside with the OffsetContext offsetContext arguments introduced in DBZ-2975: Extract offset context from object states to method signatures #13, add the TaskPartition partition argument where necessary.
  3. Move the loop introduced in DBZ-2975: Prototype multiple offset contexts per task #15 to the partitioned coordinator: https://github.com/sugarcrm/debezium/blob/efc7d20e4d973ee5e04863b5923f154ae41a7c38/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnectorTask.java#L128-L132 Leave the break temporarily, repeat for each source (snapshot, streaming).
  4. Update the SQL Server connector components to implement the partitioned API.

UPD: work on this issue showed that there's more classes apart from the Coordinator which need their APIs changed. Copy-pasting more code will increase the likelihood of being not in-sync with upstream when the changes are made to the duplicated code. In order to expedite development and lower this risk, let's take a shortcut:

  1. Instead of introducing new class hierarchy, add partition awareness to the existing one.
  2. Temporarily delete all plugins except for the one for SQL Server that would otherwise require code/API changes.
@mikekamornikov
Copy link

Closed via #22 .

@morozov morozov changed the title Introduce a partitioned event source class hierarchy Add partition awareness to base task components Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SQL Server Debezium connector for SQL Server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants