Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Remove old-style (direct) TCP replication support #11728

Open
richvdh opened this issue Jan 12, 2022 · 10 comments
Open

Remove old-style (direct) TCP replication support #11728

richvdh opened this issue Jan 12, 2022 · 10 comments
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Cleanup Things we want to get rid of, but aren't actively causing pain

Comments

@richvdh
Copy link
Member

richvdh commented Jan 12, 2022

the old direct-to-master TCP replication protocol has been deprecated since Synapse 1.18.0 (nearly a year and a half, as of this writing).

We still have a bunch of code to support it (DirectTcpReplicationClientFactory etc), and it's still mentioned in various bits of documentation (in particular https://matrix-org.github.io/synapse/develop/tcp_replication.html needs an update). Having multiple ways to do it is confusing and hard to maintain, so we should remove it properly.

@richvdh
Copy link
Member Author

richvdh commented Jan 12, 2022

( https://matrix-org.github.io/synapse/develop/tcp_replication.html claims "Currently all streams are written by a single process", which is no longer true. )

@H-Shay H-Shay added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Jan 13, 2022
@reivilibre
Copy link
Contributor

Notes from Patrick

  1. You have to deal w/ removing things from the config. Are these now errors to config, or things to just ignore? (Note that our upgrade guide says "Hey just leave the old config settings they'll be ignored if you used Redis!")
  2. There are worker docs to update.
  3. It is slightly intertwined with Redis stuff / just touches a lot of stuff.

Sounds to me like step 1 might be altering documentation to forget about TCP replication. We can then remove the then-undocumented TCP replication 😇.

@clokep
Copy link
Member

clokep commented Feb 11, 2022

A big stopper to this is that many of the tests still assume TCP replication. I took a look at this and it isn't easy to untangle. 😢

There's a BaseStreamTestCase and a BaseMultiWorkerStreamTestCase, they have a lot of duplicate code between them but I think only the second one actually uses redis. I got a decent ways there by copying and pasting some code around, but still had a handful of failures that I would probably need @erikjohnston to help me figure out.

@clokep
Copy link
Member

clokep commented Feb 16, 2022

More concrete steps of what to do for this:

  1. Update unit tests to not use the TCP replication classes (instead using the Redis ones). See BaseStreamTestCase. -- Reduce the number of tests using TCP replication #13543
  2. Update sytest so that the Synapse runners use Redis instead of TCP replication. -- Always use Redis with Synapse workers. sytest#1282
  3. Update Synapse CI to always use Redis for workers. -- Remove configuration options for direct TCP replication. #13647
  4. Remove the ability to configure TCP replication listeners. -- Remove configuration options for direct TCP replication. #13647
  5. Remove all the TCP replication protocol code.
  6. Rename some things which refer to TCP replication but are also used by Redis. -- Rename get_tcp_replication. #12192
  7. Update documentation. -- Remove configuration options for direct TCP replication. #13647

@clokep
Copy link
Member

clokep commented Mar 24, 2022

Update unit tests to not use the TCP replication classes (instead using the Redis ones). See BaseStreamTestCase.

@erikjohnston and I started some work on this in the clokep/test-redis branch. There are some inline notes of what needs to be done, but the gist is that reconnecting to Redis doesn't seem to be working properly and something is going wrong with our fake transports and such. 😢

@JacksonChen666
Copy link

Documentation is still somehow outdated using options that seem to be related to TCP replication workers

@clokep
Copy link
Member

clokep commented Sep 13, 2022

Documentation is still somehow outdated using options that seem to be related to TCP replication workers

@JacksonChen666 Can you please be more specific about which part of that page is outdated? It is quite long.

@JacksonChen666
Copy link

@clokep i might be completely wrong (now that i double checked), but the options with the names like worker_replication_host might be the outdated ones, but i think in this case i'm wrong (probably because of confusion with worker_replication_port in the upgrade notes)

@clokep
Copy link
Member

clokep commented Sep 13, 2022

@clokep i might be completely wrong (now that i double checked), but the options with the names like worker_replication_host might be the outdated ones, but i think in this case i'm wrong (probably because of confusion with worker_replication_port in the upgrade notes)

worker_replication_host is still needed -- it is used to make HTTP requests back to the main process. worker_replication_port is no longer used (as the upgrade notes say). It is confusing -- this is the first step towards making this all a bit easier!

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 30, 2022

All of the steps in #11728 (comment) are done---I think we can close this now?

Edit: oh, maybe not. This one doesn't have an associated PR

Remove all the TCP replication protocol code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Cleanup Things we want to get rid of, but aren't actively causing pain
Projects
None yet
Development

No branches or pull requests

6 participants