Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit ReconnectNodeRemover synchronization #9831

Closed
artemananiev opened this issue Nov 10, 2023 · 1 comment
Closed

Revisit ReconnectNodeRemover synchronization #9831

artemananiev opened this issue Nov 10, 2023 · 1 comment

Comments

@artemananiev
Copy link
Member

This issue was found during investigation for #9764.

Current node remover threading model is as follows:

  • LearnerThread receives all the nodes from the teacher
  • On the same thread, nodes are deserialized and put to a) node remover b) reconnect hasher
  • The remover doesn't handle the nodes right away, but puts them to a queue
  • The queue is processed on a different thread
  • Once LearnerThread is done with its main loop, the node remover is closed. Some elements may still be unprocessed in the queue
  • In the fix for ReconnectNodeRemover.getRecordsToDelete() loops forever #9764 a call to process remaining elements in the queue was added

After #9764, it seems to work, but synchronization is still fuzzy. For example, if the hasher calls ReconnectNodeRemover.getRecordsToDelete(), and the remover is closed in a parallel threads, the queue thread is stopped, but there are calls to workQueue.pause() and workQueue.resume() in getRecordsToDelete(). What do these calls do if the queue thread is stopped? Probably, no-ops + error log messages, probably not.

Is a separate queue thread needed in node remover at all? Can node requests be handled right on the learner thread? This ticket is to find answers to questions like this.

@artemananiev
Copy link
Member Author

ReconnectNodeRemover has been re-worked completely as a part of #11231, it now doesn't use any queues or threads. This issue is no longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant