Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INDIS]-check whether new load balancer executor is shutdown before using in dual-read mode #984

Merged
merged 2 commits into from
Feb 29, 2024

Conversation

brycezhongqing
Copy link
Collaborator

@brycezhongqing brycezhongqing commented Feb 29, 2024

Background

For samza job, the LoadBalancer will be shutdown during running.
2024-02-28 22:28:49.469 [main] DynamicClient [INFO] shutting down dynamic client
2024-02-28 22:28:49.469 [main] DualReadLoadBalancer [INFO] New load balancer successfully shut down
2024-02-28 22:28:49.470 [main] ZKFSLoadBalancer [INFO] Shutting down
2024-02-28 22:28:49.470 [Indis xDS client executor-4-1] XdsClientImpl [INFO] Shutting down

More context could check from :
https://linkedin-randd.slack.com/archives/C033U0SJFEX/p1709165894672129?thread_ts=1709075763.043899&cid=C033U0SJFEX

SI-37856

As the newLb executor in DualReadLoadBalancer is shut down, so DualReadLoadBalancer throws exceptions when it’s already shutdown but gets new tasks submitted to it. Then Samza code force-exits when uncaught exception is found.

Solution

So for _newLbExecutor in dual-read mode. We need to check whether is it shutdown before using to avoid new tasks process.

Test done

Regression test in local on toki-restli-backend for dual-read mode

Here is the full log:

toki-restli-backend.log

curli -v "https://localhost:18793/TokiRestliBackend/resources/tokiBackend?action=tokiServer" -X POST --data '{"message": "Hello", "delay": "100"}' --dv-auth SELF

Xnip2024-02-28_20-14-55
Untitled

Follow up

As it’s hard to catch the timing for when the dual read LB is not shutdown yet, but the newLbExecutor is shut down. So we could not get the newLb executor is shutdown already. Skipping getClient on newLb executor. log using mint undeploy method.
Will double check with samza team to check their deployment is no problem.

@bohhyang
Copy link
Contributor

bohhyang commented Feb 29, 2024

Code LGTM. Please test locally with toki and attach the log here. You can run mint undeploy in one terminal, send a request to it in another terminal during the shutdown. The log will show the messages you added.

@brycezhongqing
Copy link
Collaborator Author

brycezhongqing commented Feb 29, 2024

Code LGTM. Please test locally with toki and attach the log here. You can run mint undeploy in one terminal, send a request to it in another terminal during the shutdown. The log will show the messages you added.

Hi, @bohhyang Attach the log
And I also tried

run mint undeploy in one terminal, send a request to it in another terminal during the shutdown. The log will show the messages you added

It seems not work. During mint undeploy, I continuesly sent reqeust to service, but I could not find any shutdown info log even following:

XdsClientImpl [INFO] Shutting down

@bohhyang
Copy link
Contributor

What responses you get for those requests you send at shutdown? Could you set a breakpoint at getClient and see if it stops for the requests at shutdown?

Please attach the log file.

@brycezhongqing
Copy link
Collaborator Author

What responses you get for those requests you send at shutdown? Could you set a breakpoint at getClient and see if it stops for the requests at shutdown?

Please attach the log file.

Thanks. Updated the description and attach the log file

Copy link
Contributor

@bohhyang bohhyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. In the future, upload the log file for long-term reference since pasteIn will expire in a few days.

@brycezhongqing brycezhongqing merged commit 4e5b95d into master Feb 29, 2024
2 checks passed
@brycezhongqing brycezhongqing deleted the zhonchen/fix_newLB_shutdown_case branch February 29, 2024 18:07
@brycezhongqing
Copy link
Collaborator Author

brycezhongqing commented Feb 29, 2024

thanks. In the future, upload the log file for long-term reference since pasteIn will expire in a few days.

Thanks for reminder. Upload the log file instead of pastein

@bohhyang
Copy link
Contributor

bohhyang commented Mar 5, 2024

toki-restli-backend.log
curli -v "https://localhost:18793/TokiRestliBackend/resources/tokiBackend?action=tokiServer" -X POST --data '{"message": "Hello", "delay": "100"}' --dv-auth SELF

@brycezhongqing you were hitting toki backend app, which is wrong, and looking at the wrong log file. It should be toki frontend app. The call stack is toki frontend uses d2 client to call toki backend. Toki backend doesn't make downstream calls at all so there is no d2 service discovery happening there.

Please check the readme of Toki https://github.com/linkedin-multiproduct/toki, and use the "Querying toki" curli call "curli -v "https://localhost:18792/Toki/resources/**toki**?action=**tokiRestClient**". And the valid log is toki-war.log.

You can deploy only toki frontend with "mint deploy -w toki-war -f qei-ltx1", which will send requests to EI toki-backend, so you won't mix the two apps.

@brycezhongqing
Copy link
Collaborator Author

brycezhongqing commented Mar 5, 2024

toki-restli-backend.log curli -v "https://localhost:18793/TokiRestliBackend/resources/tokiBackend?action=tokiServer" -X POST --data '{"message": "Hello", "delay": "100"}' --dv-auth SELF

@brycezhongqing you were hitting toki backend app, which is wrong, and looking at the wrong log file. It should be toki frontend app. The call stack is toki frontend uses d2 client to call toki backend. Toki backend doesn't make downstream calls at all so there is no d2 service discovery happening there.

Please check the readme of Toki https://github.com/linkedin-multiproduct/toki, and use the "Querying toki" curli call "curli -v "https://localhost:18792/Toki/resources/**toki**?action=**tokiRestClient**". And the valid log is toki-war.log.

You can deploy only toki frontend with "mint deploy -w toki-war -f qei-ltx1", which will send requests to EI toki-backend, so you won't mix the two apps.

Thanks @bohhyang for ur correction. Yes. U are right, toki backend will only echo the message (checked from the code)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants