Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy Loading in Environments Results in Unknown Worker #979

Closed
lamxw2 opened this issue Oct 26, 2023 · 1 comment
Closed

Lazy Loading in Environments Results in Unknown Worker #979

lamxw2 opened this issue Oct 26, 2023 · 1 comment
Labels
area:oss Related to Oban OSS closed:wontfix This will not be worked on kind:bug Something isn't working

Comments

@lamxw2
Copy link

lamxw2 commented Oct 26, 2023

Precheck

Environment

  • Oban Version: 2.15.2
  • PostgreSQL Version: 15.3
  • Elixir & Erlang/OTP Versions (elixir --version): 1.12.2 & Erlang/OTP 24

Current Behavior

I can insert multiple jobs with the same worker, but the jobs occasionally fail with unknown worker error on staging environment.

The error recorded in oban_jobs table is

{"{\"at\": \"2023-10-24T11:29:35.194865Z\", \"error\": \"** (RuntimeError) unknown worker: <my worker module name>\", \"attempt\": 1}"}

The module definitely exists as some jobs with the same worker successfully execute. Following this step from the above ElixirForum link also confirms that the module is valid.

When inspecting the oban_jobs table data for the differences between the successful and errored jobs, I noticed that jobs attempted by the node with my machine name (I think local node? I'm not very familiar with OTP) successfully start, while the nodes named elixir-node fail.

I dived into the Oban code and noted that the worker String to module conversion is done using Module.safe_concat, which (probably) indicates that the worker module was not loaded yet, causing the job to fail with unknown worker error.

Based on our method of deployment, I think this issue would only occur locally/in staging and not in production (due to lazy module loading outside of mix release), however it would be great if the jobs' execution would not be flaky as it affects testing.

Expected Behavior

Ensure the worker module is loaded before being determined and returned as unknown worker error.

@sorentwo
Copy link
Member

Thanks for the detailed issue!

Calling Code.ensure_loaded for every worker resolution will hit a bottleneck at the code server. That bottleneck will slow down all job execution, which isn't acceptable when there are other solutions.

The correct way to handle this is to run in embedded mode (not lazy) in all production-ish environments to ensure all modules are loaded.

@sorentwo sorentwo closed this as not planned Won't fix, can't repro, duplicate, stale Oct 26, 2023
@sorentwo sorentwo added kind:bug Something isn't working closed:wontfix This will not be worked on area:oss Related to Oban OSS labels Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:oss Related to Oban OSS closed:wontfix This will not be worked on kind:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants