Skip to content

race condition in bootstrapper #722

@dhellmann

Description

@dhellmann

We have seen some failures in fromager's bootstrap command due to a race condition with re-resolving top-level dependencies.

The error is

2025-08-21 15:13:01,699 DEBUG:fromager.__main__:258: llama_stack_provider_lmeval: could not handle toplevel dependency llama_stack_provider_lmeval (0.2.2)
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 488, in _handle_build_requirements
    self.bootstrap(req=dep, req_type=build_type)
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 135, in bootstrap
    self._add_to_graph(req, req_type, resolved_version, source_url)
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 934, in _add_to_graph
    self.ctx.dependency_graph.add_dependency(
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/dependency_graph.py", line 235, in add_dependency
    raise ValueError(
ValueError: Trying to add setuptools==80.8.0 to parent llama-stack-provider-lmeval==0.2.2 but llama-stack-provider-lmeval==0.2.2 does not exist

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/__main__.py", line 256, in invoke_main
    main(auto_envvar_prefix="FROMAGER")
  File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/decorators.py", line 46, in new_func
    return f(get_current_context().obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/decorators.py", line 34, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/commands/bootstrap.py", line 482, in bootstrap_parallel
    ctx.invoke(
  File "/opt/app-root/lib64/python3.12/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/click/decorators.py", line 46, in new_func
    return f(get_current_context().obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/commands/bootstrap.py", line 184, in bootstrap
    bt.bootstrap(req, requirements_file.RequirementType.TOP_LEVEL)
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 256, in bootstrap
    self._prepare_build_dependencies(req, sdist_root_dir, build_env)
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 433, in _prepare_build_dependencies
    self._handle_build_requirements(
  File "/opt/app-root/lib64/python3.12/site-packages/fromager/bootstrapper.py", line 490, in _handle_build_requirements
    raise ValueError(f"could not handle {self._explain}") from err
ValueError: could not handle toplevel dependency llama_stack_provider_lmeval (0.2.2)

I think the problem here is a race condition with resolving llama_stack_provider_lmeval.

Looking at the logs, I see it first pick up version 0.2.1 and then later 0.2.2. I made a change recently that causes the bootstrapper to always use the same version for a given requirement specifier, so that should be eliminated.

It's troubling that the 0.2.2 version wasn't automatically added to the graph, though. I think in this case that's because llama_stack_provider_lmeval is a top-level dependency, and those are added to the graph outside of the bootstrapper when the bootstrap command starts up and resolves them all to start. Then later when the rule is resolved again it gets a different answer and that version is not already in the graph.

I see in bootstrapper.py in _add_to_graph that the function returns if the dependency type is top-level, because it's assuming that those packages are already in the graph. I think that's a mistake, it needs a more careful check that includes the version number.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions