Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker storage deployment randomly fails No such file or directory #13490

Closed
4 tasks done
mgsnuno opened this issue May 21, 2024 · 2 comments
Closed
4 tasks done

docker storage deployment randomly fails No such file or directory #13490

mgsnuno opened this issue May 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@mgsnuno
Copy link

mgsnuno commented May 21, 2024

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

  • flow deployment with docker image storage randomly fails while loading flow code from storage.
  • it looks as if the flow code loading part randomly does NOT run in the worker, causing this issue.
  • similar issues with github storage, but it was dependencies import errors, as if it was running not in the specified image.

Reproduction

  • deploy a flow with docker storage
deployment_id = flow.deploy(
  name=self.deploy_name,
  image=self.image_name,
)
  • start a flow run in the prefect server
  • workers will randomly fail to start the flow with error bellow

Error

Downloading flow code from storage at '.'
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "", line 879, in exec_module
  File "", line 1016, in get_code
  File "", line 1073, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1la5862wprefect/pipelines/flows/test.py'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 422, in retrieve_flow_then_begin_flow_run
    else await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 100, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/deployments/deployments.py", line 317, in load_flow_from_flow_run
    flow = await run_sync_in_worker_thread(load_flow_from_entrypoint, str(import_path))
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 136, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect/flows.py", line 1668, in load_flow_from_entrypoint
    flow = import_object(entrypoint)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/importtools.py", line 201, in import_object
    module = load_script_as_module(script_path)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/importtools.py", line 164, in load_script_as_module
    raise ScriptError(user_exc=exc, path=path) from exc
prefect.exceptions.ScriptError: Script at 'pipelines/flows/test.py' encountered an exception: FileNotFoundError(2, 'No such file or directory')

Versions

Version:             2.19.1
API version:         0.8.4
Python version:      3.10.14
Git commit:          17a1b1d8
Built:               Thu, May 16, 2024 3:33 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         server

Additional context

  • kubernetes worker
  • using docker storage for flow code with these lines added at the end of the Dockerfile:
    COPY . /opt/prefect/flows/
    WORKDIR /opt/prefect/flows/
    
  • migrated to docker storage from github storage, because of similar random errors while loading the flow code, but in the case of git storage it was dependencies import errors, as if it was running not in the specified image.
  • no issues with old agent/blocks code
@mgsnuno mgsnuno added bug Something isn't working needs:triage labels May 21, 2024
@mgsnuno
Copy link
Author

mgsnuno commented May 21, 2024

  • on a normal working deployment run the flow logs start with:
Worker 'KubernetesWorker bdbd6bb1-2083-4f55-ab1b-77ff81b20182' submitting flow run 'a4c220af-e62b-481b-bfab-64e5cd8a1fd6'
Creating Kubernetes job...
Completed submission of flow run 'a4c220af-e62b-481b-bfab-64e5cd8a1fd6'
Job 'transparent-llama-44pb7': Pod has status 'Pending'.
Job 'transparent-llama-44pb7': Pod has status 'Running'.
Opening process...
Downloading flow code from storage at '.'

....

  • on a failed deployment run the flow logs start straightaway with:
Downloading flow code from storage at '.'
Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "", line 879, in exec_module
  File "", line 1016, in get_code
  File "", line 1073, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp329n4crpprefect/pipelines/flows/test.py'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 422, in retrieve_flow_then_begin_flow_run
    else await load_flow_from_flow_run(flow_run, client=client)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 100, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/deployments/deployments.py", line 317, in load_flow_from_flow_run
    flow = await run_sync_in_worker_thread(load_flow_from_entrypoint, str(import_path))
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/asyncutils.py", line 136, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/prefect/flows.py", line 1668, in load_flow_from_entrypoint
    flow = import_object(entrypoint)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/importtools.py", line 201, in import_object
    module = load_script_as_module(script_path)
  File "/usr/local/lib/python3.10/site-packages/prefect/utilities/importtools.py", line 164, in load_script_as_module
    raise ScriptError(user_exc=exc, path=path) from exc
prefect.exceptions.ScriptError: Script at 'pipelines/flows/test.py' encountered an exception: FileNotFoundError(2, 'No such file or directory')

Summary: it is as if the worker skips some steps, any pointers of where the problem might be?

EDIT: after further investigation I notice that when it errors, the logs in prefect server UI do not even show up in the worker CLI logs, any pointers of this erratic behavior?

@mgsnuno
Copy link
Author

mgsnuno commented May 24, 2024

SOLVED: old agent (currently running still) was "stealing" flow runs from worker work pool, because the work queue names were the same;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant