Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] loader must define exec_module() when running Databricks task #3853

Open
2 tasks done
rambrus opened this issue Jul 10, 2023 · 4 comments
Open
2 tasks done

[BUG] loader must define exec_module() when running Databricks task #3853

rambrus opened this issue Jul 10, 2023 · 4 comments
Labels
bug Something isn't working flytekit FlyteKit Python related issue

Comments

@rambrus
Copy link

rambrus commented Jul 10, 2023

Describe the bug

BACKGROUND
I'm trying to run a simplified Flyte task using Databricks plugin.

PREREQUISITES:

@task(
    task_config=Databricks(
        databricks_conf={
           "run_name": "dbx simplified example",
           "existing_cluster_id": "<my-existing-cluster-id>",
           "timeout_seconds": 3600,
           "max_retries": 1,
       }
    ),
    limits=Resources(mem="2000M"),
    cache_version="1",
)
def print_spark_config():
    spark = flytekit.current_context().spark_session
    print(spark.sparkContext.getConf().getAll())

@workflow
def my_databricks_job():
    print_spark_config()

STEPS:

  • Run workflow: pyflyte --verbose run --remote --destination-dir . dbx_simplified_example.py my_databricks_job

ISSUE:
Databricks job run triggered and failed with this error:
TypeError: loader must define exec_module()

ERROR LOG:

TypeError: loader must define exec_module()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command--1> in <cell line: 12>()
     11 
     12 with open(filename, "rb") as f:
---> 13   exec(compile(f.read(), filename, 'exec'))
     14 

/tmp/tmptkt83uf9.py in <module>
      4 
      5 import click
----> 6 from flytekit.bin.entrypoint import fast_execute_task_cmd as _fast_execute_task_cmd
      7 from flytekit.bin.entrypoint import execute_task_cmd as _execute_task_cmd
      8 from flytekit.exceptions.user import FlyteUserException

/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    169             # Import the desired module. If you’re seeing this while debugging a failed import,
    170             # look at preceding stack frames for relevant error information.
--> 171             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    172 
    173             is_root_import = thread_local._nest_level == 1

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/flytekit/__init__.py in <module>
    226 from flytekit.core.workflow import ImperativeWorkflow as Workflow
    227 from flytekit.core.workflow import WorkflowFailurePolicy, reference_workflow, workflow
--> 228 from flytekit.deck import Deck
    229 from flytekit.image_spec import ImageSpec
    230 from flytekit.loggers import logger

/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    169             # Import the desired module. If you’re seeing this while debugging a failed import,
    170             # look at preceding stack frames for relevant error information.
--> 171             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    172 
    173             is_root_import = thread_local._nest_level == 1

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/flytekit/deck/__init__.py in <module>
     16 
     17 from .deck import Deck
---> 18 from .renderer import TopFrameRenderer

/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    169             # Import the desired module. If you’re seeing this while debugging a failed import,
    170             # look at preceding stack frames for relevant error information.
--> 171             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    172 
    173             is_root_import = thread_local._nest_level == 1

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/flytekit/deck/renderer.py in <module>
     10     import pyarrow
     11 else:
---> 12     pandas = lazy_module("pandas")
     13     pyarrow = lazy_module("pyarrow")
     14 

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/flytekit/lazy_import/lazy_module.py in lazy_module(fullname)
     25     # https://docs.python.org/3/library/importlib.html#implementing-lazy-imports
     26     spec = importlib.util.find_spec(fullname)
---> 27     loader = importlib.util.LazyLoader(spec.loader)
     28     spec.loader = loader
     29     module = importlib.util.module_from_spec(spec)

/usr/lib/python3.9/importlib/util.py in __init__(self, loader)
    280 
    281     def __init__(self, loader):
--> 282         self.__check_eager_loader(loader)
    283         self.loader = loader
    284 

/usr/lib/python3.9/importlib/util.py in __check_eager_loader(loader)
    271     def __check_eager_loader(loader):
    272         if not hasattr(loader, 'exec_module'):
--> 273             raise TypeError('loader must define exec_module()')
    274 
    275     @classmethod

TypeError: loader must define exec_module()

Expected behavior

Databricks job triggered, workflow successfully completed.

Additional context to reproduce

I suspect the entrypoint.py referred in Databricks Plugin Setup guide is not compatible with Flyte 1.7.0.

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@rambrus rambrus added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Jul 10, 2023
@welcome
Copy link

welcome bot commented Jul 10, 2023

Thank you for opening your first issue here! 🛠

@rambrus
Copy link
Author

rambrus commented Jul 26, 2023

I did some research and found that the problem comes from this change: flyteorg/flytekit#1590

The problem occurs when the lazy_module function is called on pandas.

I see that flyte uses this approach for lazy imports: https://docs.python.org/3/library/importlib.html#implementing-lazy-imports

I tried to localize the issue by creating a notebook with this function:

import importlib.util
import sys
def lazy_import(name):
    spec = importlib.util.find_spec(name)
    loader = importlib.util.LazyLoader(spec.loader)
    spec.loader = loader
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    loader.exec_module(module)
    return module

And tried to lazy_import several libraries:
lazy_import("numpy") -> OK
lazy_import("requests") -> OK
lazy_import("pyarrow") -> OK
lazy_import("scipy") -> OK
lazy_import("pandas") -> TypeError: loader must define exec_module()
lazy_import("sklearn") -> TypeError: loader must define exec_module()

It seems some libraries cannot be lazy imported in Databricks Runtimes, I could not reproduce the issue on local machine, so I suspect Databricks overrides some of the import functions that is conflict with lazy_import.

@pingsutw Does that sound familiar?

@rambrus
Copy link
Author

rambrus commented Jul 26, 2023

It seems Databricks defined a PostImportHook that is applied on some libraries (e.g. pandas, sklearn). PostImportHook object is returned when looking for spec and that object does not have attribute exec_module.

image

@rambrus
Copy link
Author

rambrus commented Jul 26, 2023

Adding import pandas to entrypoint.py is a quick-n-dirty workaround.
Anyway, I guess we want to avoid importing pandas eagerly.

Let me follow up this issue with Databricks team.

@thomasjpfan thomasjpfan added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue
Projects
None yet
Development

No branches or pull requests

2 participants