Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2643] kedro install fails on pyspark starter #767

Closed
glebrh opened this issue May 16, 2021 · 5 comments
Closed

[KED-2643] kedro install fails on pyspark starter #767

glebrh opened this issue May 16, 2021 · 5 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@glebrh
Copy link

glebrh commented May 16, 2021

Description

After creation of a new kedro project on a brand new conda environment using pyspark starter, kedro install fails.
It seems that kedro tries to import module with project context (where import from pyspark is done) and fails, since spark is not yet installed.
Also, other cli commands (e.g. kedro --version) fail with the same error (while executed inside project's directory).

Steps to Reproduce

  • Create a new environment
  • pip install kedro
  • kedro new --starter=pyspark
  • cd to project's directory
  • kedro install

Expected Result

kedro installs packages specified in requirements.txt
Not sure why cli goes into project settings. I guess there are several cli commands that do need to care about project specifics anyway.

Actual Result

Error with the following stacktrace:

Traceback (most recent call last):
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/bin/kedro", line 8, in
sys.exit(main())
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 268, in main
cli_collection = KedroCLI(project_path=Path.cwd())
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 181, in init
self._metadata = bootstrap_project(project_path)
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/startup.py", line 181, in bootstrap_project
configure_project(metadata.package_name)
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/project/init.py", line 218, in configure_project
_validate_module(settings_module)
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/site-packages/kedro/framework/project/init.py", line 210, in _validate_module
importlib.import_module(settings_module)
File "/Users/glebsmolnik/anaconda3/envs/testpysparkstarter/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/glebsmolnik/PycharmProjects/testpysparkstarter/pyspark_test/src/pyspark_test/settings.py", line 30, in
from pyspark_test.context import ProjectContext
File "/Users/glebsmolnik/PycharmProjects/testpysparkstarter/pyspark_test/src/pyspark_test/context.py", line 34, in
from pyspark import SparkConf
ModuleNotFoundError: No module named 'pyspark'

Your Environment

MacOS Catalina (originally got it on Windows 10).
PyCharm CE 2020.3.2
Conda environment (Python 3.7.10)
Kedro 0.17.3

@glebrh glebrh added the Issue: Bug Report 🐞 Bug that needs to be fixed label May 16, 2021
@thver
Copy link

thver commented May 17, 2021

maybe something like this in context.py of the pyspark starters

try:
    from pyspark import SparkConf
    from pyspark.sql import SparkSession
except ModuleNotFoundError:
    logging.warning("This starter requires PySpark to function. "
                    "Run 'kedro install' to install project dependencies.")

@merelcht
Copy link
Member

Hi @glebrh, thanks for flagging this issue! This indeed isn't working properly. I've created a ticket on our backlog to address it.

@merelcht merelcht changed the title kedro install fails on pyspark starter [KED-2643] kedro install fails on pyspark starter May 24, 2021
@stale
Copy link

stale bot commented Jul 23, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ignacioparicio
Copy link
Contributor

Thank you @glebrh for bringing this up! We just merged this PR with a quick fix (thanks @thver for the inspiration!). It will become effective once Kedro 0.17.5 is released. We will also be revisiting the flow triggered when calling any CLI command to potentially replace this quick fix with a more robust solution.

@limdauto
Copy link
Contributor

@ignacioparicio can we close this as well? kedro-org/kedro-starters#38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
None yet
Development

No branches or pull requests

6 participants