Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests fail in pycharm on spark tests when importing an external lib #1869

Closed
2 tasks done
lev112 opened this issue Aug 21, 2024 · 7 comments
Closed
2 tasks done

tests fail in pycharm on spark tests when importing an external lib #1869

lev112 opened this issue Aug 21, 2024 · 7 comments
Labels
🐞 bug Something isn't working

Comments

@lev112
Copy link
Contributor

lev112 commented Aug 21, 2024

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pixi, using pixi --version.

Reproducible example

full example here: https://github.com/lev112/pycharm-pixi-spark-test

pixi.toml:

[project]
channels = ["conda-forge"]
description = "Add a short description here"
name = "pixi_example"
platforms = ["osx-arm64"]
version = "0.1.0"


[dependencies]
python = ">=3.10, <3.11"
pixi-pycharm = "*"

[pypi-dependencies]
pyspark = "==3.4.3"
pytest = "*"
xxhash = "*"

tests/test_spark.py:

import xxhash
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType


def to_xxhash(x: int) -> str:
    hashed = str(xxhash.xxh64_intdigest(str(x)))

    return hashed

def test_spark():
    spark = (
        SparkSession.builder.appName("spark test")
        .master("local[1]")
        .getOrCreate()
    )

    my_udf = udf(to_xxhash, returnType=StringType())

    spark.range(1,5).withColumn("id", my_udf("id")).show()

pixi.lock:

version: 5
environments:
  default:
    channels:
    - url: https://conda.anaconda.org/conda-forge/
    indexes:
    - https://pypi.org/simple
    packages:
      osx-arm64:
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/bzip2-1.0.8-h99b78c6_7.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/ca-certificates-2024.7.4-hf0a4a13_0.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/libffi-3.4.2-h3422bc3_5.tar.bz2
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/libsqlite-3.46.0-hfb93653_0.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/libzlib-1.3.1-hfb2fe0b_1.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/ncurses-6.5-hb89a1cb_0.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/openssl-3.3.1-hfb2fe0b_2.conda
      - conda: https://conda.anaconda.org/conda-forge/noarch/pixi-pycharm-0.0.6-unix_1234567_0.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/python-3.10.14-h2469fbe_0_cpython.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/readline-8.2-h92ec313_1.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h5083fa2_1.conda
      - conda: https://conda.anaconda.org/conda-forge/noarch/tzdata-2024a-h0c530f3_0.conda
      - conda: https://conda.anaconda.org/conda-forge/osx-arm64/xz-5.2.6-h57fd34a_0.tar.bz2
      - pypi: https://files.pythonhosted.org/packages/02/cc/b7e31358aac6ed1ef2bb790a9746ac2c69bcb3c8588b41616914eb106eaf/exceptiongroup-1.2.2-py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/08/aa/cc0199a5f0ad350994d660967a8efb233fe0416e4639146c089643407ce6/packaging-24.1-py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/10/30/a58b32568f1623aaad7db22aa9eafc4c6c194b429ff35bdc55ca2726da47/py4j-0.10.9.7-py2.py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/6d/fe/3d8f6190536c4d3ed24540872c00f13ab9beb27b78dbae1703b5368838d4/pyspark-3.4.3.tar.gz
      - pypi: https://files.pythonhosted.org/packages/0f/f9/cf155cf32ca7d6fa3601bc4c5dd19086af4b320b706919d48a4c79081cf9/pytest-8.3.2-py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/97/75/10a9ebee3fd790d20926a90a2547f0bf78f371b2f13aa822c759680ca7b9/tomli-2.0.1-py3-none-any.whl
      - pypi: https://files.pythonhosted.org/packages/16/e6/be5aa49580cd064a18200ab78e29b88b1127e1a8c7955eb8ecf81f2626eb/xxhash-3.5.0-cp310-cp310-macosx_11_0_arm64.whl
packages:
- kind: conda
  name: bzip2
  version: 1.0.8
  build: h99b78c6_7
  build_number: 7
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/bzip2-1.0.8-h99b78c6_7.conda
  sha256: adfa71f158cbd872a36394c56c3568e6034aa55c623634b37a4836bd036e6b91
  md5: fc6948412dbbbe9a4c9ddbbcfe0a79ab
  depends:
  - __osx >=11.0
  license: bzip2-1.0.6
  license_family: BSD
  purls: []
  size: 122909
  timestamp: 1720974522888
- kind: conda
  name: ca-certificates
  version: 2024.7.4
  build: hf0a4a13_0
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/ca-certificates-2024.7.4-hf0a4a13_0.conda
  sha256: 33a61116dae7f369b6ce92a7f2a1ff361ae737c675a493b11feb5570b89e0e3b
  md5: 21f9a33e5fe996189e470c19c5354dbe
  license: ISC
  purls: []
  size: 154517
  timestamp: 1720077468981
- kind: pypi
  name: exceptiongroup
  version: 1.2.2
  url: https://files.pythonhosted.org/packages/02/cc/b7e31358aac6ed1ef2bb790a9746ac2c69bcb3c8588b41616914eb106eaf/exceptiongroup-1.2.2-py3-none-any.whl
  sha256: 3111b9d131c238bec2f8f516e123e14ba243563fb135d3fe885990585aa7795b
  requires_dist:
  - pytest>=6 ; extra == 'test'
  requires_python: '>=3.7'
- kind: pypi
  name: iniconfig
  version: 2.0.0
  url: https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl
  sha256: b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374
  requires_python: '>=3.7'
- kind: conda
  name: libffi
  version: 3.4.2
  build: h3422bc3_5
  build_number: 5
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/libffi-3.4.2-h3422bc3_5.tar.bz2
  sha256: 41b3d13efb775e340e4dba549ab5c029611ea6918703096b2eaa9c015c0750ca
  md5: 086914b672be056eb70fd4285b6783b6
  license: MIT
  license_family: MIT
  purls: []
  size: 39020
  timestamp: 1636488587153
- kind: conda
  name: libsqlite
  version: 3.46.0
  build: hfb93653_0
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/libsqlite-3.46.0-hfb93653_0.conda
  sha256: 73048f9cb8647d3d3bfe6021c0b7d663e12cffbe9b4f31bd081e713b0a9ad8f9
  md5: 12300188028c9bc02da965128b91b517
  depends:
  - __osx >=11.0
  - libzlib >=1.2.13,<2.0a0
  license: Unlicense
  purls: []
  size: 830198
  timestamp: 1718050644825
- kind: conda
  name: libzlib
  version: 1.3.1
  build: hfb2fe0b_1
  build_number: 1
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/libzlib-1.3.1-hfb2fe0b_1.conda
  sha256: c34365dd37b0eab27b9693af32a1f7f284955517c2cc91f1b88a7ef4738ff03e
  md5: 636077128927cf79fd933276dc3aed47
  depends:
  - __osx >=11.0
  constrains:
  - zlib 1.3.1 *_1
  license: Zlib
  license_family: Other
  purls: []
  size: 46921
  timestamp: 1716874262512
- kind: conda
  name: ncurses
  version: '6.5'
  build: hb89a1cb_0
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/ncurses-6.5-hb89a1cb_0.conda
  sha256: 87d7cf716d9d930dab682cb57b3b8d3a61940b47d6703f3529a155c938a6990a
  md5: b13ad5724ac9ae98b6b4fd87e4500ba4
  license: X11 AND BSD-3-Clause
  purls: []
  size: 795131
  timestamp: 1715194898402
- kind: conda
  name: openssl
  version: 3.3.1
  build: hfb2fe0b_2
  build_number: 2
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/openssl-3.3.1-hfb2fe0b_2.conda
  sha256: dd7d988636f74473ebdfe15e05c5aabdb53a1d2a846c839d62289b0c37f81548
  md5: 9b551a504c1cc8f8b7b22c01814da8ba
  depends:
  - __osx >=11.0
  - ca-certificates
  constrains:
  - pyopenssl >=22.1
  license: Apache-2.0
  license_family: Apache
  purls: []
  size: 2899682
  timestamp: 1721194599446
- kind: pypi
  name: packaging
  version: '24.1'
  url: https://files.pythonhosted.org/packages/08/aa/cc0199a5f0ad350994d660967a8efb233fe0416e4639146c089643407ce6/packaging-24.1-py3-none-any.whl
  sha256: 5b8f2217dbdbd2f7f384c41c628544e6d52f2d0f53c6d0c3ea61aa5d1d7ff124
  requires_python: '>=3.8'
- kind: conda
  name: pixi-pycharm
  version: 0.0.6
  build: unix_1234567_0
  subdir: noarch
  noarch: generic
  url: https://conda.anaconda.org/conda-forge/noarch/pixi-pycharm-0.0.6-unix_1234567_0.conda
  sha256: 156dd789a87ea8b6d7c329efbc202d7f21564d5afabbad9c6ac14017fb47c641
  md5: c140222c6019ef59c509f4740663fa75
  depends:
  - __unix
  - python >=3.8
  license: BSD-3-Clause
  license_family: BSD
  purls: []
  size: 8833
  timestamp: 1719726313982
- kind: pypi
  name: pluggy
  version: 1.5.0
  url: https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl
  sha256: 44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669
  requires_dist:
  - pre-commit ; extra == 'dev'
  - tox ; extra == 'dev'
  - pytest ; extra == 'testing'
  - pytest-benchmark ; extra == 'testing'
  requires_python: '>=3.8'
- kind: pypi
  name: py4j
  version: 0.10.9.7
  url: https://files.pythonhosted.org/packages/10/30/a58b32568f1623aaad7db22aa9eafc4c6c194b429ff35bdc55ca2726da47/py4j-0.10.9.7-py2.py3-none-any.whl
  sha256: 85defdfd2b2376eb3abf5ca6474b51ab7e0de341c75a02f46dc9b5976f5a5c1b
- kind: pypi
  name: pyspark
  version: 3.4.3
  url: https://files.pythonhosted.org/packages/6d/fe/3d8f6190536c4d3ed24540872c00f13ab9beb27b78dbae1703b5368838d4/pyspark-3.4.3.tar.gz
  sha256: 8d7025fa274830cb6c3bd592228be3d9345cb3b8b1e324018c2aa6e75f48a208
  requires_dist:
  - py4j==0.10.9.7
  - pandas>=1.0.5 ; extra == 'connect'
  - pyarrow>=1.0.0 ; extra == 'connect'
  - grpcio>=1.48.1 ; extra == 'connect'
  - grpcio-status>=1.48.1 ; extra == 'connect'
  - googleapis-common-protos>=1.56.4 ; extra == 'connect'
  - numpy>=1.15 ; extra == 'connect'
  - numpy>=1.15 ; extra == 'ml'
  - numpy>=1.15 ; extra == 'mllib'
  - pandas>=1.0.5 ; extra == 'pandas-on-spark'
  - pyarrow>=1.0.0 ; extra == 'pandas-on-spark'
  - numpy>=1.15 ; extra == 'pandas-on-spark'
  - pandas>=1.0.5 ; extra == 'sql'
  - pyarrow>=1.0.0 ; extra == 'sql'
  - numpy>=1.15 ; extra == 'sql'
  requires_python: '>=3.7'
- kind: pypi
  name: pytest
  version: 8.3.2
  url: https://files.pythonhosted.org/packages/0f/f9/cf155cf32ca7d6fa3601bc4c5dd19086af4b320b706919d48a4c79081cf9/pytest-8.3.2-py3-none-any.whl
  sha256: 4ba08f9ae7dcf84ded419494d229b48d0903ea6407b030eaec46df5e6a73bba5
  requires_dist:
  - iniconfig
  - packaging
  - pluggy<2,>=1.5
  - exceptiongroup>=1.0.0rc8 ; python_version < '3.11'
  - tomli>=1 ; python_version < '3.11'
  - colorama ; sys_platform == 'win32'
  - argcomplete ; extra == 'dev'
  - attrs>=19.2 ; extra == 'dev'
  - hypothesis>=3.56 ; extra == 'dev'
  - mock ; extra == 'dev'
  - pygments>=2.7.2 ; extra == 'dev'
  - requests ; extra == 'dev'
  - setuptools ; extra == 'dev'
  - xmlschema ; extra == 'dev'
  requires_python: '>=3.8'
- kind: conda
  name: python
  version: 3.10.14
  build: h2469fbe_0_cpython
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/python-3.10.14-h2469fbe_0_cpython.conda
  sha256: 454d609fe25daedce9e886efcbfcadad103ed0362e7cb6d2bcddec90b1ecd3ee
  md5: 4ae999c8227c6d8c7623d32d51d25ea9
  depends:
  - bzip2 >=1.0.8,<2.0a0
  - libffi >=3.4,<4.0a0
  - libsqlite >=3.45.2,<4.0a0
  - libzlib >=1.2.13,<2.0.0a0
  - ncurses >=6.4.20240210,<7.0a0
  - openssl >=3.2.1,<4.0a0
  - readline >=8.2,<9.0a0
  - tk >=8.6.13,<8.7.0a0
  - tzdata
  - xz >=5.2.6,<6.0a0
  constrains:
  - python_abi 3.10.* *_cp310
  license: Python-2.0
  purls: []
  size: 12336005
  timestamp: 1710939659384
- kind: conda
  name: readline
  version: '8.2'
  build: h92ec313_1
  build_number: 1
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/readline-8.2-h92ec313_1.conda
  sha256: a1dfa679ac3f6007362386576a704ad2d0d7a02e98f5d0b115f207a2da63e884
  md5: 8cbb776a2f641b943d413b3e19df71f4
  depends:
  - ncurses >=6.3,<7.0a0
  license: GPL-3.0-only
  license_family: GPL
  purls: []
  size: 250351
  timestamp: 1679532511311
- kind: conda
  name: tk
  version: 8.6.13
  build: h5083fa2_1
  build_number: 1
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/tk-8.6.13-h5083fa2_1.conda
  sha256: 72457ad031b4c048e5891f3f6cb27a53cb479db68a52d965f796910e71a403a8
  md5: b50a57ba89c32b62428b71a875291c9b
  depends:
  - libzlib >=1.2.13,<2.0.0a0
  license: TCL
  license_family: BSD
  purls: []
  size: 3145523
  timestamp: 1699202432999
- kind: pypi
  name: tomli
  version: 2.0.1
  url: https://files.pythonhosted.org/packages/97/75/10a9ebee3fd790d20926a90a2547f0bf78f371b2f13aa822c759680ca7b9/tomli-2.0.1-py3-none-any.whl
  sha256: 939de3e7a6161af0c887ef91b7d41a53e7c5a1ca976325f429cb46ea9bc30ecc
  requires_python: '>=3.7'
- kind: conda
  name: tzdata
  version: 2024a
  build: h0c530f3_0
  subdir: noarch
  noarch: generic
  url: https://conda.anaconda.org/conda-forge/noarch/tzdata-2024a-h0c530f3_0.conda
  sha256: 7b2b69c54ec62a243eb6fba2391b5e443421608c3ae5dbff938ad33ca8db5122
  md5: 161081fc7cec0bfda0d86d7cb595f8d8
  license: LicenseRef-Public-Domain
  purls: []
  size: 119815
  timestamp: 1706886945727
- kind: pypi
  name: xxhash
  version: 3.5.0
  url: https://files.pythonhosted.org/packages/16/e6/be5aa49580cd064a18200ab78e29b88b1127e1a8c7955eb8ecf81f2626eb/xxhash-3.5.0-cp310-cp310-macosx_11_0_arm64.whl
  sha256: 3171f693dbc2cef6477054a665dc255d996646b4023fe56cb4db80e26f4cc520
  requires_python: '>=3.7'
- kind: conda
  name: xz
  version: 5.2.6
  build: h57fd34a_0
  subdir: osx-arm64
  url: https://conda.anaconda.org/conda-forge/osx-arm64/xz-5.2.6-h57fd34a_0.tar.bz2
  sha256: 59d78af0c3e071021cfe82dc40134c19dab8cdf804324b62940f5c8cd71803ec
  md5: 39c6b54e94014701dd157f4f576ed211
  license: LGPL-2.1 and GPL-2.0
  purls: []
  size: 235693
  timestamp: 1660346961024

Issue description

when running the test from the command line with:

pixi run pytest

the test passes.

but when running the same test from pycharm, it fails with these logs:

24/08/21 15:54:24 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/Users/lev/work/git/tmp/pixi_example/tests/test_spark.py", line 1, in <module>
    import xxhash
ModuleNotFoundError: No module named 'xxhash'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:94)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:75)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
24/08/21 15:54:24 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (192.168.0.12 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/Users/lev/work/git/tmp/pixi_example/tests/test_spark.py", line 1, in <module>
    import xxhash
ModuleNotFoundError: No module named 'xxhash'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:561)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:94)
	at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:75)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:514)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

24/08/21 15:54:24 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job

test_spark.py:17 (test_spark)
def test_spark():
        spark = (
            SparkSession.builder.appName("spark test")
            .master("local[1]")
            .getOrCreate()
        )
    
        my_udf = udf(to_xxhash, returnType=StringType())
    
>       spark.range(1,5).withColumn("id", my_udf("id")).show()

test_spark.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../.pixi/envs/default/lib/python3.10/site-packages/pyspark/sql/dataframe.py:901: in show
    print(self._jdf.showString(n, 20, vertical))
../.pixi/envs/default/lib/python3.10/site-packages/py4j/java_gateway.py:1322: in __call__
    return_value = get_return_value(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = ('xro45', <py4j.clientserver.JavaClient object at 0x10621c100>, 'o44', 'showString')
kw = {}
converted = PythonException('\n  An exception was thrown from the Python worker. Please see the stack trace below.\nTraceback (mos...1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\t... 1 more\n')

    def deco(*a: Any, **kw: Any) -> Any:
        try:
            return f(*a, **kw)
        except Py4JJavaError as e:
            converted = convert_exception(e.java_exception)
            if not isinstance(converted, UnknownException):
                # Hide where the exception came from that shows a non-Pythonic
                # JVM exception message.
>               raise converted from None
E               pyspark.errors.exceptions.captured.PythonException: 
E                 An exception was thrown from the Python worker. Please see the stack trace below.
E               Traceback (most recent call last):
E                 File "/Users/lev/work/git/tmp/pixi_example/tests/test_spark.py", line 1, in <module>
E                   import xxhash
E               ModuleNotFoundError: No module named 'xxhash'

../.pixi/envs/default/lib/python3.10/site-packages/pyspark/errors/exceptions/captured.py:175: PythonException


============================== 1 failed in 3.71s ===============================

Process finished with exit code 1

Expected behavior

the test should pass

@lev112 lev112 added the 🐞 bug Something isn't working label Aug 21, 2024
@ruben-arts
Copy link
Contributor

It sounds like spark is running some kind of logic to figure the imports out on their own. Could you make your example project public?

@lev112
Copy link
Contributor Author

lev112 commented Aug 21, 2024

Could you make your example project public?

done

@ruben-arts
Copy link
Contributor

I can't reproduce this, it gives me this error:

>                   raise RuntimeError("Java gateway process exited before sending its port number")

It looks to expect me to have Java. I've never run spark. Could you make a more indepth reproducer

@lev112
Copy link
Contributor Author

lev112 commented Aug 21, 2024

I've updated the repo and added jvm
can you please check?

@ruben-arts
Copy link
Contributor

Sorry for the delay. I just tried it. It seems to work with:

pixi.toml

[activation.env]
PYSPARK_PYTHON = "$PIXI_PROJECT_ROOT/.pixi/envs/default/bin/python"
PYSPARK_DRIVER_PYTHON= "$PIXI_PROJECT_ROOT/.pixi/envs/default/bin/python"

And starting Pycharm with pixi run pycharm .

@lev112
Copy link
Contributor Author

lev112 commented Sep 20, 2024

thanks @ruben-arts for checking!

for me, just starting pycharm with pixi run pycharm . was enough to make the tests pass

@ruben-arts
Copy link
Contributor

Great, can this issue be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants