Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Throw FIELD_NOT_FOUND exception when lift a struct #9049

Closed
1 task done
ted0928 opened this issue Apr 25, 2024 · 3 comments · Fixed by #9052
Closed
1 task done

bug: Throw FIELD_NOT_FOUND exception when lift a struct #9049

ted0928 opened this issue Apr 25, 2024 · 3 comments · Fixed by #9052
Labels
bug Incorrect behavior inside of ibis pyspark The Apache PySpark backend

Comments

@ted0928
Copy link
Contributor

ted0928 commented Apr 25, 2024

What happened?

Here is my step:

  1. Zip two arrays to array
  2. Unnest the array to struct
  3. Lift the struct throw the exception
import ibis
from pyspark.sql import SparkSession

ibis.options.interactive = True
spark = SparkSession.builder \
    .appName("spark paimon sql") \
    .getOrCreate()
connection = ibis.pyspark.connect(spark)

df = connection.create_view('source', ibis.memtable(dict(array1=[[1, 2, 3]], array2=[[4, 5, 6]])))

df = df.mutate(zipped=df.array1.zip(df.array2))
df = df[df.zipped]
df = df.mutate(unnest=df.zipped.unnest())
df = df[df.unnest]
df = df.unnest.lift()
print(df.compile())
print(df)

SQL is :

SELECT t2.unnest.f1 AS f1, t2.unnest.f2 AS f2 FROM (SELECT t1.zipped, EXPLODE(t1.zipped) AS unnest FROM (SELECT ARRAYS_ZIP(t0.array1, t0.array2) AS zipped FROM source AS t0) AS t1) AS t2

Then I tried SQL:

SELECT t2.unnest.**array1** AS f1, t2.unnest.**array2** AS f2 FROM (SELECT t1.zipped, EXPLODE(t1.zipped) AS unnest FROM (SELECT ARRAYS_ZIP(t0.array1, t0.array2) AS zipped FROM source AS t0) AS t1) AS t2

It worked.

What version of ibis are you using?

main

What backend(s) are you using, if any?

pyspark

Relevant log output

/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/bin/python /Users/ning.ln/Java/ibis_demo/test.py 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/04/25 17:11:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/pyspark/sql/pandas/functions.py:407: UserWarning: In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF instead of specifying pandas UDF type which will be deprecated in the future releases. See SPARK-28264 for more details.
  warnings.warn(
SELECT `t2`.`unnest`.`f1` AS `f1`, `t2`.`unnest`.`f2` AS `f2` FROM (SELECT `t1`.`zipped`, EXPLODE(`t1`.`zipped`) AS `unnest` FROM (SELECT ARRAYS_ZIP(`t0`.`array1`, `t0`.`array2`) AS `zipped` FROM `source` AS `t0`) AS `t1`) AS `t2`
/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/pyspark/sql/pandas/functions.py:407: UserWarning: In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF instead of specifying pandas UDF type which will be deprecated in the future releases. See SPARK-28264 for more details.
  warnings.warn(
Traceback (most recent call last):
  File "/Users/ning.ln/Java/ibis_demo/test.py", line 19, in <module>
    print(df)
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/core.py", line 77, in __repr__
    return self._interactive_repr()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/core.py", line 64, in _interactive_repr
    console.print(self)
  File "/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/rich/console.py", line 1700, in print
    extend(render(renderable, render_options))
  File "/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/rich/console.py", line 1312, in render
    render_iterable = renderable.__rich_console__(self, _options)  # type: ignore[union-attr]
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/core.py", line 115, in __rich_console__
    raise e
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/core.py", line 96, in __rich_console__
    rich_object = to_rich(self, console_width=console_width)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/pretty.py", line 271, in to_rich
    return _to_rich_table(
           ^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/pretty.py", line 342, in _to_rich_table
    result = table.limit(max_rows + 1).to_pyarrow()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/expr/types/core.py", line 483, in to_pyarrow
    return self._find_backend(use_default=True).to_pyarrow(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/backends/pyspark/__init__.py", line 793, in to_pyarrow
    self.execute(table_expr, params=params, limit=limit, **kwargs),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/backends/sql/__init__.py", line 301, in execute
    with self._safe_raw_sql(sql) as cur:
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/backends/pyspark/__init__.py", line 311, in _safe_raw_sql
    return self.raw_sql(query)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/Java/ibis/ibis/backends/pyspark/__init__.py", line 316, in raw_sql
    query = self._session.sql(query)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/pyspark/sql/session.py", line 1631, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/Users/ning.ln/anaconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 185, in deco
    raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [FIELD_NOT_FOUND] No such struct field `f1` in `array1`, `array2`.; line 1 pos 7

Process finished with exit code 1

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ted0928 ted0928 added the bug Incorrect behavior inside of ibis label Apr 25, 2024
@cpcloud
Copy link
Member

cpcloud commented Apr 25, 2024

Thanks for the report!

Taking a look now and will post findings.

@cpcloud
Copy link
Member

cpcloud commented Apr 25, 2024

Definitely a bug, and looks like a simple cast to the output dtype Ibis expects is the fix!

PR incoming!

@ted0928
Copy link
Contributor Author

ted0928 commented Apr 25, 2024

Definitely a bug, and looks like a simple cast to the output dtype Ibis expects is the fix!

PR incoming!

Thx

cpcloud added a commit that referenced this issue Apr 25, 2024
… schema (#9052)

Fix PySpark zip implementation to ensure that its output matches the
schema expected by Ibis. Fixes #9049.

---------

Co-authored-by: Gil Forsyth <gforsyth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis pyspark The Apache PySpark backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants