Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark - Incompatible parameter type & Unsupported operand #798

Open
tomas-pihrt opened this issue Sep 26, 2023 · 1 comment
Open

PySpark - Incompatible parameter type & Unsupported operand #798

tomas-pihrt opened this issue Sep 26, 2023 · 1 comment

Comments

@tomas-pihrt
Copy link

Pyre Bug

Bug description

The pyspark dataframe functions generate Unsupported operand [58] and Incompatible parameter type [6] even though they are valid and even suggested in the Spark documentation.

Reproduction steps

Python snippet sample.py:

from pyspark.sql import SparkSession, functions as f

spark = SparkSession.builder.getOrCreate()

df = spark.sql("select 1 as num")

(
    df
    .withColumn("num", f.col("num") + 2)
    .withColumn("num", f.col("num") - 2)
    .withColumn("num", f.col("num") * 2)
    .withColumn("num", f.col("num") / 2)
    .filter(f.col("num") > 1)
    .filter(f.col("num") >= 1)
    .filter(f.col("num") < 1)
    .filter(f.col("num") <= 1)
).show()

Expected behavior

Running the pyre check should not throw any issues as the code is valid.

See the docs:

Logs

$ pyre check

ƛ Found 16 type errors!

sample.py:9:23 Unsupported operand [58]: `+` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:9:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `int`.
sample.py:10:23 Unsupported operand [58]: `-` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:10:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `int`.
sample.py:11:23 Unsupported operand [58]: `*` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:11:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `int`.
sample.py:12:23 Unsupported operand [58]: `/` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:12:23 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.withColumn`, for 2nd positional argument, expected `Column` but got `float`.
sample.py:13:12 Unsupported operand [58]: `>` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:13:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.
sample.py:14:12 Unsupported operand [58]: `>=` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:14:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.
sample.py:15:12 Unsupported operand [58]: `<` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:15:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.
sample.py:16:12 Unsupported operand [58]: `<=` is not supported for operand types `pyspark.sql.column.Column` and `int`.
sample.py:16:12 Incompatible parameter type [6]: In call `pyspark.sql.dataframe.DataFrame.filter`, for 1st positional argument, expected `Union[Column, str]` but got `bool`.

pyre_rage.log

@WangGithubUser
Copy link
Contributor

WangGithubUser commented Sep 29, 2023

As my inspection, a part(or even all) of this issue caused by pyre only treat a function defined by def __add__(): ... as a magic method of a class but Column defines its __add__ and many other magic methods like

__add__ = cast(
    Callable[["Column", Union["Column", "LiteralType", "DecimalLiteral"]], "Column"],
    _bin_op("plus"),
)

(see https://github.com/apache/spark/blob/187e9a851758c0e9cec11edab2bc07d6f4404001/python/pyspark/sql/column.py#L235-L274)

Here is a more understandable demo for this:
source:

class MyInt:
    int_val: int = 0

    def real_add_func(self, other_int: int) -> int:
        return self.int_val + other_int

    __add__ = real_add_func


a_int: int = MyInt() + 0

pyre_playground here
Pyre gives a false positive 10:13: Unsupported operand [58]: `+` is not supported for operand types `MyInt` and `int`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants