-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kedro-datasets] Upgrade to PySpark >= 3.4, Pandas >= 2 in test_requirements.txt
#216
Comments
test_requirements.txt
test_requirements.txt
I can't comment on spark, but be careful when forcing something like pandas >= 2.0 as users typically use other packages that might not be compatible with pandas 2.0 yet. For example great expectations (see explicit comment in the requirements file on their GitHub here). Furthermore, I would also set upper limits not to get into trouble later on. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I also have doubt to pin The test suite is a separate problem. It's an additional question how we should test our datasets. In any case, I would say we should tackle this in our test suite but not forcing it to our users. For example, if an user is using |
Okay I just read the title carefully, so this is only about Do we have some idea what's failing when we have |
Woops I also misread the title, thanks @noklam 👍 |
Haha I also misread the title 😅. Thanks @noklam for pointing it out! |
This is some kind of collective hallucinations 😂 |
Maybe a good remark to add here. All current versions of Spark are not compatible with Pandas >= 2! If you look at the Jira issue tracker of Spark, compatibility with Pandas 2.0 is foreseen for the next major version upgrade of Spark (Spark 4.0) |
Do you have a link? I tried a quick search but Jira and I cannot be friends |
@astrojuanlu: Sure, here is the link (note the affects version). |
Description
Spark 3.4.0 was released in April. Our
databricks
andspark
datasets should support this newer version of Spark, though it currently causes many tests to fail.Also with this change, we should enforce Pandas >= 2, as earlier versions of Pandas are not compatible with Spark >= 3.4. This change will also enable us to upgrade
delta-spark
.Context
This is an important change as it will ensure our datasets work with the latest version of Spark.
The text was updated successfully, but these errors were encountered: