Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to deal with the PySpark 2 extensions #35

Closed
MrPowers opened this issue Jul 18, 2020 · 7 comments
Closed

Figure out how to deal with the PySpark 2 extensions #35

MrPowers opened this issue Jul 18, 2020 · 7 comments
Assignees

Comments

@MrPowers
Copy link
Owner

The DataFrame#transform extension is useful for PySpark 2 users but should not be run for PySpark 3 users (cause it's built into the API).

When a user runs from quinn.extensions import * we can either use the spark.version variable to programatically skip over modules that shouldn't be imported for Spark 3 or we can design a separate import interface.

I'm still not sure which approach is better.

@MrPowers
Copy link
Owner Author

I am going to switch the project to Python 3 and remove DataFrame#transform.

@SemyonSinchenko
Copy link
Collaborator

SemyonSinchenko commented Mar 8, 2023

We can replace DataFrame.transform = transform by something like this:

DataFrame.transform = getattr(DataFrame, "transform", transform)

and it should work in both 2d and 3d versions. I can open a PR with this.

P.S. I can do it for all extensions to avoid such a problems or any unexpected behavior in the future.

@MrPowers
Copy link
Owner Author

MrPowers commented Mar 8, 2023

@SemyonSinchenko - would there be any way for PySpark 2 to be able to import this function, but for the function to error out if a user is using PySpark 3 or greater and tried to import this function? I'd prefer for PySpark 3 users to leverage the built-in function. Sidenote: they updated this particular function in PySpark 3.3, so the 3.3 method signature is different than the 3.1 method signature 🙃

@SemyonSinchenko
Copy link
Collaborator

would there be any way for PySpark 2 to be able to import this function, but for the function to error out if a user is using PySpark 3 or greater and tried to import this function?

Thats exactly what my snipped of code will do. If there is an attribute transform in DataFrame it will leave it as is but is there is no such an attribute it will add it. So behavior will depends of version.

@MrPowers
Copy link
Owner Author

MrPowers commented Mar 8, 2023

@SemyonSinchenko - your suggested solution sounds ideal in that case. Can you please send a PR?

@SemyonSinchenko
Copy link
Collaborator

@SemyonSinchenko - your suggested solution sounds ideal in that case. Can you please send a PR?

I'll do it.

@SemyonSinchenko
Copy link
Collaborator

Work was done in #81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants