New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero copy import when the schema is known #261
Conversation
Thanks @darabos, seems to work great! 🙌 We can merge this for now, but I've come across another datatype below. If it's trivial to fix, we can include it in this PR as well!
|
PS you might want to pin openjdk in
|
Haha, @lacca0 just discovered the same 2 minutes ago! We will look into changing the code to be compatible with Java 17. But this workaround is very useful until then!
Awesome, thanks a lot for testing!
Do you have a small test file or |
from pyspark.sql import functions as F
spark.range(100).withColumn('decimal_col', F.rand().cast('decimal(18,2)')) |
Thanks! I've changed the code to support all types that Spark supports. I let Spark parse what you enter. If the table does not match the specified schema you get an error. I changed the error formatting a bit to make sure this includes the important bit about what didn't match. I've also added documentation. I'll add unit tests on Monday and then we can merge it. |
@lacca0 I'm going to merge this PR to simplify merge conflicts with the 1-jar change. But it's still open for your comments! (I'll just have to address them in a separate PR.) Thanks! |
Resolves #258.
No import button! The corresponding Python code is:
Outstanding issues:
version
parameter, same as its done with export operations.imported_columns
,limit
, andsql
.