-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local Spark not utilizing spark_config parameter from great_expectations.yml #1603
Comments
@WesRoach -> I see this when using pyspark 3.0.0, which was just released today. Is that by chance the version you're running? I haven't looked yet to understand what changed. |
@jcampbell No, sorry - I should have specified - pyspark 2.4.5 |
Ok. Any chance that the cluster to which you're connected is running spark 3.0.0? I'm confused a bit because I see spark 3.0 as an official release on github and pypi, but not on spark.apache.org. And, it broke our CI tests (specifically for this feature) on release... I'll attempt to reproduce again locally on my environment. |
Running in local mode, spark.master: local[*] - it's a standalone Spark instance running on a single multi-core machine. |
@WesRoach any luck with this one ? I also encountered this and it seems that this option is not passed at all to SparkDFDataset init method |
My colleague @alexsherstinsky looked into this more, and it appears related to when we open and (don't) close the SparkSession handle. He's got a patch in the works. |
Bumping into this issue I think. It looks like |
I see that is also observed earlier by @mgorsk1 |
Found the error: in adding
causes it to deserialize the value from yaml and passes it to the |
Created a PR for this, see #1713 |
Describe the bug
Running Spark in local mode.
I've added the
spark_config
key-value dict to my Spark datasource in great_expectations.yml.spark.driver.memory
in yaml results in java heap space error.Using the semi-example from https://docs.greatexpectations.io/en/latest/how_to_guides/configuring_datasources/how_to_configure_a_spark_filesystem_datasource.html
Also tried:
To Reproduce
Expected behavior
Expected GE's Spark Session to utilize spark_config from great_expectations.yml.
Environment (please complete the following information):
Edit: Added pyspark, conda versions.
The text was updated successfully, but these errors were encountered: