New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Add spark_context to DatasourceConfigSchema #1713
[BUGFIX] Add spark_context to DatasourceConfigSchema #1713
Conversation
@cla-bot check |
Codecov Report
@@ Coverage Diff @@
## develop #1713 +/- ##
========================================
Coverage 77.86% 77.86%
========================================
Files 135 135
Lines 15698 15699 +1
========================================
+ Hits 12223 12224 +1
Misses 3475 3475
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually work? From what I read, Spark 3.0.0 maintains a single JVM, which makes the config immutable after it was created. (Also, in my tests, this did not solve the problem.). Based on this reasoning, we have a patch in the work that sets the config once and reuses it throughout the duration of the session. Are you aware of a different behavior? If so, please share some articles, test results, or anything else helpful. Thanks!
Hi @alexherstinsky. When running locally (pyspark 2.4.5) this works, and makes Maybe in your tests it's something different? |
Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [ENHANCEMENT], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.
Changes proposed in this pull request:
spark_context
toDatasourceConfigSchema
see issue Local Spark not utilizing spark_config parameter from great_expectations.yml #1603 this makes it deserialize it from the project config fileAfter submitting your PR, CI checks will run and @tiny-tim-bot will check for your CLA signature.
For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.
In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g.
closes #123
).Previous Design Review notes:
Thank you for submitting!