Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Add spark_context to DatasourceConfigSchema #1713

Merged
merged 1 commit into from Jul 21, 2020
Merged

[BUGFIX] Add spark_context to DatasourceConfigSchema #1713

merged 1 commit into from Jul 21, 2020

Conversation

Dandandan
Copy link
Contributor

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [ENHANCEMENT], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

After submitting your PR, CI checks will run and @tiny-tim-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Thank you for submitting!

@Dandandan Dandandan changed the title Add spark_context to DatasourceConfigSchema [BUGFIX] Add spark_context to DatasourceConfigSchema Jul 21, 2020
@Dandandan
Copy link
Contributor Author

@cla-bot check

@codecov
Copy link

codecov bot commented Jul 21, 2020

Codecov Report

Merging #1713 into develop will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #1713   +/-   ##
========================================
  Coverage    77.86%   77.86%           
========================================
  Files          135      135           
  Lines        15698    15699    +1     
========================================
+ Hits         12223    12224    +1     
  Misses        3475     3475           
Impacted Files Coverage Δ
great_expectations/data_context/types/base.py 92.76% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdd3700...5e0b499. Read the comment docs.

Copy link
Member

@jcampbell jcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jcampbell jcampbell merged commit 8d6efc8 into great-expectations:develop Jul 21, 2020
Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? From what I read, Spark 3.0.0 maintains a single JVM, which makes the config immutable after it was created. (Also, in my tests, this did not solve the problem.). Based on this reasoning, we have a patch in the work that sets the config once and reuses it throughout the duration of the session. Are you aware of a different behavior? If so, please share some articles, test results, or anything else helpful. Thanks!

@Dandandan
Copy link
Contributor Author

Dandandan commented Jul 22, 2020

Hi @alexherstinsky.

When running locally (pyspark 2.4.5) this works, and makes greatexpectations datasource profile pass the spark.jars property (or any property from the project config). It looks like it always generates a new session in this case, I can update configurations and they are changed too.

Maybe in your tests it's something different?
You are correct indeed session will change only when session is closed (if you use getOrCreate it will try to reuse an existing one). You could make it generate a new one if you have to "reload" it in some cases (interactive usage)?

alexsherstinsky pushed a commit to alexsherstinsky/great_expectations that referenced this pull request Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants