Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Handle "persist" directive in "SparkDFExecutionEngine" properly. #7830

Conversation

alexsherstinsky
Copy link
Contributor

Scope

  • Previously, the persist constructor argument to SparkDFExecutionEngine has been defined, but not utilized. (This was an omission from GX V2.) Now, this directive is used as intended (the dataframe.persist() is called).
  • Existing tests pass.

Remark

  • Since persist is True by default, processing of large Spark dataframes is expected to be more efficient.

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], or [CONTRIB]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: DX-469/DX-480

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in GitHub issues or Slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

@netlify
Copy link

netlify bot commented May 5, 2023

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit e8e2490
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/6455757beb4e2a0008633574
😎 Deploy Preview https://deploy-preview-7830--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@alexsherstinsky alexsherstinsky marked this pull request as ready for review May 5, 2023 21:00
@alexsherstinsky alexsherstinsky requested review from a team May 5, 2023 21:00
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) May 5, 2023 21:00
@ghost
Copy link

ghost commented May 5, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

…ete_persist_directive_from_sparkdf_execution_engine_instantiation_arguments-2023_05_05-18
Copy link
Contributor

@Shinnnyshinshin Shinnnyshinshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsherstinsky alexsherstinsky merged commit f76f8df into develop May 5, 2023
@alexsherstinsky alexsherstinsky deleted the maintenance/DX-469/DX-480/alexsherstinsky/link/delete_persist_directive_from_sparkdf_execution_engine_instantiation_arguments-2023_05_05-18 branch May 5, 2023 22:35
Shinnnyshinshin added a commit that referenced this pull request May 8, 2023
* develop: (79 commits)
  [DOCS] Removing datasource centric test_yaml_config doc (#7836)
  [FEATURE] Splitters work with Spark Fluent Datasources (#7832)
  [MAINTENANCE] New PR template (#7710)
  [DOCS] Update how to create a checkpoint with Test YAML config (#7835)
  [DOCS] Updating Checkpoint terms page (#7722)
  [BUGFIX] Adding support for Fluent Batch Requests to context.get_validator (#7808)
  [FEATURE] Plumbing of validation_result_url from cloud response (#7809)
  [MAINTENANCE] Enable S3/Spark Connecting To Your Data tests (#7828)
  [BUGFIX] Handle "persist" directive in "SparkDFExecutionEngine" properly. (#7830)
  [DOCS] Update docs for how_to_initialize_a_filesystem_data_context_in_python (#7831)
  [MAINTENANCE] Clean up: Remove duplicated fixture and utilize deeper filtering mechanism for configuration assertions. (#7825)
  [MAINTENANCE] Enable Spark-S3 Integration tests on Azure CI/CD (#7819)
  [MAINTENANCE] FDS - Datasources can rebuild their own asset data_connectors (#7826)
  [DOCS] Prerequisites Cleanup (#7811)
  [BUGFIX] Azure Package Presence/Absence Tests Strengthening (#7818)
  [RELEASE] 0.16.11 (#7824)
  [MAINTENANCE] Fix pin count. (#7823)
  [BUGFIX] Upper bound `pyathena` due to breaking API in V3 (#7821)
  [MAINTENANCE] Fix linting error. (#7820)
  [BUGFIX] Cloud - Fix FDS Asset has no attribute `_data_connector` (#7813)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants