Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Restore cli functionality for legacy checkpoints #2511

Merged
merged 18 commits into from Mar 16, 2021

Conversation

roblim
Copy link
Member

@roblim roblim commented Mar 5, 2021

Changes proposed in this pull request:

  • Restore CLI functionality for legacy checkpoints (checkpoint new, checkpoint run, checkpoint script) when using a GE config >= 3.0

@roblim
Copy link
Member Author

roblim commented Mar 5, 2021

We had originally removed support for great_expectations checkpoint new, great_expectations checkpoint run, and great_expectations checkpoint script for GE config versions >= 3.0, under the assumption that these users would be expecting the commands to be creating/running new style checkpoints, which wasn't yet supported in the CLI, and since legacy checkpoints rely on validation operators (which were removed as a config requirement in config v3.0).

However, after conversations with @spbail, we determined it makes sense to restore the above commands since validation operators and legacy checkpoints are still supported with ge config versions >= 3.0. In addition, the CLI now has --v3-api and --v2-api flags, so behavior is no longer ambiguous.

I confirmed that the commands still work with ge config v3.0, but some adjustments need to be made. Some notes from manual testing - assume the default --v2-api:

  • great_expectations checkpoint new works with old style datasources only, but the checkpoint creation flow will list all datasources found in the ge config (both old and new). If a user selects a newstyle datasource, they'll get exceptions. We should filter the datasources list to include only old style datsources
  • great_expectations checkpoint new will still work if validation operators are not configured, even though the checkpoint that is created assumes a validation operator named action_list_operator exists in the config.
  • great_expectations checkpoint run and great_expectations checkpoint script work for checkpoints created using great_expectations checkpoint new, as long as a validation operator named action_list_operator exists in the config:
validation_operators:
  action_list_operator:
    # To learn how to configure sending Slack notifications during evaluation
    # (and other customizations), read: https://docs.greatexpectations.io/en/latest/reference/validation_operators/action_list_validation_operator.html
    class_name: ActionListValidationOperator
    action_list:
    - name: store_validation_result
      action:
        class_name: StoreValidationResultAction
    - name: store_evaluation_params
      action:
        class_name: StoreEvaluationParametersAction
    - name: update_data_docs
      action:
        class_name: UpdateDataDocsAction
  • If a user is starting fresh using great_expectations init, the default validation_operators are not added to GE config
  • Possible options to address the issues with validation_operators:
    • LegacyCheckpoint already has fallback logic to handle case where no validation_operator_name is configured (see great_expectations.checkpoint.checkpoint.LegacyCheckpoint._run_default_validation_operator). We can extend this to also apply to cases where action_list_operator is the set validation_operator_name, but it is not found in config (_run_default_validation_operator basically instantiates a validation operator with the above config, then uses it to run)
    • Also check if action_list_operator exists in config in the checkpoint new command
    • If command is used that relies on action_list_operator and it does not exist in config, issue warning to user and offer to update config automatically as part of the flow, or include snippet in warning.
    • since CLI currently defaults to --v2-api, we might want to restore validation operators in config created with great_expectations init

@spbail
Copy link
Contributor

spbail commented Mar 9, 2021

Replies inline:

  • great_expectations checkpoint new works with old style datasources only, but the checkpoint creation flow will list all datasources found in the ge config (both old and new). If a user selects a newstyle datasource, they'll get exceptions. We should filter the datasources list to include only old style datsources

Tbh I'm actually ok with this leading to an error, since it's unlikely/not necessarily intended to have old- and new-style datasources in the same config.

  • great_expectations checkpoint new will still work if validation operators are not configured, even though the checkpoint that is created assumes a validation operator named action_list_operator exists in the config.

Tested and confirmed.

  • great_expectations checkpoint run and great_expectations checkpoint script work for checkpoints created using great_expectations checkpoint new, as long as a validation operator named action_list_operator exists in the config:
  • Possible options to address the issues with validation_operators:

    • LegacyCheckpoint already has fallback logic to handle case where no validation_operator_name is configured (see great_expectations.checkpoint.checkpoint.LegacyCheckpoint._run_default_validation_operator). We can extend this to also apply to cases where action_list_operator is the set validation_operator_name, but it is not found in config (_run_default_validation_operator basically instantiates a validation operator with the above config, then uses it to run)

I think fallback logic makes the most sense to me. I see that checkpoint new on your branch now creates a LegacyCheckpoint, so if there's no validation_operator found in the config (which it won't in a v3 config, unless someone adds it back manually and ignores the warnings), it should just use a default like SimpleCheckpoint. I think that's what you're saying? :D (I might be misinterpreting this a little, let's chat!)

  • Also check if action_list_operator exists in config in the checkpoint new command
  • If command is used that relies on action_list_operator and it does not exist in config, issue warning to user and offer to update config automatically as part of the flow, or include snippet in warning.

I wouldn't necessarily go that far, I think it's fine to just fall back onto a default validation_operator when a LegacyCheckpoint is run and no configured validation_operator is found in the config. Maybe output a warning/info that it's using a default.

  • since CLI currently defaults to --v2-api, we might want to restore validation operators in config created with great_expectations init

We did retire validation operators with the v3 data context config version, so I would prefer to not revert that.

@roblim
Copy link
Member Author

roblim commented Mar 9, 2021

Per conversation with @spbail, will implement fallback logic for LegacyCheckpoint to use above default action_list_operator if LegacyCheckpoint config specifies a validation operator name that cannot be found in GE config.

@roblim roblim marked this pull request as ready for review March 15, 2021 07:32
@roblim roblim requested a review from spbail March 15, 2021 07:32
@roblim roblim assigned roblim and unassigned spbail Mar 15, 2021
@roblim roblim changed the title [WIP][ENHANCEMENT] Restore cli functionality for legacy checkpoints [ENHANCEMENT] Restore cli functionality for legacy checkpoints Mar 15, 2021
Copy link
Contributor

@spbail spbail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tested out the following configs for checkpoint new - run- script:

  • v3 config and v2 datasource
  • v2 config (with validation operators) and v2 datasource
  • v3 config and v3 datasource - correct error (need to use --v3-api flag for a v3 datasource)
  • v2 config (with validation operators) and v3 datasource - correct error (need to use --v3-api flag for a v3 datasource)

Sam Bail and others added 4 commits March 16, 2021 13:34
Co-authored-by: Sam Bail <sam@superconductive.com>
Co-authored-by: Sam Bail <sam@superconductive.com>
Co-authored-by: Sam Bail <sam@superconductive.com>
@roblim roblim enabled auto-merge (squash) March 16, 2021 17:37
@roblim roblim merged commit eceee60 into develop Mar 16, 2021
@roblim roblim deleted the legacy-checkpoint-cli branch March 16, 2021 18:56
peterdhansen pushed a commit to peterdhansen/great_expectations that referenced this pull request Mar 29, 2021
…-expectations#2511)

* Restore cli functionality for legacy checkpoints

* Nix legacy arg

* Update checkpoint script template

* Also use fallback validation operator if given name not found in context

* Nix validation_operator_name

* Update test

* Nix no-longer relevant tests

* Linting

* Update great_expectations/cli/v012/checkpoint.py

Co-authored-by: Sam Bail <sam@superconductive.com>

* Update great_expectations/cli/v012/checkpoint.py

Co-authored-by: Sam Bail <sam@superconductive.com>

* Update great_expectations/cli/v012/checkpoint_script_template.py

Co-authored-by: Sam Bail <sam@superconductive.com>

Co-authored-by: Sam Bail <sam@superconductive.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants