Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set' #6097

Conversation

Arnavkar
Copy link
Contributor

@Arnavkar Arnavkar commented Sep 27, 2022

[FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set' sqlalchemy dataset.py file

Changes proposed in this pull request:

  • Allows users to make use of the pairwise expectation 'expect_column_pair_values_to_be_in_set' for SQLAlchemy backends in v2 GE
    -Provides the 'column_pair_map_expectation 'decorator for other pairwise expectations to be developed for SQLAlchemy backends in v2 GE

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

@netlify
Copy link

netlify bot commented Sep 27, 2022

👷 Deploy request for niobium-lead-7998 pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 348bba3

@anthonyburdi anthonyburdi self-assigned this Oct 19, 2022
@anthonyburdi anthonyburdi self-requested a review October 19, 2022 21:16
@anthonyburdi
Copy link
Member

Hi @Arnavkar! Thank you for contributing to Great Expectations! I'll be taking a look at this contribution. It looks like the linter is picking up some issues, would you mind running invoke hooks from the repo root & merging develop into your branch? That will run all of our automated linting.

@anthonyburdi
Copy link
Member

Hi @Arnavkar! I was able to make commits to your branch for satisfying the linter and merging develop. When you have a moment would you be able to sign the CLA? I'll submit my code review soon as well.

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 20, 2022 via email

@anthonyburdi
Copy link
Member

@cla-bot check

@anthonyburdi
Copy link
Member

Hi @Arnavkar! If you don't mind can you also sign with arnav.shirodkar@morningstar.com? I am just seeing your .edu email address. If that doesn't work we can override.

Right now I'm enabling tests for the new expectation and verifying.

Copy link
Member

@anthonyburdi anthonyburdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution! I made one small edit to enable the tests for the new expectation and ran the linter, but otherwise LGTM!

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 21, 2022 via email

@anthonyburdi anthonyburdi enabled auto-merge (squash) October 24, 2022 13:33
@anthonyburdi
Copy link
Member

Hi @Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 24, 2022 via email

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 24, 2022 via email

@anthonyburdi
Copy link
Member

Hi @Arnavkar
Are you able to see this link of our CI pipeline? I'm looking at the details of the automated checks at the bottom of this PR. Specifically the failing dev (db_integration mssql) stage. No worries if you are not able to view that link. It looks like the following tests are failing for mssql:

FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:positive_test_with_nulls_and_ignore_row_if_either_value_is_missing]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:negative_test_with_nulls]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:another_positive_test_with_nulls]

The test command that is being run in that CI pipeline is pytest tests --mssql. There are a few other directives to ignore other tests - you can use --ignore 'tests/cli' --ignore 'contrib/cli/tests' --ignore 'tests/integration/usage_statistics' to do the same. You can also just run the failing tests. To run these tests will require setting up an mssql instance, e.g. running docker compose up -d in the great_expectations/assets/docker/mssql folder.

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 25, 2022 via email

@anthonyburdi
Copy link
Member

@Arnavkar I'm having trouble as well, but my error leads me to think the database is not being created when running the docker compose script. We create the database within our CI pipeline so let me see if I can get it created when creating the docker container so we can debug the failing tests locally.
In the worst case, we can skip support for mssql for this expectation if it is too involved and tackle that in a subsequent PR.

@anthonyburdi
Copy link
Member

@Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:

  1. Run docker compose up -d in the great_expectations/assets/docker/mssql folder of the repository
  2. Run /opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;" in the docker container CLI (you can use a command or docker desktop)
  3. Run one of the failing tests via pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]", replace the test name in quotes after the -k to run other tests.

Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:

E       sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)")
E       [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition
E       FROM [test_data_N74VdOGu]]
E       [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)]
E       (Background on this error at: https://sqlalche.me/e/14/f405)

../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 25, 2022 via email

@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 26, 2022 via email

@anthonyburdi
Copy link
Member

anthonyburdi commented Oct 26, 2022

Yes I think that is a good plan. Thank you for trying! I also spent a few hours trying to get mssql to work as well and I was not successful.
I will try to spend some time today or tomorrow adding in the skip mssql logic. It's a little tricky to explain (we need to only skip mssql for v2 and not v3) so I may just add it if that is OK with you.

I also hope to add a readme to the mssql docker compose folder with the command to create the test_ci database for easier set up in the future. (I tried to add this command to the docker_compose file, but I could not get it working). Submitted separately: #6211

@anthonyburdi anthonyburdi enabled auto-merge (squash) October 26, 2022 15:46
@Arnavkar
Copy link
Contributor Author

Arnavkar commented Oct 26, 2022 via email

@anthonyburdi anthonyburdi merged commit 7d689fa into great-expectations:develop Oct 26, 2022
Shinnnyshinshin pushed a commit that referenced this pull request Oct 27, 2022
…ue-z-scores-to-be-less-than

* develop:
  [BUGFIX] Patch issue with empty config variables file raising `TypeError` (#6216)
  [MAINTENANCE] Execution Engine linting & partial typing (#6210)
  [FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set' (#6097)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants