Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider returning expectation_type as part of expectation results #574

Closed
schrockn opened this issue Aug 5, 2019 · 7 comments
Closed

Comments

@schrockn
Copy link

schrockn commented Aug 5, 2019

It would be useful to have the metadata in the expectations suite available in the return value of an individual expectation.

e.g.:

    ge_evr = df.expect_column_mean_to_be_between('num1', 0, 10)
    print(ge_evr)

Prints

{'success': True, 'result': {'observed_value': 2.0, 'element_count': 2, 'missing_count': 0, 'missing_percent': 0.0}}

It would be very nice to have another entry (expectation maybe?) that has the information that would be in the expectation_suite:

{
    'success': True, 
    'result': {'observed_value': 2.0, 'element_count': 2, 'missing_count': 0, 'missing_percent': 0.0},
    'expectation':     {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwargs": {
        "column": "num1",
        "min_value": 0,
        "max_value": 10
      }
    }
}

Use case is building logging around ge that operates on results individually on a generic basis.

@abegong
Copy link
Member

abegong commented Aug 5, 2019

The fields you're looking for will be available if you use df.validate instead of running the expectation method alone.

The original idea was that the df._expect_whatever syntax was intended for exploration, and df.validate was intended for deployment.

We did NOT originally intend for you to call:

df.expect_something
df.expect_something_else
df.expect_another_thing

as part of your actual deployed code.

There are some pretty huge benefits to gathering up Expectations as persistent, sharable artifacts.

That said, we're rethinking workflow options for exploration and validation in GE, so if you have strong preferences, please speak up.

@schrockn
Copy link
Author

schrockn commented Aug 5, 2019

While the expectation suite file format has advantages when using tooling that leverages it (the web-based editor etc) it is also useful just be able to call the function directly. Calling the expectations directly seems like a perfectly fine thing to do in production code. Why introduce a new external DSL just to call a function? Supporting the expect_* methods in a first-class will allow users to layer on top in novel ways that might have different tradeoffs than the expectation suite file format.

@jcampbell
Copy link
Member

The expectation_type field is included in the returned data when include_config is true, which can be set as the default value on a data asset. I think it may be useful to invert the default; what do you think?

@schrockn
Copy link
Author

schrockn commented Aug 7, 2019

I see. I'd recommend just including it unconditionally and then reducing your API surface area to not have to worry about include_config at all. It doesn't really cost you that much to always include it.

Full context is that I'm trying to have a simple generic function that takes the result of a GE expectation (I think you call them evrs?) and creates a dagster Expectation object, which is then rendered in UI and used by tooling.

    df = ge.read_csv(file_relative_path(__file__, './num.csv'))
    er = create_expectation_result(
        'expect_column_mean_to_be_between', df.expect_column_mean_to_be_between('num1', 0, 10)
    )

So right now we have to pass in a string of the function name in order to properly label the created expectation. It would be nasty to only have the function work if include_config=True.

@stale
Copy link

stale bot commented Mar 11, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix label Mar 11, 2020
@Aylr Aylr removed the wontfix label Mar 13, 2020
@jcampbell
Copy link
Member

Expectation config is now returned by default!

@vaidehid1305
Copy link

Hi, I am facing similar issue while working with Great Expectations version 0.13.20.
I used include_config = True as a part of my individual expectations, but still i do not see the expectation suite info in the result set.
I am working with CSV and Athena datasources and observing issue with both.

Any help on this would be appreciated.

Below is an example:
batch.expect_column_values_to_not_be_null(column='date', result_format= "COMPLETE", include_config = True)

{
"success": true,
"result": {
"element_count": 1000000,
"unexpected_count": 0,
"unexpected_percent": 0.0,
"unexpected_percent_total": 0.0,
"partial_unexpected_list": [],
"unexpected_list": [],
"unexpected_index_list": []
},
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"meta": {}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants