Consider returning expectation_type as part of expectation results #574

schrockn · 2019-08-05T13:05:04Z

It would be useful to have the metadata in the expectations suite available in the return value of an individual expectation.

e.g.:

    ge_evr = df.expect_column_mean_to_be_between('num1', 0, 10)
    print(ge_evr)

Prints

{'success': True, 'result': {'observed_value': 2.0, 'element_count': 2, 'missing_count': 0, 'missing_percent': 0.0}}

It would be very nice to have another entry (expectation maybe?) that has the information that would be in the expectation_suite:

{
    'success': True, 
    'result': {'observed_value': 2.0, 'element_count': 2, 'missing_count': 0, 'missing_percent': 0.0},
    'expectation':     {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwargs": {
        "column": "num1",
        "min_value": 0,
        "max_value": 10
      }
    }
}

Use case is building logging around ge that operates on results individually on a generic basis.

The text was updated successfully, but these errors were encountered:

abegong · 2019-08-05T14:10:15Z

The fields you're looking for will be available if you use df.validate instead of running the expectation method alone.

The original idea was that the df._expect_whatever syntax was intended for exploration, and df.validate was intended for deployment.

We did NOT originally intend for you to call:

df.expect_something
df.expect_something_else
df.expect_another_thing

as part of your actual deployed code.

There are some pretty huge benefits to gathering up Expectations as persistent, sharable artifacts.

That said, we're rethinking workflow options for exploration and validation in GE, so if you have strong preferences, please speak up.

schrockn · 2019-08-05T14:33:55Z

While the expectation suite file format has advantages when using tooling that leverages it (the web-based editor etc) it is also useful just be able to call the function directly. Calling the expectations directly seems like a perfectly fine thing to do in production code. Why introduce a new external DSL just to call a function? Supporting the expect_* methods in a first-class will allow users to layer on top in novel ways that might have different tradeoffs than the expectation suite file format.

jcampbell · 2019-08-07T01:50:44Z

The expectation_type field is included in the returned data when include_config is true, which can be set as the default value on a data asset. I think it may be useful to invert the default; what do you think?

schrockn · 2019-08-07T02:51:20Z

I see. I'd recommend just including it unconditionally and then reducing your API surface area to not have to worry about include_config at all. It doesn't really cost you that much to always include it.

Full context is that I'm trying to have a simple generic function that takes the result of a GE expectation (I think you call them evrs?) and creates a dagster Expectation object, which is then rendered in UI and used by tooling.

    df = ge.read_csv(file_relative_path(__file__, './num.csv'))
    er = create_expectation_result(
        'expect_column_mean_to_be_between', df.expect_column_mean_to_be_between('num1', 0, 10)
    )

So right now we have to pass in a string of the function name in order to properly label the created expectation. It would be nasty to only have the function work if include_config=True.

stale · 2020-03-11T11:54:23Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

jcampbell · 2020-03-16T11:54:38Z

Expectation config is now returned by default!

vaidehid1305 · 2021-08-03T14:43:28Z

Hi, I am facing similar issue while working with Great Expectations version 0.13.20.
I used include_config = True as a part of my individual expectations, but still i do not see the expectation suite info in the result set.
I am working with CSV and Athena datasources and observing issue with both.

Any help on this would be appreciated.

Below is an example:
batch.expect_column_values_to_not_be_null(column='date', result_format= "COMPLETE", include_config = True)

{
"success": true,
"result": {
"element_count": 1000000,
"unexpected_count": 0,
"unexpected_percent": 0.0,
"unexpected_percent_total": 0.0,
"partial_unexpected_list": [],
"unexpected_list": [],
"unexpected_index_list": []
},
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"meta": {}
}

stale bot added the wontfix label Mar 11, 2020

Aylr removed the wontfix label Mar 13, 2020

jcampbell closed this as completed Mar 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider returning expectation_type as part of expectation results #574

Consider returning expectation_type as part of expectation results #574

schrockn commented Aug 5, 2019 •

edited by Aylr

Loading

abegong commented Aug 5, 2019

schrockn commented Aug 5, 2019

jcampbell commented Aug 7, 2019

schrockn commented Aug 7, 2019 •

edited

Loading

stale bot commented Mar 11, 2020

jcampbell commented Mar 16, 2020

vaidehid1305 commented Aug 3, 2021

Consider returning expectation_type as part of expectation results #574

Consider returning expectation_type as part of expectation results #574

Comments

schrockn commented Aug 5, 2019 • edited by Aylr Loading

abegong commented Aug 5, 2019

schrockn commented Aug 5, 2019

jcampbell commented Aug 7, 2019

schrockn commented Aug 7, 2019 • edited Loading

stale bot commented Mar 11, 2020

jcampbell commented Mar 16, 2020

vaidehid1305 commented Aug 3, 2021

schrockn commented Aug 5, 2019 •

edited by Aylr

Loading

schrockn commented Aug 7, 2019 •

edited

Loading