Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job fields difference #158

Closed
5 tasks done
manycoding opened this issue Aug 27, 2019 · 0 comments
Closed
5 tasks done

Job fields difference #158

manycoding opened this issue Aug 27, 2019 · 0 comments
Assignees
Labels
Type: Feature New feature or request
Milestone

Comments

@manycoding
Copy link
Contributor

manycoding commented Aug 27, 2019

As in #157, write a rule which returns field values difference between 2 jobs.

difference(source_df, target_df, ["name", "url", "id"], dropna=false)
>>>Difference    FAILED

details part

>>>500 new names, 500 same names
450 names missing y, u, i: 1,  50, 30
150 new urls, 850 same urls
100 urls missing x, y, z: 5, 10..
20 new ids, 980 same ids
10 ids missing a, b, c: 77, 11, 55
  • Fail condition - assuming we use this rule if we know that certain threshold of values should be the same. E.g. if more than 25% of values are different, it is a failure.
  • dropna option, meaning if we exclude nan values from comparison. What's default value? Check
  • Any data is allowed (numeric, nan, string, lists, dicts).
  • Lists and dicts are converted to str. Strings are normalized (lower, strip).
  • Do we need to output values? See example
@manycoding manycoding added the Type: Feature New feature or request label Aug 27, 2019
@manycoding manycoding added this to the 0.3.7 milestone Sep 4, 2019
@manycoding manycoding self-assigned this Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant