New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #2981 - Update Profile to match TableProfile #2982
Conversation
Schema Change Detected. Needs ingestion-core version bump Please run |
# Conflicts: # ingestion-core/src/metadata/_version.py
"description": "No.of null value proportion in columns.", | ||
"type": "number" | ||
}, | ||
"missingPercentage": { | ||
"missingRatio": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these changes are an issue now, I can revert this and change the naming on the python side. We can tackle it later if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets not change it. UI can fix it.
Ill revert the datatype changes and handle them separatedly as it is a handful |
catalog-rest-service/src/main/resources/json/schema/entity/data/table.json
Outdated
Show resolved
Hide resolved
@pmbrull overall LGTM. Left couple of comments. If we don't need numpy lets not include it |
also we need to update the sample data |
This reverts commit 50702c3.
[open-metadata-ingestion] Kudos, SonarCloud Quality Gate passed! |
}, | ||
"max": { | ||
"description": "Maximum value in a column.", | ||
"type": ["number", "integer", "string"] | ||
}, | ||
"minLength": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @harshach, reverted all other changes and removed numpy dependency. The only change I'd like to stick here is: adding minLength
and maxLength
and updating the type for min
and max
. Looks like the tests are passing and this way I can compute it for numbers and strings
[catalog] Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Describe your changes :
This PR fixes #2981 and closes #2980
Reviewing the API definitions, we noticed that our current approach to computing profiling data was not properly synced with the TableProfile definition.
Instead of adding yet-another-step and mapping the results artificially, we have done a refactor on how the profiling data gets computed. We are now relying on a single Profiler object, which is able to compute both table and column metrics and itself handles all the columns in the table.
Its results can then be safely parsed into TableProfile without any issues.
In terms of the JSON Schema, we modified a bit the types of the metrics to better represent the results and added a couple of metrics that were not present and we are already computing. We also changed a few names of the metrics.⚠️ This might have an impact on the UI as well @Sachin-chaurasiya, @darth-coder00 ⚠️ However, we still need to work on the UI anyway for this EPIC so we could maybe live with it for a while.
We also added a method to manually name the metrics, which will make it easier to map with TableProfile specs.
An interesting addition is the function
add_props
in metrics/core. As we now perform the metrics instantiation inside the Profiler, we are passing the class definition as an input parameter. However, there are metrics that require specific attributes such asbins
orexpression
. To have the Profiler as generic as possible and not keep checking specific metrics for individualities,add_props
dynamically prepares a new class definition overriding the__init__
method to add any required attribute we pass as**kwargs
.This means that we can run
new_hist = add_props(bins=5)(Metrics.HISTOGRAM.value)
andnew_hist
will be a completely independent and new class definition that replicatesMetrics.HISTOGRAM.value
but will assignbins=5
to all its instantiations.Happy to discuss!
Thanks
Type of change :
Frontend Preview (Screenshots) :
For frontend related change, please link screenshots of your changes preview! Optional for backend related changes.
Checklist:
Reviewers