You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I'd like to get the results of my metrics reports as quickly as possible.
We performed an audit on the QualityReport since it seemed to be slow. The conclusion was that most of the time is lost in the contingency similarity metric. More specifically, these lines
Without changing the algorithm at all the goal of this issue is to improve the performance of contingency_similarity. Optimizations that are in scope include
Trying different pandas or numpy functions instead of crosstab
Trying to do any type conversions at a higher level (eg. the astype(str) calls are happening multiple times on the same columns)
Seeing if there is a more efficient way to compute the table
The optimizations should not change the overall algorithm of the metric.
Additional context
If not many optimizations can be made, we can follow up with a different issue
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, I'd like to get the results of my metrics reports as quickly as possible.
We performed an audit on the
QualityReport
since it seemed to be slow. The conclusion was that most of the time is lost in the contingency similarity metric. More specifically, these linesSDMetrics/sdmetrics/column_pairs/statistical/contingency_similarity.py
Lines 45 to 54 in 685731f
This is the performance report's visualization
Expected behavior
Without changing the algorithm at all the goal of this issue is to improve the performance of
contingency_similarity
. Optimizations that are in scope includeThe optimizations should not change the overall algorithm of the metric.
Additional context
The text was updated successfully, but these errors were encountered: