-
Notifications
You must be signed in to change notification settings - Fork 533
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initial commit * Add Fisher's exact test * Replace MetricHtmlInfo by BaseWidgetInfo. Make id uuid by default. * New data drift metrics (#339) * rework data drift metrics * fix format and imports * fix notebooks * add empty check after data clean for drift + some refactoring * fix imports * add threshold for DatasetDriftMetric add tails in DatasetDriftMetric visual * refactor data drift * refactor data drift * add tests for DatasetDriftMetric * fix checks and titles for drift * fix style * update title in ColumnDriftMetric * implement columns for DatasetDriftMetric and DataDriftTable * fix data structure and json output for DataDriftTable * fix data structure and json output for DatasetDriftMetric * fix after main merge * fix with black * add reworked ColumnRegExpMetric (#348) * add reworked ColumnRegExpMetric * move ColumnRegExpMetric to a separate module, fix visual, add unittests * fix table in html view, update an example * fix ColumnRegExpMetric import in notebooks * fix notebook imports * add tabs for ColumnRegExpMetric * fix after main merge * fix after main merge * fix imports with isort * add anderson ksamp and its test * fix doc * fix description * added hellinger_distance for drift detection * isort * Delete index.js.LICENSE.txt * Delete index.js * Added some examples of metrics and metric presets usage Added some examples of tests and test presets usage Removed outdated example with metrics * move ColumnRegExpMetric data classes to the metric module (#360) * fix warning about duplicated columns in data drift (#361) * fix warning about duplicated columns in correlation calculation in data drift * make a new list, do not modify num_feature_names * Added the example of stattest specification for TestSuites * Update readme.md * Update readme.md * add anderson example in notebook * remove used features from wasserstein * fix anderson not found * check custom test * Update all-tests.md * Update run-tests.md * Update run-tests.md * Update README.md * Add files via upload * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update examples.md * Update examples.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * fix value error messages in data drift calculations (#367) * fix value error messages in data drift calculations * add error messages about missed column * Update missing values metrics (#357) * implement ColumnMissingValuesMetric and move DataIntegrityNullValuesMetrics to DatasetMissingValuesMetric * fix isort and black * fix notebook import and naming * fix isort + black * fix ColumnMissingValuesMetricRenderer and DatasetMissingValuesMetricRenderer * ass sort in ColumnMissingValuesMetric * fix ColumnMissingValuesMetric view * fix DatasetMissingValuesMetric view * some rename null values -> missed values * fix flake8 * add ColumnMissingValuesMetric unit tests * move DatasetMissingValuesMetric to a separate module * add test_dataset_missing_values_metrics_value_error * fix number_of_rows_with_nulls * fix labels texts * update notebook example * initial commit * Add Fisher's exact test * Update test_stattests.py * fix lint,sort * Fix contingency matrix boundary cases, and add tests * fix conflicts * fix fisher's exact test * fix mypy * fix black and remove checks Co-authored-by: Mert Bozkır <mert.bozkirr@gmail.com> Co-authored-by: Vyacheslav Morov <v.morov@corp.mail.ru> Co-authored-by: Tapot <novakche@yandex.ru> Co-authored-by: inderpreetsingh01 <inderpreetsinghchhabra23@gmail.com> Co-authored-by: inderpreetsingh01 <54892545+inderpreetsingh01@users.noreply.github.com> Co-authored-by: Emeli Dral <emeli.dral@gmail.com> Co-authored-by: elenasamuylova <67064421+elenasamuylova@users.noreply.github.com>
- Loading branch information
1 parent
1535059
commit 10ea83c
Showing
8 changed files
with
242 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
56 changes: 56 additions & 0 deletions
56
src/evidently/calculations/stattests/fisher_exact_stattest.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
from typing import Tuple | ||
|
||
import numpy as np | ||
import pandas as pd | ||
from scipy.stats import fisher_exact | ||
|
||
from evidently.calculations.stattests.registry import StatTest | ||
from evidently.calculations.stattests.registry import register_stattest | ||
|
||
from .utils import generate_fisher2x2_contingency_table | ||
|
||
|
||
def _fisher_exact_stattest( | ||
reference_data: pd.Series, current_data: pd.Series, feature_type: str, threshold: float | ||
) -> Tuple[float, bool]: | ||
"""Calculate the p-value of Fisher's exact test between two arrays | ||
Args: | ||
reference_data: reference data | ||
current_data: current data | ||
feature_type: feature type | ||
threshold: all values above this threshold means data drift | ||
Raises: | ||
ValueError: If null or inf values is found in either reference_data or current_data | ||
ValueError: If reference_data or current_data is not binary(unique values exceeds 2) | ||
Returns: | ||
p_value: two-tailed p-value | ||
test_result: whether the drift is detected | ||
""" | ||
|
||
if ( | ||
(reference_data.isnull().values.any()) | ||
or (current_data.isnull().values.any()) | ||
or (reference_data.isin([np.inf, -np.inf]).any()) | ||
or (current_data.isin([np.inf, -np.inf]).any()) | ||
): | ||
raise ValueError( | ||
"Null or inf values found in either reference_data or current_data. Please ensure that no null or inf values are present" | ||
) | ||
|
||
if (reference_data.nunique() > 2) or (current_data.nunique() > 2): | ||
raise ValueError("Expects binary data for both reference and current, but found unique categories > 2") | ||
|
||
contingency_matrix = generate_fisher2x2_contingency_table(reference_data, current_data) | ||
_, p_value = fisher_exact(contingency_matrix) | ||
return p_value, p_value < threshold | ||
|
||
|
||
fisher_exact_test = StatTest( | ||
name="fisher_exact", | ||
display_name="Fisher's Exact test", | ||
func=_fisher_exact_stattest, | ||
allowed_feature_types=["cat"], | ||
default_threshold=0.1, | ||
) | ||
|
||
register_stattest(fisher_exact_test) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters