Add fisher's exact #373

SangamSwadiK · 2022-10-10T12:27:27Z

Issue:
#345 (Add fisher's exact test for binary data)

What does this implement/fix ?
This adds fisher's exact test, its doc and necessary changes in the How to examples.

test implementation
unittests
doc
examples

* rework data drift metrics * fix format and imports * fix notebooks * add empty check after data clean for drift + some refactoring * fix imports * add threshold for DatasetDriftMetric add tails in DatasetDriftMetric visual * refactor data drift * refactor data drift * add tests for DatasetDriftMetric * fix checks and titles for drift * fix style * update title in ColumnDriftMetric * implement columns for DatasetDriftMetric and DataDriftTable * fix data structure and json output for DataDriftTable * fix data structure and json output for DatasetDriftMetric * fix after main merge * fix with black

* add reworked ColumnRegExpMetric * move ColumnRegExpMetric to a separate module, fix visual, add unittests * fix table in html view, update an example * fix ColumnRegExpMetric import in notebooks * fix notebook imports * add tabs for ColumnRegExpMetric * fix after main merge * fix after main merge * fix imports with isort

Added some examples of tests and test presets usage Removed outdated example with metrics

…i#360)

* fix warning about duplicated columns in correlation calculation in data drift * make a new list, do not modify num_feature_names

add fisher's exact, its test and doc

emeli-dral · 2022-10-17T18:12:53Z

Hey @SangamSwadiK,

The test looks very good!

It is designed to work with binary data, and I have a corner case, which I believe worth supporting.
Here is the example. Imagine I have a binary feature and suddenly one of two values disappeared from the data? In this case it is still binary data, but test fails on such an input:

_fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

ValueError                                Traceback (most recent call last)
Input In [40], in <cell line: 1>()
----> 1 _fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

Input In [34], in _fisher_test(reference_data, current_data, feature_type, threshold)
      4 """Calculate the p-value of Fisher's exact test between two arrays
      5 Args:
      6     reference_data: reference data
   (...)
     12     test_result: whether the drift is detected
     13 """
     15 contingency_matrix = pd.crosstab(reference_data, current_data)
---> 17 _, p_value = fisher_exact(contingency_matrix)
     19 return p_value, p_value < threshold

File /opt/homebrew/Caskroom/miniconda/base/envs/evidently53/lib/python3.9/site-packages/scipy/stats/_stats_py.py:4278, in fisher_exact(table, alternative)
   4276 c = np.asarray(table, dtype=np.int64)
   4277 if not c.shape == (2, 2):
-> 4278     raise ValueError("The input `table` must be of shape (2, 2).")
   4280 if np.any(c < 0):
   4281     raise ValueError("All values in `table` must be nonnegative.")

ValueError: The input `table` must be of shape (2, 2).

Could you please take a look and support this scenario?

I also suggest to raise a ValueError that will appear if you have more than 2 unique values in the input data and explicitly tell the user that the test should be applied to the binary data only. I think it might be more straightforward comparing to ValueError: The input `table` must be of shape (2, 2). What do you think?

SangamSwadiK · 2022-10-18T13:12:58Z

Hey @SangamSwadiK,

The test looks very good!

It is designed to work with binary data, and I have a corner case, which I believe worth supporting. Here is the example. Imagine I have a binary feature and suddenly one of two values disappeared from the data? In this case it is still binary data, but test fails on such an input:
_fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

ValueError                                Traceback (most recent call last)
Input In [40], in <cell line: 1>()
----> 1 _fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

Input In [34], in _fisher_test(reference_data, current_data, feature_type, threshold)
      4 """Calculate the p-value of Fisher's exact test between two arrays
      5 Args:
      6     reference_data: reference data
   (...)
     12     test_result: whether the drift is detected
     13 """
     15 contingency_matrix = pd.crosstab(reference_data, current_data)
---> 17 _, p_value = fisher_exact(contingency_matrix)
     19 return p_value, p_value < threshold

File /opt/homebrew/Caskroom/miniconda/base/envs/evidently53/lib/python3.9/site-packages/scipy/stats/_stats_py.py:4278, in fisher_exact(table, alternative)
   4276 c = np.asarray(table, dtype=np.int64)
   4277 if not c.shape == (2, 2):
-> 4278     raise ValueError("The input `table` must be of shape (2, 2).")
   4280 if np.any(c < 0):
   4281     raise ValueError("All values in `table` must be nonnegative.")

ValueError: The input `table` must be of shape (2, 2).
Could you please take a look and support this scenario?

I also suggest to raise a ValueError that will appear if you have more than 2 unique values in the input data and explicitly tell the user that the test should be applied to the binary data only. I think it might be more straightforward comparing to ValueError: The input `table` must be of shape (2, 2). What do you think?

Hi ! @emeli-dral
Thanks for pointing out the the boundary cases I missed, I assumed pandas crosstab handled them but it didn't.
I found some more boundary cases and added them.

Here's a summary of changes I made.

Remove crosstab and implement contingency matrix for 2x2
Support boundary cases
Raise user-friendly error messages when the test and contingency matrix is incorrectly being used
Update tests for contingency matrix and fisher's exact

Please let me know if any changes are needed or if I missed something.

Thanks for reviewing !

emeli-dral · 2022-10-20T10:28:37Z

Hi @SangamSwadiK,

Thank you so much for changes you made!
I like a lot how you implemented exact test, and especially your approach for testing not only stattest function but utils as well.

I have a question about corner cases we discussed before.
Here is my example:

_fisher_exact_stattest(pd.Series(["a","a","a","a","a","a","a","a","a","a"]), 
                       pd.Series(["b","b","b","b","b","b","b","b","b","b"]), feature_type='cat', threshold=0.1)
(1.0, False)

I see that there are no fails, which is great! But I'm a little surprised that we do not detect a drift here.
What do you think?

SangamSwadiK · 2022-10-21T06:35:01Z

Hi @SangamSwadiK,

Thank you so much for changes you made! I like a lot how you implemented exact test, and especially your approach for testing not only stattest function but utils as well.

I have a question about corner cases we discussed before. Here is my example:
_fisher_exact_stattest(pd.Series(["a","a","a","a","a","a","a","a","a","a"]), 
                       pd.Series(["b","b","b","b","b","b","b","b","b","b"]), feature_type='cat', threshold=0.1)
(1.0, False)
I see that there are no fails, which is great! But I'm a little surprised that we do not detect a drift here. What do you think?

Hi, I think the reason why it is not detecting a drift is because of multiple zeroes in contingency table i.e the table is not a 2x2 , which leads to odds ratio as NaN or inf. This is a limitation for fisher's exact and same thing happens with chi-square for some of the corner cases as well (where pval goes to NaN or the statistic goes to inf).
i.e if there are multiple zeros in the contingency table (in the format below) eg :

[ [ 4, 0], [ 0, 0] ] (or) [ [0, 4], [0, 0] ] (or ) [ [0, 0], [4, 0] ]

I think these cases have to be handled explicitly with some warning to user stating there are lot of zeros in contingency matrix.

SangamSwadiK · 2022-10-21T07:26:58Z

~~I'm working on it, will update this by end of day.~~

SangamSwadiK · 2022-10-21T18:36:53Z

I think there were some bugs 😅
I think I've covered the corner cases now and added them to tests.

emeli-dral · 2022-10-25T15:14:00Z

Hi @SangamSwadiK,

I have a suggestion how to act on it.

Indeed the test itself will not compute in case we have only 1 category in reference, and 1 (different) category in current.

However this might indeed happen in practice when you deal with binary features. From the user standpoint, when this happens, it is something you want to know about. If all values are different between reference and current, we can safely assume that the feature has drifted.

I would suggest to implement this case separately. Instead of returning the value error, we can return the “drift detected” outcome when all values are different (without actually computing the statistical test).

emeli-dral · 2022-10-25T15:14:23Z

I am also labelling the issues as HF-accepted to acknowledge all the work done!

SangamSwadiK · 2022-10-25T15:59:03Z

Hi @emeli-dral !! Apologies to keep tagging you.

I think the tests cover the case when we have 1 category in reference, and 1 (different) category in current. I thought of handling cases where it is different, explicitly, but since the test covered it I stopped there.

I didn't write them explicitly because the p-values vary a lot with different lengths of both reference and current even though they are flipped entirely

This is because of permutation in fisher's test, the p-values are affected by the permutations of the table.
I think the longer the length of data, there are more permutations of the table (as there are multiple ways of rearranging the elements in order to get the same marginals)

Please check the below examples ;

reference = pd.Series(["b", "b", "b", "a", "a"] * 30)
current =   pd.Series(["a", "a", "a", "b", "b"] * 30)
The P-value from fishers test  is  0.000784

reference = pd.Series(["b", "b", "b", "a", "a"] * 10)
current =   pd.Series(["a", "a", "a", "b", "b"] * 10)
The P-value from fishers test  us 0.0713


reference = pd.Series(["b", "b", "b", "a", "a"] * 5)
current =   pd.Series(["a", "a", "a", "b", "b"] * 5)
The P-value from fishers test  is 0.2577

Here  both are entirely different as below : 
reference = pd.Series(["b", "b", "b", "b", "b"] )
current =   pd.Series(["a", "a", "a", "a", "a"] )
The P-value from fisher test is  0.00793

I previously had thought of implementing Hamming Distance or complement to cover the cases explicitly as you just suggested.

Since the p-values differ based on the length of the input arrays.
Because of the above , I hesitated to directly return a p-value of 0 and True for drift detected in the case of completely different Reference and Current.
However I can add the code for handling it explicitly. What do you suggest ?

Thanks for reviewing!!

emeli-dral · 2022-10-26T11:03:44Z

Hi @SangamSwadiK ,

I checked the latest version and I like it a lot!

"I didn't write them explicitly because the p-values vary a lot with different lengths of both reference and current even though they are flipped entirely" - I agree with your approach. I think it is better to return p-value here, it might be useful for something like drift size comparison, when users need to order features by drift size and act on top of this order.

Thank you a lot for your contribution! I'm happy to merge 🥳

@elenasamuylova this is a very nice example of how pytests should be implemented.

SangamSwadiK force-pushed the fishers_exact branch from 1aedcb0 to e6c8e72 Compare October 10, 2022 12:31

SangamSwadiK and others added 29 commits October 10, 2022 18:02

initial commit

e6c8e72

Add Fisher's exact test

5015ba2

Merge remote-tracking branch 'upstream/main' into fishers_exact

1f66c8d

Replace MetricHtmlInfo by BaseWidgetInfo. Make id uuid by default.

f60eddc

add anderson ksamp and its test

157a6ad

fix doc

6fd6133

fix description

db88865

added hellinger_distance for drift detection

f1b747e

isort

32a9026

Delete index.js.LICENSE.txt

53a436a

Delete index.js

105c8c3

Added some examples of metrics and metric presets usage

a3698f0

Added some examples of tests and test presets usage Removed outdated example with metrics

move ColumnRegExpMetric data classes to the metric module (evidentlya…

b1ad043

…i#360)

fix warning about duplicated columns in data drift (evidentlyai#361)

27279b4

* fix warning about duplicated columns in correlation calculation in data drift * make a new list, do not modify num_feature_names

Added the example of stattest specification for TestSuites

7cf21a8

Update readme.md

f7d079f

Update readme.md

fda67c4

add anderson example in notebook

6057f4a

remove used features from wasserstein

88ab11a

fix anderson not found

6a8fbe3

check custom test

2a667ce

Update all-tests.md

7531360

Update run-tests.md

30ba651

Update run-tests.md

4a677a1

Update README.md

16d786a

Add files via upload

8eebb7a

Update README.md

64845c3

mertbozkir and others added 3 commits October 15, 2022 21:13

Update test_stattests.py

2c94e17

Merge pull request #2 from mertbozkir/fishers_exact

2afdd94

add fisher's exact, its test and doc

fix lint,sort

fd3e30d

SangamSwadiK changed the title ~~[WIP] Add fisher's exact~~ Add fisher's exact Oct 16, 2022

SangamSwadiK added 2 commits October 18, 2022 11:40

merge upstream

a579abf

Fix contingency matrix boundary cases, and add tests

8e4de7f

SangamSwadiK force-pushed the fishers_exact branch from d38be57 to 9e4eea2 Compare October 18, 2022 12:31

SangamSwadiK added 2 commits October 18, 2022 18:02

merge upstream and resolve conflicts

9e4eea2

fix conflicts

dde227e

merge upstream

a53caba

emeli-dral self-requested a review October 20, 2022 10:11

emeli-dral added enhancement New feature or request hacktoberfest Accepted contributions will count towards your hacktoberfest PRs labels Oct 20, 2022

SangamSwadiK added 2 commits October 21, 2022 22:37

fix fisher's exact test

09c78dc

fix mypy

0a65357

SangamSwadiK force-pushed the fishers_exact branch 2 times, most recently from 88c4575 to a09e386 Compare October 21, 2022 18:45

fix black and remove checks

a09e386

emeli-dral added the hacktoberfest-accepted label Oct 25, 2022

emeli-dral merged commit 10ea83c into evidentlyai:main Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fisher's exact #373

Add fisher's exact #373

SangamSwadiK commented Oct 10, 2022 •

edited

emeli-dral commented Oct 17, 2022

SangamSwadiK commented Oct 18, 2022 •

edited

emeli-dral commented Oct 20, 2022

SangamSwadiK commented Oct 21, 2022 •

edited

SangamSwadiK commented Oct 21, 2022 •

edited

SangamSwadiK commented Oct 21, 2022 •

edited

emeli-dral commented Oct 25, 2022

emeli-dral commented Oct 25, 2022

SangamSwadiK commented Oct 25, 2022 •

edited

emeli-dral commented Oct 26, 2022

Add fisher's exact #373

Add fisher's exact #373

Conversation

SangamSwadiK commented Oct 10, 2022 • edited

emeli-dral commented Oct 17, 2022

SangamSwadiK commented Oct 18, 2022 • edited

emeli-dral commented Oct 20, 2022

SangamSwadiK commented Oct 21, 2022 • edited

SangamSwadiK commented Oct 21, 2022 • edited

SangamSwadiK commented Oct 21, 2022 • edited

emeli-dral commented Oct 25, 2022

emeli-dral commented Oct 25, 2022

SangamSwadiK commented Oct 25, 2022 • edited

emeli-dral commented Oct 26, 2022

SangamSwadiK commented Oct 10, 2022 •

edited

SangamSwadiK commented Oct 18, 2022 •

edited

SangamSwadiK commented Oct 21, 2022 •

edited

SangamSwadiK commented Oct 21, 2022 •

edited

SangamSwadiK commented Oct 21, 2022 •

edited

SangamSwadiK commented Oct 25, 2022 •

edited