Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fisher's exact #373

Merged
merged 68 commits into from
Oct 26, 2022
Merged

Add fisher's exact #373

merged 68 commits into from
Oct 26, 2022

Conversation

SangamSwadiK
Copy link
Contributor

@SangamSwadiK SangamSwadiK commented Oct 10, 2022

Issue:
#345 (Add fisher's exact test for binary data)

What does this implement/fix ?
This adds fisher's exact test, its doc and necessary changes in the How to examples.

  • test implementation
  • unittests
  • doc
  • examples

cc @mertbozkir

SangamSwadiK and others added 29 commits October 10, 2022 18:02
* rework data drift metrics

* fix format and imports

* fix notebooks

* add empty check after data clean for drift + some refactoring

* fix imports

* add threshold for DatasetDriftMetric
add tails in DatasetDriftMetric visual

* refactor data drift

* refactor data drift

* add tests for DatasetDriftMetric

* fix checks and titles for drift

* fix style

* update title in ColumnDriftMetric

* implement columns for DatasetDriftMetric and DataDriftTable

* fix data structure and json output for DataDriftTable

* fix data structure and json output for DatasetDriftMetric

* fix after main merge

* fix with black
* add reworked ColumnRegExpMetric

* move ColumnRegExpMetric to a separate module, fix visual, add unittests

* fix table in html view, update an example

* fix ColumnRegExpMetric import in notebooks

* fix notebook imports

* add tabs for ColumnRegExpMetric

* fix after main merge

* fix after main merge

* fix imports with isort
Added some examples of tests and test presets usage
Removed outdated example with metrics
* fix warning about duplicated columns in correlation calculation in data drift

* make a new list, do not modify num_feature_names
@SangamSwadiK SangamSwadiK changed the title [WIP] Add fisher's exact Add fisher's exact Oct 16, 2022
@emeli-dral
Copy link
Contributor

Hey @SangamSwadiK,

The test looks very good!

It is designed to work with binary data, and I have a corner case, which I believe worth supporting.
Here is the example. Imagine I have a binary feature and suddenly one of two values disappeared from the data? In this case it is still binary data, but test fails on such an input:

_fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

ValueError                                Traceback (most recent call last)
Input In [40], in <cell line: 1>()
----> 1 _fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

Input In [34], in _fisher_test(reference_data, current_data, feature_type, threshold)
      4 """Calculate the p-value of Fisher's exact test between two arrays
      5 Args:
      6     reference_data: reference data
   (...)
     12     test_result: whether the drift is detected
     13 """
     15 contingency_matrix = pd.crosstab(reference_data, current_data)
---> 17 _, p_value = fisher_exact(contingency_matrix)
     19 return p_value, p_value < threshold

File /opt/homebrew/Caskroom/miniconda/base/envs/evidently53/lib/python3.9/site-packages/scipy/stats/_stats_py.py:4278, in fisher_exact(table, alternative)
   4276 c = np.asarray(table, dtype=np.int64)
   4277 if not c.shape == (2, 2):
-> 4278     raise ValueError("The input `table` must be of shape (2, 2).")
   4280 if np.any(c < 0):
   4281     raise ValueError("All values in `table` must be nonnegative.")

ValueError: The input `table` must be of shape (2, 2).

Could you please take a look and support this scenario?

I also suggest to raise a ValueError that will appear if you have more than 2 unique values in the input data and explicitly tell the user that the test should be applied to the binary data only. I think it might be more straightforward comparing to ValueError: The input `table` must be of shape (2, 2). What do you think?

@SangamSwadiK
Copy link
Contributor Author

SangamSwadiK commented Oct 18, 2022

Hey @SangamSwadiK,

The test looks very good!

It is designed to work with binary data, and I have a corner case, which I believe worth supporting. Here is the example. Imagine I have a binary feature and suddenly one of two values disappeared from the data? In this case it is still binary data, but test fails on such an input:

_fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

ValueError                                Traceback (most recent call last)
Input In [40], in <cell line: 1>()
----> 1 _fisher_test([1,0,1,0,1], [0,0,0,0,0], feature_type='cat', threshold=0.1)

Input In [34], in _fisher_test(reference_data, current_data, feature_type, threshold)
      4 """Calculate the p-value of Fisher's exact test between two arrays
      5 Args:
      6     reference_data: reference data
   (...)
     12     test_result: whether the drift is detected
     13 """
     15 contingency_matrix = pd.crosstab(reference_data, current_data)
---> 17 _, p_value = fisher_exact(contingency_matrix)
     19 return p_value, p_value < threshold

File /opt/homebrew/Caskroom/miniconda/base/envs/evidently53/lib/python3.9/site-packages/scipy/stats/_stats_py.py:4278, in fisher_exact(table, alternative)
   4276 c = np.asarray(table, dtype=np.int64)
   4277 if not c.shape == (2, 2):
-> 4278     raise ValueError("The input `table` must be of shape (2, 2).")
   4280 if np.any(c < 0):
   4281     raise ValueError("All values in `table` must be nonnegative.")

ValueError: The input `table` must be of shape (2, 2).

Could you please take a look and support this scenario?

I also suggest to raise a ValueError that will appear if you have more than 2 unique values in the input data and explicitly tell the user that the test should be applied to the binary data only. I think it might be more straightforward comparing to ValueError: The input `table` must be of shape (2, 2). What do you think?

Hi ! @emeli-dral
Thanks for pointing out the the boundary cases I missed, I assumed pandas crosstab handled them but it didn't.
I found some more boundary cases and added them.

Here's a summary of changes I made.

  • Remove crosstab and implement contingency matrix for 2x2
  • Support boundary cases
  • Raise user-friendly error messages when the test and contingency matrix is incorrectly being used
  • Update tests for contingency matrix and fisher's exact

Please let me know if any changes are needed or if I missed something.

Thanks for reviewing !

@emeli-dral emeli-dral self-requested a review October 20, 2022 10:11
@emeli-dral emeli-dral added enhancement New feature or request hacktoberfest Accepted contributions will count towards your hacktoberfest PRs labels Oct 20, 2022
@emeli-dral
Copy link
Contributor

Hi @SangamSwadiK,

Thank you so much for changes you made!
I like a lot how you implemented exact test, and especially your approach for testing not only stattest function but utils as well.

I have a question about corner cases we discussed before.
Here is my example:

_fisher_exact_stattest(pd.Series(["a","a","a","a","a","a","a","a","a","a"]), 
                       pd.Series(["b","b","b","b","b","b","b","b","b","b"]), feature_type='cat', threshold=0.1)
(1.0, False)

I see that there are no fails, which is great! But I'm a little surprised that we do not detect a drift here.
What do you think?

@SangamSwadiK
Copy link
Contributor Author

SangamSwadiK commented Oct 21, 2022

Hi @SangamSwadiK,

Thank you so much for changes you made! I like a lot how you implemented exact test, and especially your approach for testing not only stattest function but utils as well.

I have a question about corner cases we discussed before. Here is my example:

_fisher_exact_stattest(pd.Series(["a","a","a","a","a","a","a","a","a","a"]), 
                       pd.Series(["b","b","b","b","b","b","b","b","b","b"]), feature_type='cat', threshold=0.1)
(1.0, False)

I see that there are no fails, which is great! But I'm a little surprised that we do not detect a drift here. What do you think?

Hi, I think the reason why it is not detecting a drift is because of multiple zeroes in contingency table i.e the table is not a 2x2 , which leads to odds ratio as NaN or inf. This is a limitation for fisher's exact and same thing happens with chi-square for some of the corner cases as well (where pval goes to NaN or the statistic goes to inf).

i.e if there are multiple zeros in the contingency table (in the format below) eg :

[ [ 4, 0],        
  [ 0, 0] ]

(or)

[ [0, 4],
  [0, 0] ]

(or )

[ [0, 0],
  [4, 0] ]

I think these cases have to be handled explicitly with some warning to user stating there are lot of zeros in contingency matrix.

@SangamSwadiK
Copy link
Contributor Author

SangamSwadiK commented Oct 21, 2022

I'm working on it, will update this by end of day.

@SangamSwadiK
Copy link
Contributor Author

SangamSwadiK commented Oct 21, 2022

I think there were some bugs 😅
I think I've covered the corner cases now and added them to tests.

@SangamSwadiK SangamSwadiK force-pushed the fishers_exact branch 2 times, most recently from 88c4575 to a09e386 Compare October 21, 2022 18:45
@emeli-dral
Copy link
Contributor

Hi @SangamSwadiK,

I have a suggestion how to act on it.

Indeed the test itself will not compute in case we have only 1 category in reference, and 1 (different) category in current.

However this might indeed happen in practice when you deal with binary features. From the user standpoint, when this happens, it is something you want to know about. If all values are different between reference and current, we can safely assume that the feature has drifted.

I would suggest to implement this case separately. Instead of returning the value error, we can return the “drift detected” outcome when all values are different (without actually computing the statistical test).

@emeli-dral
Copy link
Contributor

I am also labelling the issues as HF-accepted to acknowledge all the work done!

@SangamSwadiK
Copy link
Contributor Author

SangamSwadiK commented Oct 25, 2022

Hi @emeli-dral !! Apologies to keep tagging you.

I think the tests cover the case when we have 1 category in reference, and 1 (different) category in current. I thought of handling cases where it is different, explicitly, but since the test covered it I stopped there.

I didn't write them explicitly because the p-values vary a lot with different lengths of both reference and current even though they are flipped entirely

This is because of permutation in fisher's test, the p-values are affected by the permutations of the table.
I think the longer the length of data, there are more permutations of the table (as there are multiple ways of rearranging the elements in order to get the same marginals)

Please check the below examples ;

reference = pd.Series(["b", "b", "b", "a", "a"] * 30)
current =   pd.Series(["a", "a", "a", "b", "b"] * 30)
The P-value from fishers test  is  0.000784

reference = pd.Series(["b", "b", "b", "a", "a"] * 10)
current =   pd.Series(["a", "a", "a", "b", "b"] * 10)
The P-value from fishers test  us 0.0713


reference = pd.Series(["b", "b", "b", "a", "a"] * 5)
current =   pd.Series(["a", "a", "a", "b", "b"] * 5)
The P-value from fishers test  is 0.2577

Here  both are entirely different as below : 
reference = pd.Series(["b", "b", "b", "b", "b"] )
current =   pd.Series(["a", "a", "a", "a", "a"] )
The P-value from fisher test is  0.00793

I previously had thought of implementing Hamming Distance or complement to cover the cases explicitly as you just suggested.

Since the p-values differ based on the length of the input arrays.
Because of the above , I hesitated to directly return a p-value of 0 and True for drift detected in the case of completely different Reference and Current.
However I can add the code for handling it explicitly. What do you suggest ?

Thanks for reviewing!!

@emeli-dral
Copy link
Contributor

Hi @SangamSwadiK ,

I checked the latest version and I like it a lot!

"I didn't write them explicitly because the p-values vary a lot with different lengths of both reference and current even though they are flipped entirely" - I agree with your approach. I think it is better to return p-value here, it might be useful for something like drift size comparison, when users need to order features by drift size and act on top of this order.

Thank you a lot for your contribution! I'm happy to merge 🥳


@elenasamuylova this is a very nice example of how pytests should be implemented.

@emeli-dral emeli-dral merged commit 10ea83c into evidentlyai:main Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hacktoberfest Accepted contributions will count towards your hacktoberfest PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants