ENH add treeple tutorials to website #256

PSSF23 · 2024-04-16T17:45:22Z

Add treeple tutorials for:

single-view MI, pAUC, S@98, p-value
multi-view CMI, S@98, p-value
gaussian mixture model truth

Open to suggestions and feedback. I realized that some content overlaps with existing tutorials, which could be outdated and should be integrated/removed.

PSSF23

The tutorial simulation needs to be modified after #254 is merged and the change is pushed to pip.

examples/treeple/treeple_tutorial_pvalue.ipynb

codecov · 2024-04-16T18:48:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.04%. Comparing base (ee400da) to head (27e62b9).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #256   +/-   ##
=======================================
  Coverage   90.04%   90.04%           
=======================================
  Files          54       54           
  Lines        5105     5105           
=======================================
  Hits         4597     4597           
  Misses        508      508

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

examples/a_treeple/treeple_tutorial_pvalue.py

sampan501 · 2024-04-16T20:41:44Z

examples/a_treeple/treeple_tutorial_pvalue.py

+if pvalue < 0.05:
+    print("The null hypothesis is rejected.")
+else:
+    print("The null hypothesis is not rejected.")


This is very informative about how to compute a p-value, but not sure if we need all of this. I think that point of the tutorial is to show the user how to compute a p-value using the functions already in treeple

Yes I would just include a sentence in the very top of the example stating how to interpret the pvalue and then remove these lines of code

PSSF23

TODO: update p-value calculation with native function.

adam2392

Overall great start!

I think the summary is:

Let's try to tighten up how each example is presented to the user. Assume the user knows very little. I.e. abbreviations should be defined. Basic concepts should be defined.
We also maybe want to consider allowing metrics to be public functions so you can just import them? Assuming all public functions either take in a 2D posterior, this is very similar to scikitlearns metrics API. If we need the public functions to take a 3D posterior over trees that's fine too.

examples/a_treeple/README.txt

adam2392 · 2024-04-17T02:49:17Z

examples/a_treeple/treeple_tutorial_CMI.py

+====================================
+Treeple tutorial for calculating CMI
+====================================
+"""


We should add a short paragraph describing how we actually do the estimate. And maybe what we will show. It will help users acquaint with the example.

TODO: also link reference to paper when we make preprint online.

examples/a_treeple/treeple_tutorial_CMI.py

adam2392 · 2024-04-17T02:50:58Z

examples/a_treeple/treeple_tutorial_CMI.py

+# -----------------------
+
+
+def Calculate_MI(y_true, y_pred_proba):


Is it worth making the existing function we have public so you don't need to redefine one here?

That would be best.

examples/a_treeple/README.txt

adam2392 · 2024-04-17T02:53:47Z

examples/a_treeple/treeple_tutorial_GMM.py

+    pAUC = roc_auc_score(y_true, y_pred_proba[:, 1], max_fpr=max_fpr)
+
+    pos = np.where(fpr == max_fpr)[0][-1]
+    plt.fill_between(


Do we want to separate plotting and obtaining the statistic? Makes it a bit cleaner to understand from a user perspective

It would be best if the local function could be hidden. But sphinx doesn't let me have a separate script file.

examples/a_treeple/treeple_tutorial_MI.py

examples/a_treeple/treeple_tutorial_SA98.py

examples/a_treeple/treeple_tutorial_pvalue.py

adam2392 · 2024-04-17T02:56:34Z

examples/a_treeple/treeple_tutorial_pvalue.py

+if pvalue < 0.05:
+    print("The null hypothesis is rejected.")
+else:
+    print("The null hypothesis is not rejected.")


Yes I would just include a sentence in the very top of the example stating how to interpret the pvalue and then remove these lines of code

examples/treeple/treeple_tutorial_pvalue_multiview.py

adam2392 · 2024-04-17T14:22:01Z

examples/treeple/treeple_tutorial_GMM.py

+# As we know the true priors of each class, we can generate a sufficient
+# amount of samples to estimate the true posteriors and corresponding
+# statistics like *MI*, *pAUC*, and *S@98*.


Suggested change

# As we know the true priors of each class, we can generate a sufficient

# amount of samples to estimate the true posteriors and corresponding

# statistics like *MI*, *pAUC*, and *S@98*.

# In many simulation settings, we are interested in knowing the true statistics. Sometimes these statistics

# are not analytically computable, so we demonstrate how to approximate these numerically. Given enough

# samples, these numerical approximations converge to the true statistics.

#

# As we know the true priors of each class, we can generate a sufficient

# amount of samples to estimate the true posteriors and corresponding

# statistics like *MI*, *pAUC*, and *S@98*.

adam2392 · 2024-04-17T14:23:45Z

examples/treeple/treeple_tutorial_pAUC.py

+# .. math:: pAUC@r = \frac{100}{100 - r} \int_{T_r}^\infty \int_{\mathcal{X}} \mathbb{I}\{\eta(X_1) > \eta(X_0) \} dF_1 dF_0
+#
+# With a binary class simulation as an example, this tutorial will show
+# how to use ``treeple`` to calculate the statistic with 90% specificity


Suggested change

# how to use ``treeple`` to calculate the statistic with 90% specificity

# how to use ``treeple`` to calculate the AUC statistic at a specified

# 90% specificity

adam2392 · 2024-04-17T14:24:25Z

examples/treeple/treeple_tutorial_pvalue.py

+# By computing the p-value using ``treeple``, we can test if :math:`H_0`
+# would be rejected, which confirms that X and Y are not independent. The p-value is
+# generated by comparing the observed statistic difference with permuted
+# differences, using mutual information as an example.


Suggested change

# differences, using mutual information as an example.

# differences, using mutual information as the test statistic in this example.

adam2392 · 2024-04-17T14:26:21Z

examples/treeple/treeple_tutorial_pvalue_multiview.py

+if pvalue < 0.05:
+    print("The null hypothesis is rejected.")
+else:
+    print("The null hypothesis is not rejected.")


Should we remove these LOC? https://github.com/neurodata/scikit-tree/pull/256/files/2c487755d8a0827b7344b2bee6da8eabef1d400e#r1567919332

I don't think so. It could act as a conditional statement for users if they decide to modify the data.

adam2392 · 2024-04-17T14:26:29Z

examples/treeple/treeple_tutorial_pvalue.py

+if pvalue < 0.05:
+    print("The null hypothesis is rejected.")
+else:
+    print("The null hypothesis is not rejected.")


Should we remove these LOC? https://github.com/neurodata/scikit-tree/pull/256/files/2c487755d8a0827b7344b2bee6da8eabef1d400e#r1567919332

Co-authored-by: Adam Li <adam2392@gmail.com>

sampan501

Looks good to me, pending the larger changes in #257

PSSF23

I'll incorporate the comments in a new PR. Feel free to comment under #257 as well if you have new ideas.

ENH add treeple tutorials

37b2714

PSSF23 requested review from SUKI-O, adam2392, sampan501 and YuxinB April 16, 2024 17:45

PSSF23 added 2 commits April 16, 2024 14:04

ENH update the notebook cells

8105e34

FIX modify import order

2cfc182

PSSF23 commented Apr 16, 2024

View reviewed changes

YuxinB reviewed Apr 16, 2024

View reviewed changes

examples/treeple/treeple_tutorial_pvalue.ipynb Outdated Show resolved Hide resolved

YuxinB approved these changes Apr 16, 2024

View reviewed changes

PSSF23 added 2 commits April 16, 2024 14:25

DOC optimize the text for tree permutations

9f9d12c

DOC add docstrings to local functions

7975f0b

PSSF23 added 6 commits April 16, 2024 14:52

FIX correct spelling

7997856

ENH attempt to add notebook support

f8bde5b

TST try p-value example

c9dab6a

STY fix style

e0b0d06

FIX remove unnecessary file

537eb41

FIX update script structure

2c48775

sampan501 requested changes Apr 16, 2024

View reviewed changes

PSSF23 added 2 commits April 16, 2024 17:04

DOC upload other converted tutorials

aae4e9b

DOC correct spelling

315664d

PSSF23 commented Apr 16, 2024

View reviewed changes

PSSF23 added 3 commits April 16, 2024 17:12

FIX correct import & spelling

8a46f4c

FIX add local function

f501492

STY update black formatting

49b99d6

adam2392 requested changes Apr 17, 2024

View reviewed changes

DOC move to original folder & address comments

e52e1e1

adam2392 reviewed Apr 17, 2024

View reviewed changes

Update examples/treeple/treeple_tutorial_pvalue_multiview.py

96adab4

Co-authored-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into treeple_tutorials

27e62b9

sampan501 self-requested a review April 17, 2024 16:16

sampan501 mentioned this pull request Apr 17, 2024

ENH treeple tutorials should use native functions in scikit-tree #257

Open

sampan501 approved these changes Apr 17, 2024

View reviewed changes

PSSF23 merged commit 964dd25 into main Apr 17, 2024
28 of 29 checks passed

PSSF23 deleted the treeple_tutorials branch April 17, 2024 16:39

PSSF23 commented Apr 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH add treeple tutorials to website #256

ENH add treeple tutorials to website #256

PSSF23 commented Apr 16, 2024

PSSF23 left a comment •

edited

codecov bot commented Apr 16, 2024 •

edited

sampan501 Apr 16, 2024

adam2392 Apr 17, 2024

PSSF23 left a comment

adam2392 left a comment

adam2392 Apr 17, 2024

adam2392 Apr 17, 2024

PSSF23 Apr 17, 2024

adam2392 Apr 17, 2024

PSSF23 Apr 17, 2024

adam2392 Apr 17, 2024

adam2392 Apr 17, 2024

adam2392 Apr 17, 2024

adam2392 Apr 17, 2024

adam2392 Apr 17, 2024

PSSF23 Apr 17, 2024

adam2392 Apr 17, 2024

sampan501 left a comment

PSSF23 left a comment

		# -----------------------


		def Calculate_MI(y_true, y_pred_proba):

	# how to use ``treeple`` to calculate the statistic with 90% specificity
	# how to use ``treeple`` to calculate the AUC statistic at a specified
	# 90% specificity

	# differences, using mutual information as an example.
	# differences, using mutual information as the test statistic in this example.

ENH add treeple tutorials to website #256

ENH add treeple tutorials to website #256

Conversation

PSSF23 commented Apr 16, 2024

PSSF23 left a comment • edited

Choose a reason for hiding this comment

codecov bot commented Apr 16, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PSSF23 left a comment

Choose a reason for hiding this comment

adam2392 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sampan501 left a comment

Choose a reason for hiding this comment

PSSF23 left a comment

Choose a reason for hiding this comment

PSSF23 left a comment •

edited

codecov bot commented Apr 16, 2024 •

edited