Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add treeple tutorials to website #256

Merged
merged 19 commits into from
Apr 17, 2024
Merged

ENH add treeple tutorials to website #256

merged 19 commits into from
Apr 17, 2024

Conversation

PSSF23
Copy link
Member

@PSSF23 PSSF23 commented Apr 16, 2024

Add treeple tutorials for:

  • single-view MI, pAUC, S@98, p-value
  • multi-view CMI, S@98, p-value
  • gaussian mixture model truth

Open to suggestions and feedback. I realized that some content overlaps with existing tutorials, which could be outdated and should be integrated/removed.

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial simulation needs to be modified after #254 is merged and the change is pushed to pip.

Copy link

codecov bot commented Apr 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.04%. Comparing base (ee400da) to head (27e62b9).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #256   +/-   ##
=======================================
  Coverage   90.04%   90.04%           
=======================================
  Files          54       54           
  Lines        5105     5105           
=======================================
  Hits         4597     4597           
  Misses        508      508           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

examples/a_treeple/treeple_tutorial_pvalue.py Outdated Show resolved Hide resolved
examples/a_treeple/treeple_tutorial_pvalue.py Outdated Show resolved Hide resolved
examples/a_treeple/treeple_tutorial_pvalue.py Outdated Show resolved Hide resolved
examples/a_treeple/treeple_tutorial_pvalue.py Outdated Show resolved Hide resolved
if pvalue < 0.05:
print("The null hypothesis is rejected.")
else:
print("The null hypothesis is not rejected.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very informative about how to compute a p-value, but not sure if we need all of this. I think that point of the tutorial is to show the user how to compute a p-value using the functions already in treeple

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would just include a sentence in the very top of the example stating how to interpret the pvalue and then remove these lines of code

Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: update p-value calculation with native function.

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great start!

I think the summary is:

  • Let's try to tighten up how each example is presented to the user. Assume the user knows very little. I.e. abbreviations should be defined. Basic concepts should be defined.
  • We also maybe want to consider allowing metrics to be public functions so you can just import them? Assuming all public functions either take in a 2D posterior, this is very similar to scikitlearns metrics API. If we need the public functions to take a 3D posterior over trees that's fine too.

examples/a_treeple/README.txt Outdated Show resolved Hide resolved
====================================
Treeple tutorial for calculating CMI
====================================
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a short paragraph describing how we actually do the estimate. And maybe what we will show. It will help users acquaint with the example.

TODO: also link reference to paper when we make preprint online.

examples/a_treeple/treeple_tutorial_CMI.py Outdated Show resolved Hide resolved
# -----------------------


def Calculate_MI(y_true, y_pred_proba):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth making the existing function we have public so you don't need to redefine one here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be best.

examples/a_treeple/README.txt Outdated Show resolved Hide resolved
pAUC = roc_auc_score(y_true, y_pred_proba[:, 1], max_fpr=max_fpr)

pos = np.where(fpr == max_fpr)[0][-1]
plt.fill_between(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to separate plotting and obtaining the statistic? Makes it a bit cleaner to understand from a user perspective

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best if the local function could be hidden. But sphinx doesn't let me have a separate script file.

examples/a_treeple/treeple_tutorial_MI.py Outdated Show resolved Hide resolved
examples/a_treeple/treeple_tutorial_SA98.py Outdated Show resolved Hide resolved
examples/a_treeple/treeple_tutorial_pvalue.py Outdated Show resolved Hide resolved
if pvalue < 0.05:
print("The null hypothesis is rejected.")
else:
print("The null hypothesis is not rejected.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would just include a sentence in the very top of the example stating how to interpret the pvalue and then remove these lines of code

examples/treeple/treeple_tutorial_pvalue_multiview.py Outdated Show resolved Hide resolved
Comment on lines +22 to +24
# As we know the true priors of each class, we can generate a sufficient
# amount of samples to estimate the true posteriors and corresponding
# statistics like *MI*, *pAUC*, and *S@98*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# As we know the true priors of each class, we can generate a sufficient
# amount of samples to estimate the true posteriors and corresponding
# statistics like *MI*, *pAUC*, and *S@98*.
# In many simulation settings, we are interested in knowing the true statistics. Sometimes these statistics
# are not analytically computable, so we demonstrate how to approximate these numerically. Given enough
# samples, these numerical approximations converge to the true statistics.
#
# As we know the true priors of each class, we can generate a sufficient
# amount of samples to estimate the true posteriors and corresponding
# statistics like *MI*, *pAUC*, and *S@98*.

# .. math:: pAUC@r = \frac{100}{100 - r} \int_{T_r}^\infty \int_{\mathcal{X}} \mathbb{I}\{\eta(X_1) > \eta(X_0) \} dF_1 dF_0
#
# With a binary class simulation as an example, this tutorial will show
# how to use ``treeple`` to calculate the statistic with 90% specificity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# how to use ``treeple`` to calculate the statistic with 90% specificity
# how to use ``treeple`` to calculate the AUC statistic at a specified
# 90% specificity

# By computing the p-value using ``treeple``, we can test if :math:`H_0`
# would be rejected, which confirms that X and Y are not independent. The p-value is
# generated by comparing the observed statistic difference with permuted
# differences, using mutual information as an example.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# differences, using mutual information as an example.
# differences, using mutual information as the test statistic in this example.

Comment on lines +235 to +238
if pvalue < 0.05:
print("The null hypothesis is rejected.")
else:
print("The null hypothesis is not rejected.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It could act as a conditional statement for users if they decide to modify the data.

Comment on lines +186 to +189
if pvalue < 0.05:
print("The null hypothesis is rejected.")
else:
print("The null hypothesis is not rejected.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@sampan501 sampan501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, pending the larger changes in #257

@PSSF23 PSSF23 merged commit 964dd25 into main Apr 17, 2024
28 of 29 checks passed
@PSSF23 PSSF23 deleted the treeple_tutorials branch April 17, 2024 16:39
Copy link
Member Author

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll incorporate the comments in a new PR. Feel free to comment under #257 as well if you have new ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants