Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor nb/nhanes survival model #3395

Merged
merged 7 commits into from
Feb 23, 2024

Conversation

CloseChoice
Copy link
Collaborator

Overview

Works towards #3036

Description of the changes proposed in this pull request:

  • fix a couple of typos
  • correct column names for interaction value calculation
  • add a description to the calculation of the interaction matrix

Note that there are a couple more warnings now, since I use xgboost 2.1.2 which deprecates the file format we are using. But this is an issue with a linked PR and will be fixed once this PR is merged.

Checklist

  • All pre-commit checks pass.
  • [ ] Unit tests added (if fixing a bug or adding a new feature)

@connortann
Copy link
Collaborator

connortann commented Nov 29, 2023

Thanks for the PR! I have a few suggestions.

Unfortunately I can't comment on the diff, so I'll leave a few suggestions here as screenshots.

  • This might no longer be true:
    image

  • Suggest moving this comment to its own line to avoid the unusual formatting:
    image

  • Suggest removing this, either by clearing that cell output or editing the JSON directly:
    image

  • This long paragraph could be broken up into multiple paragraphs to help readability:
    image

  • Same comment about the warning:
    image

  • May as well move these imports to the top:
    image

  • The final image has much many more columns than before, which I think is less readable. Can we recreate the previous image?
    image

Copy link
Collaborator

@connortann connortann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggested changes as above

@connortann connortann added the documentation Relating to readthedocs, notebooks, and exposition in docstrings label Nov 29, 2023
@connortann
Copy link
Collaborator

Hi @CloseChoice - I was just going through the PR backlog and saw this one. Are you happy to take another look at the touchups above? Or if you'd prefer, I can finish it off if you're stuck into other stuff at the moment.

@CloseChoice
Copy link
Collaborator Author

Hi @CloseChoice - I was just going through the PR backlog and saw this one. Are you happy to take another look at the touchups above? Or if you'd prefer, I can finish it off if you're stuck into other stuff at the moment.

Thanks for coming back to this. Will have a look sometime this week

@CloseChoice
Copy link
Collaborator Author

@connortann I added the requested changes. The dataset seems to contain more columns now but I limited the confusion matrix to only the 12 with the highest shap values.

Copy link
Collaborator

@connortann connortann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Exact Explainer" notebook was changed, is that intentional? It now has an exception:

Other than that, looks great! Thank you for making the changes I suggested above.

A minor observation, I notice the dependence plot of Red Blood Cells looks quite different. I wonder if the source data changed at some point: there seem to be values of red_blood_cells_unacceptable which might indicate "bad" measurements which were originally dropped from the dataset? I think the change is fine though, I don't think we need to reproduce the original images exactly.

image

@CloseChoice
Copy link
Collaborator Author

The "Exact Explainer" notebook was changed, is that intentional? It now has an exception:

Other than that, looks great! Thank you for making the changes I suggested above.

A minor observation, I notice the dependence plot of Red Blood Cells looks quite different. I wonder if the source data changed at some point: there seem to be values of red_blood_cells_unacceptable which might indicate "bad" measurements which were originally dropped from the dataset? I think the change is fine though, I don't think we need to reproduce the original images exactly.

image

Good catch. I removed the changes in the exact notebook but didn't look into the changed plot. I would suspect that it has to do with changes in the underlying data but wouldn't spend more time on that. Thanks for the review ;)

@connortann connortann merged commit bbb2b5e into shap:master Feb 23, 2024
7 checks passed
@CloseChoice CloseChoice deleted the refactor-nb/nhanes-survival-model branch February 23, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Relating to readthedocs, notebooks, and exposition in docstrings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants