-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure: Statistical Techniques #106
Comments
This is a tentative sketch for the figure depicting key takeaways from the newly rewritten "Manage model complexity" section. cc: @jaclyn-taroni @allaway @cgreene We can use this as a starting point for the final figure for this section based on your comments. Happy to modify as needed. |
|
I spilt this up to individual methods for the purposes of illustrating them. We can combine them once we are happy with how the individual methods are presented. They are very generic. As we iterate, we can make them more specific. BootstrappingEnsemble LearningI have annotated my comments. RegularizationFor this I used the same illustration as dimension reduction. It seemed to me that both are reducing feature space and that was what we wanted to show. Also, someone (maybe Robert? not sure) mentioned on the call today about having two levels of abstraction for #116 and maybe here is a good place to present a more abstracted figure? One-class-at-a timeI have annotated my comments. |
Thanks @dvenprasad ! Below are my first thoughts:
|
I agree with @jaybee84 here- and important to note that the sampling is with replacement (at least for all of the bootstrapping i'm familiar with....:) ) so you typically end up with replicates of some samples for every "bootstrap". |
Bootstrapping: Ensemble learning Model A has learned from a rectangle box and Model B has learned from a cube. So is Regularization taking these two models and ranking them? So the end result would be that Model B is ranked higher than Model A? |
Okay, took another pass at bootstrapping and ensemble learning after Monday's discussion. Bootstrapping Couple of notes:
Ensemble Learning I've kept the average health of the modes similar across the 3 runs i.e 2 good health model and 1 poor health model. But my question is can you have 3 average-ish models and still get a good health combined model or have 2 poor health model and 1 good health model and get a good combined health model? Do we want to show that kind of variation? |
re ensemble modeling:
Here's an example of actually combining models: The first box is our "best" model alone. The 2nd is the "best + 2nd best", the 3rd is "best + 2nd best + 3rd best" and so on.... The blue and yellow boxes are anything that are substantially "better health" than the the best model alone. The red boxes are statistically indistinguishable from the best model alone. So, if we combine 'good health' models with each other, the resulting ensemble is better, but if we start adding too many models in that are of 'poor health' we actually don't get a model that's better than the sum of it's parts. I'm not sure if this is worth conveying in the figure, but I think that it might be beyond the scope of this manuscript because it's probably variable based on the problem, and I doubt this is specific to rare disease modeling. Also, I'm a little unsure of what the "runs" indicate? This looks like an ensemble of ensembles (which is still an ensemble, but might be unnecessary to convey the concept?) |
Re: bootstrap suggest changing "aggregate" to "harmonize" :) |
@dvenprasad I added a rough sketch re: ensemble learning in the above comment. Please let me know if you cannot view it or have questions. |
I like the above figure... just a few notes:
|
Changed yellow->purple because the contrast was really bad with white Bootstrapping Ensemble Learning Regularization One class at a time
|
It would be grand to get a figure on the statistical techniques we discuss and to link those to how they address challenges in the rare disease space.
The text was updated successfully, but these errors were encountered: