Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why didn't you do model ensemble? #26

Closed
xuzhang5788 opened this issue May 6, 2019 · 3 comments
Closed

Why didn't you do model ensemble? #26

xuzhang5788 opened this issue May 6, 2019 · 3 comments

Comments

@xuzhang5788
Copy link

Thank you for sharing your code.
You selected several better models and compare their performance. In the end, you only chose the best model for your final model. I don't know why you didn't ensemble your models to become a better final model. Possibly, this final model performs better than your the best single model.

@agitter
Copy link
Member

agitter commented May 7, 2019

Thanks for commenting @xuzhang5788. The most general answer is that we wanted to be very careful to fully define the models we would use early in the project before training them so that our pipeline would be finalized in advance. We thought this was important to emphasize that our testing truly was prospective. There were several other supervised learning models we could have included, ensembles being a good example, that we left out initially and didn't want to add later once we started training.

After we saw the prospective performance (e.g. Figure 4), it was apparent that ensembles may have helped further boost performance. The single task neural network trained in a regression setting made fairly different top predictions than the other models. That's what we had in mind in this part of the Discussion

In future work, we will explore whether ensembling classification and regression models, potentially in combination with structure-based VS algorithms, can further improve accuracy.

We did in fact do some retrospective testing of ensembles after finalizing the results in the manuscript. However, we did not include that in the manuscript because the manuscript followed the training pipeline and prospective setting we defined in advance. We thought keeping a pure prospective design to experimentally test generalization of these models was more important than anything else. For our ongoing follow up work, we are including ensembles.

@xuzhang5788
Copy link
Author

Thank you so much.

@agitter
Copy link
Member

agitter commented May 7, 2019

Thanks for your interest in our work. Feel free to open more issues if you have other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants