Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinvent the predict wheel #26

Closed
kendonB opened this issue Jul 28, 2016 · 3 comments
Closed

Reinvent the predict wheel #26

kendonB opened this issue Jul 28, 2016 · 3 comments

Comments

@kendonB
Copy link

kendonB commented Jul 28, 2016

As far as I can tell, the terms object in predict is only used for getting the original x-values, which aren't always wanted. Worse, a terms object will carry around it's entire environment when generated not in .GlobalEnv, which can be terrible for memory when working with big data and/or generating many models. See here https://stat.ethz.ch/pipermail/r-devel/2016-July/072924.html

My wish is that predict methods would allow me to manually pass newdata and only use the terms object when required, so that I can delete this from the original object and not run into these problems of carrying around all this data unnecessarily.

And, your package seems like a great place to fit this in for a variety of model objects.

@leeper
Copy link
Owner

leeper commented Jul 30, 2016

This looks really good. I think the framework I'm currently developing (on the" numDeriv" branch) will accommodate this. Basically, we can take a modelling object, drop all of the irrelevant stuff except what is needed by predict() and then perform calculations from there. Alternatively, if you want to drop everything before passing to margins() (and/or a potential predict2() kind of function), it should be designed to still work correctly in those cases.

The current "master" branch implementation requires the formula (because of using symbolic derivatives), but the new approach (using numerical derivatives) only needs to be able to run predict(), so this should be really easy to make happen.

Thanks for pointing me to that thread!

@leeper leeper self-assigned this Jul 30, 2016
@leeper leeper modified the milestone: Beta Jul 30, 2016
@leeper leeper assigned leeper and unassigned leeper Jul 30, 2016
@kendonB
Copy link
Author

kendonB commented Jul 30, 2016

Have you also seen this package? https://github.com/hadley/modelr

It would also be good to think about implementing with that philosophy in mind.

@leeper
Copy link
Owner

leeper commented Aug 3, 2016

In code currently on numDeriv branch, I remove the data and the terms object's environment before passing things around internally. This seems to have produced some performance gains and reduction in object size. Unfortunately, we do apparently need terms for predict() to behave correctly, so I'll leave this ticket open as a longer-term possibility to completely redo predict() functionality without actually using predict().

@leeper leeper removed this from the Beta milestone Aug 3, 2016
@leeper leeper removed their assignment Aug 3, 2016
@leeper leeper closed this as completed in 5f8330c Sep 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants