Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Coefficients" for Interpretation #802

Closed
PhilippPro opened this issue Mar 23, 2016 · 5 comments
Closed

"Coefficients" for Interpretation #802

PhilippPro opened this issue Mar 23, 2016 · 5 comments

Comments

@PhilippPro
Copy link
Member

I recently talked with a social scientist who wants to use Machine Learning methods in the social sciences. He created his own package for the interpretation of results of ML-algorithms (https://github.com/umasds/mlame) and I told him about mlr and partial dependence plots.

What he also wanted were some kind of coefficients and he also implemented something like average marginal effects (AME).
I thought about implementing measures like AME for mlr. What do you think about it? Unnecessary or interesting?

@zmjones
Copy link
Contributor

zmjones commented Mar 23, 2016

I think that entirely misses the utility of ML. If you want things like
that just fit a simple GLM. The whole point of using ML for explanatory
purposes is to discover unexpected patterns.
On Wed, Mar 23, 2016 at 10:53 AM Philipp Probst notifications@github.com
wrote:

I recently talked with a social scientist who wants to use Machine
Learning methods in the social sciences. He created his own package for the
interpretation of results of ML-algorithms (
https://github.com/umasds/mlame) and I told him about mlr and partial
dependence plots.

What he also wanted were some kind of coefficients and he also implemented
something like average marginal effects (AME).
I thought about implementing measures like AME for mlr. What do you think
about it? Unnecessary or interesting?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#802

@PhilippPro
Copy link
Member Author

Ok, this is a good argument.
A counter-argument to this is, that maybe you want the coefficients of "your" model and maybe the glm cannot get the direct relationships as well as your model.
And coefficients could be interesting if you want statements like, "when I increase x by 1, y increases by ...".

@zmjones
Copy link
Contributor

zmjones commented Mar 24, 2016

I am not sure what you mean. Can you elaborate?

If you want a summary of the marginal relationship between a feature and the target you can do this by doing partial predictions or some other means of projecting the learned function to an appropriately low dimension. This is already an approximation to the learned function. For certain kinds of functions this decomposition works ok but summarizing things using a single number like this would only work if the learned function happens to be linear which is basically never in my experience.

I plan on adding additional methods when I get time (part of my dissertation) that allow more flexible sorts of approximations to the learned function (functional anova particularly) and that would allow you to, for example, find the best additive decomposition of a learned function. Nonetheless I think this entirely misses the usefulness of ML for social scientists, which is to discover patterns. Unless you have a by design reason to impose a linear structural model (and I don't think theory works for this purpose) I see no reason to simplify the function you learned beyond what is necessary for it to be human interpretable.

@PhilippPro
Copy link
Member Author

I am not sure what you mean. Can you elaborate?

If you want a summary of the marginal relationship between a feature and the target you can do this by doing partial predictions or some other means of projecting the learned function to an appropriately low dimension. This is already an approximation to the learned function. For certain kinds of functions this decomposition works ok but summarizing things using a single number like this would only work if the learned function happens to be linear which is basically never in my experience.

I think you are mainly right. It just could be some sort of simplification of what you see in the partial prediction plot, but I am myself also not very convinced, that it is really necessary or useful. E.g. you could make just a new prediction/use the partial prediction plot if you want to know what happens if x increases by 1.

@berndbischl
Copy link
Member

the name of the social scienties guy is arne. zach and I are talking to him anyhow and we might do this in mlr. will close this now.

@PhilippPro
If you want to be kept in the loop, complain to me in person, and you will be kept in loop :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants