Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation from MERF random effects nomenclature to align with other implementations #14

Closed
dstanner opened this issue Sep 5, 2018 · 2 comments

Comments

@dstanner
Copy link

dstanner commented Sep 5, 2018

First, thank you for creating this package! It is kind of exactly what I am looking for. However, I have some questions that stem from my prior experience with mixed models on other platforms (notably the lme4 package for R), and the documentation and examples aren't helping me translate my knowledge of how random effects are specified/named in MERF.

For example, in lme4, random effects are designated as having slopes and intercepts (which I think correspond to "clusters" and "covariates" in MERF? With 1s in the covariates matrix indicating the intercepts?).

So if "subject" (in an experiment) or "county") like in the radon example are grouping variables over which there are multiple observations, one could specify a random intercept for subject (or county). Then, one could additionally specify a random slope for some variable by the grouping variable (such as to specify a random slope for experimental condition by subject, to allow the model to estimate the variance of how much subjects differ in their response to the experimental manipulation, or that counties could have a random slope for floor, allowing counties to differ in how much each floor impacts random levels in the model). I'm not quite sure if I'm translating these to the MERF nomenclature correctly.

Moreover, lme4 allows the researcher to specify multiple, crossed random effects (e.g., random intercepts for both experimental subjects and experimental items, as well as random slopes for variables by both subject and item).

Getting to the point:

  1. My looking through the examples leads me to think that the clusters argument is the column containing the IDs for which random intercepts are generated: Is this the case?
  2. I get the inclination that the Z matrix includes 1s for the random intercepts, but can include a second column for random slopes (i.e., the covariates): Is this the case? Can there be more columns?

More generally, some comments in the notebooks or documentation about what these variables are (concretely, and what form they must/can/cannot take, and possibly relationships to how random effects are specified in other mixed modeling packages) would be very helpful.

  1. Can MERF handle crossed random effects structures?
  2. Finally, does MERF provide variable importance measures from the fit forest (analogous to those produced in sklearn, and from randomForest and party::cforest in R)? I couldn't find mention of that in the readme or the notebooks.

Thanks!

@resdntalien
Copy link
Contributor

@dstanner Sorry for the super late response to this. You bring up very good points. Some comments covering some (maybe not all) your points:

  1. This adheres to the sklearn model interface as much as possible. That was the goal to make is easily usable in the Python community.

  2. I have implemented this so that Z (the random effects features) can have multiple dimensions. In my notebook examples I've only always made Z a vector of all 1's -- which effectively means we're only allowing random intercepts. You can make random slopes as well by adding in another feature variable, e.g. the floor in the Minnesota radon example. You can put in crossed variables as well -- whatever you put into the feature matrix will be modeled as a random effect. Usually these are also in there as a fixed effect.

And I like your comments I will try to make cleaner notebooks in the upcoming months. Of course, I am going to now put this back on you -- if you want to take a swag at updating some of the notebooks with the clearer nomenclature, please by all means do so and submit a PR. I would really appreciate that.

@eacton
Copy link

eacton commented May 28, 2019

Hi! In a similar vein, I was wondering how to extract information about feature importances, which can be easily accessed with other random forests in python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants