Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: insight: Easy Access to Model Information for Various Model Objects #1412

Open
whedon opened this issue Apr 25, 2019 · 24 comments

Comments

Projects
None yet
4 participants
@whedon
Copy link
Collaborator

commented Apr 25, 2019

Submitting author: @strengejacke (Daniel Lüdecke)
Repository: https://github.com/easystats/insight
Version: 0.2.1
Editor: @karthik
Reviewer: @alexpghayes
Archive: Pending

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831"><img src="http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831/status.svg)](http://joss.theoj.org/papers/30851fde6e76f138423f8f1fd9082831)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@alexpghayes, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @karthik know.

Please try and complete your review in the next two weeks

Review checklist for @alexpghayes

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: Does the release version given match the GitHub release (0.2.1)?
  • Authorship: Has the submitting author (@strengejacke) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 25, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @alexpghayes it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 25, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 25, 2019

@karthik

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

Thanks again @alexpghayes! Be sure to unwatch the repo, and then work through the list. You can build the pdf anytime with @whedon generate pdf as a comment here.

@strengejacke

This comment has been minimized.

Copy link

commented May 7, 2019

We're planning to submit a package-update to CRAN, which would change the version number. The changes won't affect the paper or review-process, but should we update the version-number here somehow?

@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented May 7, 2019

Review

This review is of the Github master branch on 2019-05-07.

insight is a well-written and well-documented package. Reading both the source, pkgdown website and tests was a pleasure. The goal of insight is to reduce the complexity of working with model objects by providing an easy and intuitive interface to abstracting information from model objects.

My comments are largely about this interface. I think about this primarily in terms of complexity. In some sense insight is competing with the original modeling packages -- if the interface to insight is: (1) simpler than fiddling with the model objects, and (2) consistent across model objects, then using insight reduces the overall complexity of interacting with model objects.

Interface complexity

My primary concern is that insight has a complicated interface. The homepage outlines the interface, which consists of:

  • find_algorithm()

  • find_formula()

  • find_variables()

  • find_terms()

  • get_data()

  • get_priors()

  • get_variance()

  • find_parameters() / get_parameters()

  • find_predictors() / get_predictors()

  • find_random() / get_random()

  • find_response() / get_response()

With such a broad interface, I had a hard time figuring out the entry point. Some questions I asked myself:

  • Where should I start? After some digging it seems like model_info() is the key function that returns all the information insight provides at once.

  • What is the difference between get_*() and find_*()?

I strongly recommend shrinking this interface. In particular

  • find_parameters() / get_parameters()
  • find_predictors() / get_predictors()
  • find_random() / get_random()
  • find_response() / get_response()

providing nearly identical functionality. Also, I struggled to understand the difference between:

  • variables and predictors and response
  • terms and formula

This meant I had to read the documentation for each of these methods before figuring out which one did what I wanted. These distinctions should be explained clearly on the front page of the pagedown website. The diagram in the JOSS paper helps somewhat but still needs an accompanying narrative.

For additional discussion of interface design and hiding complexity, I highly recommend A philosophy of software design by Ousterhoust, which I've been enjoying recently.

Consistency

While the documentation is very clear about function returns, I had difficulty understanding why behavior differs from different model objects.

  • In find_parameters(): aov objects fit linear models with conditional and random terms, but instead this information is stored in between and within fields of the return

  • In find_algorithm(): the object return differs from different object types, so I still would need to check which elements are in this list before trying to access them. Also, I belive the term algorithm should be replaced by estimator, which is more appropriate. Also, for Bayesian models, describing the algorithm as "sampling" is too vague.

  • In find_formula(): how is random different from correlation? How is slopes different from conditional? I would have grouped these together. Being somewhat familiar with the model objects under consideration, my impression is that the returns here reflect choices that made the implementation easy for developers, but that do not necessarily provide a generalized interface for users.

On the whole, I get the impression that model components that I conceive of as conceptually similar often end up in different parts of a return object. If I need to understand how get_parameters() behaves for both lm and brm models, it's unclear how beneficial get_parameters() is.

Miscellenea

Examples: The documentation shows how a user can extract information from a model, but it doesn't show how the user can use this information to solve larger problems. Some applications showing how the various functions operate together (and how they are distinct) would be greatly beneficial.

  • /R is organized by method but /test is organized by model
  • The homepage and README should have links to information about contributing

Overall impression

My initial impression is that using insight is about as complicated as understanding a model object itself. I think the package needs narrative explaining at a conceptual level how it generalizes across models, and return objects with fewer special cases. Both of these are very difficult tasks due to the heterogeneity of models. insight may benefit from narrowing in on particular families of models. For instance, it seems like most existing functionality is targeted at GLMMs.

@strengejacke

This comment has been minimized.

Copy link

commented May 7, 2019

@alexpghayes Thanks for the detailed and comprehensive review! I think your comments are very helpful to make the package more accessible to users. We'll try to address your issues point-by-point and respond here once we think we have a revised version that is sufficient for re-reviewing again.

@karthik

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

@strengejacke We can fix your version before this review is completed.

@strengejacke

This comment has been minimized.

Copy link

commented Jun 4, 2019

Just giving a short sign that we work on the revisions, there was some delay due to other deadlines and tasks.

@karthik

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2019

Thanks @strengejacke. We appreciate the update.

@strengejacke

This comment was marked as outdated.

Copy link

commented Jun 5, 2019

@whedon generate pdf

Just to check if the revised paper looks ok, a short in-between-built of the PDF...

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 5, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 5, 2019

PDF failed to compile for issue #1412 with the following error:

Error producing PDF.
! Missing number, treated as zero.

\futurelet
l.429 ...n the expression \texttt{x\ +\ I(x^{}2)}

Looks like we failed to compile the PDF

@strengejacke

This comment was marked as resolved.

Copy link

commented Jun 5, 2019

@karthik Any ideas why compilation fails? It seems to be a code-block in markdown (x + I(x^2)) which causes the error, although when I build the PDF locally, it works fine.

@strengejacke

This comment has been minimized.

Copy link

commented Jun 5, 2019

Dear @alexpghayes,

on behalf of the co-authors, let me thank you again for your thorough comments on our submission. We substantially revised our paper, the package documentation and the accompanying website, and revised package code and functions as suggested. Let me give you a more detailed response to your suggestions and our revisions.

Interface complexity: My primary concern is that insight has a complicated interface. With such a broad interface, I had a hard time figuring out the entry point. Where should I start?

Thank you for this comment. We have revised the paper accordingly, making it more clear for users where they can start. We adapted these revisions both to README of the package’s front page (https://github.com/easystats/insight), as well as the online documentation (https://easystats.github.io/insight/articles/insight.html).

What is the difference between get_() and find_()?

We have added more details in the paper, to the README and the online documentation to make this essential distinction clearer to users. Furthermore, in addition with the figure showing how package functions are related to model components (Figure 1 in the paper), it should now be easier to grasp the differences between the get- and the find-functions.

I strongly recommend shrinking this interface.

See our previous point. In revising the paper and the online documentation, the need for both get- and find-functions should be clearer now. Since it is not possible to extract parameter names for specific model parts (like random or fixed effects, zero-inflated or conditional component) without additional effort when you have the data from one of the get-functions, and vice versa, getting the underlying data once you have the names of parameters or predictors, we see the need for having both find- and get-functions. There are several use cases where either one is needed, and getting the required information or data easily is one of the core aspects of the insight package.

Also, I struggled to understand the difference between: variables and predictors and response, terms and formula. This meant I had to read the documentation for each of these methods before figuring out which one did what I wanted. These distinctions should be explained clearly on the front page of the pagedown website. The diagram in the JOSS paper helps somewhat but still needs an accompanying narrative.

Thanks for pointing this out. We have added a complete new section on the definition of model components to the paper. Furthermore, we added two figures that help making it more clear what the differences between these terms are, and why it is necessary to have these distinctions. We also added the new section to the README of the package’s front page, as well as to the online-documentation (including the new figures).

While the documentation is very clear about function returns, I had difficulty understanding why behavior differs from different model objects. In find_parameters(): aov objects fit linear models with conditional and random terms, but instead this information is stored in between and within fields of the return

We agree with this point, and revised affected package functions accordingly. Now, there are less exceptions, making it more clear and expectable for users what will be returned. Regarding your particular example, the functions now return the elements “conditional” and “random”, being in line with the return values of other model objects.

In find_algorithm(): the object return differs from different object types, so I still would need to check which elements are in this list before trying to access them. Also, I belive the term algorithm should be replaced by estimator, which is more appropriate. Also, for Bayesian models, describing the algorithm as "sampling" is too vague.

We agree with this point. We implemented this function, knowing that it currently is still a bit “work in progress”, and there is an open GitHub issue on this (easystats/insight#38). So this is something we plan to address, however, we still need and want to ask other people with correspondent expertise for their opinion in order to adequately address this issue.

In find_formula(): how is random different from correlation? How is slopes different from conditional? I would have grouped these together.

This is a design decision where we think it is not easily possible to find a closing decision. Our idea was to reflect the different parts of a model formula in the returned elements from the function. “lme()”, for instance, can have both a random-argument and a correlation-argument, and we wanted to preserve this distinction. For “feis”-objects (fixed effects regressions with individual slopes), we have a complicated formula structure with several components. Again, we wanted to preserve the distinction between these parts of the model formula. However, we are happy to reduce interface complexity here and are open for concrete suggestions that help users and developers.

Examples: The documentation shows how a user can extract information from a model, but it doesn't show how the user can use this information to solve larger problems. Some applications showing how the various functions operate together (and how they are distinct) would be greatly beneficial.

Thanks for this suggestion. We have added a new section with examples of practical use cases (in different R packages) to our paper.

/R is organized by method but /test is organized by model

This is due to performance-reasons. Since each function should work for (almost) every model object, for each test we would need to fit all the models each time again, if tests were organized by models. To reduce computational time, especially for CRAN, we decided to organize tests differently that the code-files in the R-folder.

The homepage and README should have links to information about contributing

We have added explicit information about information, support and contributing to the README and website.

I think the package needs narrative explaining at a conceptual level how it generalizes across models, and return objects with fewer special cases.

We have largely revised the paper, README and website (online documentation), providing a clearer narrative (also at the conceptual level) in order to make the idea and functioning of the package clearer for users.

insight may benefit from narrowing in on particular families of models. For instance, it seems like most existing functionality is targeted at GLMMs.

One of the core aims of the insight package is to provide a unified syntax to access model information. Since this information is not easily accessible for many model objects, the idea is to work with as many models as possible. Against the background, it would run against the nature of this package to narrow the scope of supported models. We have added a section in the paper showing examples of use cases and hope that these examples explain why it makes sense to support many different models.

Again, thank your for this valuable review! We hope that we have addressed all your points sufficiently and are happy to receive your feedback on our revision.

Best
Daniel, on behalf of the co-authors

@strengejacke

This comment was marked as resolved.

Copy link

commented Jun 5, 2019

@whedon generate pdf

If the paper.md does not yet compile, we probably can use a temporary PDF?
Here is the link: https://github.com/easystats/insight/blob/master/paper/paper.pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 5, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 5, 2019

PDF failed to compile for issue #1412 with the following error:

Error producing PDF.
! Missing number, treated as zero.

\futurelet
l.435 expression \texttt{x\ +\ I(x\ ^{}\ 2)}

Looks like we failed to compile the PDF

@strengejacke

This comment has been minimized.

Copy link

commented Jun 5, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 5, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 5, 2019

@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented Jun 16, 2019

Response to revisions

Response based on comments by @strengejacke above and the Github master branch on 2019-06-16.

I'm impressed by the changes you've made, especially to the documentation. After reading more carefully, I agree with your comments that the broad interface is warranted.

Something clicked for me when I realized that the importance of differentiating between names and values. Some functions return the names of columns of data, some functions return values of columns of data. The difference is especially clear with get_weights() and find_weights(). I believe get_*() and find_*() are an attempt to make this differentiation, but I am now convinced that a better approach is to give functions names like the following:

  • find_weights() -> weights_name()
  • get_weights() -> weights_values()

I suspect users will have an easier time with this sort of naming strategy and encourage you to consider this scheme, or something similar.

Documentation

I appreciated the new definitions of the various components of a model. Instead of presenting this material all at once, I recommend you group the definitions by components that are easy to confuse and set these groups visually apart.

Note that you have chosen to define terms and variables in a non-standard way for the R community. In an R model formula an entire chunk such as I(x^2) is referred to as a term, rather than a variable. I recommend following this definition, which amounts to flipping the current usage of term and variable.

Finally, I recommend grouping get_*() and find_*() together on the pkgdown site in the Reference section. That is, you want get_parameters() and find_parameters() (or parameter_names() and parameter_values()) to appear next to each other.

Miscellenea

  • I appreciate that there are fewer special cases in the returns.

  • The test file organization makes sense, I'm on board.

  • It makes me nervous that the example at https://easystats.github.io/insight/reference/clean_parameters.html warns about overwriting quosure methods. Can you comment on why this is happening and/or necessary?

  • When is clean_names() different from find_terms()? I would include examples so that users can tell when to use which function.

  • I appreciate that you've linked to packages that use insight. However, this is not especially useful to users -- instead of demonstrating how to solve a specific problem with insight, the user must now go read an entire package on their own, find a usage of insight, and infer the problem that is being solved. I would construct explicit, self-contained examples showing idiomatic usage. If these are inspired by the linked packages, fantastic, but merely linking to another code base isn't that helpful.

  • I recommend keeping the original paper title insight: Easy Access to Model Information for Various Model Objects rather than insight: A Unified Syntax for Accessing Model Information for Various Model Objects in R. Insight provides a unified interface, but does not introduce any new syntax, which is a language level feature. Domain specific languages such as dplyr and data.table introduce new syntax (i.e. := and !!) via metaprogramming tricks, but insight does not and makes use of standard evaluation and syntax.

  • I would rename all_models_equal() to all_models_have_same_class(). The current name makes it seem like the all_models_equal() is also checking whether the parameter values match up.

  • Instead of providing a new generic n_obs() I would provide methods for the existing generic nobs().

  • I am about to respond with additional thoughts on find_algorithm() in easystats/insight#38.

@strengejacke

This comment has been minimized.

Copy link

commented Jun 17, 2019

@alexpghayes Thanks for your review! Since some of your remaining suggestions mean quite some substantial rework or breaking changes, I would like to clarify some points and suggest how we would address your issues, before we actually start working on it. Could you please give a short feedback if our planned steps are ok for you?

  1. Rename find_weights() -> weights_name() and get_weights() -> weights_values()

Since insight is part of the larger "easystats" project, we have developed some coding guidelines for our packages (https://github.com/easystats/easystats#convention-of-code-style). Your suggestion would break these guidelines, so for consistency within and across packages, we would prefer to keep the current names. However, we may add aliases to the existing functions.

  1. Definition of "term" and "variable".

I'm not sure about this, I could not find clear definitions yet (if you have some references/links, would be happy if you could share them). However, since these functions are currently only used by one package that depends on insight, we could flip the usage of term and variable here.

  1. Finally, I recommend grouping get_() and find_() together on the pkgdown site in the Reference section.

I'm not sure if this is possible with some pkgdown options, I think that we would have to combine the functions rd-files, so both ?find_parameters and ?get_parameters would appear in the same help-page. Would you suggest going this route?

  1. It makes me nervous that the example at https://easystats.github.io/insight/reference/clean_parameters.html warns about overwriting quosure methods. Can you comment on why this is happening and/or necessary?

This seems to be new in R 3.6 (at least, since then it started appearing for me), and once namespaces are referred to, R warns about overwritten S3-methods. ggplot2 depends on rlang, afaik, and it seems ggplot2 overwrites some S3-methods from rlang (or vice versa). In short, this issues seems unrelated to insight.

  1. I appreciate that you've linked to packages that use insight. However, this is not especially useful to users -- instead of demonstrating how to solve a specific problem with insight, the user must now go read an entire package on their own, find a usage of insight, and infer the problem that is being solved.

We would add some more comprehensive working examples that clearly show some solutions to problems to the readme (and as further (sub) section to the paper) - would that be sufficient?

strengejacke referenced this issue in easystats/insight Jun 18, 2019

@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented Jun 19, 2019

Since insight is part of the larger "easystats" project, we have developed some coding guidelines for our packages (https://github.com/easystats/easystats#convention-of-code-style). Your suggestion would break these guidelines, so for consistency within and across packages, we would prefer to keep the current names. However, we may add aliases to the existing functions.

If you don't want to change these, I understand. Since the difference between get_() and find_() does seem to come down to names versus values, I really do recommend function names reflecting this, but I'm happy to recommend for publication either way.

Definition of "term" and "variable".

Definitely swap. It look like you found some references in easystats/insight#56 (comment)?

Finally, I recommend grouping get_() and find_() together on the pkgdown site in the Reference section.

Yeah, you can do this all with pkgdown options, no need for @rdname or anything fancy. See the _pkgdown.yaml file (I think that's what it's called) in dplyr for examples: https://dplyr.tidyverse.org/reference/index.html

This seems to be new in R 3.6 (at least, since then it started appearing for me), and once namespaces are referred to, R warns about overwritten S3-methods. ggplot2 depends on rlang, afaik, and it seems ggplot2 overwrites some S3-methods from rlang (or vice versa). In short, this issues seems unrelated to insight.

Great, wouldn't worry about it then.

We would add some more comprehensive working examples that clearly show some solutions to problems to the readme (and as further (sub) section to the paper) - would that be sufficient?

Yep, that'd be ideal! The current examples show how to access certain kinds of information. The open question is when would users want to access this kind of information. What common problem does insight provide a solution too? My guess is there's some problem where you need to map over many models at once for model comparison and insight makes that process nicer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.