A better model summary #214

noamross · 2023-03-07T21:40:42Z

I think gratia would be a nice home for an alternative GAM model summary table that would be a reasonable first pass for including in a paper or at least a supplement for common models, and be amenable to modification. Things that I think would be good to have:

A unified table for parametric and smooth terms, with NA where applicable
A nice print function
Columns for:
- Term name
- Knots
- Basis type (with possible expanded full names)
- P-value w/stars
- EDF
Optional columns of min and max effect size in data range, with CIs and associated x (covariate) values associated with each

I'm doing this for myself, and would be happy to start a PR based on it.

The text was updated successfully, but these errors were encountered:

gavinsimpson · 2023-03-09T08:35:23Z

There's currently an overview() that does some of this:

library("gratia")
library("mgcv")

eg1 <- data_sim("eg1", n = 1000,  dist = "normal", scale = 2, seed = 1)
m <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), 
         data = eg1, 
         method = "REML")

overview(m)

> overview(m)

Generalized Additive Model with 4 terms

  term  type    edf statistic p.value
  <chr> <chr> <dbl>     <dbl> <chr>  
1 s(x0) TPRS   4.23    16.1   < 0.001
2 s(x1) TPRS   3.50   169.    < 0.001
3 s(x2) TPRS   8.61   201.    < 0.001
4 s(x3) TPRS   1.00     0.599 0.43875

It currently doesn't return estimates for parametric terms:

eg4 <- data_sim("eg4", n = 400,  dist = "normal", scale = 2, seed = 1)
m_by <- gam(y ~ fac + s(x2, by = fac) + s(x0), 
            data = su_eg4, method = "REML")
overview(m_by)

> overview(m_by)

Generalized Additive Model with 5 terms

  term       type         edf statistic p.value  
  <chr>      <chr>      <dbl>     <dbl> <chr>    
1 fac        parametric  2     150.     < 0.001  
2 s(x2):fac1 TPRS        2.85    4.62   0.0021453
3 s(x2):fac2 TPRS        2.95   30.9    < 0.001  
4 s(x2):fac3 TPRS        6.49   30.9    < 0.001  
5 s(x0)      TPRS        1.00    0.0168 0.8996412

It wouldn't be hard to have "parametric" be "factor" or "continuous" say, but I hadn't thought about returning the actual estimates (and the constant term is missing here).

What do you mean by Knots? The number of basis functions used? That would be easy to add.

I'm not a big fan of stars on p values, so I'd resist including them.

Expanding on the basis names (type) could be done - I was trying to avoid having the output be too wide, but my short-names are not very consistent (some aren't very short for some bases).

The above output is a plain tibble with some custom formatting rather than a specific print method: https://github.com/gavinsimpson/gratia/blob/HEAD/R/overview.R

Is this (sort of) what you had in mind?

noamross · 2023-03-09T22:42:32Z

I did not know about overview()! Yes, this is a way towards what I'm looking for. I do think anything that makes it easier for people to see and report effect sizes is important - they're under-reported even in linear models and really rare/hard for people to extract with GAMs. Yes, by Knots I mean k - I think that it's useful to have those alongside edf for interpretation. I'm not big on stars either. In theory I hoped that printing them would encourage people to use the output, but if they want them I suppose they can add it themselves.

…utput; part of #214

gavinsimpson · 2023-03-11T11:14:35Z

I've added a few bits of this to overview(). As of 0.8.1.13 the output is:

> overview(m_by, stars = TRUE)                                                           

Generalized Additive Model with 5 terms

  term       type           k   edf statistic p.value   stars    
  <chr>      <chr>      <dbl> <dbl>     <dbl> <chr>     <noquote>
1 fac        parametric    NA  2     150.     < 0.001   ***      
2 s(x2):fac1 TPRS           9  2.85    4.62   0.0021453 **       
3 s(x2):fac2 TPRS           9  2.95   30.9    < 0.001   ***      
4 s(x2):fac3 TPRS           9  6.49   30.9    < 0.001   ***      
5 s(x0)      TPRS           9  1.00    0.0168 0.8996412

but stars is FALSE by default.

I was thinking about effect sizes, and while it's not too onerous to do this for univariate smooths (evaluating the smooth at 100 evenly spaced values should be fine for most cases), it gets slower when we're doing this for multidimensional smooths unless we keep the number of evaluation points low-ish. I haven't looked into where things are slowing down and an obvious place to speed things up would be to run each smooth in a separate thread/process with furrr and future, but I haven't implemented any of the effect-size bits yet because I want to think a little more about them.

noamross · 2023-03-11T14:01:15Z

For the 2 papers I'm doing this for right now, we decided to have min/max in the summary calculated only from the fitted values on the data the model was trained on, so there's no additional predictions to make and these min/max values don't interpolate or extrapolate at all. This is at least predictable. For 2D+ one needs to think about things like too.far anyway and I think that's too case-base-case to get into. I think cases where the 1D min/max is well outside the fitted values are edge cases that would need to be diagnosed by looking at partial effects anyway.

gavinsimpson added a commit that referenced this issue Mar 11, 2023

add basis dimension and option to turn on significance stars to the o…

3f1d9ff

…utput; part of #214

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A better model summary #214

A better model summary #214

noamross commented Mar 7, 2023

gavinsimpson commented Mar 9, 2023

noamross commented Mar 9, 2023

gavinsimpson commented Mar 11, 2023

noamross commented Mar 11, 2023 •

edited

A better model summary #214

A better model summary #214

Comments

noamross commented Mar 7, 2023

gavinsimpson commented Mar 9, 2023

noamross commented Mar 9, 2023

gavinsimpson commented Mar 11, 2023

noamross commented Mar 11, 2023 • edited

noamross commented Mar 11, 2023 •

edited