Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goals for 0.0.7 #62

Closed
9 of 17 tasks
IndrajeetPatil opened this issue Oct 1, 2018 · 22 comments
Closed
9 of 17 tasks

goals for 0.0.7 #62

IndrajeetPatil opened this issue Oct 1, 2018 · 22 comments
Milestone

Comments

@IndrajeetPatil
Copy link
Owner

IndrajeetPatil commented Oct 1, 2018

(Goal for release date: last week of December)

To do:

  • Add groupedstats as dependencies and import shared functions from there
  • Go full rlang rather than using short-cuts
  • Refactor code to remove stats::na.omit(). Take a more fine-grained approach to remove NAs only from columns of interest.
  • Add results.subtitle argument to all functions
  • Get ggcoefstats to work with dataframe arguments
  • Showing both 50% and 95% CIs for ggcoefstats (like in Bayesian inference plots: e.g., https://twitter.com/tjmahr/status/1048226472710873089)
  • Add many more tests and get the code coverage to at least 50%
    (currently at 14%: https://github.com/IndrajeetPatil/ggstatsplot/tree/master/tests)
  • Check font size for theme_ggstatsplot function; give user arguments option to change all aspects of the theme?
  • Clean up Rmd using gramr package
  • Change k = 2 for all functions to follow APA guidelines
  • Add Bayes Factors to ggscatterstats, ggpiestats, and ggbetweenstats (anova designs)
  • When there are many levels in a factor, ggpiestats labels can overlap; give the option to have the labels to be either "internal" (current default) or "external" to the slices
  • Add group option for ggscatterstats to support grouped marginals (https://github.com/daattali/ggExtra/blob/master/inst/vignette_files/ggExtra_files/figure-markdown_strict/ggmarginal-grouping-1.png)
  • Add ggplot.function argument to grouped_ variants to make modifications with ggplot2 functions to customize the plot
  • Add pairwise comparisons support for ggbetweenstats
  • Add new function ggdotplotstats for dot plots/charts
  • Change 95% CI to have 95% as a subscript
@IndrajeetPatil
Copy link
Owner Author

@ibecav I think, for now, we can stop tweaking the vignettes because the package size is now 3.3 MB, way below the CRAN limit of 5 MB.

I think the thing that really needs to be implemented asap is tests; only 14% of the code has code coverage, which is far from ideal.

Do you mind working on adding more tests? I will work in parallel on adding a new function for dot plots.

@ibecav
Copy link
Contributor

ibecav commented Oct 1, 2018 via email

@IndrajeetPatil
Copy link
Owner Author

Sounds good. Also, note that purrr_examples vignette has also changed over the weekend.

@ibecav
Copy link
Contributor

ibecav commented Oct 1, 2018 via email

@IndrajeetPatil
Copy link
Owner Author

@ibecav In light of the discussion about lm_effsize_ci bug, I was thinking that it'll be a good idea if you can give the vignette for ggcoefstats a read. It can really benefit from your statistical expertise as I am not really well-versed in all nuances surrounding various kinds of regression models.

(Please don't remove any examples because, right now, this is the only way to check which regression models are supported by this function. We can downsize it only if the need arises.)

@ibecav
Copy link
Contributor

ibecav commented Oct 5, 2018 via email

@ibecav
Copy link
Contributor

ibecav commented Oct 5, 2018

I took a look today and definitely see the need. Wow that's a lot of vignette! I'll see if I can add some value.

@ibecav
Copy link
Contributor

ibecav commented Oct 5, 2018 via email

@IndrajeetPatil
Copy link
Owner Author

Yeah, I agree that that vignette is pretty big, but that was deliberate on my part. I just want the user to have the impression that any regression model they can think of is supported by this function!

Thanks for pointing me to CGPfunctions. Looks pretty cool!
I especially love the ggslopegraph function. I am definitely considering including slopegraphs in ggstatsplot to show categorical data since a lot of people don't want to use pie charts, for good reasons of course.

I've been thinking about adding support for more complex factorial designs (not just 2-way, but any kind of design the user can think of) in ggstatsplot by 1.0.0. release, something along the lines of ggbetweenstats in terms of plot design. The closest thing I have seen is the new afex_plot() function that is mighty general in terms of what kinds of factorial designs it can display and also satisfying with regards to the plot design:
https://cran.r-project.org/web/packages/afex/vignettes/afex_plot_introduction.html

@ibecav
Copy link
Contributor

ibecav commented Oct 8, 2018

Thanks for the tip on afex. A little disappointed that it doesn't accept many of the standard model objects.

I have been going over the vignette for ggcoefstats. Some points for discussion before I start editing:

  1. Is more or less a feature request. Would you consider supporting standardized coefficients? If the predictors are even one or two orders of magnitude different like wt & disp in mtcars it throws the plot off

  2. Any problem if I significantly reorder things?

  3. Long term would you consider breaking the vignette up into smaller chunks if it made sense topically?

Thanks.

set.seed(123)
# your current example
ggstatsplot::ggcoefstats(x = lm(formula = mpg ~ cyl * am, data = mtcars))

# a different exmaple some would say more classic regression
ggstatsplot::ggcoefstats(x = lm(formula = mpg ~ wt * disp, data = mtcars))

# Note R squared
summary(lm(formula = mpg ~ cyl * am, data = mtcars))
#> 
#> Call:
#> lm(formula = mpg ~ cyl * am, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.5255 -1.2820 -0.0191  1.6301  5.9745 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  30.8735     3.1882   9.684 1.95e-10 ***
#> cyl          -1.9757     0.4485  -4.405 0.000141 ***
#> am           10.1754     4.3046   2.364 0.025258 *  
#> cyl:am       -1.3051     0.7070  -1.846 0.075507 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.939 on 28 degrees of freedom
#> Multiple R-squared:  0.7852, Adjusted R-squared:  0.7621 
#> F-statistic: 34.11 on 3 and 28 DF,  p-value: 1.73e-09
# Note R squared
summary(lm(formula = mpg ~ wt * disp, data = mtcars))
#> 
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -3.267 -1.677 -0.836  1.351  5.017 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 44.081998   3.123063  14.115 2.96e-14 ***
#> wt          -6.495680   1.313383  -4.946 3.22e-05 ***
#> disp        -0.056358   0.013239  -4.257  0.00021 ***
#> wt:disp      0.011705   0.003255   3.596  0.00123 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.455 on 28 degrees of freedom
#> Multiple R-squared:  0.8501, Adjusted R-squared:  0.8341 
#> F-statistic: 52.95 on 3 and 28 DF,  p-value: 1.158e-11
# stanardized coefficents mean the x axis scale is less problematic
# and make results clearer when thing are on different scales
lsr::standardCoefs(lm(formula = mpg ~ cyl * am, data = mtcars))
#>                b       beta
#> cyl    -1.975735 -0.5854553
#> am     10.175407  0.8424555
#> cyl:am -1.305116 -0.5871095
lsr::standardCoefs(lm(formula = mpg ~ wt * disp, data = mtcars))
#>                   b      beta
#> wt      -6.49567966 -1.054555
#> disp    -0.05635816 -1.158954
#> wt:disp  0.01170542  1.296692

Created on 2018-10-08 by the reprex package (v0.2.1)

@IndrajeetPatil
Copy link
Owner Author

  1. Yes, this is something I've would like to implement for sure. Maybe even for the 0.0.7 release.
    (With this approach: http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf)

This is already implemented in this function from dotwhisker package, but I am wondering if I should write it myself and avoid another dependency or just import this function:
https://github.com/fsolt/dotwhisker/blob/master/R/by_2sd.R

I was kind of hoping that the users would do this themselves. For instance, for the example you provided:

ggstatsplot::ggcoefstats(x = lm(
  formula = scale(mpg) ~ scale(wt) * scale(disp),
  data = mtcars
))

  1. No problem if you reorder, but please don't remove any of the existing models.

  2. I kind of like the current "one vignette per function" approach. If you are worried about the increasing size of the vignette and the toll it's going to take on the speed of CRAN checks, we can just move this vignette to a website article and point to it in the additional vignette.

I've already started doing this:
https://indrajeetpatil.github.io/ggstatsplot/articles/additional.html

@ibecav
Copy link
Contributor

ibecav commented Oct 8, 2018 via email

@IndrajeetPatil
Copy link
Owner Author

IndrajeetPatil commented Oct 8, 2018 via email

@ibecav
Copy link
Contributor

ibecav commented Oct 8, 2018

beta versus b hat has nothing to do with standardizing more about population versus sample

https://stats.stackexchange.com/questions/210543/what-is-the-difference-between-beta-1-and-hat-beta-1

@IndrajeetPatil
Copy link
Owner Author

IndrajeetPatil commented Oct 9, 2018 via email

@ibecav
Copy link
Contributor

ibecav commented Oct 9, 2018

No I don't think any practitioner will be confused. I just teach and tend to be very careful until it sinks into their heads. Today will be slow (meetings) but I'll keep slogging along.

And to your earlier comment about having users simply use scale as part of the formula to lm that does not produce the classic standardized coefficients I'm used to in my discipline but as Gelman notes there are actually multiple ways you can standardize. I'm used to this one:

screen shot 2018-10-09 at 08 43 19

lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Coefficients:
#> (Intercept)           wt         disp      wt:disp  
#>    44.08200     -6.49568     -0.05636      0.01171
lm(formula = scale(mpg) ~ scale(wt) * scale(disp), data = mtcars)
#> 
#> Call:
#> lm(formula = scale(mpg) ~ scale(wt) * scale(disp), data = mtcars)
#> 
#> Coefficients:
#>           (Intercept)              scale(wt)            scale(disp)  
#>               -0.2026                -0.6161                -0.3845  
#> scale(wt):scale(disp)  
#>                0.2355
lm.beta::lm.beta(lm(formula = mpg ~ wt * disp, data = mtcars))
#> 
#> Call:
#> lm(formula = mpg ~ wt * disp, data = mtcars)
#> 
#> Standardized Coefficients::
#> (Intercept)          wt        disp     wt:disp 
#>    0.000000   -1.054555   -1.158954    1.296692
lsr::standardCoefs(lm(formula = mpg ~ wt * disp, data = mtcars))
#>                   b      beta
#> wt      -6.49567966 -1.054555
#> disp    -0.05635816 -1.158954
#> wt:disp  0.01170542  1.296692

Created on 2018-10-09 by the reprex package (v0.2.1)

@IndrajeetPatil
Copy link
Owner Author

@ibecav Thinking of doing a minor release by the end of this month to make some important bug fixes available to the users. Will you have time to add a few more tests, especially for the subtitle making functions that are at the heart of everything (https://indrajeetpatil.github.io/ggstatsplot/reference/index.html#section-helper-functions-statistics-subtitles-)? Their code is currently not covered by any tests.

It will be pretty straightforward, just like with the tests you had written for the effect size functions and shouldn't take a lot of time.

I've been working on a new function to display post-hoc comparisons for ggbetweenstats and that will keep me occupied until the next release.

@ibecav
Copy link
Contributor

ibecav commented Nov 7, 2018

I'll try and get at least some done before Thanksgiving.

@IndrajeetPatil IndrajeetPatil added this to the 0.0.7 milestone Nov 10, 2018
@ibecav
Copy link
Contributor

ibecav commented Nov 13, 2018

I did an initial pull request for one test. Take a look and then I can work the others.

@IndrajeetPatil
Copy link
Owner Author

Just had a closer look at all the testthat files you added and modified few things.

For future reference, can you please structure all tests so that they follow the points listed below-

  1. Try not to have any lints in the code. So, for example, no line should go above 90 characters limit; better to have each argument on a separate line. The lintr bot will catch it, but better to do it on our side anyway.
  2. Add a comment for each test and not for a full block of tests.
  3. If you are copy-pasting tests from one file to another, make sure to change the comments (e.g., you were talking about checking bayes factor in a file containing tests for robust anovas and Pearson's chi-squared test).
  4. Always have a test for sample size n. This is going to be crucial when NAs are present in the dataset.
  5. All subtitle making functions are exported, so no need to have ::: call.

This will also make them easier to debug or change in case the functions being tested themselves change.

We are making a good headway towards increating the code coverage!
https://indrajeetpatil.github.io/ggstatsplot/articles/tests_and_coverage.html

@ibecav
Copy link
Contributor

ibecav commented Nov 14, 2018 via email

@IndrajeetPatil
Copy link
Owner Author

@ibecav I have added some new tests in for ggbetweenstats helpers, which brought to my attention that all the new helpers I had added for pairwise comparisons have 0% code coverage! :(
https://github.com/IndrajeetPatil/ggstatsplot/blob/master/R/helpers_pairwise_comparison.R

If you get time, can you add tests for these as well? I am working on adding tests for ggcoefstats, which is going to take a lot of tests to have complete code coverage.

I am planning to submit 0.0.7 to CRAN after thanksgiving, most probably on the 28th of November. So, if you add more tests until then, they will be part of this release. If not, 0.0.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants