Add diagnose method #485

rok-cesnovar · 2021-04-18T13:31:39Z

Summary

Fixes #156

This is a WIP as I think we need to settle on some of the names and what to expose.

The basic functionality:

> library(cmdstanr)

> model_path <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
> data_path <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.data.json")
> mod <- cmdstan_model(model_path)
> r <- mod$diagnose(data = data_path)
> r$gradients()
  param_idx    value    model finite_diff       error
1         0 0.118369 -3.35469    -3.35469 -3.3885e-10

Another example:

> r <- cmdstanr_example("schools", method = "diagnose")
> r$gradients()
   param_idx     value      model finite_diff        error
1          0  1.157970 -1.4246900  -1.4246900  1.27292e-09
2          1  0.890422 -4.5675900  -4.5675900 -9.29061e-11
3          2  1.222960  0.1080590   0.1080590 -2.31706e-09
4          3 -0.566058  0.3761530   0.3761530  2.74081e-10
5          4  0.385916  0.1168620   0.1168620 -5.88578e-10
6          5 -1.114130  0.4498980   0.4498980 -3.96907e-09
7          6 -0.178851  0.2151120   0.2151120  2.91668e-09
8          7  0.958420  0.0339674   0.0339674  2.17283e-09
9          8  1.064610  0.1850840   0.1850840  4.19366e-09
10         9 -0.895713  0.3858390   0.3858390 -7.83429e-10

Diagnose method has the following diagnose-specific args:

epsilon,
error,

Both of these are in the gradient argument group. But given that diagnose only diagnoses gradients, this is fine I think.
The rest of the commong args are:

data,
seed,
init,
output_dir,
output_basename

Not sure we need output_dir and output_basename but they were easy to add.
We could add threads but I dont think its a real use case for the diagnose method.

The run returns a CmdStanDiagnose object that has the following methods:

$gradients()
$lp() (returns the log probability also returned in diagnose)
$metadata()
$init()
$output_files()
$save_output_files()
$data_file()
$save_data_file()
$print() (just prints the gradients data frame)

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Rok Češnovar

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)

rok-cesnovar · 2021-04-18T13:38:57Z

Questions to resolve:

do we need the threads arg in the diagnose() call?

My opinion is we do not need it at the moment, but can always add it later if a use-case arises or someone wants to add it.

is the name $gradients() ok?
do we need $output()?

Currenlty if the diagnose call stops, the entire stdout/stderr is printed. If diagnose finishes normally, there is no actual useful output so I would say we dont need it?

do we need $print()? I am not sure its really useful
do we need any other method in the result object?

codecov-commenter · 2021-04-18T14:03:38Z

Codecov Report

Merging #485 (667eff2) into master (c97b25c) will decrease coverage by 0.05%.
The diff coverage is 90.09%.

@@            Coverage Diff             @@
##           master     #485      +/-   ##
==========================================
- Coverage   93.31%   93.26%   -0.06%     
==========================================
  Files          12       12              
  Lines        2948     3043      +95     
==========================================
+ Hits         2751     2838      +87     
- Misses        197      205       +8

Impacted Files	Coverage Δ
R/example.R	`100.00% <ø> (ø)`
R/run.R	`95.90% <66.66%> (-1.08%)`	⬇️
R/fit.R	`98.18% <90.00%> (-0.31%)`	⬇️
R/args.R	`98.57% <100.00%> (+0.06%)`	⬆️
R/csv.R	`98.39% <100.00%> (+0.11%)`	⬆️
R/model.R	`92.91% <100.00%> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c97b25c...667eff2. Read the comment docs.

jgabry · 2021-04-19T15:53:13Z

Thanks for working on this.

Questions to resolve:

do we need the threads arg in the diagnose() call?

My opinion is we do not need it at the moment, but can always add it later if a use-case arises or someone wants to add it.

That sounds good to me.

is the name $gradients() ok?

Ok by me, but we should check with @mitzimorris. If CmdStanPy exposes the diagnose method we can coordinate names.

do we need $output()?

Currenlty if the diagnose call stops, the entire stdout/stderr is printed. If diagnose finishes normally, there is no actual useful output so I would say we dont need it?

Yeah I guess we don't need it.

do we need $print()? I am not sure its really useful

If there's nothing useful to print then I guess it's fine to omit it.

do we need any other method in the result object?

I can't think of anything right now.

rok-cesnovar · 2021-04-19T16:10:33Z

cmdstanpy does not expose this: stan-dev/cmdstanpy#233

But we can agree on names here.

mitzimorris · 2021-04-27T21:31:29Z

Here's my $0.02:

Diagnose method has the following diagnose-specific args:

epsilon,
error,
Both of these are in the gradient argument group. But given that diagnose only diagnoses gradients, this is fine I think.

agree

The rest of the commong args are:

data,
seed,
init,
output_dir,
output_basename
Not sure we need output_dir and output_basename but they were easy to add.
We could add threads but I dont think its a real use case for the diagnose method.

also agree - threads don't make sense - threads really only make sense in the context of MCMC methods.

the methods on the CmdStanDiagnose object are fine - no opinion on print - that's an R thing.

regarding name gradients the other option would be finite_diffs ?
an R dataframe object seems appropriate - one thing that would be useful would be to add a column of parameters names.
CmdStanPy would then return a pandas dataframe.

CmdStan has a basic diagnostic feature that will calculate the gradients of the initial state and compare them with gradients calculated by finite differences.

I ran this on a model with more parameters - here's the output:

./bym2_islands diagnose data file=scotland_islands.data.json  output sig_figs=18
method = diagnose
  diagnose
    test = gradient (Default)
      gradient
        epsilon = 9.9999999999999995e-07 (Default)
        error = 9.9999999999999995e-07 (Default)
id = 0 (Default)
data
  file = scotland_islands.data.json
init = 2 (Default)
random
  seed = 471103111 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = 18
  profile_file = profile.csv (Default)

TEST GRADIENT MODE

 Log probability=-15346.3

 param idx           value           model     finite diff           error
         0       -0.946343        -1476.93        -1476.93    -4.09008e-07
         1       -0.464474         -536.62         -536.62    -1.83793e-07
         2      -0.0379859        -1110.81        -1110.81     1.00659e-07
         3        0.388117        -20026.8        -20026.8     1.30729e-06
         4         0.71313         10.4752         10.4752     4.49812e-07
         5       -0.540705         60.9843         60.9843     6.18526e-07
         6       -0.249176         17.2592         17.2592     2.35654e-07
         7         1.46106         7.89157         7.89157     5.31475e-07
         8         1.68691        -116.345        -116.345     -4.7295e-08
         9         1.55025         8.59678         8.59678    -3.97736e-07
        10       -0.982377         41.2775         41.2775     1.22317e-07
...
       112         1.41383         2965.13         2965.13     2.65591e-06
       113       -0.818459          3066.8          3066.8     3.04369e-06
       114        0.240622            3062            3062     2.14349e-06
       115         1.98949         3044.91         3044.91     3.61147e-08

hence the suggestion for a column of parameter names.

rok-cesnovar · 2021-04-28T06:53:29Z

regarding name gradients the other option would be finite_diffs ?

Maybe, though this data frame represents gradients calculated with autodiff and the same gradients calculated with finite_diffs so I think gradient may be more appropriate.

one thing that would be useful would be to add a column of parameters names

I agree, but cmdstan does not output them. We would have to run an iteration of optimization or something to get the names. Or change it in Stan services. The next version of stanc3 that will be available for cmdstan 2.27 will be able to output the names of the parameters so we could use that in the wrappers to get the names. In both cases I think that is a separate PR as this right now does the bare minimum of what CmdStan offers.

mitzimorris · 2021-04-28T16:05:06Z

hi @rok-cesnovar - you're right about no names in the output.
I think it would be possible to get this info from the instantiated model in CmdStan
and put it into the output - will investigate

also, agree, gradients is the correct name for the method.

rok-cesnovar · 2021-05-01T17:50:41Z

This is ready for review.

jgabry

Looks good, just a few minor things plus this error I'm getting:

cmdstanr_example(method = "diagnose")

Error in processx::run(command = self$command(), args = self$command_args()[[1]],  : 
  argument 5 matches multiple formal arguments

Also would be good to add a test.

cmdstanr.Rproj

R/model.R

R/run.R

R/fit.R

jgabry · 2021-05-04T16:07:56Z

Ok I did the documentation. Any idea why the example errors? Or is that just on my computer?

jgabry · 2021-05-04T17:17:14Z

Any idea why the example errors? Or is that just on my computer?

Turns out I needed to update my processx version because previous versions didn't have the stdout and stderr arguments. I think we need processx >= 3.5.0 for this, so I just bumped the minimum version in DESCRIPTION. I also added a NEWS item.

rok-cesnovar · 2021-05-04T17:20:11Z

I think we need processx >= 3.5.0

Ah yes, great call, thanks. The ability to store stdout/stderr in a file was added in 3.5.0.

Will fix the rest of the minor stuff immediately.

rok-cesnovar · 2021-05-04T17:43:26Z

Addressed the rest of the comments. Thanks so much for the review and fixing up the comments!

jgabry

Looks good, thanks! Can you add a test (I think just a simple one checking that it doesn't error and the gradients() method works after)? Other than that I think it's ready to merge. If you don't have time to do the test let me know and I can probably do it tonight or tomorrow.

@rok-cesnovar

@rok-cesnovar this should fix the failing check

rok-cesnovar · 2021-05-04T19:55:11Z

Thanks! Will add one tomorrow.

# Conflicts: # R/model.R

rok-cesnovar · 2021-05-05T18:20:02Z

Added tests running the method and reading the CSV. Also did a minor change so that lp is not part of metadata in read_cmdstan_csv.

jgabry

Thanks! Looks good to me.

jgabry · 2021-05-05T20:00:32Z

Everything passed so merging now

add diagnose method

edc90a2

rok-cesnovar added 3 commits April 18, 2021 16:15

remove sig_figs

eead6fc

add lp()

c9da26b

update docs

2b0c039

rok-cesnovar added 2 commits April 22, 2021 09:51

Merge remote-tracking branch 'origin/master' into diagnose_method

db9d346

add some docs

9f97743

rok-cesnovar added 2 commits May 1, 2021 12:11

fix docs

3f60ae4

more doc fixes

5c91b3b

rok-cesnovar changed the title ~~[WIP] Add diagnose method~~ Add diagnose method May 1, 2021

Update args.R

7e4e36f

jgabry requested changes May 3, 2021

View reviewed changes

cmdstanr.Rproj Outdated Show resolved Hide resolved

R/model.R Outdated Show resolved Hide resolved

R/run.R Outdated Show resolved Hide resolved

R/fit.R Outdated Show resolved Hide resolved

CmdStanDiagnose doc

f4ce01e

jgabry added 2 commits May 4, 2021 11:16

bump processx version

bd70edb

Add NEWS entry

84af8b1

rok-cesnovar added 3 commits May 4, 2021 19:31

remove cmdstanr.Rproj changes

7d6c868

tbb path checking cleanup

8fe72d7

fix arg order in diagnose

f795823

jgabry reviewed May 4, 2021

View reviewed changes

regenerate doc after changing argument order

1ffb548

@rok-cesnovar this should fix the failing check

rok-cesnovar added 4 commits May 5, 2021 10:20

Merge branch 'master' into diagnose_method

605cd5f

# Conflicts: # R/model.R

add basic diagnose tests

9e04d87

move lp from metadata and add csv test

42b0c44

add examples test

667eff2

jgabry approved these changes May 5, 2021

View reviewed changes

jgabry merged commit a2b36fe into master May 5, 2021

jgabry deleted the diagnose_method branch May 5, 2021 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add diagnose method #485

Add diagnose method #485

rok-cesnovar commented Apr 18, 2021 •

edited by jgabry

Loading

rok-cesnovar commented Apr 18, 2021

codecov-commenter commented Apr 18, 2021 •

edited

Loading

jgabry commented Apr 19, 2021

rok-cesnovar commented Apr 19, 2021

mitzimorris commented Apr 27, 2021

rok-cesnovar commented Apr 28, 2021

mitzimorris commented Apr 28, 2021

rok-cesnovar commented May 1, 2021

jgabry left a comment •

edited

Loading

jgabry commented May 4, 2021

jgabry commented May 4, 2021

rok-cesnovar commented May 4, 2021

rok-cesnovar commented May 4, 2021

jgabry left a comment

rok-cesnovar commented May 4, 2021

rok-cesnovar commented May 5, 2021

jgabry left a comment

jgabry commented May 5, 2021

Add diagnose method #485

Add diagnose method #485

Conversation

rok-cesnovar commented Apr 18, 2021 • edited by jgabry Loading

Summary

Copyright and Licensing

rok-cesnovar commented Apr 18, 2021

codecov-commenter commented Apr 18, 2021 • edited Loading

Codecov Report

jgabry commented Apr 19, 2021

rok-cesnovar commented Apr 19, 2021

mitzimorris commented Apr 27, 2021

rok-cesnovar commented Apr 28, 2021

mitzimorris commented Apr 28, 2021

rok-cesnovar commented May 1, 2021

jgabry left a comment • edited Loading

Choose a reason for hiding this comment

jgabry commented May 4, 2021

jgabry commented May 4, 2021

rok-cesnovar commented May 4, 2021

rok-cesnovar commented May 4, 2021

jgabry left a comment

Choose a reason for hiding this comment

rok-cesnovar commented May 4, 2021

rok-cesnovar commented May 5, 2021

jgabry left a comment

Choose a reason for hiding this comment

jgabry commented May 5, 2021

rok-cesnovar commented Apr 18, 2021 •

edited by jgabry

Loading

codecov-commenter commented Apr 18, 2021 •

edited

Loading

jgabry left a comment •

edited

Loading