Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling #434

Merged
merged 32 commits into from
Jan 22, 2021
Merged

Profiling #434

merged 32 commits into from
Jan 22, 2021

Conversation

rok-cesnovar
Copy link
Member

Summary

Adds support for profiling in cmdstanr.

generator = function(seed = 0, n = 1000, k = 10) {
  
  inv_logit <- function(x) {
    1 / (1 + exp(-x))
  }
  
  set.seed(seed)
  X <- matrix(rnorm(n * k), ncol = k)
  
  y <- 3 * X[,1] - 2 * X[,2] + 1
  p <- runif(n)
  y <- ifelse(p < inv_logit(y), 1, 0)
  
  list(k = ncol(X), n = nrow(X), y = y, X = X)
}

model_code <- 'data {
  int<lower=1> k;
  int<lower=0> n;
  matrix[n, k] X;
  int y[n];
}
parameters {
  vector[k] beta;
  real alpha;
}
model {
  profile("priors") {
    target += std_normal_lpdf(beta);
    target += std_normal_lpdf(alpha);
  }
  profile("likelihood") {
      target += bernoulli_logit_lpmf(y | X * beta + alpha);
  }
}'

temp_stan_file <- write_stan_file(code = model_code)
mod <- cmdstan_model(temp_stan_file)
fit <- mod$sample(data = generator(1, 1000, 50))
> print(fit$profiles())
[[1]]
name       thread_id total_time forward_time reverse_time chain_stack no_chain_stack autodiff_calls no_autodiff_calls
1 likelihood 140564926150464  2.1715600   1.77739000   0.39417500       77661       51774000          25887                 1
2     priors 140564926150464  0.0109287   0.00874673   0.00218202       51774              0          25887                 1

[[2]]
name       thread_id total_time forward_time reverse_time chain_stack no_chain_stack autodiff_calls no_autodiff_calls
1 likelihood 139824695863104 1.76786000   1.44602000   0.32183700       64524       43016000          21508                 1
2     priors 139824695863104 0.00891067   0.00712529   0.00178538       43016              0          21508                 1

[[3]]
name       thread_id total_time forward_time reverse_time chain_stack no_chain_stack autodiff_calls no_autodiff_calls
1 likelihood 140448024303424 1.84105000   1.50612000   0.33492800       67326       44884000          22442                 1
2     priors 140448024303424 0.00974464   0.00786197   0.00188267       44884              0          22442                 1

[[4]]
name       thread_id total_time forward_time reverse_time chain_stack no_chain_stack autodiff_calls no_autodiff_calls
1 likelihood 140609478141760 1.66333000   1.36427000    0.2990630       58035       38690000          19345                 1
2     priors 140609478141760 0.00878534   0.00712344    0.0016619       38690              0          19345                 1

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Rok Češnovar

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

@rok-cesnovar
Copy link
Member Author

This is not just ready yet, missing docs and test but need feedback on the API. @bbbales2 if you have a few moments, mind looking at this?

fit$profiles() returns a list of data frames
fit$profile_files() returns files

@bbbales2
Copy link
Member

@rok-cesnovar works for me. There probably should be a $save_profile_files() too

@bbbales2
Copy link
Member

Currently the situation is:

> fit$profiles()
list()
> fit$profiles_files()
Error: attempt to apply non-function
> fit$save_profile_files()
Moved 0 files and set internal paths to new locations:
- /home/bbales2/cmdstanr/generate-profile-202101201633-1-39e034.csv
- /home/bbales2/cmdstanr/generate-profile-202101201633-2-39e034.csv
- /home/bbales2/cmdstanr/generate-profile-202101201633-3-39e034.csv
- /home/bbales2/cmdstanr/generate-profile-202101201633-4-39e034.csv

So $profiles() is good, and the other two could be slightly better. For comparison:

> fit$latent_dynamics_files()
 Error: No latent dynamics files found. Set 'save_latent_dynamics=TRUE' when fitting the model. 
> fit$save_latent_dynamics_files()
 Error: No latent dynamics files found. Set 'save_latent_dynamics=TRUE' when fitting the model. 

@rok-cesnovar
Copy link
Member Author

rok-cesnovar commented Jan 20, 2021

Thanks. I agree.

@jgabry are you good with these names

fit$profiles()
fit$profile_files()
fit$save_profile_files()

Just double checking with you. Ben agreed to review the functionalities.

@bbbales2
Copy link
Member

Update actually profile_files() works I think I hadn't reloaded session or something.

save_profile_files could still be less verbose tho:

> fit$save_profile_files()
Moved 0 files and set internal paths to new locations:
- /home/bbales2/cmdstanr/generate-profile-202101201640-1-1e8498.csv
- /home/bbales2/cmdstanr/generate-profile-202101201640-2-1e8498.csv
- /home/bbales2/cmdstanr/generate-profile-202101201640-3-1e8498.csv
- /home/bbales2/cmdstanr/generate-profile-202101201640-4-1e8498.csv

@bbbales2
Copy link
Member

@jgabry this all works great for me. It would be good to get this reviewed and merged ASAP so we can test before release (next Monday, the 25th)

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments/questions

R/args.R Outdated Show resolved Hide resolved
R/run.R Outdated Show resolved Hide resolved
tests/testthat/test-fit-shared.R Outdated Show resolved Hide resolved
tests/testthat/test-profiling.R Outdated Show resolved Hide resolved
R/fit.R Show resolved Hide resolved
@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Actually don't merge this yet. I think there's a problem:

fit <- cmdstanr_example(quiet = FALSE)

Compiling Stan program...
Running MCMC with 4 sequential chains...

Chain 1 profile_file=NA is either mistyped or misplaced.
Chain 1 A method must be specified!
Warning: Chain 1 finished unexpectedly!

Chain 2 profile_file=NA is either mistyped or misplaced.
Chain 2 A method must be specified!
Warning: Chain 2 finished unexpectedly!

Chain 3 profile_file=NA is either mistyped or misplaced.
Chain 3 A method must be specified!
Warning: Chain 3 finished unexpectedly!

Chain 4 profile_file=NA is either mistyped or misplaced.
Chain 4 A method must be specified!
Warning: Chain 4 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Warning messages:
1: All chains finished unexpectedly!
 
2: No chains finished successfully. Unable to retrieve the fit. 

@rok-cesnovar
Copy link
Member Author

I am getting a weird error with vignettes that I didnt touch. Checking.

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Yeah I think it's the same error as the one I got running cmdstanr_example()

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

I get a lot of test failures too if I run the unit tests locally

@rok-cesnovar
Copy link
Member Author

Can you try again?

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Nice, that seems to fix the error I was seeing.

@rok-cesnovar
Copy link
Member Author

Once this is in I am going to try my luck making a vignette. What is the easiest way to build the HTML and everything we need to publish it here?

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Cool. Do you mean how do we include a new vignette on the CmdStanR website?

@rok-cesnovar
Copy link
Member Author

Yeah. Never made a vignette before..

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Not complicated but a few things required. How about I demonstrate by creating a branch for the new vignette and doing everything necessary except writing the content? Then you can see what I did so you can do it yourself next time you want to add a vignette. Does that work?

@rok-cesnovar
Copy link
Member Author

That works perfectly if you have the time. Thank you!

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Yeah I have a few min now so I can do that

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Done: #435

R/run.R Outdated Show resolved Hide resolved
@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

I think this is failing now because my previous suggestion was only half right! We need to check:

  • if length 0 (will be if version < 2.26)
  • if not length zero do the files exist?

So something like this:

    profile_files = function(include_failed = FALSE) {
      files <- private$profile_files_
      if (!length(files) || !any(file.exists(files))) {
        stop(
          "No profile files found. ",
          "The model that produced the fit did not use any profiling.",
          call. = FALSE
        )
      }
      if (include_failed) {
        files
      } else {
        ok <- self$procs$is_finished() | self$procs$is_queued()
        files[ok]
      }
    }

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Let's see what happens if I push that

@jgabry
Copy link
Member

jgabry commented Jan 21, 2021

Now I think the test coverage is only failing because it's not using the release candidate. I guess there are two options: we could change the installation to the RC like you did for the R CMD check workflow, or we could only run the failing tests if version >= 2.26. Preference?

@rok-cesnovar
Copy link
Member Author

we could only run the failing tests if version >= 2.26

This is probably better.

@codecov-io
Copy link

Codecov Report

Merging #434 (e6ae1ad) into master (5de494b) will decrease coverage by 3.19%.
The diff coverage is 43.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #434      +/-   ##
==========================================
- Coverage   93.12%   89.93%   -3.20%     
==========================================
  Files          12       12              
  Lines        2822     2870      +48     
==========================================
- Hits         2628     2581      -47     
- Misses        194      289      +95     
Impacted Files Coverage Δ
R/csv.R 98.16% <ø> (-1.05%) ⬇️
R/run.R 91.38% <34.28%> (-5.21%) ⬇️
R/args.R 98.91% <62.50%> (-0.65%) ⬇️
R/fit.R 97.24% <62.50%> (-1.13%) ⬇️
R/install.R 57.43% <0.00%> (-13.71%) ⬇️
R/utils.R 90.54% <0.00%> (-1.36%) ⬇️
R/model.R 91.70% <0.00%> (-0.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5de494b...e6ae1ad. Read the comment docs.

@rok-cesnovar rok-cesnovar merged commit deeebdc into master Jan 22, 2021
@rok-cesnovar rok-cesnovar deleted the profiling branch January 22, 2021 08:40
@wlandau
Copy link
Collaborator

wlandau commented Jan 22, 2021

Thank you so much for profiling functionality! I hope to give it a test drive soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants