Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/2759 model base #2785

Merged
merged 14 commits into from Jul 17, 2019

Conversation

@bob-carpenter
Copy link
Contributor

commented Jul 8, 2019

Submission Checklist

  • Run unit tests: ./runTests.py src/test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary

Add stan::model::model_base base class and and stan::model::model_crtp static adapter class for constructing models.

model_base defines pure virtual methods for all functionality used by the services and algorithms.

Model classes are now generated to extend model_crtp rather than prob_grad. They also define a factory function to create new instances. The implementation of the CRTP ensures that all the existing template functions and virtual functions are implemented in model_base by delegating to the implementation's template function. (There's an exmaple in the tests for model_crtp.)

Intended Effect

Speed up compile time. Specifically, allow functions like the services to be instantiated with model_base and compiled in their own translation units. Then models are compiled in their own translation unit and linked.

How to Verify

Check the matching branch in stan-dev/cmdstan that implements CmdStan using the new pattern . You can verify there that there is no slowdown.

Otherwise, there are unit tests for the instantiation and inheritance of the base classes and all other tests still pass.

Side Effects

This change maintains backward compatibility in that no functionality is lost from generated model code, which can still be used as before. All template programs were left intact---they can simply be instantiated with base_model rather than being hard coded for base_model.

The main side effect is that all model classes foo now contain virtual functions where there used to be non-virtual functions. As none of these are in tight loops, there is negligible effect on speed.

The model_name() function was changed from static to a virtual member function so that it could be used as part of the base model specification. Nothing in the services use the model name, but it might get used in RStan, PyStan, etc.

Documentation

Yes. The goal is to have the model_base fully documented so that clients can understand Stan's requirements for models.

Also added documentation for prob_grad that was previously missing.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@stan-buildbot

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.98)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 1.0)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.97)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.98)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 0.78)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.0)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.03)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.02)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.03)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.01)
Result: 0.98587122603
Commit hash: da68b8b

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Result: 0.98587122603

Does this mean it's a bit slower? There's an outlier, so now I'm curious about aggregation method:

stat_comp_benchmarks/benchmarks/arK/arK.stan, 0.78)

To save everyone the trouble of looking it up, there's nothing special about this particular Stan program:

data {
  int<lower=0> K;
  int<lower=0> T;
  real y[T];
}
parameters {
  real alpha;
  real beta[K];
  real<lower=0> sigma;
}
model {
  alpha ~ normal(0, 10);
  beta ~ normal(0, 10);
  sigma ~ cauchy(0, 2.5);
  for (t in (K+1):T) {
    real mu;
    mu = alpha;
    for (k in 1:K)
      mu = mu + beta[k] * y[t - k];
    y[t] ~ normal(mu, sigma);
  }
}
@wds15

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

Yes... values smaller than 1 mean its slower...bigger than 1 and you are speeding up. 0.78 for ark seems a lot (but that model runs super fast).

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

I ran 20 iterations of the old and new model code for arK.stan from the benchmarks repo and here are the results (mean time is 2.80s and 2.76s respectively---this is in the noise, but shows at least on my machine that there's not a big difference in performance). Both compiled with latest develop branch cmdstan at O=3.

# OLD
2.79759, 2.81587, 2.80003, 2.80729, 2.81272, 2.80357, 2.80182, 2.83747, 2.84059, 2.79978, 2.82007, 2.78515, 2.80253, 2.84306, 2.80799, 2.81918, 2.81565, 2.84919, 2.82136, 2.77996
# NEW
2.75693, 2.80307, 2.82399, 2.82531, 2.81886, 2.78611, 2.82901, 2.80264, 2.77021, 2.77916, 2.76571, 2.78418, 2.79107, 2.81126, 2.75938, 2.8003, 2.82604, 2.79684, 2.83821, 2.80198
@stan-buildbot

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.96)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.97)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 0.78)
(performance.compilation, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.98)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.01)
Result: 0.98079029754
Commit hash: 4040263

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Hmm. The outlier is still an outlier despite not having different run time on my machine (Mac OS X, clang++). I have no idea why separate translation units would slow down this specific Stan program on one platform and not another.

How are these tests being run and on what platform? Is the config somewhere public?

@seantalts

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

You can see the config here: https://github.com/stan-dev/performance-tests-cmdstan/blob/master/Jenkinsfile#L102 which mostly sets up and calls https://github.com/stan-dev/performance-tests-cmdstan/blob/master/compare-git-hashes.sh.

It runs on the Mac pro (which does sometimes show performance degradations other Macs don't) with clang++-6.0. arK seems to run very fast and can have somewhat variable timing, though it got exactly the same ration the past two times, so who knows.

I'm pretty confused though - this had equal compilation speed to develop, so that suggests to me it isn't actually testing the full thing, right? Or does that not matter?

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

this had equal compilation speed to develop, so that suggests to me it isn't actually testing the full thing, right?

This PR just adds the virtual base class and changes the way code is generated to use it. The speedup will come in CmdStan when the services can be compiled in their own translation units. That'll also need to be benchmarked.

with clang++-6.0. arK seems to run very fast and can have somewhat variable timing,

I found very stable timing (as shown in the previous comment) running a single chain with the same seed.

Mac pro (which does sometimes show performance degradations other Macs don't)

Is the Mac pro buggy, or are those degradations from our code?

@seantalts

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

In the one case I saw (one of the TLS PRs) it showed a degradation that didn't show up on other macs (but maybe did on Linux with GCC?) when the code was changed to use a pointer instead of a reference, I believe. So I think the code was all as intended, the debate was about whether we cared about the performance on that CPU which maybe doesn't support some instruction or otherwise doesn't trigger optimal codegen from LLVM.

I think I was thinking of arma that had more variable performance, that one only takes 600ms. arK seems to take about 2000ms, which is a bit longer.

Perhaps the debate is again over whether we care about the old Mac Pro CPU. I personally care much more about the total time it takes to execute a model, which would mean that this is still a massive win for arK as its compilation time decreases by something like 25 seconds while its runtime increases by only about 400ms. I would argue (perhaps in a separate thread) that this is the metric we should all be considering (well, the most-measurable one and most applicable here - perhaps the best one is actually "total time to develop and run a Stan model," which is affected by a large multiplier of write-compile-run-debug feedback loop speed).

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Thanks. That's good to know. I'm about to benchmark on gcc and will report back.

I don't care much about the old Mac CPU, but it may be indicative of others.

As for what we should optimize, I think it depends on use case. If you have something like a precompiled model that everyone uses (rstanarm, prophet, etc.), then you really don't care much about compilation time. And if it's in some kind of server setting, you probably do care about performance. On the other side, if you're fitting a bunch of little-ish models that are fast, that compile time of 40+s is a real drag.

@seantalts

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

True, rstanarm definitely deserves its own performance tests! cc @bgoodri and @jgabry :P

I think the wisdom in most of the software engineering shops I've worked in is that deployed workload resource consumption is worth very little compared with consuming the developer's time, even for models that go into production. If we end up with specific models that need to be faster in a corporate production environment, I think I and others would welcome the consulting income or github PRs to deal with that, haha. Of course there's a balance to be struck - I would just argue we could move a bit towards faster development and at the expense of more runtime performance than we are losing here.

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Also no difference on GCC 6 (other than it runs in 2.3s in GCC 6 compared to 2.7s in clang from Xcode).

OLD (GCC 6):

2.33129, 2.30224, 2.51232, 2.32546, 2.32826, 2.32524, 2.34958, 2.29206, 2.34273, 2.33053, 2.30137, 2.52292, 2.29986, 2.29951, 2.31221, 2.30466, 2.33785, 2.32555, 2.29266, 2.31779

NEW (GCC 6):

2.31826, 2.35278, 2.36406, 2.32456, 2.32519, 2.3421, 2.3993, 2.36443, 2.30421, 2.3311, 2.31127, 2.34374, 2.38383, 2.32886, 2.3493, 2.33989, 2.35524, 2.38073, 2.35985, 2.32206
@seantalts

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

Sounds good to me.

@bob-carpenter bob-carpenter referenced this pull request Jul 9, 2019
2 of 2 tasks complete
@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

This is ready to review.

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2019

This is ready to review now.

@syclik

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

@bob-carpenter, it looks like the compile process hasn't changed for the tests in Stan. It seems like we would want to do it for multiple reasons:

  1. be able to verify this works without needing CmdStan
  2. catch problems if future changes break this (I can't see how, but what can break will)
  3. speed up compilation times for Stan tests. (!!!). The integration tests would all get faster, right?

I can help sort that out unless @mitzimorris wants to.

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

@mitzimorris

This comment has been minimized.

Copy link
Member

commented Jul 10, 2019

I agree with Bob that getting this PR in for 2.19 should take priority.
especially since at some point the mono-repo should remove the issue of whether to test via CmdStan or Stan. that said, I would be happy to do a follow-on PR to address makefile issues.

@wds15

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

I looked over this code and overall I think the approach is sound and makes sense. What I struggle with most in my head is the naming of model_base and model_crtp. On one hand I dislike naming things like a programming pattern, but maybe that's fine (wikipedia does the same in their example on CRTP). However, where we could maybe improve is to name the crtp bit mode_base_crtp. This way it makes it clear that this is still a model_base - it's just the CRTP part of it.

The compilation times indeed speedup seriously as I have verified on the cmdstan branch of this which goes with it. I would be ok to me with having this PR going into Stan and not take advantage of the speedups in the Stan repository in terms of faster test compilation times; we can address that later or just wait for the monorepo.

The restriction which we gain out of this approach is that any templated function which needs to be part of a model will have to be put into the CRTP thing where we have to flesh out all template variations.

I am happy to take another deep dive into this later today. What makes me a little hesitant to waive this PR is that I never have worked on Stan services / etc.. So I do lack a holistic overview probably. Do others think this is a problem or can someone else also take a look?

@seantalts

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

Maybe @betanalpha can take a quick look at the services part?

@betanalpha

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

@wds15

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

@betanalpha I think the answer is yes. The way I got is that we have now

model_base : virtual only class which has log_prob and write_array spelled out in all the different variations admitted by our templates
model_crtp : This thing is the base class for model instances. It provides an actual implementation of the virtual functions using the templated functions from the model instance itself.

So nothing changes for the services. It's just that adding templated functions to the model instance class gets a bit more tedious.

At least this is my understanding.

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

@wds15

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

model_base_crtp is great.

bob-carpenter added some commits Jul 15, 2019

@wds15
Copy link
Contributor

left a comment

I am leaving some requested changes as we anyway need to wait for the cmdstan PR to be in line (and have the makefile cleanups).

I have one more question: Do we still need the using stan::model::prob_grad; line in the generated hpp files? This is not needed at all as far as I can see.

Other than that I think this is all great. It would be super nice if @betanalpha could just shortly confirm that all of this is fine from his angle (sounded everything alright to him, but that was pending confirmation).

// test template version from base class reference
// long form assignment avoids test macro parse error with multi tparams
double v1 = bm.template log_prob<false, false>(params_r, msgs);
EXPECT_FLOAT_EQ(1, v1);

This comment has been minimized.

Copy link
@wds15

wds15 Jul 15, 2019

Contributor

You could use extra brackets to avoid the assignment like this:

EXPECT_FLOAT_EQ(1, (bm.template log_prob<false, false>(params_r, msgs)));

Same applies to the model_base_test.cpp

*/
class prob_grad {
protected:
protected:
// TODO(carpenter): roll into model_base; remove int members/vars

This comment has been minimized.

Copy link
@wds15

wds15 Jul 15, 2019

Contributor

This is not related to this PR, but rather to my ingorance... but why is the number of parameters in a model not declared const? These never change.


inline double log_prob(Eigen::VectorXd& theta,
std::ostream* msgs) const override {
return static_cast<const M*>(this)

This comment has been minimized.

Copy link
@wds15

wds15 Jul 15, 2019

Contributor

Would it be possible to declare a reference to the parent class once? So you could have the member

const M& model_;

This would be initialized with

model_(static_cast<const M&>(*this))

With that you can avoid all those static_cast calls and simply do model_.template ....

That's easier to read to me. Does that make sense?

This comment has been minimized.

Copy link
@wds15

wds15 Jul 15, 2019

Contributor

Ok... if that's the idiomatic CRTP way.. fine.

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

@stan-buildbot

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.99)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.97)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.95)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 0.98)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.04)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.98)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.02)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.9)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.08)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.94)
Result: 0.99226191263
Commit hash: ac7d51e

@betanalpha

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

@syclik
Copy link
Member

left a comment

@bob-carpenter, this is great. There were just a couple of minor doc things that I noticed. Happy to approve now if you don't want to address them (but I think it's easy enough to fix them now prior to them staying in our codebase).

src/stan/lang/generator/generate_class_decl.hpp Outdated Show resolved Hide resolved
src/stan/model/model_base.hpp Outdated Show resolved Hide resolved
src/stan/model/model_base_crtp.hpp Outdated Show resolved Hide resolved
src/stan/model/model_base_crtp.hpp Outdated Show resolved Hide resolved
src/stan/model/model_base_crtp.hpp Outdated Show resolved Hide resolved
src/stan/model/model_base_crtp.hpp Outdated Show resolved Hide resolved
@syclik

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

Also... I tried to get the compile times down for the tests. It didn't look like there was anything obvious that was fully linked, so I don't think there's anything to do. (@bob-carpenter, you were right... my thoughts on making things faster were misguided and you probably saw that when I asked the question.)

@wds15

wds15 approved these changes Jul 16, 2019

@wds15

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

Looks like we are all good. Addressing @syclik comments would be great... and I can approve once more when you touch the code again.

@stan-buildbot

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 1.0)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 0.99)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.97)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.01)
(performance.compilation, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.02)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 0.95)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.03)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.0)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 0.9)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.98)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.88)
Result: 0.98083235381
Commit hash: d55a88b

@bob-carpenter

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

I updated the doc as @syclik requested. Is there more you'd like me to do here, @wds15?

@syclik syclik merged commit ac4880d into develop Jul 17, 2019

2 checks passed

continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@syclik syclik deleted the feature/2759-model-base branch Jul 17, 2019

@syclik

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

Thanks, @bob-carpenter! And @wds15 for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.