From f9d7d27327080fe11b680ff74dfef9ad2917969a Mon Sep 17 00:00:00 2001
From: Steve Bronder <stevo15025@gmail.com>
Date: Thu, 30 Apr 2020 23:12:00 -0400
Subject: [PATCH 1/7] Add design doc for static matrices

---
 designs/0005-static-matrices.md | 95 +++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)
 create mode 100644 designs/0005-static-matrices.md
diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
new file mode 100644
index 0000000..b4c6052
--- /dev/null
+++ b/designs/0005-static-matrices.md
@@ -0,0 +1,95 @@
+- Feature Name: static_matrices
+- Start Date: April 30th, 2020
+- RFC PR: [Example Branch](https://github.com/stan-dev/math/compare/feature/var-template)
+- Stan Issue: [#1805](https://github.com/stan-dev/math/issues/1805)
+
+# Summary
+[summary]: #summary
+
+This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users.
+
+# Motivation
+[motivation]: #motivation
+
+Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. Furthermore, because of the non-assignment condition the underlying matrix
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+At the language level, `static_matrix` can replace a `matrix` type if subsets of the matrix are never assigned to. That makes code such as the below illegal.
+
+```stan
+static_matrix[N, M] A = // Fill in A...
+A[10, 10] = 5;
+```
+
+Besides that they can be thought of as standard matrix.
+
+At the Stan Math level a `static_matrix` is a `var_type<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_type<Eigen::Matrix<double, -1, -1>>`\*. This means that accessors for the value and adjoint of `var_type` objects can access contiguous chunks of each part of the `var_type`. Any function that accept a `var_type<T>` will support static matrices.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+This feature will be added in the following stages.
+
+1. Replace `var` and `vari` with templated types. See the [example branch](https://github.com/stan-dev/math/compare/feature/var-template) here to see how this will be done. `var` and `vari` have been changed to `var_type<T>` and `vari_type<T>` with aliases for the current methods as `using var = var_type<double>` and `using vari = vari_type<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
+
+2. Make the stack allocator conform to accept `vari_type<T>` objects.
+
+3. Additional PRs for the current arithmetic operators for `var` to accept `vari_type<T>` object types.
+
+4. Sort out and add any additional specialized methods for static matrix types such as `cholesky_decompose`.
+
+5. Add the `static_matrix` and `static_vector` types to the Stan language along with supported signatures.
+
+6. Support detection and substitution of `matrix` types for `static_matrix` types.
+
+7. ???
+
+8. Done!
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+More templates can be confusing and will lead to larger compile time. The confusion of templates can be mitigated by proper duty of care with documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.
+
+With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as
+
+```stan
+@(sparse, static)
+matrix[N, M] X;
+
+@(complex, static)
+vector[N] Y;
+```
+
+though this is far down the line for now.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+Besides the tech burden I'm rather happy with this implementation, though I'm open to any criticisms/alternatives and will add them here. You can see some of the discussion on this as well in issue [#1805](https://github.com/stan-dev/math/issues/1805) and the prior art below.
+
+# Prior art
+[prior-art]: #prior-art
+
+Discussions:
+ - [Efficient `static_matrix` type?](https://discourse.mc-stan.org/t/efficient-static-matrix-type/2136)
+ - [Static/immutable matrices w. comprehensions](https://discourse.mc-stan.org/t/static-immutable-matrices-w-comprehensions/12641)
+ - [Stan SIMD and Performance](https://discourse.mc-stan.org/t/stan-simd-performance/10488/11)
+ - [A New Continuation Based Autodiff By Refactoring](https://discourse.mc-stan.org/t/a-new-continuation-based-autodiff-by-refactoring/5037/2)
+
+[Enoki](https://github.com/mitsuba-renderer/enoki) is a very nice C++17 library for automatic differentiation which under the hood can transform their autodiff type from an arrays of structs to structs of arrays. It's pretty neat! Though implementing something like their methods would require some very large changes to the way we handle automatic differentiation.
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+- What parts of the design do you expect to resolve through the RFC process before this gets merged?
+
+I'm interested in hearing about Stan language semantics and any difficulties I may have missed during the implementation section.
+
+- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
+
+Any intricacies of using the GPU via `var_type<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation.
+
+\*Interestingly, this also means that `var_type<float>` and `var_type<long double>` can be legal.

From db5e92d0532ededca36dd35c831a918d69d21d5d Mon Sep 17 00:00:00 2001
From: Steve Bronder <Stevo15025@gmail.com>
Date: Thu, 30 Apr 2020 23:17:42 -0400
Subject: [PATCH 2/7] Update 0005-static-matrices.md

---
 designs/0005-static-matrices.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
index b4c6052..7542c4f 100644
--- a/designs/0005-static-matrices.md
+++ b/designs/0005-static-matrices.md
@@ -25,7 +25,7 @@ A[10, 10] = 5;
 
 Besides that they can be thought of as standard matrix.
 
-At the Stan Math level a `static_matrix` is a `var_type<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_type<Eigen::Matrix<double, -1, -1>>`\*. This means that accessors for the value and adjoint of `var_type` objects can access contiguous chunks of each part of the `var_type`. Any function that accept a `var_type<T>` will support static matrices.
+At the Stan Math level a `static_matrix` is a `var_type<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_type<Eigen::Matrix<double, -1, -1>>`*. This means that accessors for the value and adjoint of `var_type` objects can access contiguous chunks of each part of the `var_type`. Any function that accept a `var_type<T>` will support static matrices.
 
 # Reference-level explanation
 [reference-level-explanation]: #reference-level-explanation
@@ -56,10 +56,10 @@ More templates can be confusing and will lead to larger compile time. The confus
 With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as
 
 ```stan
-@(sparse, static)
+@(sparse_type, static_type) // can't have static here because of C++ keyword
 matrix[N, M] X;
 
-@(complex, static)
+@(complex_type, static_type)
 vector[N] Y;
 ```
 
@@ -92,4 +92,4 @@ I'm interested in hearing about Stan language semantics and any difficulties I m
 
 Any intricacies of using the GPU via `var_type<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation.
 
-\*Interestingly, this also means that `var_type<float>` and `var_type<long double>` can be legal.
+*Interestingly, this also means that `var_type<float>` and `var_type<long double>` can be legal.

From e714e6bcd139cdaef940fc337a937f4887810c4c Mon Sep 17 00:00:00 2001
From: Steve Bronder <stevo15025@gmail.com>
Date: Thu, 30 Apr 2020 23:19:11 -0400
Subject: [PATCH 3/7] remove some words

---
 designs/0005-static-matrices.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
index 7542c4f..3978e12 100644
--- a/designs/0005-static-matrices.md
+++ b/designs/0005-static-matrices.md
@@ -11,7 +11,7 @@ This proposes a `static_matrix` and `static_vector` type in the Stan language wh
 # Motivation
 [motivation]: #motivation
 
-Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. Furthermore, because of the non-assignment condition the underlying matrix
+Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. 
 
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation

From 19fa0cdec7c19325379bf6b5cc74a75415d2ab42 Mon Sep 17 00:00:00 2001
From: Steve Bronder <stevo15025@gmail.com>
Date: Thu, 7 May 2020 18:53:37 -0400
Subject: [PATCH 4/7] restrict type to not allow assignment after first
 declaration, propose moving language discussion to later

---
 designs/0005-static-matrices.md | 124 ++++++++++++++++++++++++--------
 1 file changed, 94 insertions(+), 30 deletions(-)

diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
index 3978e12..093fe61 100644
--- a/designs/0005-static-matrices.md
+++ b/designs/0005-static-matrices.md
@@ -6,69 +6,129 @@
 # Summary
 [summary]: #summary
 
-This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users.
+This proposes a constant matrix type in stan that will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, R, C>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a constant matrix can be detected and applied to Stan programs automatically for users. The implementation of this proposal suggests a staged approach where a stan language level feature is delayed until type attributes are allowed in the language and constant matrices have support for the same set of methods that the current `matrix` type has.
 
 # Motivation
 [motivation]: #motivation
 
-Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. 
+Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. See [here](https://github.com/stan-dev/design-docs/pull/21#issuecomment-625352581) for performance tests on a small subset of gradient evaluations using matrices vs. constant matrices.
 
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation
 
-At the language level, `static_matrix` can replace a `matrix` type if subsets of the matrix are never assigned to. That makes code such as the below illegal.
+At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*.  This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices.
+
+At the language and level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and never reassigned to.
 
 ```stan
-static_matrix[N, M] A = // Fill in A...
-A[10, 10] = 5;
+const matrix[N, M] A = // Construct matrix A...
+A[10, 10] = 5; // illegal!
+A[1:10, ] = 5; // illegal!
+A[, 1:10] = 5; // illegal!
+A[1:10, 1:10] = 5; // illegal!
+A = X; // illegal!
 ```
 
-Besides that they can be thought of as standard matrix.
+Any function or operation in the stan language that can accepts a `matrix` as an argument can also accept a constant matrix.
 
-At the Stan Math level a `static_matrix` is a `var_type<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_type<Eigen::Matrix<double, -1, -1>>`*. This means that accessors for the value and adjoint of `var_type` objects can access contiguous chunks of each part of the `var_type`. Any function that accept a `var_type<T>` will support static matrices.
 
 # Reference-level explanation
 [reference-level-explanation]: #reference-level-explanation
 
 This feature will be added in the following stages.
 
-1. Replace `var` and `vari` with templated types. See the [example branch](https://github.com/stan-dev/math/compare/feature/var-template) here to see how this will be done. `var` and `vari` have been changed to `var_type<T>` and `vari_type<T>` with aliases for the current methods as `using var = var_type<double>` and `using vari = vari_type<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
+1. Replace `var` and `vari` with templated types. This has already been done in the example branch [here](https://github.com/stan-dev/math/compare/feature/var-template) (see [here](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/vari.hpp) for the `vari_value` implementation and [here](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/var.hpp) for `var_value`). `var` and `vari` have been changed to `var_value<T>` and `vari_value<T>` with aliases for the current methods as `using var = var_value<double>` and `using vari = vari_value<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
 
-2. Make the stack allocator conform to accept `vari_type<T>` objects.
+2. Make the stack allocator conform to accept `vari_value<T>` objects. This is done by having `vari_type<T>` inherit from a `vari_base` class and defining the [`ChainableStack`](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/chainablestack.hpp) as
 
-3. Additional PRs for the current arithmetic operators for `var` to accept `vari_type<T>` object types.
+```cpp
+using ChainableStack = AutodiffStackSingleton<vari_base, chainable_alloc>;
+```
 
-4. Sort out and add any additional specialized methods for static matrix types such as `cholesky_decompose`.
+3. Additional PRs for the current arithmetic operators for `var` to accept `vari_value<T>` object types.
+   - Since the new operations work on matrix types, `chain()` methods of the current operators will need to be specialized for matrix, vector, and scalar inputs.
 
-5. Add the `static_matrix` and `static_vector` types to the Stan language along with supported signatures.
+4. Add additional specialized methods for constant matrices.
+ - All of our current reverse mode specializations require an `Eigen::Matrix<var>`, which then assumes it needs to generate `N * M` `vari`. Specializations will need to be added for `var_type<Eigen::Matrix<var>>`
 
-6. Support detection and substitution of `matrix` types for `static_matrix` types.
+5. In the stanc3 compiler, support detection and substitution of `matrix/vector/row_vector` types for constant matrix types.
 
-7. ???
+6. Create a design document for allowing type attributes in the stan language.
 
-8. Done!
+7. Add a `const` type attribute to the stan language.
 
-# Drawbacks
-[drawbacks]: #drawbacks
+Steps (1) and (2) have been completed in the example branch with some work done on (3). Step (5-7) have been chosen specifically to allow more time to discuss the stan language implementation. The compiler can perform an optimization step while parsing the stan program to see if a `matrix/vector/row_vector`:
 
-More templates can be confusing and will lead to larger compile time. The confusion of templates can be mitigated by proper duty of care with documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.
+1. Does not perform assignment after the first declaration.
+2. Uses methods that have constant matrix implementations in stan math.
 
-With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as
+and then replace `Eigen::Matrix<var, R, C>` with `var_value<Eigen::Matrix<double, R, C>>`.
+
+As an example of when the compiler could detect a constant matrix substitution, brms will normally output code for hierarchical models such as
 
 ```stan
-@(sparse_type, static_type) // can't have static here because of C++ keyword
-matrix[N, M] X;
+parameters {
+  vector[Kc] b;  // population-level effects
+  // temporary intercept for centered predictors
+  real Intercept;
+  real<lower=0> sigma;  // residual SD
+  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
+  matrix[M_1, N_1] z_1;  // standardized group-level effects
+  // cholesky factor of correlation matrix
+  cholesky_factor_corr[M_1] L_1;
+}
+transformed parameters {
+  // actual group-level effects
+  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
+  // using vectors speeds up indexing in loops
+  vector[N_1] r_1_1 = r_1[, 1];
+  vector[N_1] r_1_2 = r_1[, 2];
+}
+model {
+  // initialize linear predictor term
+  vector[N] mu = Intercept + Xc * b;
+  for (n in 1:N) {
+    // add more terms to the linear predictor
+    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
+  }
+  // more stuff
+```
+
+This could be rewritten in brms to do the assignment of `mu` on one line. And because each `vector` and `matrix` do not perform assignment after the first declaration these would all be candidates to become constant matrices. With a type attribute of `const` in the language this would then look like:
 
-@(complex_type, static_type)
-vector[N] Y;
+```stan
+parameters {
+  const vector[Kc] b;  // population-level effects
+  // temporary intercept for centered predictors
+  real Intercept;
+  real<lower=0> sigma;  // residual SD
+  const vector<lower=0>[M_1] sd_1;  // group-level standard deviations
+  const matrix[M_1, N_1] z_1;  // standardized group-level effects
+  // cholesky factor of correlation matrix
+  const cholesky_factor_corr[M_1] L_1;
+}
+transformed parameters {
+  // actual group-level effects
+  const matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
+  // using vectors speeds up indexing in loops
+  const vector[N_1] r_1_1 = r_1[, 1];
+  const vector[N_1] r_1_2 = r_1[, 2];
+}
+model {
+  // predictor terms
+  const vector[N] mu = Intercept + Xc * b + r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] * Z_1_2;
+  // more stuff
 ```
 
-though this is far down the line for now.
 
-# Rationale and alternatives
-[rationale-and-alternatives]: #rationale-and-alternatives
+Delaying release of the `const` type to the language allows for a slow buildup of the needed methods. Once constant matrices have the same available methods as the `matrix/vector/row_vector` types we can release it as a stan language feature.
+
+# Drawbacks
+[drawbacks]: #drawbacks
 
-Besides the tech burden I'm rather happy with this implementation, though I'm open to any criticisms/alternatives and will add them here. You can see some of the discussion on this as well in issue [#1805](https://github.com/stan-dev/math/issues/1805) and the prior art below.
+More templates can be confusing and will lead to longer compile time. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.
+
+Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex` matrices.
 
 # Prior art
 [prior-art]: #prior-art
@@ -86,10 +146,14 @@ Discussions:
 
 - What parts of the design do you expect to resolve through the RFC process before this gets merged?
 
-I'm interested in hearing about Stan language semantics and any difficulties I may have missed during the implementation section.
+Whether the staged development process listed in the reference level explanation will suffice.
+
+If the restriction on not allowing assignment to the entire matrix such as `A = X` is too restrictive.
 
 - What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
 
-Any intricacies of using the GPU via `var_type<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation.
+Any intricacies of using the GPU via `var_value<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation.
+
+Methods involving changing the current `matrix` type in the Stan language
 
-*Interestingly, this also means that `var_type<float>` and `var_type<long double>` can be legal.
+*Interestingly, this also means that `var_value<float>` and `var_value<long double>` can be legal.

From 1726a61a7effe2fc9262c0de65fe01ca82f1558a Mon Sep 17 00:00:00 2001
From: Steve Bronder <stevo15025@gmail.com>
Date: Mon, 11 May 2020 16:21:30 -0400
Subject: [PATCH 5/7] fix grammar and add JAX document reference to prior art

---
 designs/0005-static-matrices.md | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
index 093fe61..18aefe3 100644
--- a/designs/0005-static-matrices.md
+++ b/designs/0005-static-matrices.md
@@ -18,7 +18,7 @@ Currently, a differentiable matrix in Stan is represented as an Eigen matrix hol
 
 At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*.  This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices.
 
-At the language and level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and never reassigned to.
+At the language and compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and never reassigned to.
 
 ```stan
 const matrix[N, M] A = // Construct matrix A...
@@ -37,7 +37,9 @@ Any function or operation in the stan language that can accepts a `matrix` as an
 
 This feature will be added in the following stages.
 
-1. Replace `var` and `vari` with templated types. This has already been done in the example branch [here](https://github.com/stan-dev/math/compare/feature/var-template) (see [here](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/vari.hpp) for the `vari_value` implementation and [here](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/var.hpp) for `var_value`). `var` and `vari` have been changed to `var_value<T>` and `vari_value<T>` with aliases for the current methods as `using var = var_value<double>` and `using vari = vari_value<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
+1. Replace `var` and `vari` with templated types.
+   - This has already been done in the example branch [here](https://github.com/stan-dev/math/compare/feature/var-template) (see [here](https://github.com/stan-dev/math/pull/1877/files#diff-130c5a75cc427d7d41715e9fca8281f4R177) for the `vari_value` implementation and [here](https://github.com/stan-dev/math/pull/1877/files#diff-ab13c40c3f03b20efbba9d70d55b4dcdR34) for `var_value`). `var` and `vari` have been changed to `var_value<T>` and `vari_value<T>` with aliases for the current methods as `using var = var_value<double>` and `using vari = vari_value<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
+   - The `vari_value` specialized for Eigen types holds two pointers of scalar arrays `val_mem_` and `adj_mem_` which are then accessed through `Eigen::Map` types `val_` and `adj_`.
 
 2. Make the stack allocator conform to accept `vari_value<T>` objects. This is done by having `vari_type<T>` inherit from a `vari_base` class and defining the [`ChainableStack`](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/chainablestack.hpp) as
 
@@ -45,19 +47,23 @@ This feature will be added in the following stages.
 using ChainableStack = AutodiffStackSingleton<vari_base, chainable_alloc>;
 ```
 
-3. Additional PRs for the current arithmetic operators for `var` to accept `vari_value<T>` object types.
+3. Additional PRs for the current arithmetic operators for `var` to accept `var_value<Eigen>` types.
    - Since the new operations work on matrix types, `chain()` methods of the current operators will need to be specialized for matrix, vector, and scalar inputs.
 
 4. Add additional specialized methods for constant matrices.
- - All of our current reverse mode specializations require an `Eigen::Matrix<var>`, which then assumes it needs to generate `N * M` `vari`. Specializations will need to be added for `var_type<Eigen::Matrix<var>>`
+ - All of our current reverse mode specializations require an `Eigen::Matrix<var>`, which then assumes it needs to generate `N * M` `vari`. Specializations will need to be added for `var_type<Eigen::Matrix<T>>`
 
-5. In the stanc3 compiler, support detection and substitution of `matrix/vector/row_vector` types for constant matrix types.
+5. Generalize the current autodiff testing framework to work with constant matrices.
 
-6. Create a design document for allowing type attributes in the stan language.
+6. Add specializations for `operands_and_partials`, `scalar_seq_view`, `value_of` and other helper functions needed for the distributions.
 
-7. Add a `const` type attribute to the stan language.
+7. In the stanc3 compiler, support detection and substitution of `matrix/vector/row_vector` types for constant matrix types.
 
-Steps (1) and (2) have been completed in the example branch with some work done on (3). Step (5-7) have been chosen specifically to allow more time to discuss the stan language implementation. The compiler can perform an optimization step while parsing the stan program to see if a `matrix/vector/row_vector`:
+8. Create a design document for allowing type attributes in the stan language.
+
+9. Add a `const` type attribute to the stan language.
+
+Steps (1) and (2) have been completed in the example branch with some work done on (3). Step (7-9) have been chosen specifically to allow more time to discuss the stan language implementation. The compiler can perform an optimization step while parsing the stan program to see if a `matrix/vector/row_vector`:
 
 1. Does not perform assignment after the first declaration.
 2. Uses methods that have constant matrix implementations in stan math.
@@ -126,9 +132,9 @@ Delaying release of the `const` type to the language allows for a slow buildup o
 # Drawbacks
 [drawbacks]: #drawbacks
 
-More templates can be confusing and will lead to longer compile time. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.
+More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.
 
-Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex` matrices.
+Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex`.
 
 # Prior art
 [prior-art]: #prior-art
@@ -139,6 +145,8 @@ Discussions:
  - [Stan SIMD and Performance](https://discourse.mc-stan.org/t/stan-simd-performance/10488/11)
  - [A New Continuation Based Autodiff By Refactoring](https://discourse.mc-stan.org/t/a-new-continuation-based-autodiff-by-refactoring/5037/2)
 
+[JAX](https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#%F0%9F%94%AA-In-Place-Updates) is an autodiff library developed by google whose array type has similar constraints to the constant matrix type proposed here.
+
 [Enoki](https://github.com/mitsuba-renderer/enoki) is a very nice C++17 library for automatic differentiation which under the hood can transform their autodiff type from an arrays of structs to structs of arrays. It's pretty neat! Though implementing something like their methods would require some very large changes to the way we handle automatic differentiation.
 
 # Unresolved questions

From 5ac1cfd43194cdba023689c16d4c4e850adf00f0 Mon Sep 17 00:00:00 2001
From: Steve Bronder <stevo15025@gmail.com>
Date: Tue, 7 Jul 2020 16:46:12 -0400
Subject: [PATCH 6/7] update docs

---
 designs/0005-static-matrices.md | 26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
index 18aefe3..413fb63 100644
--- a/designs/0005-static-matrices.md
+++ b/designs/0005-static-matrices.md
@@ -18,7 +18,7 @@ Currently, a differentiable matrix in Stan is represented as an Eigen matrix hol
 
 At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*.  This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices.
 
-At the language and compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and never reassigned to.
+At the language and compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and it's subslices are never assigned to.
 
 ```stan
 const matrix[N, M] A = // Construct matrix A...
@@ -26,10 +26,9 @@ A[10, 10] = 5; // illegal!
 A[1:10, ] = 5; // illegal!
 A[, 1:10] = 5; // illegal!
 A[1:10, 1:10] = 5; // illegal!
-A = X; // illegal!
 ```
 
-Any function or operation in the stan language that can accepts a `matrix` as an argument can also accept a constant matrix.
+Any function or operation in the stan language that can accepts a `matrix` as an argument can also accept a constant matrix. Functions which can take in multiple matrices will only return a constant matrix if all of the other matrix inputs as well as the return type are static matrices.
 
 
 # Reference-level explanation
@@ -59,11 +58,7 @@ using ChainableStack = AutodiffStackSingleton<vari_base, chainable_alloc>;
 
 7. In the stanc3 compiler, support detection and substitution of `matrix/vector/row_vector` types for constant matrix types.
 
-8. Create a design document for allowing type attributes in the stan language.
-
-9. Add a `const` type attribute to the stan language.
-
-Steps (1) and (2) have been completed in the example branch with some work done on (3). Step (7-9) have been chosen specifically to allow more time to discuss the stan language implementation. The compiler can perform an optimization step while parsing the stan program to see if a `matrix/vector/row_vector`:
+Steps (1) and (2) have been completed in the example branch with some work done on (3). Step 7 has been chosen specifically to allow more time to discuss the stan language implementation. The compiler can perform an optimization step while parsing the stan program to see if a `matrix/vector/row_vector`:
 
 1. Does not perform assignment after the first declaration.
 2. Uses methods that have constant matrix implementations in stan math.
@@ -149,19 +144,4 @@ Discussions:
 
 [Enoki](https://github.com/mitsuba-renderer/enoki) is a very nice C++17 library for automatic differentiation which under the hood can transform their autodiff type from an arrays of structs to structs of arrays. It's pretty neat! Though implementing something like their methods would require some very large changes to the way we handle automatic differentiation.
 
-# Unresolved questions
-[unresolved-questions]: #unresolved-questions
-
-- What parts of the design do you expect to resolve through the RFC process before this gets merged?
-
-Whether the staged development process listed in the reference level explanation will suffice.
-
-If the restriction on not allowing assignment to the entire matrix such as `A = X` is too restrictive.
-
-- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
-
-Any intricacies of using the GPU via `var_value<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation.
-
-Methods involving changing the current `matrix` type in the Stan language
-
 *Interestingly, this also means that `var_value<float>` and `var_value<long double>` can be legal.

From 17f824f6bf1f678a2c0dc339ec51fe532c22b64f Mon Sep 17 00:00:00 2001
From: Steve Bronder <stevo15025@gmail.com>
Date: Tue, 4 Aug 2020 21:08:45 -0400
Subject: [PATCH 7/7] update docs

---
 designs/0005-static-matrices.md | 71 +++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/designs/0005-static-matrices.md b/designs/0005-static-matrices.md
index 413fb63..8f80779 100644
--- a/designs/0005-static-matrices.md
+++ b/designs/0005-static-matrices.md
@@ -6,29 +6,39 @@
 # Summary
 [summary]: #summary
 
-This proposes a constant matrix type in stan that will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, R, C>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a constant matrix can be detected and applied to Stan programs automatically for users. The implementation of this proposal suggests a staged approach where a stan language level feature is delayed until type attributes are allowed in the language and constant matrices have support for the same set of methods that the current `matrix` type has.
+This proposes a constant matrix type in stan that will allow for significant performance opportunities. In Stan Math this will be represented by `var_value<Eigen::Matrix<double, R, C>>` or `var_value<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a constant matrix can be detected and applied to Stan programs automatically for users. The implementation of this proposal suggests a staged approach where a stan language level feature is delayed until type attributes are allowed in the language and constant matrices have support for the same set of methods that the current `matrix` type has.
 
 # Motivation
 [motivation]: #motivation
 
-Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. See [here](https://github.com/stan-dev/design-docs/pull/21#issuecomment-625352581) for performance tests on a small subset of gradient evaluations using matrices vs. constant matrices.
+Currently, an `NxM` matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. See [here](https://github.com/stan-dev/design-docs/pull/21#issuecomment-625352581) for performance tests on a small subset of gradient evaluations using matrices vs. constant matrices.
 
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation
 
-At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*.  This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices.
+At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*.  The value and adjoint of the matrix are stored separately in memory pointed to by the `vari_value`. Functions that currently support `Eigen::Matrix<var, R, C>` types will be extended to support `var_value<Eigen>` types
 
-At the language and compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and it's subslices are never assigned to.
+At compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and it's subslices are never assigned to.
 
 ```stan
-const matrix[N, M] A = // Construct matrix A...
-A[10, 10] = 5; // illegal!
-A[1:10, ] = 5; // illegal!
-A[, 1:10] = 5; // illegal!
-A[1:10, 1:10] = 5; // illegal!
+matrix[N, M] A = // Construct matrix A...
+A[10, 10] = 5; // Will be dynamic!
+A[1:10, ] = 5; // Will be dynamic!
+A[, 1:10] = 5; // Will be dynamic!
+A[1:10, 1:10] = 5; // Will be dynamic!
 ```
 
-Any function or operation in the stan language that can accepts a `matrix` as an argument can also accept a constant matrix. Functions which can take in multiple matrices will only return a constant matrix if all of the other matrix inputs as well as the return type are static matrices.
+However extracting subslices from a matrix will still allow it to be constant.
+
+```stan
+matrix[N, M] A = // Construct matrix A...
+real b = A[10, 10]; // Will be real
+matrix[10, M] C = A[1:10, ]; // Will be static!
+row_vector[10] D = A[, 1:10]; // Will be static!
+matrix[10, 10] F = A[1:10, 1:10]; // Will be static!
+```
+
+Any function or operation in the Stan language that can accepts a `matrix` as an argument can also accept a constant matrix. Functions which can take in multiple matrices will only return a constant matrix if all of the other matrix inputs as well as the return type are static matrices.
 
 
 # Reference-level explanation
@@ -37,16 +47,16 @@ Any function or operation in the stan language that can accepts a `matrix` as an
 This feature will be added in the following stages.
 
 1. Replace `var` and `vari` with templated types.
-   - This has already been done in the example branch [here](https://github.com/stan-dev/math/compare/feature/var-template) (see [here](https://github.com/stan-dev/math/pull/1877/files#diff-130c5a75cc427d7d41715e9fca8281f4R177) for the `vari_value` implementation and [here](https://github.com/stan-dev/math/pull/1877/files#diff-ab13c40c3f03b20efbba9d70d55b4dcdR34) for `var_value`). `var` and `vari` have been changed to `var_value<T>` and `vari_value<T>` with aliases for the current methods as `using var = var_value<double>` and `using vari = vari_value<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
-   - The `vari_value` specialized for Eigen types holds two pointers of scalar arrays `val_mem_` and `adj_mem_` which are then accessed through `Eigen::Map` types `val_` and `adj_`.
+   - This has already been done in PR [#1915](https://github.com/stan-dev/math/pull/1915). `var` and `vari` have been changed to `var_value<T>` and `vari_value<T>` with aliases for the current methods as `using var = var_value<double>` and `using vari = vari_value<double>`. These aliases allow for full backwards compatibility with existing upstream dependencies.
 
-2. Make the stack allocator conform to accept `vari_value<T>` objects. This is done by having `vari_type<T>` inherit from a `vari_base` class and defining the [`ChainableStack`](https://github.com/stan-dev/math/blob/d2967fe2bf6e0d4729d67a714ef40d95d907b18b/stan/math/rev/core/chainablestack.hpp) as
+2. Add `var_value` and `vari_value` Eigen specializations
+   - This has been done in PR [#1952](https://github.com/stan-dev/math/pull/1952)
+   - The `vari_value` specialized for Eigen types holds two pointers of scalar arrays `val_mem_` and `adj_mem_` which are then accessed through `Eigen::Map` types `val_` and `adj_`.
 
-```cpp
-using ChainableStack = AutodiffStackSingleton<vari_base, chainable_alloc>;
-```
+3. Update `adj_jac_apply` to work with constant matrices.
+   - An example of this exists in [#1928](https://github.com/stan-dev/math/pull/1928). Using `adj_jac_apply` will provide a standardized and efficient API for adding new reverse mode functions for constant matrices.
 
-3. Additional PRs for the current arithmetic operators for `var` to accept `var_value<Eigen>` types.
+4. Additional PRs for the current arithmetic operators for `var` to accept `var_value<Eigen>` types.
    - Since the new operations work on matrix types, `chain()` methods of the current operators will need to be specialized for matrix, vector, and scalar inputs.
 
 4. Add additional specialized methods for constant matrices.
@@ -99,25 +109,25 @@ This could be rewritten in brms to do the assignment of `mu` on one line. And be
 
 ```stan
 parameters {
-  const vector[Kc] b;  // population-level effects
+  vector[Kc] b;  // population-level effects
   // temporary intercept for centered predictors
   real Intercept;
   real<lower=0> sigma;  // residual SD
-  const vector<lower=0>[M_1] sd_1;  // group-level standard deviations
-  const matrix[M_1, N_1] z_1;  // standardized group-level effects
+  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
+  matrix[M_1, N_1] z_1;  // standardized group-level effects
   // cholesky factor of correlation matrix
-  const cholesky_factor_corr[M_1] L_1;
+  cholesky_factor_corr[M_1] L_1;
 }
 transformed parameters {
   // actual group-level effects
-  const matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
+  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
   // using vectors speeds up indexing in loops
-  const vector[N_1] r_1_1 = r_1[, 1];
-  const vector[N_1] r_1_2 = r_1[, 2];
+  vector[N_1] r_1_1 = r_1[, 1];
+  vector[N_1] r_1_2 = r_1[, 2];
 }
 model {
   // predictor terms
-  const vector[N] mu = Intercept + Xc * b + r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] * Z_1_2;
+  vector[N] mu = Intercept + Xc * b + r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] * Z_1_2;
   // more stuff
 ```
 
@@ -131,6 +141,17 @@ More templates can be confusing and will lead to longer compile times. The confu
 
 Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex`.
 
+One large consideration is the need for the use of `flto` by compilers in order to not have a performance regression for ODE models because of missed divirtualization from the compiler. All vari created in stan are tracked and stored as pointers in our autodiff reverse mode tape `ChainableStack`. The underlying implementation of `ChainableStack` requires the use of an `std::vector` which is a homogeneous container. so all `vari_value<T>` must inherit from a `vari_base` that is then used as the type for the reverse mode tape in `ChainableStack` as a `std::vector<vari_base*>`. Because of the different underlying structures of each `vari_value<T>` the method `set_zero_adjoint()` of the `vari_value<T>` specializations must be `virtual` so that when calling the reverse pass over the tape via the `vari_base*` we call the proper `set_zero_adjoint()` member function for each type. But because Stan's autodiff tape is a global object, compilers will not devirtualize these function calls unless the compiler can optimize over the entire program (see [here](https://stackoverflow.com/questions/48906338/why-cant-gcc-devirtualize-this-function-call) for more info and a godbolt example).
+
+Multiple other methods were attempted such as
+1. boost::variants instead of polymorphism
+  - This still leads to a lookup from a function table which causes similar problems to the virtual table lookup
+2. A different stack for every vari_base subclass
+3.  Don't use the chaining stack for zeroing. Instead of having a chaining/non-chaining stack have a chaining/zeroing stack. Everything goes in the zeroing stack. Only chaining things go in the chaining stack.
+  - Both (2) and (3) also lead to performance problems because of having to allocate memory on multiple stacks
+
+ The ODE models are particularly effected by this because of the multiple calls to `set_zero_adjoint()` that are used in the ODE solvers. This means that upstream packages will need to suggest (or automatically apply) the `-flto` flag to Stan programs so that these function calls can be devirtualized.
+
 # Prior art
 [prior-art]: #prior-art