Make Distributions GenerativeFunctions #274

georgematheos · 2020-06-25T21:24:17Z

This builds on #263 and resolves #259

I make Distribution <: GenerativeFunction true, and remove a lot of code in the static and dynamic DSLs specialized to distributions. Likewise, I remove the ChoiceAt combinator.

Note that I have not significantly modified the gradient code, since I have not yet taken the time to understand how it works. (So it still dispatches on whether something is a distribution or a generative function.) We may be able to further reduce the code footprint by removing specialization to distributions in the gradient calculations.

I have also added a test/benchmark folder with a couple initial MH benchmarks adapted from the examples; we can add and revise benchmarks as we create them. Currently the benchmarks are running MH on a static DSL and dynamic DSL model I took from the examples folder.

My initial benchmark results were sort of volatile (varied a lot between runs), and more careful benchmarking should be done. It looks like the changes somewhat improve performance for static code, but slow down dynamic code somewhere between 1.1x and 1.6x. The dynamic performance slowdown appears to be caused by my previous PR (ValueChoiceMap), not the distributions-as-generative-functions PR. That said, I don't see how to do this PR without building on the other. I have not thoroughly investigated what causes the dynamic performance reduction; we may be able to improve this.

Benchmarking results:
after this PR:

Simple static DSL (including CallAt nodes) MH on regression model:
  0.379326 seconds (4.19 M allocations: 309.771 MiB, 15.82% gc time)
  0.383679 seconds (4.19 M allocations: 309.771 MiB, 13.12% gc time)

Simple dynamic DSL MH on regression model:
  9.365213 seconds (88.12 M allocations: 4.972 GiB, 13.14% gc time)
  9.583431 seconds (88.12 M allocations: 4.972 GiB, 12.99% gc time)

After the ValueChoiceMap PR but before this PR:

Simple static DSL (including CallAt nodes) MH on regression model:
  0.658002 seconds (3.89 M allocations: 300.707 MiB, 9.39% gc time)
  0.618939 seconds (3.89 M allocations: 300.707 MiB, 10.44% gc time)

Simple dynamic DSL MH on regression model:
  9.536382 seconds (83.63 M allocations: 4.799 GiB, 12.63% gc time)
  9.662581 seconds (83.63 M allocations: 4.799 GiB, 12.26% gc time)

Before either PR:

Simple static DSL (including CallAt nodes) MH on regression model:
  0.423469 seconds (4.35 M allocations: 309.954 MiB, 19.35% gc time)
  0.416202 seconds (4.35 M allocations: 309.954 MiB, 17.90% gc time)

Simple dynamic DSL MH on regression model:
  8.099392 seconds (68.65 M allocations: 4.524 GiB, 16.40% gc time)
  6.430965 seconds (68.65 M allocations: 4.524 GiB, 17.02% gc time)

…eorgematheos-distributionsasgenfns

alex-lew · 2020-06-29T17:21:08Z

@georgematheos For these benchmark results, could you also run the experiment but at 10x the number of datapoints? That will help ensure that nothing is sneakily asymptotically slower (though I don't see why it would be).

georgematheos · 2020-07-03T15:34:37Z

@alex-lew Here is some more benchmarking for asymptotics. I did 10x the datapoints for the static DSL, and 1/10 the points for the dynamic DSL (since this is slow, and has superlinear performance, so it takes a very long time at 10x the data points). I also did a few runs on each since I found the results varied a decent amount:

This PR:

Simple static DSL (including CallAt nodes) MH on regression model:
  3.954642 seconds (50.89 M allocations: 4.416 GiB, 19.14% gc time)
  4.119761 seconds (50.89 M allocations: 4.416 GiB, 23.29% gc time)

Simple dynamic DSL MH on regression model:
  0.138685 seconds (1.70 M allocations: 96.116 MiB, 9.75% gc time)
  0.139786 seconds (1.70 M allocations: 96.116 MiB, 8.17% gc time)

georgematheos@Georges-MacBook-Pro-3 benchmarks % julia run_benchmarks.jl
Simple static DSL (including CallAt nodes) MH on regression model:
  3.978059 seconds (50.89 M allocations: 4.416 GiB, 18.58% gc time)
  4.196043 seconds (50.89 M allocations: 4.416 GiB, 20.04% gc time)

Simple dynamic DSL MH on regression model:
  0.162830 seconds (1.70 M allocations: 96.116 MiB, 13.55% gc time)
  0.165859 seconds (1.70 M allocations: 96.116 MiB, 13.06% gc time)

georgematheos@Georges-MacBook-Pro-3 benchmarks % julia run_benchmarks.jl
Simple static DSL (including CallAt nodes) MH on regression model:
  3.833077 seconds (50.89 M allocations: 4.416 GiB, 18.93% gc time)
  3.992323 seconds (50.89 M allocations: 4.416 GiB, 20.45% gc time)

Simple dynamic DSL MH on regression model:
  0.150765 seconds (1.70 M allocations: 96.116 MiB, 14.91% gc time)
  0.151998 seconds (1.70 M allocations: 96.116 MiB, 13.49% gc time)

Master branch:

Simple static DSL (including CallAt nodes) MH on regression model:
  3.726138 seconds (52.32 M allocations: 4.416 GiB, 15.33% gc time)
  3.742488 seconds (52.32 M allocations: 4.416 GiB, 16.26% gc time)

Simple dynamic DSL MH on regression model:
  0.092530 seconds (1.33 M allocations: 87.372 MiB, 11.15% gc time)
  0.090397 seconds (1.33 M allocations: 87.372 MiB, 8.60% gc time)

georgematheos@Georges-MacBook-Pro-3 benchmarks % julia run_benchmarks.jl
Simple static DSL (including CallAt nodes) MH on regression model:
  3.941206 seconds (52.32 M allocations: 4.416 GiB, 17.40% gc time)
  4.000897 seconds (52.32 M allocations: 4.416 GiB, 18.58% gc time)

Simple dynamic DSL MH on regression model:
  0.111098 seconds (1.33 M allocations: 87.372 MiB, 20.31% gc time)
  0.106125 seconds (1.33 M allocations: 87.372 MiB, 17.79% gc time)

georgematheos@Georges-MacBook-Pro-3 benchmarks % julia run_benchmarks.jl
Simple static DSL (including CallAt nodes) MH on regression model:
  3.872930 seconds (52.32 M allocations: 4.416 GiB, 18.37% gc time)
  3.955795 seconds (52.32 M allocations: 4.416 GiB, 19.18% gc time)

Simple dynamic DSL MH on regression model:
  0.110984 seconds (1.33 M allocations: 87.372 MiB, 19.15% gc time)
  0.111851 seconds (1.33 M allocations: 87.372 MiB, 17.00% gc time)

So it looks like there is no asymptotic difference between the old implementation and the new one. Again we see that the dynamic DSL is slowed down a bit by this PR (I wouldn't be surprised if it is slowed down more in this small example since there's less time for the compiler to optimize). It also looks like in these runs the speedup in the static DSL do not seem to appear (though there doesn't seem to be a significant slowdown either).

georgematheos · 2020-07-03T15:39:46Z

@marcoct I agree that changing the choicemap interface may be reasonable. Soon, I will implement the AddressTree concept discussed in the design doc for modifying the update function (I want the new update function for my OUPM research), and this would provide a place to standardize some sort of "address tree iteration" interface.

In terms of implementing iteration behavior instead of get_submaps_shallow, this sounds reasonable to me, but we should decide whether the iterator should behave like get_submaps_shallow, or instead give an iterator over pairs :complete_address => value. @bzinberg has discussed with me that before we commit to making the "address tree" nature of choicemaps a user-facing feature, rather than an implementation detail, we should think carefully about whether this is really a fundamental part of choicemaps or not, and how certain we are we want to commit to this.

alex-lew · 2022-08-19T19:41:22Z

@georgematheos I'm interested in seeing if we can merge this in for the next breaking-changes release. (This PR mostly doesn't break anything for users, but it does change the ChoiceMap interface a bit, e.g. the behavior of get_submaps_shallow.) I think it's really nice work! I've merged recent changes from master into the PR.

One question I have about the implementation is why the gradient-based GFI methods still special-case on Distribution and DistributionTrace. It seems we should be able to just treat calls to distributions as calls to black-box generative functions, as long as we implement the backprop-related methods correctly on distributions themselves. (This appears to be the strategy you took for simulate, generate, assess, propose, update, and regenerate.) Is there a reason you avoided doing this for backprop-related methods?

georgematheos · 2022-08-19T19:52:10Z

@alex-lew not totally sure why I did this -- I can take a look in more detail next week.

If I remember correctly, the reason may be that at the time when I wrote the pull request, I did not understand Gen's backprop code, so I tried to keep the back-prop code as similar to the past version as possible, to make sure I didn't break anything.

alex-lew · 2022-08-19T21:01:15Z

src/distribution.jl

+    args
+    score::Float64
+end
+@inline dist(::DistributionTrace{T, Dist}) where {T, Dist} = Dist()


@georgematheos Unfortunately, this implementation does not support more complicated distributions that are not 'singleton' structs, like the Mixture distributions. The problem is that they do not have zero-argument constructors. For example, HomogeneousMixture's definition looks like this:

struct HomogeneousMixture{T} <: Distribution{T} base_dist::Distribution{T} dims::Vector{Int} end

If someone creates such a distribution, and then simulates a trace from it, the trace does not remember the base_dist and the dims, and so it cannot implement Gen.get_gen_fn.

One option is for the DistributionTrace to store a reference to the distribution object, and to have dist(d::DistributionTrace) = d.dist. That adds slight storage overhead, but maybe not enough to worry about -- it's one more pointer per choice in a trace.

Another option is to say that the Mixtures are not literally subtypes of Distribution -- they are just additional generative functions with ValueChoiceMaps.

I lean toward the first option -- thoughts?

One option is for the DistributionTrace to store a reference to the distribution object, and to have dist(d::DistributionTrace) = d.dist. That adds slight storage overhead, but maybe not enough to worry about -- it's one more pointer per choice in a trace.

I think this may actually add zero storage overhead if the DistributionTrace is parametrically typed, and the type in question is a singleton (as it is for Normal etc.)! My guess is that Julia optimizes that sort of thing away. But it's worth checking.

…stributions

…p methods

…der Julias

alex-lew · 2022-08-20T19:22:35Z

All right, I think I've removed most (all?) of the special-casing that the dynamic and static DSLs do on the Distribution type -- so now the DSLs themselves are just glue holding together calls to black-box generative functions (including calls to Distributions). All such calls are handled uniformly. In many ways I think this is quite a simplification!

One interesting benefit of this PR's changes is that it is now possible to, from the trace of a program, figure out what distribution each choice was drawn from. This could be used to implement, e.g., automatic Gibbs inference for discrete choices without the user needing to pass in a list of valid values (because the function could just deduce the support of the choice, based on get_args of the distribution's trace).

However, merging this PR would break external packages that examine the IR of Gen static DSL functions and expect to find RandomChoiceNodes, including GenVariableElimination.jl and the not-exactly-a-package GenCompileContinuous.jl. It should be an easy fix to make those packages special-case on whether a GenerativeFunctionCallNode is a call to a Distribution, though.

georgematheos added 30 commits May 17, 2020 12:24

first draft of core functionality

23f79a8

add support for address schemas

c9b1d49

update choicemap docs

1e0a589

refactoring and tests

623bc8f

performance improvements and benchmarking

83349c7

benchmark for dynamic choicemap lookups

b9b5312

inline dynamicchoicemap methods

bce5e77

remove old version benchmark file

a985f9b

minor testing cleanup

1f5029c

ensure valuechoicemap[] syntax works

eb6adf7

provide some examples in the documentation

eef9417

fix some typos

a83adfb

add phrase 'nesting level zero' to docs

1bd705f

distribution <: GenFn; dynamic DSL simplification

676828b

simplify static ir code

5bf4207

brief documentation for Dist <: GenFn

61673a4

short map over distribution test

298a333

default static_get_submap = EmptyChoiceMap

e34875a

default static_get_submap = EmptyChoiceMap

972d455

dist performance improvements

ee64d12

minor performance improvement

fd1991f

performance improvement related to zip bug

c3d5db0

Merge branch '20200516-georgematheos-valuechoicemaps' into 20200617-g…

f652346

…eorgematheos-distributionsasgenfns

better static retdiff checking

8a43845

add static info for dist trace type

ffd9373

don't use static get_submap for staticchoicemap

67d5e12

some simple MH benchmarks

4966ea9

bug fix

0909a5b

remove ChoiceAt; bug fixes

47cca59

decrease iters on benchmark

10df952

georgematheos marked this pull request as ready for review July 3, 2020 16:33

georgematheos marked this pull request as draft July 3, 2020 16:34

merge in updated master

a79390e

georgematheos mentioned this pull request Jul 8, 2020

Combine regenerate and update into an update method taking an UpdateSpec object #282

Draft

georgematheos mentioned this pull request Jul 29, 2020

Core name and syntax changes #42

Open

alex-lew added 8 commits August 19, 2022 13:41

Merge branch 'master' into pr/274

1c6ed49

Merge in additional changes from master

4fca569

Remove methods referring to choice nodes

dc7d9a9

Use Ints in trace translator tests for discrete values

fc03fbd

Fix choicemap equality

33b7c10

update has_submap tests to reflect that ValueChoiceMaps are submaps

7780c70

fix some iterator issues

5ffad02

handle complement selections in project(::DistributionTrace, ...)

2845e3b

alex-lew marked this pull request as ready for review August 19, 2022 19:13

Fix project_complement

4c93efc

alex-lew reviewed Aug 19, 2022

View reviewed changes

alex-lew added 7 commits August 20, 2022 14:31

Make DistributionTrace store a reference to the distribution

d2e80f5

Remove dependency of benchmarks on no-longer-extant examples directory

e07701b

Implement choice_gradients and accumulate_param_gradients! for di…

577b289

…stributions

Delete load_generated_functions calls from benchmarks

f7044a2

Remove special handling of Distribution from dynamic DSL backprop code

09dc792

Remove special case handling of distributions from static DSL backpro…

370b822

…p methods

Avoid "slurping" destructuring assignment, which is unavailable in ol…

a09d1cc

…der Julias

sritchie mentioned this pull request Nov 14, 2023

feat: new IChoiceMap interface, implementations (2/n) probcomp/Gen.clj#52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Distributions GenerativeFunctions #274

Make Distributions GenerativeFunctions #274

georgematheos commented Jun 25, 2020

alex-lew commented Jun 29, 2020

georgematheos commented Jul 3, 2020

georgematheos commented Jul 3, 2020

alex-lew commented Aug 19, 2022 •

edited

Loading

georgematheos commented Aug 19, 2022

alex-lew Aug 19, 2022

ztangent Aug 19, 2022

alex-lew commented Aug 20, 2022

Make Distributions GenerativeFunctions #274

Are you sure you want to change the base?

Make Distributions GenerativeFunctions #274

Conversation

georgematheos commented Jun 25, 2020

alex-lew commented Jun 29, 2020

georgematheos commented Jul 3, 2020

georgematheos commented Jul 3, 2020

alex-lew commented Aug 19, 2022 • edited Loading

georgematheos commented Aug 19, 2022

alex-lew Aug 19, 2022

Choose a reason for hiding this comment

ztangent Aug 19, 2022

Choose a reason for hiding this comment

alex-lew commented Aug 20, 2022

alex-lew commented Aug 19, 2022 •

edited

Loading