Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ChoiceMap interface/architecture via ValueChoiceMaps #263

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

georgematheos
Copy link
Contributor

Note: this PR implements a breaking change, and is not for merging in the short-term; I am posting it to discuss this possible way of changing the choicemap interface, and possibly for inclusion in a "version number change" release in which we allow for breaking changes.


Big Picture

This PR is based on issue #258, and makes it so that the values stored in ChoiceMaps are wrapped in a subtype ValueChoiceMap <: ChoiceMap. This way, we can view the values as leaf nodes of a ChoiceMap, which simplifies some code and will make it possible in the future to make Distribution a subtype of GenerativeFunction (which is one of the biggest reasons for this change). To be consistent with this semantic that values are choicemaps, I have changed the meaning of get_submaps_shallow and the error-throwing behavior of get_submaps, as I will discuss in the next section.

Breaking Changes

  • get_submaps_shallow now returns an iterator over tuples (addr, submap), including for submaps which are a ValueChoiceMap. So collect(get_submaps_shallow(choicemap((:a => :b, 1), (:c, 2))) will now return [(:a, choicemap(:b, 1)), (c, ValueChoiceMap(2)].
  • get_submap(choicemap, addr) where choicemap[addr] contains a value no longer throws a KeyError; now it returns a ValueChoiceMap containing that value.

I have also changed some error throwing behavior to what seemed more natural to me in this implementation, though these could be reverted to the old behavior without changing the idea of the PR majorly:

  • Calling get_submap(choicemap, addr1 => addr2) where addr1 contains a value now returns an EmptyChoiceMap rather than throwing a KeyError.
  • Calling get_value(choicemap, addr) where there is no value stored at addr no longer throws a KeyError; instead it throws a custom error called a ChoiceMapGetValueError. This error does not include any information about which address did not possess a value. However, I do not think this makes the error significantly less informative than the way KeyErrors were being returned, since these KeyErrors always wrap the deepest part of the address, regardless of the first address not to be found. (Ie. if you called get_value(choicemap, :a => :b => :c => :d)and it threw aKeyError, it would always be a KeyError(:d), even if say :bwas the first key which had anEmptyChoiceMapunder it. So this address theKeyError` wraps is not very useful.)

Nonbreaking interface changes

  • Now get_nonvalue_submaps_shallow performs the functionality get_submaps_shallow used to.
  • Now get_value(choicemap) and has_value(choicemap) are valid calls. get_value(choicemap) returns the value on the choicemap if choicemap isa ValueChoiceMap and throws a ChoiceMapGetValueError otherwise; has_value(choicemap) returns true if choicemap isa ValueChoiceMap and false otherwise.
  • There is now a default implementation of _fill_array!, so custom choicemap types have a built-in implementation for to_array. (Users must still implement_from_array to use from_array.)
  • Default implementations for Base.isempty, has_value, get_value, get_values_shallow, and get_nonvalue_submaps_shallow are provided based on get_submap and get_submaps_shallow, so users should only need to implement 2 methods to create a custom choicemap type.

There are also many implementation changes.

Initial Benchmarking

I have run 3 benchmarks and included the files in the PR; they are test/static_choicemap_benchmark.jl in which I test the performance of lookups from static choicemaps, test/static_inference_benchmark.jl, in which I test the performance of running MH on a simple static model, and test/dynamic_choicemap_benchmark.jl, in which I test the performance of lookups from dynamic choicemaps.

The takeaways from this initial benchmarking are

  • Static choicemap lookup performance (in situations where the JIT compiler can propagate the address constants and so compile the lookups) is not affected by this PR for shallow value lookups, and is improved for nested lookups (ie. looking up :b => :c). However, this improvement may be achievable via some changes to the static IR code without the rest of this PR; it comes from changing the implementation of get_choices for static DSL traces so that it always returns a StaticTraceIRAssmt rather than sometimes returning an EmptyChoiceMap; this type-stability allows for deeper compilation. However, this full PR may be needed for the behavior of isempty to perform correctly if we get rid of this check to sometimes return EmptyChoiceMap; I'm not totally sure.
  • Dynamic choicemap lookup performance is made worse for shallow address lookups (ie. looking up :a) and is improved for nested lookups (ie. looking up :b => :c).
  • Running inference on the static model did not seem to have it's performance majorly effected by the PR.

I suspect we will want to do more benchmarking; any suggestions for what experiments would be useful to run?

Questions and requests for review

  • src/modeling_library/recurse/recurse.jl- I only made minor changes to this file, and all the tests are passing, but I have not taken the time to understand how this combinator works, so I am not 100% confident my implementation here is correct
  • As I mentioned in the benchmarking section, I removed some code related to tracking is_empty and num_nonempty in the static IR trace.jl file. I am not totally clear why these values are being tracked (is is just so we can return an EmptyChoiceMap if there are no values in the traces subtraces?), but I removed the check that has get_submap sometimes return an EmptyChoiceMap, since this type instability decreases lookup performance drastically. My new implementation should handle calling isempty on a staticIRtrace choicemap with no values properly even if it isn't an EmptyChoiceMap; does this mean we should get rid of tracking isempty and num_nonempty? Or are they being used for something else? Or have I actually broken something that just didn't trigger any errors in the test suite?
  • Please let me know what you think of the way I've changed the error-throwing behavior, and if you think it would be better to revert some of it.

@georgematheos
Copy link
Contributor Author

georgematheos commented May 19, 2020

Textdump from my initial benchmarking:

OLD VERSION:
static choicemap nonnested lookup:
  0.017679 seconds (56.41 k allocations: 3.131 MiB)
  0.000002 seconds
  0.000000 seconds
  0.000000 seconds
static choicemap nested lookup:
  0.020194 seconds (83.83 k allocations: 4.551 MiB)
  0.000002 seconds
  0.000000 seconds
  0.000000 seconds
static gen function choicemap nonnested lookup:
  0.018849 seconds (57.48 k allocations: 3.173 MiB)
  0.000002 seconds
  0.000000 seconds
  0.000000 seconds
static gen function choicemap nested lookup:
  0.029400 seconds (193.98 k allocations: 6.452 MiB)
  0.000507 seconds (100.00 k allocations: 1.526 MiB)
  0.000707 seconds (100.00 k allocations: 1.526 MiB)
  0.000640 seconds (100.00 k allocations: 1.526 MiB)

dynamic choicemap nonnested lookup:
  0.022475 seconds (61.28 k allocations: 3.335 MiB)
  0.001247 seconds
  0.001312 seconds
  0.001267 seconds
dynamic choicemap nested lookup:
  0.037073 seconds (168.30 k allocations: 6.755 MiB)
  0.007001 seconds (100.00 k allocations: 3.052 MiB)
  0.007292 seconds (100.00 k allocations: 3.052 MiB)
  0.006637 seconds (100.00 k allocations: 3.052 MiB)

running 1000 MH iters on small static function:
  1.174330 seconds (2.14 M allocations: 106.396 MiB, 2.03% gc time)
  0.005628 seconds (52.00 k allocations: 1.785 MiB)
  0.005626 seconds (52.00 k allocations: 1.785 MiB)
  0.005608 seconds (52.00 k allocations: 1.785 MiB)

NEW VERSION:

static choicemap nonnested lookup:
  0.027458 seconds (67.67 k allocations: 3.759 MiB)
  0.000002 seconds
  0.000000 seconds
  0.000001 seconds
static choicemap nested lookup:
  0.024796 seconds (78.58 k allocations: 4.330 MiB)
  0.000001 seconds
  0.000000 seconds
  0.000000 seconds
static gen function choicemap nonnested lookup:
  0.018428 seconds (70.05 k allocations: 3.819 MiB)
  0.000002 seconds
  0.000000 seconds
  0.000000 seconds
static gen function choicemap nested lookup:
  0.022049 seconds (87.10 k allocations: 4.614 MiB)
  0.000001 seconds
  0.000000 seconds
  0.000000 seconds

dynamic choicemap nonnested lookup:
  0.027989 seconds (72.75 k allocations: 3.944 MiB)
  0.003157 seconds
  0.003532 seconds
  0.003402 seconds
dynamic choicemap nested lookup:
  0.046181 seconds (80.35 k allocations: 4.294 MiB, 27.29% gc time)
  0.005809 seconds
  0.005882 seconds
  0.005703 seconds


running 1000 MH iters on small static function:
  1.117399 seconds (2.16 M allocations: 108.037 MiB, 2.05% gc time)
  0.005740 seconds (52.00 k allocations: 1.785 MiB)
  0.005562 seconds (52.00 k allocations: 1.785 MiB)
  0.005494 seconds (52.00 k allocations: 1.785 MiB)

@bzinberg
Copy link
Contributor

Thanks for this work @georgematheos!

I'm wondering if the following slight adjustment to the framing is equivalent to what you're proposing:

Julia's indexing syntax is that foo[a, b, c] lowers to Base.getindex(foo, a, b, c).

The current indexing syntax for choicemaps is, essentially, keys are assumed to be linked lists, and choicemap[a => b] := get_submap(choicemap, a)[b]. Similar to the Julia indexing syntax, but uses linked lists as the special "unpackable" part instead of tuples.

Julia indexable things can have zero-dimensional indexes: foo[], which lowers to Base.getindex(foo). The difference between foo[] and foo is important because the seemingly-pedantic notational consistency actually eliminates the need for special-casing in ways that matter -- see JuliaLang/julia#28866.

It seems that the ValueChoiceMaps you describe are essentially ChoiceMaps that know how to retrieve at an index of length zero (the empty linked list).

If we can pin that down as the semantics of choicemap indexing, I think that would simplify the API.

Again, I think I'm essentially saying the same thing as you. (Ok, that's my third "essentially," better stop now.) WDYT?

@bzinberg bzinberg marked this pull request as draft May 19, 2020 14:42
@georgematheos
Copy link
Contributor Author

georgematheos commented May 19, 2020

Hi Ben! Yes, that's a great way to think of it! I hadn't considered the choicemap[] syntax yet, but I'm really glad you pointed it out; that makes what's going on clearer, and already works in my implementation. I just added a test to the test suite to ensure we support that syntax.

I just added a change to the choicemap docs on this PR to try to articulate choicemap lookups along the lines of your comment (1bd705f). I did this mostly via examples, though; @bzinberg does this look about like what you had in mind? Do you think including these sorts of examples are sufficient, or should we try to find a better formal articulation?

@bzinberg
Copy link
Contributor

The email pings reminded me that I forgot to reply to this 🙂

Upon further thought, I think I've better figured out what the itch was that I was trying to scratch with my above comment. I was trying to figure out what the "more correct but slightly less convenient to write" version of the indexing formalism is for these choicemaps, along the same lines as I advocated in JuliaLang/julia#28866 (comment). Once we know what that is, we can make a syntax shortcut that is as close as possible to the surface, and not have a nagging sense that we're muddying the internals.

The key (hah) invariant that indexing should satisfy is, for any choicemap c,

c[concat(keypath1, keypath2)] == c[keypath1][keypath2]

where keypath1 and keypath2 are any linked lists, including the empty one (and concat is a fanciful name for linked list concatenation which I think doesn't exist as a Julia primitive).

This axiom then implies that c[] == c. So what we need (regardless of what syntax we eventually decide to use for it) is a method of "assert that I am a depth-zero ('leaf') choicemap and extract my value." The thing I didn't realize before is that this method has to be separate from getindex, because the invariant above already determines what getindex does. Or, we could rename the method that gets submaps to something other than getindex, if that results in a nicer surface syntax -- but it's a separate method.

So I guess we have a few choices.

  • We could say
    c[keypath] := assert_leaf_and_get_value(submap_at_keypath(c, keypath))
    and require the user to call submap_at_keypath if they want to see a submap.
  • Conversely, we could say
    c[keypath] := submap_at_keypath(c, keypath)
    and require the user to call assert_leaf_and_get_value (by a better name of course) if they want to extract the value. This has the huge, possibly-nonstarter downside that it will be unfamiliar to absolutely everyone, since dicts don't work this way.
  • We could have getindex try to guess whether the index is a linked list or a "bare" key, and choose its semantics accordingly. (There is genuine ambiguity since Gen currently allows keys to be of any type, including linked list.) Full disclosure, I really don't like this third approach because it puts a huge burden on anyone who wants to reason systematically about what Gen does -- hoping to nip it in the bud.

@bzinberg
Copy link
Contributor

bzinberg commented Jun 19, 2020

Oh how about this:

Define a class Gen.KeyPathPrefix, which is conceptually the same as a tuple whose elements are not allowed to be of type KeyPathPrefix. State that choice addresses also cannot be of type KeyPathPrefix. Then:

# For any T1, T2, ... that are not subtypes of KeyPathPrefix,
c[a1::T1, a2::T2, ...] === getindex(c, a1, a2, ...) =
        assert_leaf_and_get_value(c[KeyPathPrefix(a1, a2, ...)])

c[A::KeyPathPrefix] === getindex(c, A) = submap_at_keypath(c, A)
# Can also have a vararg overload of submap_at_keypath:
# if T1, T2, ... are not subtypes of KeyPathPrefix,
submap_at_keypath(c, a1::T1, a2::T2...) =
        submap_at_keypath(c, KeyPathPrefix(a1, a2, ...))

This might get the best of both worlds. If the user doesn't know about KeyPathPrefix, indexing behavior works like a dict as they expect. If they want to get a submap, they can look up the API and see that they have the options (neither one especially verbose) of

submap_at_prefix(c, a1, a2, ...)

or

c[KeyPathPrefix(a1, a2, ...)]

It might have to be a breaking change because it probably should supersede Gen's special treatment of linked lists formed by Pairs. I think that's a good thing -- cleaner to have the type Gen treats specially be a type that Gen controls rather than a Julia primitive.

WDYT?

@georgematheos
Copy link
Contributor Author

georgematheos commented Jun 19, 2020

@bzinberg thanks for bringing this up, I agree this is an important thing to figure out the right semantics for now. Also, check out the design doc I linked in the probcomp slack; part of my proposal is to merge ChoiceMaps and Selections into a single type called an AddressTree capturing the functionality of both.

Re should c[address] return a choicemap or a value:
I'm not totally sure, but my feeling is that it should be a value (or an error if no value is present). In the Gen code I write, I find I access values in choicemaps more often than I want to access submaps, so my guess is it makes more sense to have a get_submap function for getting submaps, and use getindex for getting values (ie. your "first proposal" above):

# where get_value throws an error if there is no value at this address
Base.getindex(c::ChoiceMap, addr) = get_value(c, addr)
Base.getindex(c::ChoiceMap) = get_value(c) # support valuechoicemap[] syntax

get_value(c::ChoiceMap, addr) = get_value(get_submap(c, addr))
get_value(v::ValueChoiceMap) = v.value
get_value(c::ChoiceMap) = error("There is not a value at this address!")

(This is pretty much what I've implemented in this PR. The addr... )

Re use special type instead of Pair for hierarchical addresses:
@bzinberg what do you think of the following structure for addresses, which I'm hoping will make it easier to specify what is a BaseAddress while allowing us to retain the old Pair syntax?

abstract type Address end
struct EmptyAddress <: Address end
struct BaseAddress{T} <: Address
    addr::T
end
struct HeirarchicalAddress{First, Rest} <: Address
    first::First
    rest::Rest
    # force first and rest to be addresses with this inside constructor:
    HeirarchicalAddress(f::Address, r::Address) = new(f, r)
end

# declaration for get_submap:
function get_submap(::ChoiceMap, ::Address) end
# implementations:
get_submap(c::ChoiceMap, ::EmptyAddress) = c
get_submap(c::ChoiceMap, h::HeirarchicalAddress) = get_submap(get_submap(c, h.first), h.rest)
# get_submap(::ChoiceMap, ::BaseAddress) is has a custom implementation for each ChoiceMap type

# same declarations for get_value as above

# getindex tries to convert to addresses:
Base.getindex(c::ChoiceMap, a::Address) = get_value(c, a)
Base.getindex(c::ChoiceMap, a) = get_value(c, Address(a))

# default converters:
Address() = EmptyAddress()
Address(a::Address) = a
Address(a) = BaseAddress(a)
Address(a::Pair) = HeirarchicalAddress(a[1], Address(a[2]))

If a user wanted to use a Pair as a BaseAddress, they could just do something like choicemap[:a => BaseAddress(:b => :c) => :d.

@bzinberg
Copy link
Contributor

bzinberg commented Jun 19, 2020

Ah, thanks for the pointer -- I'll take a look at your design doc about AddressTree.

Regarding your API sketch, it looks to me like the user-facing API will be similar to the one implied by #263 (comment), and you've suggested a different type layout for the internals (compared to "just use a thing that is conceptually a tuple").

FWIW, I think that the current use of Pairs is a design wrinkle. Example sticking points:

  • What is an empty linked list? It should be an identity element for Pair, i.e. (empty_linked_list => x) == x for all x; but no such value exists. (You speak to this above by introducing EmptyAddress.)
  • What is a one-element linked list, and how is it different from a bare value? (You speak to this above by introducing BaseAddress.) And what about a one-element linked list whose element is itself a one-element linked list?
  • Concatenation is confusing. You can't just concatenate two linked lists A and B by writing A => B. We could write a helper function for this, but that still increases the learning curve. Plus, AFAICS, this form of linked list is not idiomatic Julia.

I'm wondering if you agree it would be good to phase out Pair-based linked lists in the long term, and preserving it for now is a holdover, or whether you think we should stick with using Pair-based linked lists long term. I was speaking above mostly about what I consider ideal for the long term, without much thought on migration plan.

I also have two questions/comments about the type layout you proposed:

  • What advantage does having a first/rest based linked list confer over just having a single address class which is a wrapper around Tuple? (FWIW, Julia's treatment of Tuples is highly optimized, and I would expect your self-recursive implementation of get_submap above to incur a function call overhead proportional to the depth of the choicemap, which is avoidable.)
  • To make Address(a::Address) = a with Address(a) = BaseAddress(a) not result in confusing behavior, I think we need something like
    BaseAddress(a::BaseAddress) = error("Can't nest BaseAddress")
    In particular, having a behavior like BaseAddress(a::BaseAddress) = a.addr would require anyone doing metaprogramming to be careful about this special case -- so I think best to just rule it out. (Or have both versions, and have only one of them be a constructor -- the question of code ergonomics can be separate from the question of consistency.)

@georgematheos
Copy link
Contributor Author

georgematheos commented Jun 20, 2020

Okay, I think you have a good point @bzinberg that Pairs are imperfect for addresses. (I think they might be a reasonable syntactic sugar--but for now I will focus on the internal Address type since I think the internals are more important to "get right".)

Tuples Vs Linked-Lists

As you are saying, I'm essentially proposing to make the Address data structure look like a linked-list rather than a tuple as in the KeyPathPrefix you mentioned as a possibility. My intuition is that having addresses be "linked-list shaped" is more natural than tuples, since it makes it faster to "strip off" the top level addresses and pass around the subaddresses.

Tuples don't seem quite right to me for this use case. For example, it feels like I should be able to change a top-level address without reconstructing the whole hierarchical address. And it feels unnatural to have "indices" for the address nodes at different "depth"s. (I don't see any reason to need constant-time access to the nth node in an address...when do I ever need a node without needing to know the part of the address preceding it?)

While the performance for tuples is highly optimized, I think the JIT compiler should be able to produce equally fast code for linked lists for iteration (though of course not for indexing--but again, I don't think we need this). When I run simple experiments to iterate over linked-lists vs tuples, it looks like for iteration implemented via recursion, linked lists have better performance, vs "for-loop" recursion leads to better performance for tuples. (Pairs are faster with recursive iteration because it's slow to do tuple[2:end] but not do do pair.second. I'm not sure why tuples are faster in my loop example.)

Concatenation

(Note that I use => syntax for writing linked lists in this section for convenience; I don't mean to suggest we need to stick with julia Pairs.)

I agree that it's not obvious what the best way to do concatenation is. Ideally, we could concatenate any 2 addresses in constant time. I don't think this is possible if addresses are represented as tuples, since tuples are immutable so to concatenate we have to fill a new tuple with the values from the other ones. If they are linked-lists, we can sort of concatenate in constant time, but we get the wrong "nesting structure" as you are pointing out. (Ie. we'd want concatenate(:a, :b => :c => :d) == concatenate (:a => :b, :c => :d), but this would require :a => (:b => (:c => :d)) == (:a => :b) => (:c => :d).)

Interestingly, a lot of the time, having improper nesting wouldn't matter. For example, get_submap is associative over addresses: get_submap(cm, :a => (:b => :c)) == get_submap(cm, (:a => :b) => :c). (I'm pretty sure this holds from the implementation of get_submap I described above.)

However, there are cases when we need the first BaseAddress immediately, for example when updating a generative function. If we allow for construction of linked-list addresses with "improper nesting", one way to handle this would be to implement a function with the following signature:

(first::BaseAddress, rest::Address) = get_first_rest(a::HeirarchicalAddress)

and have this automatically do the (possibly linear-time) traversal needed to extract the base address from a. (I don't love the name get_first_rest; I'd want to come up with something better. Or maybe we could try to override indexing syntax?)

Alternatively, we could just do this iteration at construction-time, so every time we construct a HeirarchicalAddress, we make sure the first node is a BaseAddress. I'm not sure what the right answer is to this question of lazy-structuring vs immediate structuring. The lazy one has the advantage that sometimes we may never need to restructure an address, but the downside that we might need the properly structured address multiple times. (To "cache" the lazy restructure, we'd need to make addresses mutable. This feels like the wrong call to me, since it would mean the linked-list nodes are not type-stable.)

Another possibility would be to have one type ImproperHeirarchicalAddress node which allows for improperly-structured nodes, but which we can convert into a ProperHeirarchicalAddress to properly nest it. WDYT?

@bzinberg
Copy link
Contributor

bzinberg commented Jul 1, 2020

Aha - now that I've read your docs on AddressTree and UsingWorld, I have a lot more context for this.

The main things I want to advocate are API simplicity and deferred commitments. By deferred commitments I mean that the core conceptual machinery that users have to know should "be as unrestrictive as possible, except when it's not" -- wherever restrictions or complexity are introduced, they should be well justified. So what I'm most interested in understanding better (and didn't do a good job of focusing on in my previous comment) is, what commitments do the above API and Julia type hierarchy make in terms of how the user structures their programs, and are they all necessary? On first glance it looks to me like the type hierarchy itself is catered to inference algorithms that operate by tree traversal on the choicemap -- which might be exactly right but I want to better understand why it's right.

@bzinberg
Copy link
Contributor

bzinberg commented Jul 5, 2020

Btw, this is a minor and tangential thing, but just to clarify, when I said

I would expect your self-recursive implementation of get_submap above to incur a function call overhead proportional to the depth of the choicemap

I was referring to stack memory overhead (which can overflow the stack even if the system has plenty of physical memory), because AFAIK Julia does not do tail call optimization. I doubt that any reasonable Gen program would come close to having choicemaps this deep, I was just reflexively commenting on what appeared to be stack-based recursion in a language that isn't Scheme. (Sadly, even Clojure doesn't have proper tail call optimization, because it needs to follow Java calling conventions for interoperability.)

@georgematheos
Copy link
Contributor Author

Ah shoot, they don't flatten tail recursion?! Okay, that's good to know.

I agree most programs won't have choicemaps that deep, but it still might be a performance regression. We could consider if an iterative implementation like

function get_submaps_shallow(c::Choicemap, a)
  while !(a isa BaseAddress)
    c = get_submap(c, a.first)
    a = a.rest
  end
  return get_submap(c, a)

would improve performance.

Re commitments, I totally agree. I am pretty busy right now but at some point I think I might put together a presentation/writeup to discuss with the group about a way to look at Gen as a sort of "tree traversal" library, which motivates a lot of my suggestions and changes. This will help explain why I think that this type of tree-recursion is arguably fundamental to Gen, and why I think it may be reasonable to commit to it in the explicit design of the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants