New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Higher-Order Function Instructions #147

Open
thelmuth opened this Issue Jul 13, 2015 · 75 comments

Comments

Projects
None yet
3 participants
@thelmuth
Copy link
Collaborator

thelmuth commented Jul 13, 2015

It would be cool to have higher-order function instructions like map, reduce, and filter in Clojush. Maybe they'd only be defined on the array-like types (string, vector_integer, vector_boolean, etc.), or maybe they'd somehow be defined on other stacks as well.

The details are vague, and there are some non-trivial things to figure out, but if anyone wants to take this on it would be cool!

@lspector

This comment has been minimized.

Copy link
Owner

lspector commented Jul 14, 2015

We do already have code_map, which could be used as a model for others. Also, several exec instructions have related functionality.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 14, 2015

There are also exec_do*vector_integer and similar instructions for the other vector stacks.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 14, 2015

I would be happy to add these. I've done them at least twice in push-forth and duck, and it's a good place for me to get a better sense of adding comprehensive midje tests as I work. :)

The bigger question, of course, is how do you want to handle big swathes of functionality like this? A few instructions at a time, or a suite all together?

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 14, 2015

Personally, I'd say do it however you like! If you want feedback, a little at a time might be good, but most things "never get done", so if you're going to "get it done", then do it your way :)

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 14, 2015

So let it be written.

@lspector

This comment has been minimized.

Copy link
Owner

lspector commented Jul 14, 2015

Please be sure, though, that the execution of a single instruction is always well bounded in space and time, since we only check for limits being exceeded between instructions. I'm pretty sure we've been careful to do this -- for the definition of "well bounded" that has mattered for us in practice -- for every instruction currently in Clojush, but I also think (though I'm not sure) that @Vaguery has had a somewhat different philosophy on this (maybe because of different ways to catch things when they run out of control).

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 14, 2015

My idea right now is to bring over something like the :enumerator type I built in one of the other languages. It is a direct extension of the forms you are used to with exec_do_etc and code_doetc`, and so it's completely reentrant and iterative in the same way.

Briefly: am enumerator contains one immutable sequence (I will start with vectors here) and an associated persistent "pointer state". As with code there is no need that the sequence be single-typed, and one enumerator may contain a vector_integer and another a vector_string or a string item.

When created, the enumerator's pointer is set to 0.

A standard library of pull the top enumerator from that stack and push the following to exec:

  • enumerator_first produces
    • the enumerator, with its pointer set to 0
    • a copy of the item at position 0 in the enumerator
  • enumerator_last produces
    • the enumerator, with its pointer set to the last index
    • a copy of the item at position -1 in the enumerator
  • enumerator_forward produces
    • the enumerator, with its pointer incremented by 1; if the pointer exceeds the length of the sequence, the enumerator is consumed without output
  • enumerator_back produces
    • the enumerator, with its pointer reduced by 1; if the counter was already 0, the enumerator is consumed without output
  • enumerator_next produces
    • the enumerator, with its pointer incremented by 1; if the pointer exceeds the length of the sequence, the enumerator is consumed without output
    • a copy of the item in the sequence at the new position
  • enumerator_prev produces
    • the enumerator, with its counter reduced by 1; if the counter was already 0, the enumerator is consumed without output
    • a copy of the item in the sequence at the new position
  • enumerator_history produces
    • the enumerator
    • a sequence (of the same type as its internal sequence) containing all the items from positions 0 to the pointer
  • enumerator_remaining produces
    • the enumerator
    • a sequence (of the same type as its internal sequence) containing all the items from just after the pointer to the end
  • enumerator_apply produces (with a code argument)
    • the enumerator
    • code_quote
    • the code item
    • a copy of the item at the pointer position
    • the code item
  • enumerator_map produces (with a code argument)
    • the enumerator, with the pointer advanced
    • code_quote
    • the code item
    • a copy of the item at the pointer position
    • the code item
    • (this is from memory; probably some other stuff to make it "keep going" in a Pushlike way)
  • enumerator_moveto produces (with an integer argument)
    • the enumerator, with the pointer set to the absolute position indicated by the int modulo the sequence length
  • enumerator_move produces (with an integer argument)
    • the enumerator, with the pointer moved relative to its current position by a number of steps determined by the int; if the pointer leaves the sequence from either end, the enumerator disappears.
  • enumerator_pos produces
    • the enumerator
    • the pointer value as an integer
  • enumerator_break produces
    • the sequence
    • the pointer value as an integer
  • [sequence]_to_enumerator creates a new enumerator from the indicated sequence
  • the usual suspects: enumerator_dup, enumerator_pop and so forth

Note: in most of the implementations I've worked on where the goal was extending Push, I did not want anything reaching into the associated sequence and changing it. In duck things are more fluid, and as I recall the stack-affecting messages like :dup and :pop can be received by an Enumerator instance as well, so all kinds of madness happens.

Anyway, @lspector, there is no more than one additional copy of the sequence ever, and so I think the computational implications of this approach are in keeping with the Push philosophy at least as much as code_do_times and so forth are.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 14, 2015

I didn't fully digest all of the details, but this looks cool to me. I like the idea of the enumerator stack. It always felt a little scary to me that indices for the exec iteration instructions are kept on the exec and integer stack, where they can be messed around with easily. An enumerator stack that keeps track of that stuff without multipurposing the exec and integer stacks seems like a clean way to avoid that mess. Thumbs up!

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 16, 2015

Is there a pre-existing idiom (or convenience function) for adding a new type to clojush.globals/push-types? I would prefer to have the responsibility for registering a new type live in the definition of that type, so people who don't want to use it don't end up with extra chaff added to globals.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 16, 2015

Nope. The only current way that I'm aware of is to just add it to that list of stack types.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 17, 2015

I could use some feedback on the work I've done so far. It's not in any sense "done", mainly because there's a huge amount of repetition in the instruction definitions, but I'd like to know if it makes sense so far to an external reader familiar with Clojush's overarching structure.

You can compare to the more-or-less intact Clojush state using the branch comparison view

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 17, 2015

Definitely looking like a good start! A few comments:

  • Can you add a comment for each instruction describing what it should do? It's hard to tell if an instruction is correct when I don't know its intention.
  • All instructions must return a push state. It appears that some of your instructions will not, in certain conditions. For example, this if statement has no else argument, and will return nil if the collection is empty. It should instead either return state or popped_state, depending on your intention (which I think is the latter based on your Rules of Thumb). I'm actually seeing this in all of your instructions.

Otherwise, it's hard to tell if things look "right" without comments about what they're supposed to do.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 17, 2015

Do you know if there is room inside define-registered for a docstring?

Good catch on the return state. I was just getting around to getting rid of all the repetition that masked that from me.

Do the midje tests, which explicitly say what every thing should do in every edge case, help clarify what the code is intended to do? I will also add comments because that's the standard and because the docs are generally pretty bad, but the intention is always best spelled out in the test suite.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 17, 2015

Do you know if there are any instructions that don't start by setting the metadata of their requirements, and (fn [state]...?

(and I mean when they're "unrolled", too; I see for example the inner function of popper in the example is equivalent)

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 17, 2015

Do you know if there is room inside define-registered for a docstring?

I don't know! You could always try and see if anything breaks.

I will also add comments because that's the standard and because the docs are generally pretty bad, but the intention is always best spelled out in the test suite.

I didn't check, but will. Though, that assume there aren't any typos/bugs in the test suite. And also: English is easier to understand than someone else's code.

Do you know if there are any instructions that don't start by setting the metadata of their requirements, and (fn [state]...?

Not that I know of. But, there's no reason you couldn't use another idiom. Each instruction has to be a function that takes a push state and returns a push state, and needs metadata associated with it that tells what stacks it uses, but you can get there any way you like.

@lspector

This comment has been minimized.

Copy link
Owner

lspector commented Jul 17, 2015

The define-registered macro, as currently written, cannot take a doc string. This has annoyed me and I've resorted to using comments in the past, but it could certainly be fixed. define-registered would have to be written to take an optional doc string and to attach it as meta-data to the instruction name in the appropriate way (which I haven't investigated, but is probably not hard, modulo the usual mental orientation/clarity that's needed when writing macros).

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 17, 2015

Was just about to open this as an issue :)

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

I've written a little testing helper function called safe-execute in my midje tests which will

  1. apply execute-instruction to a state
  2. raise an exception if the returned value doesn't have all the push-types as its keys (e.g. when it's nil)
  3. return the result otherwise

I am making an effort to use this in every test call of an instruction, so that every returned value from every edge case is validated. Obviously this is too much work for the interpreter to do in the normal course of running a program (?), but it's really pretty spooky that this sort of thing (blithely ignoring nil arguments) is still lurking in there.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 18, 2015

Was that last comment supposed to go in #153?

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

Will do enumerator_map, enumerator_set and the remaining conversion instructions tomorrow.

A few questions of preference:

  • Do you want to permit "looping" behavior? I'm guessing this would be a boolean flag, settable by a toggle instruction, which when true would permit the Enumerator to "wrap" in both directions when the pointer moves beyond the currently permitted bounds.

  • Thinking more about map and apply functionality: My previous implementations were all in interpreters where every result went onto the equivalent of the :exec stack. Here I see many instructions jump straight to pushing results to their eventual stack. I am thinking the enumerator_map instruction should do something like this:

    1. push enumerator_map onto :exec, unless the Enumerator is done
    2. copy the top :code item onto :exec (leaving a copy on top of :code)
    3. push the item at the current Enumerator pointer onto :exec
    4. advance the Enumerator pointer

    And enumerator_apply (not my favorite name) would be slightly different in that it would take new items from the :code stack on each step:

    1. push enumerator_apply onto :exec, unless the Enumerator is done
    2. move the top :code item onto :exec (popping it from :code)
    3. push the item at the current Enumerator pointer onto :exec
    4. advance the Enumerator pointer
  • my plan is to build (and test)

    • enumerator_from_vector_integer
    • enumerator_from_vector_boolean
    • enumerator_from_vector_float
    • enumerator_from_vector_string
    • enumerator_from_string
    • enumerator_from_code

    That last is not intended to be a tree-traversing zipper kind of thing; it will just be treated like a "flat" list of subtrees.

What have I missed?

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

@thelmuth No, it as intended for here, as a response to the bugs you caught!

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 18, 2015

Do you want to permit "looping" behavior? I'm guessing this would be a boolean flag, settable by a toggle instruction, which when true would permit the Enumerator to "wrap" in both directions when the pointer moves beyond the currently permitted bounds.

This sounds a bit crazy. Good crazy? I don't know. It would be a lot easier to have infinite looping behavior with the looping you're talking about.

re: enumerator_map: I would love to be able to use this instruction (and similar ones) without having to use the :code stack at all. The :code stack is great when you want to use it a lot, but adds a ton of extra complexity (and a larger instruction set) if you otherwise don't need it. Would it be possible for some of these instructions to use the :exec stack instead, or to have 2 versions, one which takes the code from :exec and one from :code, like exec_do*range and code_do*range?

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

No, I'm sorry, I can't possibly see how anybody could do that. Good grief, man, these aren't computers we're... oh wait.

Yes!

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

While we're discussing things: as I've mentioned, this isn't my first go on the Enumerator ride. One set of instructions (from push-forth I think) I had sort of set aside because I figured you'd think they were pretty weird. These take an entire stack and create an Enumerator out of that. enumerator_from_all_integers and so forth, I guess they would be here.

Just sayin'.

My favorite is one I suppose would be called enumerator_from_all_enumerators....

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 18, 2015

While we're discussing things: as I've mentioned, this isn't my first go on the Enumerator ride. One set of instructions (from push-forth I think) I had sort of set aside because I figured you'd think they were pretty weird. These take an entire stack and create an Enumerator out of that. enumerator_from_all_integers and so forth, I guess they would be here.

I'm all for this! While it might be sort of crazy, it seems like it could be very useful at times.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

There still should be some discussion of what reduce and filter might be in a Push setting. Realizing that one doesn't want to do them "all at once" (because that's the Push way), I'm not entirely clear what makes sense.

The sort of canonical reducer is injecting + over a list, right? I'm not sure what it means to "reduce" a list by "applying" an arbitrary Push code item—with unknown arity—to even one item, let alone "combining". Even the map I've sketched above feels a bit like making a sandwich by carefully putting a slice of bread between every item....

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

I will postpone "looping" for the time being.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

I notice a lot of the instruction names are pretty long. Can I get away with renaming them enum_thingie rather than enumerator_thingie, or is the naming standard to always use the complete stack name? Either way is good for me, just loads of typing. I understand enum is probably an overloaded string for many programmers, and it's not at all the same thing as an enumerator.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

Quick judgment call: Should instructions like enumerator_map_code leave the code item in place on that stack after they've exhausted the enumerator, or pop it?

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

@thelmuth Let's talk more about proper map and other functional instructions. Here are the things I can determine, thinking about the instruction set we have already:

For numeric vector types, the basic arithmetic would be easy to implement as map or reduce functions. The question, in terms of language design, is how to recognize and obtain the particular instructions, without blindly proliferating a new suite of instructions: How would the process described by map integer_add over this vector, taking an integer as the second argument find the integer_add?

As I mentioned above, one approach would be to use metadata associated with the instructions themselves to facilitate this: What if a better map instruction looked at the code or exec stack, found an instruction as such, looked for an argument of the right kind to match the types of the instruction... that might work for a lot of them.

The trick here is the same one that interferes with (or complicates) matrix operations in Push: We need to choose between

  1. being willing to fail to find complementary arguments; for instance, we would need by chance to find an integer->integer function and a vector_integer accidentally to be next to each other on some (?) stack.
  2. being willing to "stage" the second argument to suit the first: pick a function rom the place they come from, and then examine its input type(s) to find the correct item(s) on the indicated stack.
  3. being willing to "scrounge" for second arguments to suit the type demanded by the first: if our function takes floats, we could pull the top thingie that is made of floats from a mixed source (this is like the matrix dimension constraint, where if we pick an $$m x n$$ matrix as the first argument, we would need to hunt for the first $$n x k$$ matrix to multiply it).

The first one seems like it's not especially helpful, and unlikely to be very evolvable. The second one breaks a fundamental paradigm of the instructions we've written so far, which is that all arguments are pulled first and then we do the math. The third one opens a can of worms that makes this more like push-forth, where this sort of thing is built in (because there is only one stack for every type).

One way of thinking about the second and third paths, I suppose, would be to build a kind of "map closure" or "continuation" object out of a map instruction and its first (instruction or code) argument, and then give that the behavior that push-forth uses to shuffle through the stack looking for a match.

Call this mythic instruction supermap_exec for now. Suppose it takes a single argument from the exec stack, which must be an instruction or code item. It pops the next item on exec, and if that item isn't "compatible" it swaps position with it in the exec stack. If the popped item is "mappable", it consumes it and becomes a map_closure item, still on exec. When a map_closure item is executed it pops the next item from the exec stack, and if it's the right type it applies the map function to it (one set of arguments taken from the stacks), and we're done. If it doesn't find a compatible second argument in the next position on exec or wherever, it swaps position with it.

The problem I've seen in push-forth is that these little continuation form gliders start crawling up the stack and consuming everything in turn. No instruction goes un-executed, but if there's a lot of mismatches there can be a large bolus of continuations drawing up the stack before you're done.

The only tricks are, this feels pretty un-Pushy, and it means there needs to be a lot more metadata associated with instructions (and possible code items).

Let me walk through an example for clarity:

exec:(supermap 2 4 [8 7 2] integer_mod 8) int:() int_vec:()
exec:(2  supermap 4 [8 7 2] integer_mod 8) int:() int_vec:()  ; supermap don't like "2"
exec:(supermap 4 [8 7 2] integer_mod 8) int:(2) int_vec:()  
exec:(4 supermap [8 7 2] integer_mod 8) int:(2) int_vec:()  ; supermap don't like "4"
exec:(supermap [8 7 2] integer_mod 8) int:(4 2) int_vec:() 
exec:([8 7 2] supermap integer_mod 8) int:(4 2) int_vec:() ; supermap don't like "[8 7 2]"
exec:(supermap integer_mod 8) int:(4 2) int_vec:([8 7 2]) ; supermap likes integer_mod!
exec:((map integer_mod SCALAR INT onto INT_VECTOR) 8) int:(4 2) int_vec:([8 7 2])  ; closure formed
exec:(8) int:(2) int_vec:([8%4 7%4 2%4])  ; closure arguments taken from stacks
exec:() int:(8 2) int_vec:([8%4 7%4 2%4])  ; boring old 8
@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 18, 2015

Before reading too deep into your post, I think here's one way we're differing on these thoughts: You seem to want to ensure that the instruction(s) used as the mapping function output the right type and will find the right types. Let me come back to this idea in a second, when you get to [1].

I would personally go down another track entirely. Yes, I would somehow indicate the collection of the output of the map. Not sure exactly how -- there used to be a :types stack in Push that would be helpful here, but maybe there's a better way. You could have "baked-in" input and output types -- instructions like enumerator_map_integer_vector_to_string. Maybe this would result in "too many" enumerator instructions. We currently have ~6 enumerable types which would mean 6^2 = 36 such map instructions, which might cross the "too many" threshold.

Another idea that just came to mind is to somehow use epigenetic markers to denote the input/output types for higher-order functions. This might be overkill/weird, but it could also potentially be a nice way to only have 1 map instruction, but be able to have it define any input/output combinations.

[1] Ok, back to this thought. Personally, I'd lean toward letting evolution "figure it out" regarding the correct output types. You give the map instruction a block of code, and if it doesn't leave the correct output on the right stack, tough noogies. I don't know if the penalty would simply be a failing map instruction, or a shorter output vector, or what. Same with a filter instruction -- if there's no boolean on the :boolean stack, default to false and don't include that item. It seems like this way would avoid a lot of the complications that you're getting into. Now that I've read more, maybe this suggestion is like your (1) in your list?

Another problem I'm seeing with your suggestion is that you'd (well, I'd) often want to map a block of instructions, not just a single instruction. Your push-forthy ideas seem like they'd be even crazier to do with "more than one instruction functions".

Another crazy thought: what if we had a :function stack? The functions would be blocks of code with input and output types, and would execute in their own scope as to not disrupt the rest of the program. AHA! This actually sounds like something we already have, believe it or not -- the :environments stack, which is something @lspector and I were working on year(s) ago, with the purpose of being able to execute a block of code without disrupting other stack contents -- a local scope. We were doing it for other reasons (I think it was geometric semantic crossover?), but this could totally be co-oped for this purpose. I don't have time to think about the ramifications right now, but this could be a pretty elegant solution.

Ugh, I should be working on my defense talk rather than this, but this is so much more interesting. In fact, I should probably put off all of this stuff until after my defense. But, I don't want to put you on hold if now is when you can work on it. Anyway, if I don't respond for a while, that's why. 😿

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 18, 2015

On a more concrete note, I just realized that any instructions you define that you expect to use a parenthesized block of code from the :exec stack, such as enumerator_map_exec, need to have metadata indicating how many parenthesized blocks they require. See exec_if and exec_dup for examples.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

Dude. DEFENSE.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 18, 2015

Major clarification

I just was chatting with @NicMcPhee and I realize you might think I am considering this current branch for submission. This enumerator type was simply intended as a design spike for a more robust suite of types and instructions that actually do what we discover they should do. It's not going to be submitted for a Pull Request, and all the code will be thrown away! It's just a way of sketching in code, and eliciting points of "resistance" from the codebase and from you folks.

I quite like the idea of a proper higher-order function, along the lines Tom and I are sketching above. But that would be something I'd do in a test-driven way, and using a lot better refactoring techniques. Please recall this is the first Clojure (or Lisp or Java) code I've ever written.

Never use the first code!

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 19, 2015

@thelmuth after telling you to go practice, I now want to ask you to explain your idea for a sandboxed :function with a bit more of an example. Walk me through something by hand.

Also, I don't understand the comment about the "parenthesized block from :exec", above. I don't actually know what that is. Are you talking about :code literals on the exec stack? For instance in an interpreter

exec:  1 2 exec_quote ( 3 int_mod 4 ) int_add ( )

( 3 4 ) and ( ) are both code literals. The Push interpreter unwraps those into their components when it encounters them, much like Lisp I always assumed, right?

To determine the effective arity and/or type of a given code block, I assume you'd have to do a sandboxed dry run, especially with the contingent statefulness of certain instructions' outcomes. But (3 int_mod 4) has a (non-normalized) type of [integer]->[integer,integer], right?

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 19, 2015

Also, I don't understand the comment about the "parenthesized block from :exec", above. I don't actually know what that is. Are you talking about :code literals on the exec stack? For instance in an interpreter

Ah, I was talking about how the new Plush genomes need to know how many code blocks to open. Let me explain.

When we transitioned into linear Plush genomes, a big reason we did so was to make it so that instructions that expect parenthesized code blocks (maybe what you're calling "code literals") will always have them. When evolving Push code directly, we often found that despite the ability to evolve semantic code blocks, we often (or even usually) found that :exec stack instructions that could manipulate code blocks were followed instead by a single instruction. This makes it a lot harder to do interesting things with :exec stack manipulations, since they often are operating on single instructions instead of code blocks.

So, now we get to Plush genomes, which are linear lists of instruction maps. Since they're linear, and we need to translate them into hierarchical Push programs, we have to have a way to indicate where parentheses should be. We considered a few options, but went with this: any instruction that can make use of a code block from the :exec stack implicitly opens a parenthesis pair. Then, in each instruction map, we have an epigene that is an integer that tells how many closing parentheses to put after that instruction -- the :close epigene.

Some instructions, for example exec_if, take more than one code block (exec_if takes 2). In this case, if the top :boolean is true, the second block of code is discarded, and if false the first block of code is discarded. Thus, exec_if will always operate on blocks of code. They might be empty or have only one instruction, but hopefully evolution will make use of code blocks with multiple instructions more often this way.

We specify the number of code blocks that an instruction uses by the :parentheses metadata in the instruction, like I linked earlier. So, if you want enumerate_map_exec to take one block of code off the :exec stack (which seems natural to me), then you'll want to have :parentheses 1in its metadata.

Grok? We need to write this up, but haven't yet besides a small section of my dissertation.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 19, 2015

Partially grok. I totally believe I get how Plush genomes work (but yes, write it up)... but instructions that take arguments from :exec take items from :exec, right? For example you mention exec_if: it doesn't take two code blocks, it takes two items of any sort from the :exec stack, doesn't it?

So it seems like you're saying the Plush genome "wants" there to be two code literals ("parenthesized code blocks") on the :exec stack when the interpreter executes it, but it's definitely not guaranteed at runtime. Right? (follow links)

My confusion comes from a sense that one of the central design principles of Push is that the genome isn't anything to do with the interpreter's behavior: that instructions (as executed) have to be capable of handling any argument that might possibly exist on the stack from which it's taken. The interesting thing, though not unique by design, about :code and :exec is that items of any type can exist on them.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 19, 2015

For example you mention exec_if: it doesn't take two code blocks, it takes two items of any sort from the :exec stack, doesn't it?

Correct. But, barring :exec stack manipulations, we force exec_if to be followed by two code blocks, so it will almost always be followed by them.

But yes, you are correct, the instructions you linked could make it so that an exec_if statement is not followed by a code block but a single instruction (or integer literal, etc.), which is fine. (Note, BTW, that the first two instructions you linked do require code blocks!)

To help you wrap your mind around this better, all of this prescription of parentheses is a translation-time guarantee, not a run-time guarantee. Does that help?

@lspector

This comment has been minimized.

Copy link
Owner

lspector commented Jul 19, 2015

Maybe helps to clarify in a different way (?): When @thelmuth says "we force exec_if to be followed by two code blocks" he means that the Plush->Push translation process will guarantee that in the post-translation Push code exec_if will be followed by two parenthesized sub-programs. This happens before Push program execution. When the Push program is actually executing, if exec and/or code instructions are present, then when an exec_if executes there might be anything (parenthesized or not) or nothing following it on the exec stack.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 19, 2015

Yes, it does help, though I'm still not 100% I've grokked your idea for mappers as sketched in this comment.

I does seem, though that we ("we" 😄) could implement a simple convolve instruction right now that

  1. took the next item off :exec
  2. stopped unless it was a bare instruction
  3. found a collection of the same type as its first argument (possibly subject to caveats below)
  4. found scalars of the same type as its other argument(s), if any
  5. applied the function to every element of the vector, using the scalars for every application

Caveats: This would probably be best if we only considered instructions that produce scalar outputs. Because vector_of_environments_—while very sexy in theory—is probably out of reach for the moment.

Concern: One wouldn't want to undertake a huge functional operation without building it out onto the :exec stack in stages, the way we do for :exec_do_* and so forth. Otherwise the counters would be off.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 19, 2015

Working out an example:

exec:([false true false] true map boolean_xor)
exec:(true map boolean_xor) vecb:([false true false])
exec:(map boolean_xor) vecb:([false true false]) bool:(true)
exec:(boolean_xor) vecb:([false true false]) bool:(true)      <- map fires
...?...                                                       <- how does mapping "unroll"
exec:() vecb:([true false true]) bool:()                      <- [true xor false, true xor true, true xor false]
@lspector

This comment has been minimized.

Copy link
Owner

lspector commented Jul 19, 2015

Another crazy thought: what if we had a :function stack? The functions would be blocks of code with
input and output types, and would execute in their own scope as to not disrupt the rest of the
program. AHA! This actually sounds like something we already have, believe it or not -- the
:environments stack, which is something @lspector and I were working on year(s) ago, with the
purpose of being able to execute a block of code without disrupting other stack contents -- a local
scope. We were doing it for other reasons (I think it was geometric semantic crossover?), but this
could totally be co-oped for this purpose. I don't have time to think about the ramifications right
now, but this could be a pretty elegant solution.

This came up in a conversation at GECCO too (I forget with whom).

I definitely think there are exciting opportunities in using the environment stack plus a little bit of something that we haven't fully worked out yet. One idea is to provide a minimal way to get an environment and return instructions so that the package behaves as a function.

Environments get the full Push state, with all stacks, any parts of which they can use or ignore. But changes to that state are thrown away when we return from the environment, except for changes specified by explicit return... and return...pop instructions within the environment. (When leaving an environment we first restore the pre-environment state and then make the changes specified by return... and return...pop instructions.)

We had discussed packaging not only environment and return instructions into single function-creating instructions, but also tying this into the tag system so that the function would be tagged and therefore callable by tag reference.

I don't think we ever implemented any of this (beyond getting environments and return instructions to work on their own) but I still do think it's a worthwhile idea. And it might also open the door to more elegant formulations of ideas involving higher-order functions.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 19, 2015

I was working on a post saying everything @lspector said, but then got interrupted by having to bathe baby Ben, etc. He gave a good overview of what current environments can do, and it sounds like his other ideas are along extremely similar lines to mine, including the parts about needing to make environments slightly more function-like and attaching tags to them.

For kicks here's the stuff I had already typed, which goes into some detail about how environments work:

after telling you to go practice, I now want to ask you to explain your idea for a sandboxed :function with a bit more of an example. Walk me through something by hand.

Ok, now this. First, let's talk about the :environment stack and environments, which is what we already have. The idea behind environments is to execute some code in a local scope, so that the code doesn't affect the other stacks.

There are two ways to start an environment. environment_new takes the top block of code on the :exec stack and executes it in a new environment. The environment_begin instruction does the same thing, but uses the entire rest of the :exec stack. Environments can be ended in two ways: running out of code on the :exec stack, and reaching an environment_end instruction.

When an environment starts, it pushes the current Push state onto the :environment stack. When it finishes, it pops the top of the environment stack, restoring the state as it was before the environment started. The only way for code inside an environment to affect the rest of the program is to use return_X and return_Y_pop instructions. The return_X instructions have the name of a stack instead of X. When such an instruction is executed, that type is put onto the :return stack, and when the environment finishes it will put the top item on the X stack onto the X stack in the base environment. The return_Y_pop instructions are similar, except they specify a stack to pop after returning from the environment, allowing a program to "consume" arguments.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 19, 2015

Seems simple enough. Leaves room for some finesses, like pushing the entire state or just components.

We really need to work on the docs, since there's almost no way for anybody to suss all this unfulfilled potential out of the codebase as it exists right now :(

@lspector

This comment has been minimized.

Copy link
Owner

lspector commented Jul 21, 2015

I just copied Tom's description above to the Orientation category of push-language.hampshire.edu.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

I stepped away for a couple of days, and came back this morning to finish this first design spike. Basically what I wanted from the exercise is to add a new type and some associated instructions, add a new problem definition that invokes those, and be able to run that... and then throw it away and do it again, with a goal of improving the experience.

90% there.

So now it's dying with a stack trace that strongly implies to me that the creation of random genomes requires some sort of extra information. Can we talk about that? Am I supposed to provide additional information for this random code generator, for instance? I haven't written any epigenetic markers, personally, on the assumption that defaults were in place.

[the instructions look like they're getting loaded up here]
...
Generating initial population...
Processing generation: 0
OutOfMemoryError Java heap space
    java.util.Arrays.copyOf (Arrays.java:2882)
    java.lang.AbstractStringBuilder.expandCapacity (AbstractStringBuilder.java:100)
    java.lang.AbstractStringBuilder.append (AbstractStringBuilder.java:390)
    java.lang.StringBuffer.append (StringBuffer.java:224)
    java.io.StringWriter.write (StringWriter.java:84)
    clojure.core/print-simple (core_print.clj:128)
    clojure.core/fn--5398 (core_print.clj:131)
    clojure.lang.MultiFn.invoke (MultiFn.java:231)
    clojure.core/pr-on (core.clj:3322)
    clojure.core/print-sequential (core_print.clj:58)
    clojure.core/fn--5453 (core_print.clj:279)
    clojure.lang.MultiFn.invoke (MultiFn.java:231)
    clojure.core/pr-on (core.clj:3322)
    clojure.core/print-sequential (core_print.clj:58)
    clojure.core/fn--5406 (core_print.clj:143)
    clojure.lang.MultiFn.invoke (MultiFn.java:231)
    clojure.core/pr-on (core.clj:3322)
    clojure.lang.Var.invoke (Var.java:419)
    clojure.lang.RT.print (RT.java:1748)
    clojure.lang.RT.printString (RT.java:1727)
    clojure.lang.ASeq.toString (ASeq.java:21)
    clojure.core/str (core.clj:513)
    clojush.util/open-close-sequence-to-list (util.clj:263)
    clojush.translate/translate-plush-genome-to-push-program/fn--977 (translate.clj:87)
    clojush.translate/translate-plush-genome-to-push-program (translate.clj:67)
    clojush.translate/population-translate-plush-to-push/fn--984/fn--985 (translate.clj:128)
    clojure.core/binding-conveyor-fn/fn--4107 (core.clj:1839)
    clojure.lang.Agent$Action.doRun (Agent.java:114)
    clojure.lang.Agent$Action.run (Agent.java:163)
    java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:895)
@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

By the way, trying to suss out where that was happening, I set :max-generations 0 and it still dies with the same stack trace.

I've pushed the broken state to the current branch, so I can spend some time later trying to identify the problem.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 21, 2015

The function you linked to is a single instruction map generator. If you are using it thinking it will create random genomes, you're going to have problems. What you want is random-plush-genome.

If you know all of this, let me know and I'll dig further. It would also help to have a pointer to the code where you're making the random code.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

I'm not actively invoking anything, except via lein run clojush.problems.tozier.float-regression-with-enumerators. All I'm trying to do is run a pushgp problem that uses the new instructions and types.

See the file problems/tozier/float_regression_with_enumerators.clj in this branch

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 21, 2015

First of all, I'm not getting the same exception you did -- did you change the code since then?

The exception I'm getting is that you included the set of all instructions @registered-instructions in your list of instructions, instead of concating it. I'll make a pull request to fix this shortly. Once I fixed that, it seemed to run fine.

One last note: you can easily change arguments on the command line, which makes testing things easier, instead of having to stick things like :max-generations 0 in the problem file. Here's how:

lein run clojush.problems.tozier.float-regression-with-enumerators :max-generations 10 :population-size 20
@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 21, 2015

Oh, I also forgot to mention that you're overwriting the default :epigenetic-markers, which is [:close] by default. The way you're doing it will result in programs with all closing parentheses coming at the end of the program. I'd recommend simply removing that line from your argmap.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

LOL. I copied the argmap from another example, and AFAIK only changed it by putting the require all instructions thing.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

The list concatenation did it.

Do you want to explain why this isn't an open issue?

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 21, 2015

Do you want to explain why this isn't an open issue?

The problems that explicitly overwrite the :epigenetic-markers with an empty vector do so because their instruction sets include only instructions that do not make use of semantic parentheses. So, including :close epigenetic markers wouldn't cause any problems, but would include a lot of unnecessary overhead, so we turn them off. All other problems use the default :epigenetic-markers from clojush/pushgp/pushgp.clj, which has them turned on.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

Excellent! So now I've worked all the way from defining a new type, adding instructions, getting them registered, and finally using them in a running example. That completes the first pass!

Have you run the tests I wrote? I'd like to make sure (since you have a copy now) that they run—as long as you've made the adjustment spelled out at the top of the midje tutorial.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 21, 2015

I just tried lein midje for the first time after adding that thing to my lein profile (which was non-trivial, since it turns out you have to merge it with existing plugins, of which I had one).

When I run lein midje in your enumerators branch, I get this:

Exception in thread "main" java.lang.Exception: Duplicate Push instruction defined:a0
at clojush.pushstate$register_instruction.invoke(pushstate.clj:42)
at clojush.problems.boolean.mux_6$eval13554.invoke(mux_6.clj:40)
at clojure.lang.Compiler.eval(Compiler.java:6619)
at clojure.lang.Compiler.eval(Compiler.java:6608)
at clojure.lang.Compiler.load(Compiler.java:7064)
at clojure.lang.RT.loadResourceScript(RT.java:370)
at clojure.lang.RT.loadResourceScript(RT.java:361)
at clojure.lang.RT.load(RT.java:440)
at clojure.lang.RT.load(RT.java:411)
at clojure.core$load$fn__5018.invoke(core.clj:5530)
at clojure.core$load.doInvoke(core.clj:5529)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5336)
at clojure.core$load_lib$fn__4967.invoke(core.clj:5375)
at clojure.core$load_lib.doInvoke(core.clj:5374)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invoke(core.clj:619)
at clojure.core$load_libs.doInvoke(core.clj:5413)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:619)
at clojure.core$require.doInvoke(core.clj:5496)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at midje.repl$load_facts$fn__7980.invoke(repl.clj:206)
at midje.repl$load_facts.doInvoke(repl.clj:192)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at user$eval8043.invoke(form-init5683710115900330494.clj:1)
at clojure.lang.Compiler.eval(Compiler.java:6619)
at clojure.lang.Compiler.eval(Compiler.java:6609)
at clojure.lang.Compiler.load(Compiler.java:7064)
at clojure.lang.Compiler.loadFile(Compiler.java:7020)
at clojure.main$load_script.invoke(main.clj:294)
at clojure.main$init_opt.invoke(main.clj:299)
at clojure.main$initialize.invoke(main.clj:327)
at clojure.main$null_opt.invoke(main.clj:362)
at clojure.main$main.doInvoke(main.clj:440)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at clojure.lang.Var.invoke(Var.java:419)
at clojure.lang.AFn.applyToHelper(AFn.java:163)
at clojure.lang.Var.applyTo(Var.java:532)
at clojure.main.main(main.java:37)
Subprocess failed

Do you not get this error? Is it a midje thing, or is it because it's loading in a bunch of Digital Multiplier problem files, which is what it sounds like?

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

Somewhere (probably the top of every test file) I said to only run lein midje :autotest test until we figured out what the problem with clojush is. (It is a clojush error, in the sense that as a default lein midje will try to run both your src and test forks, because many people who write tested Clojure want to have tests intermingled into their source code apparently)

Also lein midje and 'clojush` may be the most irritating things in the world to type on a computer with autocorrect turned on.

@thelmuth

This comment has been minimized.

Copy link
Collaborator Author

thelmuth commented Jul 21, 2015

Yes, you did mention that somewhere (and in the code), and I had the feeling I was doing something wrong but I had forgotten where to look for it.

After doing that, it looks like it's working great! It looks like midje runs continuously and checks things when files are saved? Nifty! It says its passing all tests.

@Vaguery

This comment has been minimized.

Copy link
Contributor

Vaguery commented Jul 21, 2015

Good, now throw that branch away :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment