-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Higher-Order Function Instructions #147
Comments
We do already have |
There are also |
I would be happy to add these. I've done them at least twice in The bigger question, of course, is how do you want to handle big swathes of functionality like this? A few instructions at a time, or a suite all together? |
Personally, I'd say do it however you like! If you want feedback, a little at a time might be good, but most things "never get done", so if you're going to "get it done", then do it your way :) |
So let it be written. |
Please be sure, though, that the execution of a single instruction is always well bounded in space and time, since we only check for limits being exceeded between instructions. I'm pretty sure we've been careful to do this -- for the definition of "well bounded" that has mattered for us in practice -- for every instruction currently in Clojush, but I also think (though I'm not sure) that @Vaguery has had a somewhat different philosophy on this (maybe because of different ways to catch things when they run out of control). |
My idea right now is to bring over something like the Briefly: am When created, the A standard library of pull the top
Note: in most of the implementations I've worked on where the goal was extending Push, I did not want anything reaching into the associated sequence and changing it. In Anyway, @lspector, there is no more than one additional copy of the sequence ever, and so I think the computational implications of this approach are in keeping with the Push philosophy at least as much as |
I didn't fully digest all of the details, but this looks cool to me. I like the idea of the enumerator stack. It always felt a little scary to me that indices for the exec iteration instructions are kept on the exec and integer stack, where they can be messed around with easily. An enumerator stack that keeps track of that stuff without multipurposing the exec and integer stacks seems like a clean way to avoid that mess. Thumbs up! |
Is there a pre-existing idiom (or convenience function) for adding a new type to |
Nope. The only current way that I'm aware of is to just add it to that list of stack types. |
I could use some feedback on the work I've done so far. It's not in any sense "done", mainly because there's a huge amount of repetition in the instruction definitions, but I'd like to know if it makes sense so far to an external reader familiar with Clojush's overarching structure. You can compare to the more-or-less intact Clojush state using the branch comparison view |
Definitely looking like a good start! A few comments:
Otherwise, it's hard to tell if things look "right" without comments about what they're supposed to do. |
Do you know if there is room inside Good catch on the return state. I was just getting around to getting rid of all the repetition that masked that from me. Do the midje tests, which explicitly say what every thing should do in every edge case, help clarify what the code is intended to do? I will also add comments because that's the standard and because the docs are generally pretty bad, but the intention is always best spelled out in the test suite. |
Do you know if there are any instructions that don't start by setting the metadata of their requirements, and (and I mean when they're "unrolled", too; I see for example the inner function of |
I don't know! You could always try and see if anything breaks.
I didn't check, but will. Though, that assume there aren't any typos/bugs in the test suite. And also: English is easier to understand than someone else's code.
Not that I know of. But, there's no reason you couldn't use another idiom. Each instruction has to be a function that takes a push state and returns a push state, and needs metadata associated with it that tells what stacks it uses, but you can get there any way you like. |
The |
Was just about to open this as an issue :) |
I've written a little testing helper function called
I am making an effort to use this in every test call of an instruction, so that every returned value from every edge case is validated. Obviously this is too much work for the interpreter to do in the normal course of running a program (?), but it's really pretty spooky that this sort of thing (blithely ignoring nil arguments) is still lurking in there. |
Was that last comment supposed to go in #153? |
Will do A few questions of preference:
What have I missed? |
@thelmuth No, it as intended for here, as a response to the bugs you caught! |
This sounds a bit crazy. Good crazy? I don't know. It would be a lot easier to have infinite looping behavior with the looping you're talking about. re: |
No, I'm sorry, I can't possibly see how anybody could do that. Good grief, man, these aren't computers we're... oh wait. Yes! |
While we're discussing things: as I've mentioned, this isn't my first go on the Enumerator ride. One set of instructions (from Just sayin'. My favorite is one I suppose would be called |
I'm all for this! While it might be sort of crazy, it seems like it could be very useful at times. |
There still should be some discussion of what The sort of canonical reducer is injecting |
I will postpone "looping" for the time being. |
I notice a lot of the instruction names are pretty long. Can I get away with renaming them |
Quick judgment call: Should instructions like |
@thelmuth Let's talk more about proper For numeric As I mentioned above, one approach would be to use metadata associated with the instructions themselves to facilitate this: What if a better The trick here is the same one that interferes with (or complicates)
The first one seems like it's not especially helpful, and unlikely to be very evolvable. The second one breaks a fundamental paradigm of the instructions we've written so far, which is that all arguments are pulled first and then we do the math. The third one opens a can of worms that makes this more like One way of thinking about the second and third paths, I suppose, would be to build a kind of "map closure" or "continuation" object out of a Call this mythic instruction The problem I've seen in The only tricks are, this feels pretty un-Pushy, and it means there needs to be a lot more metadata associated with instructions (and possible Let me walk through an example for clarity:
|
Before reading too deep into your post, I think here's one way we're differing on these thoughts: You seem to want to ensure that the instruction(s) used as the mapping function output the right type and will find the right types. Let me come back to this idea in a second, when you get to [1]. I would personally go down another track entirely. Yes, I would somehow indicate the collection of the output of the map. Not sure exactly how -- there used to be a Another idea that just came to mind is to somehow use epigenetic markers to denote the input/output types for higher-order functions. This might be overkill/weird, but it could also potentially be a nice way to only have 1 map instruction, but be able to have it define any input/output combinations. [1] Ok, back to this thought. Personally, I'd lean toward letting evolution "figure it out" regarding the correct output types. You give the map instruction a block of code, and if it doesn't leave the correct output on the right stack, tough noogies. I don't know if the penalty would simply be a failing map instruction, or a shorter output vector, or what. Same with a Another problem I'm seeing with your suggestion is that you'd (well, I'd) often want to map a block of instructions, not just a single instruction. Your push-forthy ideas seem like they'd be even crazier to do with "more than one instruction functions". Another crazy thought: what if we had a Ugh, I should be working on my defense talk rather than this, but this is so much more interesting. In fact, I should probably put off all of this stuff until after my defense. But, I don't want to put you on hold if now is when you can work on it. Anyway, if I don't respond for a while, that's why. 😿 |
Dude. DEFENSE. |
Major clarificationI just was chatting with @NicMcPhee and I realize you might think I am considering this current branch for submission. This I quite like the idea of a proper higher-order function, along the lines Tom and I are sketching above. But that would be something I'd do in a test-driven way, and using a lot better refactoring techniques. Please recall this is the first Clojure (or Lisp or Java) code I've ever written. Never use the first code! |
@thelmuth after telling you to go practice, I now want to ask you to explain your idea for a sandboxed Also, I don't understand the comment about the "parenthesized block from
To determine the effective arity and/or type of a given code block, I assume you'd have to do a sandboxed dry run, especially with the contingent statefulness of certain instructions' outcomes. But |
Ah, I was talking about how the new Plush genomes need to know how many code blocks to open. Let me explain. When we transitioned into linear Plush genomes, a big reason we did so was to make it so that instructions that expect parenthesized code blocks (maybe what you're calling "code literals") will always have them. When evolving Push code directly, we often found that despite the ability to evolve semantic code blocks, we often (or even usually) found that So, now we get to Plush genomes, which are linear lists of instruction maps. Since they're linear, and we need to translate them into hierarchical Push programs, we have to have a way to indicate where parentheses should be. We considered a few options, but went with this: any instruction that can make use of a code block from the Some instructions, for example We specify the number of code blocks that an instruction uses by the Grok? We need to write this up, but haven't yet besides a small section of my dissertation. |
Partially grok. I totally believe I get how Plush genomes work (but yes, write it up)... but instructions that take arguments from So it seems like you're saying the Plush genome "wants" there to be two code literals ("parenthesized code blocks") on the My confusion comes from a sense that one of the central design principles of Push is that the genome isn't anything to do with the interpreter's behavior: that instructions (as executed) have to be capable of handling any argument that might possibly exist on the stack from which it's taken. The interesting thing, though not unique by design, about |
Correct. But, barring But yes, you are correct, the instructions you linked could make it so that an To help you wrap your mind around this better, all of this prescription of parentheses is a translation-time guarantee, not a run-time guarantee. Does that help? |
Maybe helps to clarify in a different way (?): When @thelmuth says "we force |
Yes, it does help, though I'm still not 100% I've grokked your idea for mappers as sketched in this comment. I does seem, though that we ("we" 😄) could implement a simple
Caveats: This would probably be best if we only considered instructions that produce scalar outputs. Because Concern: One wouldn't want to undertake a huge functional operation without building it out onto the |
Working out an example:
|
This came up in a conversation at GECCO too (I forget with whom). I definitely think there are exciting opportunities in using the environment stack plus a little bit of something that we haven't fully worked out yet. One idea is to provide a minimal way to get an environment and Environments get the full Push state, with all stacks, any parts of which they can use or ignore. But changes to that state are thrown away when we return from the environment, except for changes specified by explicit We had discussed packaging not only environment and return instructions into single function-creating instructions, but also tying this into the tag system so that the function would be tagged and therefore callable by tag reference. I don't think we ever implemented any of this (beyond getting environments and |
I was working on a post saying everything @lspector said, but then got interrupted by having to bathe baby Ben, etc. He gave a good overview of what current environments can do, and it sounds like his other ideas are along extremely similar lines to mine, including the parts about needing to make environments slightly more function-like and attaching tags to them. For kicks here's the stuff I had already typed, which goes into some detail about how environments work:
Ok, now this. First, let's talk about the There are two ways to start an environment. environment_new takes the top block of code on the When an environment starts, it pushes the current Push state onto the |
Seems simple enough. Leaves room for some finesses, like pushing the entire state or just components. We really need to work on the docs, since there's almost no way for anybody to suss all this unfulfilled potential out of the codebase as it exists right now :( |
I just copied Tom's description above to the Orientation category of push-language.hampshire.edu. |
I stepped away for a couple of days, and came back this morning to finish this first design spike. Basically what I wanted from the exercise is to add a new type and some associated instructions, add a new problem definition that invokes those, and be able to run that... and then throw it away and do it again, with a goal of improving the experience. 90% there. So now it's dying with a stack trace that strongly implies to me that the creation of random genomes requires some sort of extra information. Can we talk about that? Am I supposed to provide additional information for this random code generator, for instance? I haven't written any epigenetic markers, personally, on the assumption that defaults were in place.
|
By the way, trying to suss out where that was happening, I set I've pushed the broken state to the current branch, so I can spend some time later trying to identify the problem. |
The function you linked to is a single instruction map generator. If you are using it thinking it will create random genomes, you're going to have problems. What you want is random-plush-genome. If you know all of this, let me know and I'll dig further. It would also help to have a pointer to the code where you're making the random code. |
I'm not actively invoking anything, except via See the file |
First of all, I'm not getting the same exception you did -- did you change the code since then? The exception I'm getting is that you included the set of all instructions One last note: you can easily change arguments on the command line, which makes testing things easier, instead of having to stick things like
|
Oh, I also forgot to mention that you're overwriting the default |
LOL. I copied the argmap from another example, and AFAIK only changed it by putting the require all instructions thing. |
The list concatenation did it. Do you want to explain why this isn't an open issue? |
The problems that explicitly overwrite the |
Excellent! So now I've worked all the way from defining a new type, adding instructions, getting them registered, and finally using them in a running example. That completes the first pass! Have you run the tests I wrote? I'd like to make sure (since you have a copy now) that they run—as long as you've made the adjustment spelled out at the top of the midje tutorial. |
I just tried When I run
Do you not get this error? Is it a midje thing, or is it because it's loading in a bunch of Digital Multiplier problem files, which is what it sounds like? |
Somewhere (probably the top of every test file) I said to only run Also |
Yes, you did mention that somewhere (and in the code), and I had the feeling I was doing something wrong but I had forgotten where to look for it. After doing that, it looks like it's working great! It looks like midje runs continuously and checks things when files are saved? Nifty! It says its passing all tests. |
Good, now throw that branch away :) |
It would be cool to have higher-order function instructions like map, reduce, and filter in Clojush. Maybe they'd only be defined on the array-like types (string, vector_integer, vector_boolean, etc.), or maybe they'd somehow be defined on other stacks as well.
The details are vague, and there are some non-trivial things to figure out, but if anyone wants to take this on it would be cool!
The text was updated successfully, but these errors were encountered: