Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set up evolution steps #4

Closed
rogiervandergeer opened this issue Aug 14, 2017 · 16 comments
Closed

How to set up evolution steps #4

rogiervandergeer opened this issue Aug 14, 2017 · 16 comments

Comments

@rogiervandergeer
Copy link
Collaborator

The evolution will contain a chain of steps, which can in sequence be applied to a population. In principle these steps will need to be no more than functions - we could simply do

for step in self.chain:
    population = step(population)

These step functions would simply look like:

def step(population):
    return population.evaluate()

but we shouldn't replace them by lambda's for the sake of pickling.

In order to provide arguments to the methods of the population inside such functions we could
re-define the functions with the proper arguments for each time we use them.

Now on the other hand it would be nice to be able to give a name to such a step, for example to identify the step in logging or to make it easier to debug. We could achieve that by creating a namedtuple with name and function fields. Once we have a namedtuple, we could decide to put the arguments not baked into the function, but in their own field. The application of the steps would then look like

for step in self.chain:
    population = step.function(population, **step.kwargs)

We can take it one step further by defining an EvolutionStep class, and make classes for each of the step types inheriting from that class. This would look like:

class EvolutionStep:

    def __init__(self, name, **kwargs):
        self.name = name
        self.kwargs = kwargs


class EvaluationStep(EvolutionStep):

    def __init__(self, name, lazy=False):
        EvolutionStep.__init__(self, name=name, lazy=lazy)

    def apply(self, population):
        return population.evaluate(**self.kwargs)

which has the very nice property that the arguments are well taken care off as the function is only defined once (as a method to the step class). In this case the Evolution.evaluate method can
look as simple as

def evaluate(self):
     result = copy(self)
     result.chain.append(EvolutionStep())
     return result

On the other hand we will need to define a separate Step class for each operation.

@koaning, any preference?

@koaning
Copy link
Contributor

koaning commented Aug 15, 2017

What would the repeat step look like? Will we make many many Step classes?

@rogiervandergeer
Copy link
Collaborator Author

We would indeed need a step class for each of the operations, i.e. for evaluate, survive, filter, map, breed, repeat, etc.

@koaning
Copy link
Contributor

koaning commented Aug 17, 2017

I really like the idea of being able to pickle everything [I recall multicore having very poor support for lambda's too]. The library might get a bit more verbose but this is more than compensated by clearer debugging and general code safety.

I'll make a new branch and a small start.

@koaning
Copy link
Contributor

koaning commented Aug 17, 2017

Mhm. Just to check. @rogiervandergeer Don't we want this?

class EvolutionStep:

    def __init__(self, name, *args, **kwargs):
        self.name = name
        self.args = args
        self.kwargs = kwargs

class BreedStep(EvolutionStep):

    def __init__(self, name):
        EvolutionStep.__init__(self, name=name)

    def apply(self, population) -> Population:
        return population.breed(*self.args, **self.kwargs)

Explicit is better than implicit, but what happens if you pass the required parent_picker and combiner as unnamed parameters?

@koaning
Copy link
Contributor

koaning commented Aug 17, 2017

Another thing. What is a good standard for the name that we attach to a step. Currently I am thinking about something like this;

from copy import copy
from .population import Population
from .step import EvaluationStep, ApplyStep

class Evolution:

    def __init__(self):
        self.steps = []

    def __iter__(self):
        return self.steps.__iter__()

    def __repr__(self):
        result = "<Evolution object with steps>"
        return result + "\n".join([f"  -{step.name} for step in self.steps"])

    def evaluate(self, name=None) -> 'Evolution':
        if not name:
            name = f"step-{len(self.steps)+1}-evaluate"
        result = copy(self)
        result.steps.append(EvaluationStep(name=name))
        return result

But it may be a good standard to force that order number such that user cannot overwrite that.

@rogiervandergeer
Copy link
Collaborator Author

Well, I'm not sure if we want to store the *args as well. I don't think we want users to create their own Step objects (although of course we cannot forbid it). The Evolution.breed() must translate all the relevant arguments to their keyworded counterparts:

class Evolution:
    def breed(self, parent_picker, combiner, population_size=None, **kwargs):
        self.steps.append(BreedStep(parent_picker=parent_picker, combiner=combiner, 
                                               population_size=population_size, **kwargs))

and voilà no more *args.

@rogiervandergeer
Copy link
Collaborator Author

rogiervandergeer commented Aug 17, 2017

When it comes to names... that is a tough one. I like having a default name with an order number in there. I'm open for discussion on whether we want to enforce it. But then we have to decide what we do when we combine multiple evolutions.

Say we create an evolution:

evo = Evolution().survive(...).mutate(...).breed(...)

which will then have steps survive-0, mutate-1, breed-2. Then we create an Evolution that incorporates the first one (we didn't define append yet but I feel we need something like that):

appended_evo = Evolution().mutate().append(evo)

What will the steps of this evolution be called?

  1. mutate-0, survive-0, mutate-1, breed-2
  2. mutate-0, survive-1, mutate-2, breed-3
  3. mutate-0, evo-1-survive-0, evo-1-mutate-1, evo-1-breed-2

Option 1 us ugly, 2 is difficult, so perhaps 3 is best?

Or suppose we create an evolution that contains the above:

grp_evo = Evolution().group(...).repeat(evo, n=100).ungroup()

What do we call these? Perhaps we don't even want to add these to the evolution, but make a RepeatStep that contains the entire evolution evo.

@koaning
Copy link
Contributor

koaning commented Aug 17, 2017

Is option 2 that hard?

    def __repr__(self):
        result = "<Evolution object with steps>"
        return result + "\n".join([f"  -{step}-{i}" for i,step in enumerate(self.steps)])

Sure if things are nested it may get tricky, but we can fix this by adding __repr__ to the EvolutionStep perhaps.

This means we may have a slightly different __repr__ for the RepeatStep such that we can have nesting.

@koaning
Copy link
Contributor

koaning commented Aug 17, 2017

The true complexity will appear once we introduce grouping though ... but I am willing to let that one slide for now and worry about that once we're at the grouping stage.

@rogiervandergeer
Copy link
Collaborator Author

Here you introduce the order number inside the Evolution, not inside the EvolutionStep. This means that if you have to throw an exception or log something from inside the step (e.g. when you have no survivors in a survive step) the number isn't available.

It isn't too hard to add the order number to a step when adding it (since we add it from the Evolution, and there the number of steps is available), but changing them is difficult

@koaning
Copy link
Contributor

koaning commented Aug 17, 2017

Ah. I was mainly concerning myself with the __repr__ of the Evolution. I see what you mean now.

I guess if you are really debugging though... you're probably going to take the effort to add names to steps right? I am wondering how much the number we'll add is going to help you with debugging. If you need a number to tell you what mutate step is going wrong because your chain consists of 20 mutate steps you're probably doing something wrong that we can't help you with.

@rogiervandergeer
Copy link
Collaborator Author

True. Perhaps a random 4-digit uuid is good enough? Then we just need a nice function that prints the whole evolution in a human-readable format (explain?)

@koaning
Copy link
Contributor

koaning commented Aug 18, 2017

what would we want to put in .explain() that we would not put in __repr__?

@rogiervandergeer
Copy link
Collaborator Author

The idea of __repr__ is that it must be unambiguous, while __str__ must be readable. (See e.g. here). So for example, it is very nice if executing the output of __repr__ in python gives you a copy of that exact same object (such as for dictionaries, etc). If that is not possible or would be too complicated (as one might argue with an Evolution) the idea is to be very concise about what object it is, i.e. provide a memory address like <Evolution object at 0x00000000>.

The .explain() (or __str__, but I like to provide an explicit method) should give a description as readable as possible, which could be something like

Evolution (name):
 - Survive (name)
 - Breed (name)
 - Mutate (name)

but we could provide much more information.

@koaning
Copy link
Contributor

koaning commented Aug 18, 2017

Would an ordered dict perhaps be better than a list for a chain? That way we might pickle the **kwargs and this may help in explaining too? It may be nice to also list HOW the .mutate() is working (with a named reference to the function we pass into it).

@rogiervandergeer
Copy link
Collaborator Author

I don't see the benefit of an ordered dict. Would you want to store the name in the key?

I agree a keyworded argument for mutate is good. Currently it is func, what would be a better name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants