`.cv_results_` does not include info from first generation #27

ClimbsRocks · 2017-06-27T01:42:02Z

I think there's a fenceposting/off-by-one error somewhere.

When I pass in generations_number = 1, it's actually 0-indexed, and gives me 2 generations. Similarly, if I pass in 2 generations, I actually get 3.

Then, when I examine the cv_results_ property, I noticed that I only get the results from all generations after the first generation (the 0-indexed generation).

This is most apparently if you set generations_number = 1.

I looked through the code quickly, but didn't see any obvious source of it. Hopefully someone who knows the library can find it more easily!

The text was updated successfully, but these errors were encountered:

ClimbsRocks · 2017-07-18T02:34:57Z

@rsteca or even @ryanpeach - any thoughts on how to get the first generation included in .cv_results_?

ryanpeach · 2017-07-18T13:39:14Z

I'll take a look.

ryanpeach · 2017-07-18T13:45:34Z

In the _fit function, where most of the work is done, the history object caclulates after the mate+mutate step. As such, there is a problem where either we keep it the way it is, and loose 1st gen information, or move it above the mate+mutate step and lose the last generation. Maybe we should have a special "first run" condition which saves the 1st generation data.

ClimbsRocks · 2017-07-18T18:14:04Z

That sounds good to me. Personally, I'm more interested in the 1st run than the last run (first run is where we try all the crazy ideas, and we'll see the most variance across different hyperparameter combinations, while last run is generally a bit safer and more boring combinations of things we've already tried before).

But I like your idea- sounds like a pretty simple bit of code to get all the data people would expect. Thanks for finding that!

ryanpeach · 2017-07-18T19:32:30Z

Someone test this branch #29

ryanpeach · 2017-07-18T19:40:26Z

I basically discovered that we just haven't included the evaluation step of the population in the history logger. I've now added both evaluation and selection steps but they need testing.

ryanpeach · 2017-08-14T00:54:39Z

Hey, so I think we have a misunderstanding. cv_results_ does not include "Generation information." it includes all generated individuals from all generations. It's a pretty big table...

ClimbsRocks · 2017-08-16T21:29:04Z

@ryanpeach Yeah, i understand that we're including individuals in .cv_results_, not generation information. but, from what i can understand, we're not including any of the individuals from the first generation right now.

i ran into this issue when i ran a pretty small search space that was only two generations, and the second generation was primarily just re-picking candidates from the first generation.

try setting generations_number=1, and i think you'll see the issue i'm talking about.

thanks for looking into this! it's a really cool project, and a pretty big improvement over gridsearch

ryanpeach · 2017-08-16T22:35:00Z

@ClimbsRocks Great, ok just being clear. Wasn't sure.

I'm actually not super familiar with how DEAP works (which is the framework we use). I am following the code referenced here:

http://deap.readthedocs.io/en/master/api/tools.html

history = History()

# Decorate the variation operators
toolbox.decorate("mate", history.decorator)
toolbox.decorate("mutate", history.decorator)

# Create the population and populate the history
population = toolbox.population(n=POPSIZE)
history.update(population)

# Do the evolution, the decorators will take care of updating the
# history
# [...]

import matplotlib.pyplot as plt
import networkx

graph = networkx.DiGraph(history.genealogy_tree)
graph = graph.reverse()     # Make the grah top-down
colors = [toolbox.evaluate(history.genealogy_history[i])[0] for i in graph]
networkx.draw(graph, node_color=colors)
plt.show()

Here:

pop = toolbox.population(n=self.population_size)
        hof = tools.HallOfFame(1)

        # Stats
        stats = tools.Statistics(lambda ind: ind.fitness.values)
        stats.register("avg", np.nanmean)
        stats.register("min", np.nanmin)
        stats.register("max", np.nanmax)

        # History
        hist = tools.History()
        toolbox.decorate("mate", hist.decorator)
        toolbox.decorate("mutate", hist.decorator)
        hist.update(pop)

And here

idxs, individuals, each_scores = zip(*[(idx, indiv, np.mean(indiv.fitness.values))
                                                for idx, indiv in list(gen.genealogy_history.items())
                                                if indiv.fitness.valid and not np.all(np.isnan(indiv.fitness.values))])

Just for the reference.

I'm lost as to how the history object works, but I think it contains all individuals ever populated in pop, and then "decorates" those individuals by the decorator commands "such as, creating a graph of who was selected, or who mated with who." But the evaluation step is saved in the history automatically I think.

I'll keep looking I guess, just thinking out loud.

ryanpeach · 2017-08-16T23:25:39Z

@ClimbsRocks

Hey, so I did what you said and I'm just not replicating the results. On the test.ipynb notebook (use my fork) if you put generation_number to 1 you still get some individuals. Note, they wont be the same number of individuals as population_size indicates, because if 2 individuals are functionally the same, they are treated as the same (so a population of 3 "111" individuals in history just shows up as some individual "111"). You sure you aren't just miscounting?

If you are sure this is still an issue, please provide an example jupyter notebook. Thanks!

ryanpeach · 2017-08-16T23:30:14Z

And... now I'm seeing it. I swear it worked just a min ago...

ryanpeach · 2017-08-16T23:34:17Z

Nope, nvm, it works as expected. Here is a link to my notebook:

https://github.com/ryanpeach/sklearn-deap/blob/test_issue27/test.ipynb

rsteca closed this as completed Sep 7, 2017

ClimbsRocks mentioned this issue Sep 8, 2017

First generation is ignored #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.cv_results_` does not include info from first generation #27

`.cv_results_` does not include info from first generation #27

ClimbsRocks commented Jun 27, 2017

ClimbsRocks commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ClimbsRocks commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ryanpeach commented Aug 14, 2017

ClimbsRocks commented Aug 16, 2017

ryanpeach commented Aug 16, 2017 •

edited

Loading

ryanpeach commented Aug 16, 2017

ryanpeach commented Aug 16, 2017

ryanpeach commented Aug 16, 2017

.cv_results_ does not include info from first generation #27

.cv_results_ does not include info from first generation #27

Comments

ClimbsRocks commented Jun 27, 2017

ClimbsRocks commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ClimbsRocks commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ryanpeach commented Jul 18, 2017

ryanpeach commented Aug 14, 2017

ClimbsRocks commented Aug 16, 2017

ryanpeach commented Aug 16, 2017 • edited Loading

ryanpeach commented Aug 16, 2017

ryanpeach commented Aug 16, 2017

ryanpeach commented Aug 16, 2017

`.cv_results_` does not include info from first generation #27

`.cv_results_` does not include info from first generation #27

ryanpeach commented Aug 16, 2017 •

edited

Loading