Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does 'seed' ensure that the same children/mutations are generated for a yml configuration and program [using algorithm type genetic]? #135

Closed
pdreiter opened this issue Jul 16, 2019 · 4 comments

Comments

@pdreiter
Copy link
Collaborator

I was able to duplicate the scenario where not only do I see different candidates from the STDOUT Darjeeling repair, but also some different repairs.
I launched a few back to back darjeeling runs like so with the :

for i in $(seq 0 1 10); do 
    mkdir -p seed.test.$i; 
    cd seed.test.$i; 
    darjeeling repair --seed 12345678 ../../no_seed.yml |& tee darjeeling.repair.log.seed_12345678; 
    cd ../; 
done

Please note that I'm using the genetic algorithm:

algorithm:
  type: genetic
  population: 200
  generations: 200
  tournament-size: 20
  mutation-rate: 0.8
  crossover-rate: 0.4
  # look at entire test suite for test sampling [subset of testsuite is 100%]
  test-sample-size: null

and I'm not only seeing different candidates, but some different repairs. I have a tarball of the logs and outputs, will upload.


submitted from GitQ

@pdreiter
Copy link
Collaborator Author

Here's the individual darjeeling runs from the aforementioned for loop:
many_tests.tar.gz

@ChrisTimperley
Copy link
Collaborator

Here's a quick way to find the patch evaluation order from a given log file:

$ grep "evaluating candidate" darjeeling.log.0 | cut -f7 -d:

@ChrisTimperley
Copy link
Collaborator

Here's a quick way to find the patch evaluation order from a given log file:

$ grep "evaluating candidate" darjeeling.log.0 | cut -f7 -d:

I've fixed the issue for exhaustive search. In doing so, I've found another bug, depending on your perspective. The ID reported for each candidate in the log file (e.g., #55afa924) is based on the hash value of that candidate. Python hashes are not consistent between runs (i.e., hash("foo") may have different values in different Python sessions), and so the ID can't be used to identify the same candidate across multiple runs. I'll create a new issue to assign a stable, unique identifier to each candidate.

@ChrisTimperley
Copy link
Collaborator

Fixed by #138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants