Does 'seed' ensure that the same children/mutations are generated for a yml configuration and program [using algorithm type genetic]? #135

pdreiter · 2019-07-16T21:11:17Z

I was able to duplicate the scenario where not only do I see different candidates from the STDOUT Darjeeling repair, but also some different repairs.
I launched a few back to back darjeeling runs like so with the :

for i in $(seq 0 1 10); do 
    mkdir -p seed.test.$i; 
    cd seed.test.$i; 
    darjeeling repair --seed 12345678 ../../no_seed.yml |& tee darjeeling.repair.log.seed_12345678; 
    cd ../; 
done

Please note that I'm using the genetic algorithm:

algorithm:
  type: genetic
  population: 200
  generations: 200
  tournament-size: 20
  mutation-rate: 0.8
  crossover-rate: 0.4
  # look at entire test suite for test sampling [subset of testsuite is 100%]
  test-sample-size: null

and I'm not only seeing different candidates, but some different repairs. I have a tarball of the logs and outputs, will upload.

submitted from GitQ

The text was updated successfully, but these errors were encountered:

pdreiter · 2019-07-16T21:12:58Z

Here's the individual darjeeling runs from the aforementioned for loop:
many_tests.tar.gz

ChrisTimperley · 2019-07-23T21:25:33Z

Here's a quick way to find the patch evaluation order from a given log file:

$ grep "evaluating candidate" darjeeling.log.0 | cut -f7 -d:

ChrisTimperley · 2019-07-23T21:40:23Z

Here's a quick way to find the patch evaluation order from a given log file:
$ grep "evaluating candidate" darjeeling.log.0 | cut -f7 -d:

I've fixed the issue for exhaustive search. In doing so, I've found another bug, depending on your perspective. The ID reported for each candidate in the log file (e.g., #55afa924) is based on the hash value of that candidate. Python hashes are not consistent between runs (i.e., hash("foo") may have different values in different Python sessions), and so the ID can't be used to identify the same candidate across multiple runs. I'll create a new issue to assign a stable, unique identifier to each candidate.

ChrisTimperley · 2019-07-23T22:08:27Z

Fixed by #138

This was referenced Jul 23, 2019

Ensure deterministic ordering of test cases #136

Closed

Ensure that test suite coverage is deterministically ordered squaresLab/BugZoo#343

Merged

This was referenced Jul 23, 2019

Ensure that candidate IDs are deterministic and stable between runs #139

Open

Ensure deterministic internal representation for fault localization and transformation list #138

Merged

ChrisTimperley closed this as completed Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does 'seed' ensure that the same children/mutations are generated for a yml configuration and program [using algorithm type genetic]? #135

Does 'seed' ensure that the same children/mutations are generated for a yml configuration and program [using algorithm type genetic]? #135

pdreiter commented Jul 16, 2019

pdreiter commented Jul 16, 2019

ChrisTimperley commented Jul 23, 2019

ChrisTimperley commented Jul 23, 2019

ChrisTimperley commented Jul 23, 2019

Does 'seed' ensure that the same children/mutations are generated for a yml configuration and program [using algorithm type genetic]? #135

Does 'seed' ensure that the same children/mutations are generated for a yml configuration and program [using algorithm type genetic]? #135

Comments

pdreiter commented Jul 16, 2019

pdreiter commented Jul 16, 2019

ChrisTimperley commented Jul 23, 2019

ChrisTimperley commented Jul 23, 2019

ChrisTimperley commented Jul 23, 2019