TSP_20 task reward function #10

wasd12345 · 2018-11-19T00:01:41Z

Hi, thanks for sharing your code.

In the file "tsp_task.py", I see the TSP reward defined in the usual way as the sum of distances between consecutive vertices in the cylce. But it looks like you also started to generalize this, or maybe had to manually tune things a bit for tsp_20? I see this:

# For TSP_20 - map to a number between 0 and 1
# min_len = 3.5
# max_len = 10.
# TODO: generalize this for any TSP size
#tour_len = -0.1538*tour_len + 1.538 
#tour_len[tour_len < 0.] = 0.
return tour_len

So for the results you showed for the tsp_20 and tsp_50 tasks, did you use the usual sum of distances that would run as is in the current uncommented code, or did you find it helped to do those affine then clip operations you have commented out? Are these values derived theoretically or empirically?

Any other thoughts here in terms of generalizing the reward to work with arbitrary tour length?

Thanks!

The text was updated successfully, but these errors were encountered:

pemami4911 · 2018-11-20T17:07:21Z

I used the usual sum of distances for the results. I don't think that the commented out lines worked better, but it's been a while, sorry.

pemami4911 closed this as completed Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TSP_20 task reward function #10

TSP_20 task reward function #10

wasd12345 commented Nov 19, 2018

pemami4911 commented Nov 20, 2018

TSP_20 task reward function #10

TSP_20 task reward function #10

Comments

wasd12345 commented Nov 19, 2018

pemami4911 commented Nov 20, 2018