Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seeded random instances #13

Open
dietmarwo opened this issue Aug 1, 2022 · 2 comments
Open

seeded random instances #13

dietmarwo opened this issue Aug 1, 2022 · 2 comments
Assignees

Comments

@dietmarwo
Copy link

dietmarwo commented Aug 1, 2022

Very impressed with this work. The grasp algorithm performs excellent considering the fact it is single threaded and is written in Python. Learned about anpcp two days ago and obj_func_module.py with its visualisation really helped me to understand what this is about.

May be you can add a "seed" argument to Instance.random so that it can generate repruducible
instances you no longer need to store:

import numpy as np
...
    def random(cls, n: int, m: int, x_max: int = 1000, y_max: int = 1000, seed: int = None) -> "Instance":
        distinct_coords = set()
        total = n + m
        if seed is None: 
            seed = randint(0, sys.maxsize)
        rng = np.random.default_rng(seed)
        while len(distinct_coords) < total:
            distinct_coords |= {
                (rng.integers(0, x_max), rng.integers(0, y_max))

see https://towardsdatascience.com/stop-using-numpy-random-seed-581a9972805f .

More general questions arise how easy this method can be adapted to problem variants.

Example a) Lets assume we are not only interested in the alpha-best distance, but in a (weighted) sum of the alpha best distances. May be the distance to the nearer facilities also matter for some users. Not always the alpha-1 nearer facilities are down.

Example b) May be a user has not yet selected a set of possible facility locations, only their number is known and we want to compute their optimal locations. Finding exact facility location options is deferred to a later stage using the determined optimal locations. This variant is a continuous optimization problem.

How hard is it to adapt your method to these variants?

Progress often is triggered by competition, so I tried to come up with an alternative Python implementation shining for some problem instances were grasp struggles. Main advantages of my implementation are:

  • Very easy to code (a few hours).
  • Easy to adapt to problem variants.
  • Performs good with the TSP sample problems.
  • No "swap" implementation required, just a 3 line fitness function.

Disadvantages are:

  • Performs less good at extremly large random samples (2000 users, 2000 facilities)
  • Requires a modern many-core CPU - tested with a 16 core AMD 5950x
  • No "partial" fitness evaluation, fitness needs to be reevaluated for each solution candidate.

Question is how relevant random problem instances are in the real world. May be they are misleading algorithm design in the wrong direction? Very good you already included samples derived from real world TSP instances.

For 'data/rl1323_993_330_4.anpcp.tsp' p=12, alpha=2 my algorithms finds

selection = [114 7 251 296 161 329 252 117 85 26 133 162]
value = 4190.0

In less than a minute. I tried grasp for 5000 iterations, the best it found was 4388. My algo performs about 150000 fitness evaluations / sec, using 32 parallel threads on a AMD 5950x CPU on linux. On windows it is only about 100000 evaluations / sec, because windows has a bad implementation of Python multithreading.

See https://github.com/dietmarwo/fast-cma-es/blob/master/examples/anpcp/anpcp.py for my code. You either can check out the whole repo or do a "pip install fcmaes" to try it out. https://github.com/dietmarwo/fast-cma-es/blob/master/examples/anpcp/anpcpc.py is the continuous variant (b), variant (a) is implemented in both cases but commented out.

@netotz
Copy link
Owner

netotz commented Aug 9, 2022

May be you can add a "seed" argument to Instance.random so that it can generate repruducible
instances you no longer need to store:

I agree, my revisors have suggested this as well.

a) Lets assume we are not only interested in the alpha-best distance, but in a (weighted) sum of the alpha best distances. May be the distance to the nearer facilities also matter for some users. Not always the alpha-1 nearer facilities are down.

What do you exactly mean by "alpha-best"? In the ANPCP, a user is assigned to its alpha-th closest facility, and the facilities closer to it don't matter for the objective function (although they do matter for the fast interchange). If you are using the same definition, then this variant would be something like "the alpha-neighbor p-median problem (ANPMP)", considering the objective function of the p-median problem which is a sum of distances as you suggest. In that case it would be interesting to adapt the fast interchange for the ANPMP, I would try to follow the same simplifications by Resende and Werneck.

If that's not the case, could you give more details about that variant?

b) ... This variant is a continuous optimization problem.

I don't think the fast interchange can be applied for a continuous problem, why considering an interchange when you could take any point in the plane? In my opinion, continuous problems have the disadvantage of being unrealistic, but there are plenty of them in the literature and I think it is an interesting variant too.

Progress often is triggered by competition, so I tried to come up with an alternative Python implementation shining for some problem instances were grasp struggles.

Nice! I'll be taking a look at the source code of your implementations, I'm interested in what is that fitness function. Regarding the quality of the code, I want to add that unfortunately this repo has a technical debt that I haven't addressed yet because it keeps getting more complex. It needs a refactorization.

Question is how relevant random problem instances are in the real world. May be they are misleading algorithm design in the wrong direction? Very good you already included samples derived from real world TSP instances.

Yes, random instances are not the best and that's why I have included some from the TSPLIB. The problem is that the TSPLIB considers only one set for users and facilities, while I'm tackling the problem by using 2 different sets, so I have split them by choosing random nodes as facilities, but these are not standard instances.

I like your approach of using multithreading, it's definitely a future work for my case. I'm currently experimenting with the parameters of GRASP and it seems like it's not the best metaheuristic for the ANPCP, for rl1323_882_441_4.anpcp.tsp there's only 1 improvement in 5000 iterations (p = 44, alpha = 3). I'll keep the repo and you updated when I consider other alternatives.

Thank you for taking your time to check out the code and contribute to the research of the ANPCP!

@dietmarwo
Copy link
Author

dietmarwo commented Aug 19, 2022

"the alpha-neighbor p-median problem (ANPMP)", considering the objective function of the p-median problem which is a sum of distances as you suggest. In that case it would be interesting to adapt the fast interchange for the ANPM

Indeed, this is no longer the original problem. What I wanted to point out is using my own approach described in Service.adoc this modification is trivial (a one-liner).

has a technical debt that I haven't addressed yet

There are very different approaches "solving this issue". For research work I think readability and simplicity is more important than a fancy complex class hierarchy. I had no issues at all understanding what your code does.

My own ideas are a bit "heretic" from the perspective of algorithm design:
I try to get rid of it completely. There are some similarities to machine-learning:
We replace fancy algorithms by quite stupid but generic deep neural networks powered by the parallelism supported by by modern GPUs.

For optimization I see a similar trend: New generic optimization algorithms specifically designed to be executed many times in parallel can do unexpected, fancy stuff, if you speed up fitness evaluation using numba.

Another related example: p-center-problem , a forked repo were I added the application of a generic algorithm and get much better results as the original one (although it needs a bit more wall time). Or Multi-UAV-Task-Assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants