# Sorting the travelling salesman problem

## Abstract

We look here at 2 related problems:
The travelling salesman problem (TSP),
and the sorting problem (SP) - understanding the SP
as the problem that is solved by sorting algorithms.
We only consider the euclidean TSP, where cities are points in the plane.
Both problems have very similar solution spaces,
and very similar ways of navigating these solution spaces.
Basically, the solution spaces are, in both cases,
the set of permutations of some given set of objects;
and we navigate them using a measure
on a generating set for the permutations,
which we try to minimize.
However, while this navigation results in solving the SP instances,
it only provides heuristics to approach the solution
in the case of instances of the TSP.
We try to explain this difference
examining the topology of these solution spaces.

## Sets and orderings

For any instance of both the TSP and the SP,
the solution space is the set $\mathcal{P}$ of orderings of some set $\mathbf{A}$ of objects, understanding an *ordering* as a bijective function $\mathcal{p}: \mathbf{I} \rightarrow \mathbf{A}$,
with $|\mathbf{A}| = n$ and $\mathbf{I} = [1..n] \subset \mathbb{N}$ (so $|\mathcal{P}| = n!$).
In the case of the SP, the objects are (or can be trivially mapped to) real numbers,
and for each instance of the problem we have a finite subset $\mathbf{A} \subset \mathbb{R}$.
In the case of the TSP they are points in the plane
(here, for simplicity, we will take the plane to be the complex plane),
so for each instance of the problem we have a finite subset $\mathbf{A} \subset \mathbb{C}$.

As is customary when formulating these problems,
we start with an initial arbitrary ordering $\mathcal{o}$ on these sets.
In the SP, we are generally provided with a list of numbers to sort,
which is obviously a set of numbers and an initial ordering for them.
In the TSP, the usual formulation also includes an initial ordering
(or labelling of the cities with integers;
for a fairly up to date review on the TSP, see [1])

Now we consider the symmetric group $\mathcal{S}_n$ acting on $\mathbf{I}$ in the usual way,
which allows us to obtain any possible ordering $\mathcal{p} \in \mathcal{P}$ of the elements of $\mathbf{A}$:

$$
\forall \mathcal{p} \in \mathcal{P} : \exists \sigma \in \mathcal{S}_n : \mathcal{o} \circ \sigma = \mathcal{p}
$$

[Here](Python-1.ipynb) we can find some Python code that provides an implementation of these concepts.

## Measures

Using all the information above, we can provide a *measure* - defined as a function $\mathcal{S}_n \rightarrow \mathbb{R}$ - to both the SP and the TSP.

In the case of the SP we will call this measure $\phi$, defined, for $\sigma \in \mathcal{S}_n$, by:

$$
\sigma \mapsto \sum_{\mathbf{i} = 1}^{n} ((\mathcal{o} \circ \sigma)(\mathbf{i}) - \mathbf{i})^2
$$

Note that when $\sigma$ is a transposition,
comparing the value of $\phi(\sigma)$ and of $\phi(\mathbf{id}_{\mathcal{S}_n})$
is the same as comparing the values that are being transposed,
and so to performing the usual comparison used in sorting algorithms (i.e., using $<_{\mathbb{R}}$).

In the case of the TSP we will call this measure $\psi$, defined for $\sigma \in \mathcal{S}_n$ by:

$$
\sigma \mapsto \sum_{\mathbf{i} = 1}^{n - 1} |(\mathcal{o} \circ \sigma)(\mathbf{i}) - (\mathcal{o} \circ \sigma)(\mathbf{i} + 1)|^2
$$

In this case, note that $\psi$ just provides the length of the tour given by $\mathcal{p} = \mathcal{o} \circ \sigma$ (though, to keep the expressions simple, we do not require here that the salesman ends up in the same city they started from, only that they visit all cities).

[Here](Python-2.ipynb) We can find a Python implementation of these functions (note that in this implementation, the salesman ends up in the starting city).

## Brute force

All the above is enough to provide an algorithm to solve both problems, i.e., find the $\sigma \in \mathcal{S}_n$ that minimizes the quantity $\phi(\sigma)$ (or $\psi(\sigma)$).

This algorithm forces us to calculate the measure for every element in $\mathcal{S}_n$,
which is intractable for relatively small $n$. It is what is usually called the naïve or brute force algorithm.

[Here](Python-3.ipynb) we can find a Python demo of this algorithm.

## A smarter algorithm

To reduce the complexity of the algorithm,
the natural strategy is to, instead of considering every element in $\mathcal{S}_n$,
consider only the elements in a generating set $\mathcal{G} \subset \mathcal{S}_n$:
if we assume that  $\tau \in \mathcal{S}_n$ provides the solution to some instance of either problem,
we want to find a sequence $(\mathcal{g}_1, \mathcal{g}_2 ... \mathcal{g}_k)$ of elements
of $\mathcal{G}$ such that $\mathcal{g}_1 \cdot \mathcal{g}_2 ... \cdot \mathcal{g}_k = \tau$.

Obviously we do not want to consider every possible such sequence; that would not be even naïve, it would be stupid.

What we want is to
check the measure on all the $\mathcal{g}_i \in \mathcal{G}$,
choose one of those elements $\mathcal{g}_1$ that decreases the measure wrt $\mathbf{id}_{\mathcal{S}_n}$,
and then repeat the procedure with $\mathcal{g}_1 \cdot \mathcal{g}_i$,
again for all the $\mathcal{g}_i \in \mathcal{G}$, to choose another $\mathcal{g}_2$
(not necessarily distict from $\mathcal{g}_1$),
accumulating a sequence of elements of $\mathcal{g}_1, \mathcal{g}_2 ... \in \mathcal{G}$ until there is no more
choice that would decrease the measure.
This is what we will call the smart algorithm.

We can assume as a first aproximation that the cardinality of the generating sets
will correspond directly with the complexity of the resulting algorithms.
In reality the correspondence will not be that simple,
as the number of steps needed to reach the solution
(the number of $\mathcal{g}_i$ in the solution)
will also be a factor in the complexity,
and it will decrease as the cardinality of $\mathcal{G}$ increases.
But not by much, we are now in the zone of sublinear effects.

[Here](Python-4.ipynb) we can find a Python implementation of this algorithm.

## Generating sets

The smallest generating set for $\mathcal{S}_n$ is $\{(12), (123...n)\}$; but this will not provide solutions for any problem using the smart algorithm above. 

The smallest generating set that will provide solutions for the SP is the set of all contiguous transpositions $\mathcal{G} = \{(12), (23), ... (n1)\}$, so with $|\mathcal{G}| = n$.
And almost (for an exception, see below regarding 2-opt) any other more complex generating set (that does not distinguish any positions, as the smalles set in the previous paragraph does for 1 and 2) will also provide solutions; for example the set of all transpositions.

If we look closely, and disregard a few clever optimizations, this algorithm, choosing appropriate generating sets, corresponds quite well with the usual sorting algorithms. Check the wikipedia [2] for a few of them.

Additionally, it is not difficult to check that this algorithm, with the choices of generating sets
that do solve all instances of the SP, will not solve instances of the TSP,
but will only provide heuristic approaches.
It can also easily be seen that the heuristics commonly used
to approach instances of the TSP (k-opt and derivatives, see [1])
correspond well with this algorithm.

The k-opt algorithm essentially consists on repeatedly breaking the tour (it is originally defined for the TSP)
in $k$ places and rearranging the resulting pieces.
This procedure corresponds to generating sets $\mathcal{G}$ with cardinality $n \times (n - 1) \times ... \times (n - k + 1)$ (and thus to algorithms with such complexity).

It is also interesting to note that a 2-opt algorithm is not in general enough to solve any instance of the SP.

[Here](Python-5.ipynb) we can find Python functions to construct different generating sets, and code to demonstrate the ideas in this section.

## Topologies

It seems to me that the language of topology is the one that better captures the information
we have available to discriminate between the algorithms that can solve instances of these problems
and those that can't. For all the conceptualization of finite topological spaces I follow here
the work of J.P. May, see [3] or in general all the stuff in [4].

To lay out all this information topologically, we 1st turn our formulations of the instances of the problems into graphs.

First we use the group theory information to define a directed graph $\mathfrak{G}$,
in which the nodes are the elements in $\mathcal{P}$, and the edges are (compositions with) the elements in $\mathcal{G}$.
This is a regular directed graph (all nodes have the same number of in and out edges),
which is connected due to the fact that $\mathcal{G}$ is a generating set for $\mathcal{S}_n$.
So $\mathcal{g} \in \mathcal{G}$ would be an edge out of any $\mathcal{p} \in \mathcal{P}$
and into $(\mathcal{p} \circ \mathcal{g}) \in \mathcal{P}$.

Now we use our measure functions to define a subgraph $\mathfrak{M} < \mathfrak{G}$ in which,
for each node $\mathcal{p} \in \mathcal{P}$
we assume $\mathcal{p}$ to be the initial ordering,
and we leave at most one outgoing edge,
such that it has to end in a node with a smaller measure than $\mathcal{p}$,
and no other outgoing edge from $\mathcal{p}$ in $\mathfrak{G}$ can end in a node with smaller measure.

Clearly $\mathfrak{M}$ is no longer regular nor necessarily connected.

Now we define 2 different topologies on $\mathfrak{M}$.

For the 1st topology $\mathcal{T}_\mathcal{G}$ we consider that having $\mathcal{p}, \mathcal{q} \in \mathcal{P}$,
$\mathcal{p} < \mathcal{q}$ iff it is possible to go from $\mathcal{q}$ to $\mathcal{p}$
traversing any number of edges in the forward direction;
and we use the poset structure given by that relation to define the topology.

The 2nd topology $\mathcal{T}_\mathbb{R}$ on $\mathcal{P}$
is given by $\phi$ or $\psi$ as embeddings of $\mathcal{P}$ into $\mathbb{R}$,
so we can inherit the standard topology from $\mathbb{R}$.
In this case the topology would correspond to a total order.

So given this, we can define a curve in $\mathcal{P}$ as a map $[1..k] \rightarrow \mathcal{P}$;
and use the defined topologies to distinguish continuous curves,
as monotonic curves wrt the order associated to the topology.
Of course a curve that is continuous in one topology need not be so in the other.

Now, we can use $\mathcal{T}_\mathcal{G}$ to build continuous curves without
explicitly *constructing* the full space, in the sense of calculating the measure for all points in $\mathcal{P}$.
Building curves this way corresponds to using our smart algorithm.

But we cannot use $\mathcal{T}_\mathbb{R}$ to build continuous curves without
explicitly constructing the full space.
Building a continuous curve in $\mathcal{T}_\mathbb{R}$ would correspond
to using the naïve algorithm to find the solution.

The issue here is that in some cases, the continuous curves in $\mathcal{T}_\mathcal{G}$
are enough to find the topologically distinguished points in $\mathcal{T}_\mathbb{R}$
(in particular, the unique point which is in an open set in $\mathcal{T}_\mathbb{R}$ just by itself - the solution)
and in other cases they are not enough.

We need 2 topologies that share the same topologically distinguished points:
we need 2 topologies that are homotopy equivalent.

This is actually a lot simpler than it sounds:
the topologically distinguished points in $\mathcal{T}_\mathcal{G}$
in this sense and these spaces are just
the sinks in the $\mathfrak{M}$ graphs,
and since in $\mathcal{T}_\mathbb{R}$ we only ever have one point in an open set by itself,
we just need that the companion $\mathfrak{M}$ graph has just one sink;
This would put them in the same homotopy class.

Note that I am here only using topology to understand in what cases these problems can and cannot be solved.
It would also make sense to use this to explore how far certain instances are from being solvable,
depending on some distance between homotopy classes;
or to explore how many instances of each problem are there for each cardinality of  $\mathbf{A}$,
by counting homeomorphism classes.

[Here](Python-6.ipynb) we can find some Python code to explore these ideas
and check the homotopy classes for different types of problem, generating sets, and measures.

## More complex generating sets?

So, all the information we have, with which we can define these graphs and topologies,
is encapsulated for each instance of the problems in the measure function,
and in the generating set we use to look for the solution.

It is easy to see that most generating sets provide a topology
that is homotopy equivalent to $\mathcal{T}_\mathbb{R}$ in the case of instances of the SP
(after all, sorting algorithms do work).
Also easy to see that in the case of instances of the TSP,
the usual generating sets do not do so (and so they just provide heuristics).

And so the question now is,
can we find a sufficiently complex generating set $\mathcal{G}$ for $\mathcal{S}_n$
that will provide a topology for any TSP instance
that is in the same homotopy class as the $\mathcal{T}_\mathbb{R}$ topology provided by $\psi$?
Say, for example, 20-opt or something like that,
in the same way that 2-opt is a heuristic for the SP but 3-opt solves it?

We consider here that a more complex $\mathcal{G}$ will essentially correspond to a higher $k$
in the k-opt nomenclature; i.e., a more complex $\mathcal{G}$ will break the tour into more pieces
to be rearranged into new tours; until we arrive at n-opt, breaking the tour into $n$ pieces,
and $\mathcal{G} = \mathcal{S}_n$ and we are back in the brute force algorithm.

However, we can provide a method ([check the code here](Python-7.ipynb)) to build an instance of the TSP
in which the 2 shortest tours are an arbitrary number of breaks away;
i.e., that will require an arbitrarily complex $\mathcal{G}$.

Therefore, we cannot find a unique $\mathcal{G}$ to generally solve the TSP;
we cannot vary $\mathcal{G}$ to find a polynomial time algorithm to solve it.

## Manifolds?

So the only other thing that remains that can be varied,
so steer the heuristic approaches to the TSP into the realm of solutions,
is the measure, $\psi$.
We can, for example, consider, as local data,
not only the measure on some node,
but also the measure on all its neighbouring nodes in the $\mathfrak{G}$ graph.

This would require in my opinion to stop considering the measure as just providing a topology,
and start considering it as a *field* in the $\mathfrak{G}$ graph (provided with the *graph topology*).
We can then use discrete calculus to look at gradients and so on,
and try to find some kind of smooth structure that might allow us to navigate the space
with more finesse than just looking at continuous curves.

This would enter the field of differential geometry.
So how would this work.
We would need that the field,
which is just the measure applied on all the nodes,
be somehow smooth in the graph topology,
so that its local gradient can tell you something
about the global field.
The local gradient must happen in some kind of tangent space;
and in our case the tangents to our continuous curves,
as sequences of points,
are the edges.
So for the tangent space at each point we would have to consider the generating sets, $\mathcal{G}$.
Now, to have something approaching a finite manifold,
in which we can define such a gradient to navigate such a field,
we need the tangent spaces to have the same dimensionality as the base space.
The spaces $\mathcal{P}$ we are dealing with have $n$ $\mathbb{F}_n$ dimensions
(with the restriction that each dimension holds a different value)
and so we need a $\mathcal{G}$ that needs to be expressed in that dimension.
And that basically suggests to me that the generating sets for the tangent spaces need be
the full set of permutations.
Or, in other words, the only $\mathcal{G}$ that is not an arbitrary choice as tangent spaces
for $\mathcal{S}_n$ is $\mathcal{S}_n$ itself;
any other $\mathcal{G}$ will not be able to be taken as a microscopic, or linearized,
image of $\mathcal{S}_n$.
So my conclusion is that this would not provide an advantageous algorithm.

I have made a few very inconclusive tests for this though,
using machine learning to train models to distinguish tours that are the solution
from tours that are a sink but not the solution,
training them with all the local information available.

Using contiguous transpositions as generating set,
for instances of the TSP with 20 cities I am getting a confusion matrix
(for instances of the problem that the model has never seen) like
$\left[\begin{smallmatrix}61 & 39 \\\\ 26 & 74\end{smallmatrix}\right]$
whereas for 30 cities I am getting
$\left[\begin{smallmatrix}53 & 47 \\\\ 34 & 66\end{smallmatrix}\right]$.
So it looks like for more cities we would need more complex generating sets,
supporting my conjecture.

This code is work in progress and the results are very preliminary
(since I don't have access to that much computing power),
but if anyone is interested I can clean it up and link it here.

## References

1. David S. Johnson and Lyle A. McGeoch "The Traveling Salesman Problem: A Case Study in Local Optimization", in "Local Search in Combinatorial Optimization,"  E. H. L. Aarts and J. K. Lenstra (eds.), John Wiley and Sons, London, 1997, pp. 215-310.

2. https://en.wikipedia.org/wiki/Sorting_algorithm

3. J. Peter May "Finite Topological Spaces" http://math.uchicago.edu/~may/FINITE/REUNotes2010/FiniteSpaces.pdf

4. J. Peter May "Various Papers on Finite Spaces" http://math.uchicago.edu/~may/FINITE/

All code licensed under the GPLv3. Copyright by Enrique Pablo Pérez Arnaud.