# The Merry Movie Montage as Colored Traveling Salesman Problem

This notebook is an attempt to adopt the special type of the traveling salesman problem called colored traveling salesman problem for this competition.

# What is a Colored Traveling Salesman Problem?

Colored Traveling Salesman Problem (CTSP) is the special case of Multiple Traveling Salesman Problem (MTSP) which in turn is an extension of the Travelling Salesman Problem (TSP).

MTSP involves assigning m salesmen to n cities, and each city must be visited by a salesman while requiring a minimum total cost. ([citation](https://www.tandfonline.com/doi/full/10.1080/21642583.2019.1674220#:~:text=MTSP%20involves%20assigning%20m%20salesmen%20to%20n%20cities%2C%20and%20each%20city%20must%20be%20visited%20by%20a%20salesman%20while%20requiring%20a%20minimum%20total%20cost.))

However, MTSP is applicable to only the cases in which multiple executing individuals (traveling salesman) share the common workspace (city set). It cannot be used to handle many multi-machine engineering systems where multiple machines' workspaces are not the same and partially overlap with each other. [This paper](https://doi.org/10.3182/20140824-6-ZA-1003.01403) proposed and formulated a new MTSP called colored traveling salesman problem (CTSP). Each of its salesmen is assigned a private city set and all salesmen share a public city set. Every set of cities is colored differently. ([citation](https://www.sciencedirect.com/science/article/pii/S1474667016431289?via%3Dihub#:~:text=However%2C%20it%20is,is%20colored%20differently.))

CTSP looks like a good match for our problem where three sets of permutations beginning with üéÖü§∂ that must be found in each string will be our "private city sets" and all the other permutations will represent a "public city set". The use of wildcards is still not easily translated to these settings.

Luckily our old friend the [LKH-3](http://webhotel4.ruc.dk/~keld/research/LKH-3/) supports CSTP instances. All we need to do is to solve three problems:
1. figure out the LKH input format for CTSP instances;
2. understand the LKH output tour format for CTSP instances;
3. make LKH work with asymmetric CTSP instances.

Let's do it step by step.

# 1. CTSP Input Format

At the [LKH-3](http://webhotel4.ruc.dk/~keld/research/LKH-3/) page we can download CTSP instances by clicking [CTSP](http://webhotel4.ruc.dk/~keld/research/LKH-3/BENCHMARKS/CTSP.tgz) link. Let's have a look at the CTSP instance `/CTSP/INSTANCES/Eil/eil21-3.ctsp`:

```
NAME : eil21-3
COMMENT : 21-city problem (Christofides/Eilon)
TYPE : CTSP
DIMENSION : 21
SALESMEN : 3
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 42 41
  *** 19 lines with nodes' coordinates are omitted ***
21 38 35
CTSP_SET_SECTION
1 2 3 4 5 -1
2 6 7 8 9 -1
3 10 11 12 13 -1
DEPOT_SECTION
1
-1
EOF
```

As we can see the CTSP input format is pretty straightforward, all we need to do compared to the ATSP input format is to:
* change problem type: `TYPE : CTSP` instead of `TYPE : ATSP`;
* specify the number of salesmen: `SALESMEN : 3`;
* provide "private city sets" under the `CTSP_SET_SECTION` keyword, each set starts with its index \\((1, 2, 3)\\) followed by nodes' IDs in this set and ends with -\\(1\\);
* specify the node all salesmen will start their routes from under the `DEPOT_SECTION` keyword.

# 2. CTSP Output Tour Format

Each CTSP instance downloaded above has corresponding output tour found by LKH. So let's examine the output tour `/CTSP/TOURS/Eil/eil21-3.157477.tour` corresponding to the CTSP instance we looked at above:
```
NAME : eil21-3.157477.tour
COMMENT : Length = 157477
COMMENT : Found by LKH [Keld Helsgaun] Thu Nov  8 14:34:19 2018
TYPE : TOUR
DIMENSION : 23
TOUR_SECTION
1 <--- original depot
13
11
12
10
23 <--- not present in the problem, additional depot 
19
20
8
6
9
7
15
18
22 <--- not present in the problem, additional depot
14
3
4
5
2
17
16
21
-1
EOF
```

There is only one tour, not three separate ones for each salesman as one could expect. But the dimension of the tour \\((23)\\) is not the same as of the problem \\((21)\\), there are two nodes that are not present in the problem, the first salesman starts his route from the depot node and each additional salesman starts his route from an additional node. So if the problem has \\(n\\) nodes and \\(m\\) salesmen the output tour will have \\(n+m-1\\) nodes with additional depot nodes numbered \\(n+1, n+2, \ldots, n+m-1\\). 

# 3. Dealing with an Asymmetric CTSP

Now after finding out the input and output CTSP formats for LKH we can write CTSP instance file and initial tour file for our problem and feed it to the LKH, but all we'll get is garbage because LKH expects CTSP to be symmetric, whereas our problem is asymmetric because the distance between permutations is not symmetric, i.e. there are such permutations \\(p\\) and \\(q\\) that \\(d(p,q) \ne d(q, p)\\), where \\(d(p, q)\\) is the distance between \\(p\\) and \\(q\\).

To deal with this problem we will use the approach from the [paper](http://home.eng.iastate.edu/~rkumar/PUBS/atsp.pdf) to convert an asymmetric TSP (ATSP) to a symmetric one (STSP) by doubling its size. Below is a short description of such conversion.

Let \\(D_{n \times n}\\) be an asymmetric distance matrix for a TSP of size n. Then authors define
\\[d_{\max }:=\max _{i \neq j} d_{i j} ; \quad d_{\min }:=\min _{i \neq j} d_{i j}\\]

Authors define the distance matrix \\(D^{\prime}=\left[d_{i j}^{\prime}\right]_{n \times n}\\) as follows:
\\[\forall i, j: d_{i j}^{\prime}:= \begin{cases}0 & \text { if } i=j \\ d_{i j} & \text { if }\left[4 d_{\min }-3 d_{\max }\right]>0, i \neq j \\ d_{i j}+\left[3 d_{\max }-4 d_{\min }+\epsilon\right] & \text { otherwise }\end{cases}\\]

Using the asymmetric distance matrix \\(D^{\prime}\\) authors define a symmetric distance matrix \\(\bar{D}=\left[\bar{d}_{i j}\right]_{2 n \times 2 n}\\) which is a desired symmetric distance matrix:
\\[\bar{D}:=\left[\begin{array}{c|c}
\infty & \left(D^{\prime}\right)^{\top} \\
\hline D^{\prime} & \infty
\end{array}\right]\\]

For notational simplicity, given \\(i \leq n\\), authors use \\(\left[i\right]\\) to denote \\(i + n\\). Thus \\(\left[1\right] = 1 + n, \left[n\right] = 2n\\), etc. Authors call \\(i\\) and \\(\left[i\right]\\) to be a complementary pair of nodes. Furthermore, for each \\(i \leq n\\), the node \\(i\\) is called a real node, whereas the node \\(\left[i\right]\\) is called a virtual node. Authors note that for each \\(i, j,\\)
\\[\bar{d}_{i j}=\bar{d}_{[i][j]}=\infty ; \quad \bar{d}_{[i] j}=\bar{d}_{j[i]}=d_{i j}^{\prime} ; \quad \bar{d}_{[i] i}=\bar{d}_{i[i]}=d_{i i}^{\prime}=0\\]

In other words, the distance between a pair of real or virtual nodes is infinity, whereas the distance between a real
and virtual node is finite and symmetric, and the distance between a complementary pair of nodes is zero.

This way the tour \\[T=i_{1} \rightarrow i_{2} \rightarrow \ldots \rightarrow i_{n}\\] in initial asymmetric settings translates to the tour
\\[\bar{T}=i_{1} \rightarrow\left[i_{1}\right] \rightarrow i_{2} \rightarrow\left[i_{2}\right] \rightarrow \ldots \rightarrow i_{n} \rightarrow\left[i_{n}\right]\\] in the defined symmetric settings. And the authors show that the optimal tour in the difined symmetric settings corresponds to the optimal tour in the initial asymmetric settings.

# The Merry Movie Montage as CTSP

All is left for us to do is to implement the above ideas in code.

We have \\(5280\\) real nodes and \\(5280\\) virtual nodes, first \\(360\\) nodes will represent three sets of our mandatory permutations (private city sets), the depot node will be the last with the ID equal to \\(2 \cdot 5280 + 1 = 10561\\).

In our case \\(d_{\max } = 7\\) and \\(d_{\min } = 1\\).

\\(3 d_{\max }-4 d_{\min } = 21 - 4 = 17\\). To keep distances integer we'll use \\(\epsilon = 1\\), so we need to add \\(18\\) to each non-zero distance in our distance matrix.

But we'll use a slightly different approach. Below we scale weights by the factor of 10, set distances from/to the depot to 35 and set DW to 10**5. This way the last 5 digits of LKH score is the sum of resulting strings' lengths times 10.

In [None]:
import functools
import glob
import itertools
import numpy as np
import pandas as pd

!wget http://webhotel4.ruc.dk/~keld/research/LKH-3/LKH-3.0.7.tgz &>/dev/null
!tar xvfz LKH-3.0.7.tgz &>/dev/null
!cd LKH-3.0.7; make &>/dev/null; cp LKH ..

In [None]:
SIZE = 5280 # total number of permutations
DW = 10**5 # additional weight, 3d_max - 4d_min + epsilon 
INF = 10**9 - 1 # infinite edge weight
TIME_LIMIT = 3600 * 8 # time limit for LKH run, seconds
SEED = 2428 # LKH seed value

def perm_dist(p, q):
    i = p.index(q[0])
    return i if p[i:] == q[:7-i] else 7

def distances_matrix():
    all_perms = list(itertools.permutations(range(7), 7))
    mandatory_perms = all_perms[:120]
    nodes = mandatory_perms * 2 + all_perms
    m = np.zeros((SIZE, SIZE), dtype='int32')
    for i, p in enumerate(nodes):
        for j, q in enumerate(nodes):
            m[i, j] = perm_dist(p, q)
    m *= 10
    m[np.where(m > 0)] += DW
    m[np.where(m == 0)] = INF # treat equal perms in different mandatory sets
    np.fill_diagonal(m, 0) # restore zero weights at the main diagonal
    return m

def write_params_file(initial_tour=None):
    with open('santa.par', 'w') as f:
        printf = functools.partial(print, file=f)
        printf('SPECIAL')
        printf('PROBLEM_FILE = santa.ctsp')
        printf('INITIAL_TOUR_ALGORITHM = CTSP')
        printf('TOUR_FILE = best_tour_$.txt') # $ will be replaced with the tour cost
        printf('OUTPUT_TOUR_FILE = output_tour_$.txt') # save each improvement
        if initial_tour:
            printf('INITIAL_TOUR_FILE = initial_tour.txt')
        printf(f'MAX_CANDIDATES = 5281')
        printf(f'SEED = {SEED}')
        printf('MAX_TRIALS = 100000')
        printf(f'TIME_LIMIT = {TIME_LIMIT}') # seconds
        printf('TRACE_LEVEL = 2')
        printf('PRECISION = 1')

def write_problem_file():
    with open('santa.ctsp', 'w', buffering=-1) as f:
        printf = functools.partial(print, file=f)
        printf('TYPE: CTSP')
        printf(f'DIMENSION: {SIZE * 2 + 1}')
        printf('SALESMEN : 3')
        printf('EDGE_WEIGHT_TYPE: EXPLICIT')
        printf('EDGE_WEIGHT_FORMAT: FULL_MATRIX')
        printf('EDGE_WEIGHT_SECTION')
        # write distances matrix
        inf_row = ' '.join(itertools.repeat(str(INF), SIZE))
        distances = distances_matrix()
        # top half of the distances matrix
        for weights in distances.T: # iterate over columns
            # infinite weights, weights column, distance to depot
            printf(inf_row, ' '.join(map(str, weights)), 35)
        # bottom half of the distances matrix
        for weights in distances: # iterate over rows
            # weights row, infinite weights, distance to depot
            printf(' '.join(map(str, weights)), inf_row, 35)
        printf(' '.join(itertools.repeat('35', SIZE * 2)), INF) # distances from the depot
        # write "private city sets"
        printf('CTSP_SET_SECTION')
        for i in range(3):
            printf(i + 1, end=' ') # set index 
            for j in range(1, 121):
                printf(i * 120 + j, end=' ') # real node of mandatory permutations
                printf(i * 120 + j + SIZE, end=' ') # complementary virtual node
            printf(-1)
        printf('DEPOT_SECTION')
        printf(2 * SIZE + 1)
        printf(-1)
        printf('EOF')

def write_initial_tour_file(initial_tour=None):
    if initial_tour:
        with open('initial_tour.txt', 'w') as f:
            print('TOUR_SECTION', file=f)
            print(' '.join(str(_) for _ in initial_tour), -1, file=f)
    
def solve_ctsp(initial_tour=None, verbose=False):
    write_params_file(initial_tour)
    write_problem_file()
    write_initial_tour_file(initial_tour)
    
    # run LKH-3 to solve CTSP instance
    if verbose:
        !./LKH santa.par
    else:
        !touch lkh.log
        !./LKH santa.par >> lkh.log

We'll provide LKH an initial tour to start optimization from. As the initial tour we'll use the one, found by [my previous notebook](https://www.kaggle.com/kostyaatarik/permutations-rebalancing).

In [None]:
LETTERS = {
    1: 'üéÖ',  # father christmas
    2: 'ü§∂',  # mother christmas
    3: 'ü¶å',  # reindeer
    4: 'üßù',  # elf
    5: 'üéÑ',  # christmas tree
    6: 'üéÅ',  # gift
    7: 'üéÄ',  # ribbon
    8: 'üåü',  # star
}
INV_LETTERS = {v: k for k, v in LETTERS.items()}

solution = pd.read_csv('../input/permutations-rebalancing/submission_no_wildcards_2497_2492_2491.csv')
strings = [[INV_LETTERS[c] for c in s] for s in solution.schedule]
strings.sort(key=len, reverse=True)
print(f'Strings lengths are {[len(_) for _ in strings]}.')

def find_strings_perms(strings, verbose=False):
    all_perms = set(itertools.permutations(range(1, 8), 7))
    perms = []
    for s in strings:
        perms.append([])
        for i in range(len(s)-6):
            p = tuple(s[i:i+7])
            if p in all_perms:
                perms[-1].append(p)
    if verbose:
        lens = [len(_) for _ in  perms]
        print(f'There are {lens} permutations in strings, {sum(lens)} in total.')
        lens = [len(set(_)) for _ in  perms]
        print(f'There are {lens} unique permutations in strings, {sum(lens)} in total.')
    return perms

def rebalance_perms(strings_perms, verbose=False):
    # convert to dicts for fast lookup and to keep permutations order
    strings_perms = [dict.fromkeys(_) for _ in strings_perms] 
    for p in strings_perms[0].copy():  # iterate over the copy to allow modification during iteration
        if p[:2] != (1, 2) and (p in strings_perms[1] or p in strings_perms[2]):
            strings_perms[0].pop(p)
    for p in strings_perms[1].copy():
        if p[:2] != (1, 2) and p in strings_perms[2]:
            strings_perms[1].pop(p)
    if verbose:
        lens = [len(_) for _ in  strings_perms]
        print(f'There are {lens} permutations left in strings after rebalancing, {sum(lens)} in total.')
    return [list(_) for _ in strings_perms]

strings_perms = find_strings_perms(strings, verbose=True)
strings_perms = rebalance_perms(strings_perms, verbose=True)

In [None]:
def ctsp_initial_tour(strings_perms):
    index = {p: i for (i, p) in enumerate(itertools.permutations(range(1, 8), 7), 1)}
    initial_tour = []
    for i, perms in enumerate(strings_perms):
        initial_tour.append(SIZE*2 + i + 1) # depot node for each string
        for p in perms:
            if p[:2] == (1, 2):
                initial_tour.append(i*120 + index[p])
            else:
                initial_tour.append(240 + index[p])
            initial_tour.append(initial_tour[-1] + SIZE) # a complementary virtual node
    return initial_tour


initial_tour = ctsp_initial_tour(strings_perms)

Write all files and feed it to LKH.

In [None]:
solve_ctsp(initial_tour)

Since CTSP objective is to minimize total cost of the route, i.e. the sum of the lenghts of the solution strings, it can tend to output tours with strings' lengths being like \\([2400, 2400, 2700]\\) which is not great for our problem. So instead of examining only the best CTSP solution found by LKH we'll check all the improved tours found by LKH along the optimization.

In [None]:
def read_strings(file_name):
    all_perms = list(itertools.permutations(range(1, 8), 7))
    mandatory_perms = all_perms[:120]
    nodes = mandatory_perms * 2 + all_perms
    
    with open(file_name, 'r') as f:
        lines = [l.strip() for l in f.readlines()]
    lines = lines[lines.index(f'{SIZE*2 + 1}'):-2]
    tour = [int(_) - 1 for _ in lines]
    i0, i1, i2 = sorted(tour.index(i) for i in range(SIZE*2, SIZE*2 + 3)) # depots
    strings = [tour[i0+1:i1], tour[i1+1:i2], tour[i2+1:]]
    for s in strings:
        s[:] = [nodes[_] for _ in s if _ < SIZE] # leave only real nodes
        s_forward, s_backward = [], []
        for directed_s in (s_forward, s_backward):
            directed_s.extend(s[0])
            for p, q in zip(s, s[1:]):
                d = perm_dist(p, q)
                directed_s.extend(q[-d:])
            s[:] = s[::-1]
        s[:] = min(s_forward, s_backward, key=len)
    return strings

def check_solution(strings):
    all_perms = set(itertools.permutations(range(1, 8), 7))
    mandatory_perms = {p for p in all_perms if p[:2] == (1, 2)}
    strings_perms = [set(_) for _ in find_strings_perms(strings)]
    for s in strings_perms:
        if mandatory_perms - s:
            print(mandatory_perms - s)
            return False
    if all_perms - set.union(*strings_perms):
        return False
    return True

def contain_wildcards(strings):
    for s in strings:
        if 8 in s:
            return True
    return False

def write_submission_csv(strings):
    sub = pd.DataFrame()
    sub['schedule'] = [''.join(LETTERS[x] for x in s) for s in strings]
    if contain_wildcards(strings):
        sub_name = f'submission_wildcards_{"_".join(str(len(_)) for _ in strings)}.csv'
    else:
        sub_name = f'submission_no_wildcards_{"_".join(str(len(_)) for _ in strings)}.csv'
    sub.to_csv(sub_name, index=False)
    return sub_name

tour_files = glob.glob('output_tour_*.txt') + glob.glob('best_tour_*.txt')
print("=" * 70)
for f in tour_files:
    strings = read_strings(f)
    strings.sort(key=len, reverse=True)
    print(f'File {f}, strings lenghts are {[len(s) for s in strings]}.')    
    if check_solution(strings):
        print(f'The solution is written to {write_submission_csv(strings)}')
    else:
        print('The solution is invalid.')
    print("=" * 70)


# Wildcards Optimization

We'll use the code from the [notebook](https://www.kaggle.com/yosshi999/wildcard-postprocessing-using-dynamic-programming) created by [Yosshi999](https://www.kaggle.com/yosshi999) to improve found solutions with wildcards.

In [None]:
import itertools
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F


perms = list(map(lambda p: "".join(p), itertools.permutations("1234567")))
perm2id = {p: i for i, p in enumerate(perms)}
perms_arr = np.array([list(map(int, p)) for p in perms])

perms_onehot = np.eye(7)[perms_arr-1, :].transpose(0, 2, 1)
assert np.allclose(perms_onehot[:,0,:].astype(np.int64), (perms_arr == 1).astype(np.int64))

# print("onehot 1234567:")
# print(perms_onehot[perm2id["1234567"]])

# print("onehot 5671234:")
# print(perms_onehot[perm2id["5671234"]])

# print("correlate between 1234567 and 5671234")
left = perms_onehot[perm2id["1234567"]]
right = perms_onehot[perm2id["5671234"]]
matches = F.conv2d(
    F.pad(torch.Tensor(left[None, None, :, :]), (7, 7)),
    torch.Tensor(right[None, None, :, :]),
    padding="valid"
).numpy().reshape(-1)
# print(matches)
must_match_left2right = np.array([-1, -1, -1, -1, -1, -1, -1, 7, 6, 5, 4, 3, 2, 1, 0])
must_match_right2left = np.array([0, 1, 2, 3, 4, 5, 6, 7, -1, -1, -1, -1, -1, -1, -1])
cost_ifmatch = np.array([7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7])
# print("cost of 1234567 -> 5671234:", min(cost_ifmatch[np.equal(must_match_left2right, matches)]))
# print("cost of 5671234 -> 1234567:", min(cost_ifmatch[np.equal(must_match_right2left, matches)]))

M = F.conv2d(
    F.pad(torch.Tensor(perms_onehot[:, None, :, :]), (7, 7)),
    torch.Tensor(perms_onehot[:, None, :, :]),
    padding="valid"
).squeeze().numpy()

must_match_left2right = np.array([-1, -1, -1, -1, -1, -1, -1, 7, 6, 5, 4, 3, 2, 1, 0])
must_match_left2right_wild = np.array([-1, -1, -1, -1, -1, -1, -1, 6, 5, 4, 3, 2, 1, 0, 0])

cost_ifmatch = np.array([7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7])

costMat = np.where(M == must_match_left2right, cost_ifmatch, np.inf).min(axis=-1).astype(np.int8)
costMatWild = np.minimum(costMat, np.where(M == must_match_left2right_wild, cost_ifmatch, np.inf).min(axis=-1)).astype(np.int8)

def optimize_wildcards(words):
    found_perms = find_strings_perms(words)
    balanced_perms = rebalance_perms(found_perms)
    balanced_perms = [[''.join(str(_) for _ in perm) for perm in perms] for perms in balanced_perms]
    nodes_list = []
    table_list = []
    for i in range(3):
        word = words[i]
        nodes = [perm2id[p] for p in balanced_perms[i]]

        table = np.zeros((len(nodes), 10), np.int64)
        table[0, :] = 7
        for i in range(1, len(nodes)):
            e = costMat[nodes[i-1], nodes[i]]
            ew = costMatWild[nodes[i-1], nodes[i]]
            table[i,0] = table[i-1,0] + e
            table[i,1] = min(table[i-1,1] + e, table[i-1,0] + ew)
            table[i,2] = min(table[i-1,2], table[i-1,1]) + e # TODO: better transition
            table[i,3] = min(table[i-1,3], table[i-1,2]) + e
            table[i,4] = min(table[i-1,4], table[i-1,3]) + e
            table[i,5] = min(table[i-1,5], table[i-1,4]) + e
            table[i,6] = min(table[i-1,6], table[i-1,5]) + e
            table[i,7] = min(table[i-1,7], table[i-1,6]) + e
            table[i,8] = min(table[i-1,8], table[i-1,7]) + e
            table[i,9] = min(table[i-1,9] + e, table[i-1,8] + ew)
#         print(table[-1].min(), table[-1])
        nodes_list.append(nodes)
        table_list.append(table)

    # backtrack
    new_words = []
    wilds = []
    for nodes, table in zip(nodes_list, table_list):
        ns = [perms[nodes[-1]]]
        track = np.argmin(table[-1])
        wild = []
        for i in range(len(nodes)-2, -1, -1):
            e = costMat[nodes[i], nodes[i+1]]
            ew = costMatWild[nodes[i], nodes[i+1]]
            if track == 0:
                ns.append(perms[nodes[i]][:e])
            elif track == 1:
                if table[i, 1] + e < table[i, 0] + ew:
                    ns.append(perms[nodes[i]][:e])
                else:
                    left = np.array(list(map(int, perms[nodes[i]][ew:])))
                    right = np.array(list(map(int, perms[nodes[i+1]][:-ew])))
                    mis = np.where(left != right)[0][0]
                    wild.append(table[i, track-1]-7+ew+mis)
                    ns.append(perms[nodes[i]][:ew])
                    track = track - 1
            elif 2 <= track <= 8:
                if table[i, track] >= table[i, track-1]:
                    track = track - 1
                ns.append(perms[nodes[i]][:e])
            elif track == 9:
                if table[i, 9] + e < table[i, 8] + ew:
                    ns.append(perms[nodes[i]][:e])
                else:
                    ns.append(perms[nodes[i]][:ew])
                    left = np.array(list(map(int, perms[nodes[i]][ew:])))
                    right = np.array(list(map(int, perms[nodes[i+1]][:-ew])))
                    mis = np.where(left != right)[0][0]
                    wild.append(table[i, track-1]-7+ew+mis)
                    track = track - 1
            else:
                assert False
        assert track == 0
        wilds.append(wild)
        nsw = list("".join(ns[::-1]))
        for w in wild:
            nsw[w] = "8"
        new_words.append("".join(nsw))
    return new_words

In [None]:
tour_files = glob.glob('submission_no_wildcards_*.csv')
print("=" * 71)
for f in tour_files:
    schedule = pd.read_csv(f).schedule.tolist()
    strings = [[INV_LETTERS[c] for c in s] for s in schedule]
    strings.sort(key=len, reverse=True)
    new_strings = optimize_wildcards(strings)
    new_strings = [[int(c) for c in s] for s in new_strings]
    new_strings.sort(key=len, reverse=True)
    print(f'File {f}.')
    print(f'Improved strings lengths from {[len(s) for s in strings]} to {[len(s) for s in new_strings]}.')
    print(f'The solution is written to {write_submission_csv(new_strings)}')
    print("=" * 71)

That's it, thank you for reading, please upvote if you find it useful.