Add ToddCoxeterBand method #691

tomcontileslie · 2020-07-09T17:23:28Z

Edit: for an update, see enormous comment below

This PR builds on @MTWhyte's PR #684 adding a ToddCoxeterBand method, also joint work with @reiniscirpons under @james-d-mitchell. This version is consistent with the properness proof we have written, the changes from 684 being:

new_coset now checks pre-existence of words via paths (i.e. checks value of tau(i, canon(wa)) rather than canon(wa) in words)
Double the implicit relations are now pushed: when working on word n, active cosets in {1, ..., n-1} are pushed through words[n]words[n] = words[n].
The canon function now ensures the canonical form is shortlex-minimal.

This is WIP until output format is decided and tests and docs are written. This function should eventually be implemented in libsemigroups, preferably with an analogous Felsch implementation (the strategy here is broadly HLT).

…oups into todd_coxeter_band

tomcontileslie · 2021-06-02T16:01:53Z

Update on the status of this pull request

File which I reference later: Band-Todd-Coxeter.pdf (note: there are examples with diagrams in this document which will hopefully make the stuff explained in this comment clearer!)

Most of the work here was done a while ago, so memory is a bit rusty, but the stuff here is still relevant although it needs a bit more work before being justified. I'm just going to write out what work led us here, and I'll explain what I'm still unsure about.

The code written in this PR is a first attempt at answering the question:

Given a FINITE band S given by a band presentation <A,R> (i.e. a list of generators and a list of relations, where it is implicitly assumed that all words are idempotent), can we devise an algorithm which will output some description of this band S - a coset table, Cayley digraph, etc...?

For general presentations (not band presentations), we can use the Todd-Coxeter (TC) algorithm to answer this question. The rigorous backing for this that I have familiarised myself with is the work of @james-d-mitchell, @flsmith, @mariatsalakou and Tom Coleman, which describes coset tables as labelled digraphs (called "R-digraphs"). The node set of an R-digraph is a subset of A* (these are the "cosets" of coset tables), its edges are labelled by elements of A, and the restriction is that if two paths over the graph lead to the same node, then the elements given by those paths must be equal in the presentation. They explain that any TC-style algorithm - referred to as a congruence enumeration process - can be implemented as a combination of three basic digraph manipulations, starting with the digraph at step i, G(i), and outputting a new digraph at step i+1, G(i+1). The algorithm also keeps track of a set of coincidences K(i), a set of pairs of words which are known to be equal in the presentation, but currently give different paths in G(i). (the idea is that eventually in the algorithm, any coincidences should be eliminated so that the R-digraph is consistent with the presentation is is computing).

The three steps are, roughly:

(TC1: add a node). If a node w has no out-edge a, define a new node wa and an arrow labelled a from w to wa.
(TC2: push a relation) For a node w and a relation (u,v) in R, require that following paths u and v from w lead to the same node; adding coincidences if not.
(TC3: process coincidences) Clear the set K by merging any nodes in the coincidence set and appropriately redefining paths through the digraph.

With some additional assumptions, the idea is that congruence enumeration processes for finite semigroups always terminate on a description of the semigroup. Our idea with @reiniscirpons and @MTWhyte was then to adapt this to work for bands too.

How do we do this? If we're given an alphabet A of size n and a set of relations R, we want to compute the smallest band which satisfies the relations R. The first thing we could try is to add "the band relations" to R, and then run the usual Todd-Coxeter algorithm on A and R. If, say, we have A = {x,y} and R = {xy=yx}, then the "band relations" (a list of relations which give a presentation for the free band on A) are {xx=x, yy=y, xyxy=xy, yxyx=yx}. So, running TC on <A, {xx=x, yy=y, xyxy=xy, yxyx=yx, xy=yx}> will give the result we're looking for. However, for n equal to 4 and above, the list of band relations becomes incredibly long. The idea is then to see whether we can tweak the TC algorithm so that we don't have to explicitly pass the band relations, but still get the right result.

My first idea (see section 2.1 of attached doc) was to change the definition of TC1 so that it labels nodes in the free band canonical form. This can actually be generalised to work in any variety of semigroups, so let V be any variety (with the aim of later taking V = variety of bands). Suppose we have a function which, for any word w in A*, returns the shortlex-smallest word in A* which is equal to w in (the free object of) the variety V (call this its canonical form and denote it bar(w)). Then our new TC1 (call it TC1V) does something like:

(TC1V: cleverly add a node in the variety V). If a node w (in canonical form) has no out-edge a, then check whether there is a node bar(wa). If there is, draw an edge from w to bar(wa) labelled a. If not, define it (and all of its (also canonical) prefixes - some technicalities here) and draw an edge from w to the newly defined bar(wa).

This was the first thing we naturally tried to do when devising a band Todd-Coxeter (BTC) algorithm: only label cosets using some canonical form. In the attached document I prove that this is valid. More specifically:

Result from Section 2.1. Suppose we have a presentation <A,R> implicitly given in a variety V. Let R' be the union of the explicit relations R with the set of all implicit relations generating the free object in the variety V (remember, as mentioned before, this set of relations may be significantly larger than R, so we want to avoid computing R' and/or passing it as input to the Todd-Coxeter algorithm). Then, any sequence of TC1V, TC2 and TC3 has exactly the same properties as a congruence enumeration process for R': namely, these three steps output R'-digraphs when given R'-digraphs, and any sequence of applications eventually stabilises.

The next step in the original paper is to devise specific combinations of TC1, TC2 and TC3 (algorithms or strategies) which not only stabilise, but stabilise exactly at a description of the semigroup. It is shown that this always happens provided the algorithm satisfies three conditions, in which case we call it a proper congruence enumeration process. These conditions essentially require that everything happens as you would hope; they roughly say:

Every path in the digraph is eventually defined.
Every possible coincidence is eventually noticed.
Coincidences are regularly processed.

Switching now to the band-specific setting (variety V=B), we have a canonical labelling function for free band elements and a clever band step TC1B from above. I have managed to show that a similar algorithm to the well-known HLT strategy satisfies properness conditions 1 and 3, but the issue I am having is that I am not sure whether it also satisfies properness condition 2.

The reason for this is as follows. We have a presentation <A,R>, and a larger presentation R' which contains all of R as well as all the free band relations. The HLT approach guarantees that coincidences involving pairs in R are noticed for every coset, since these relations are passed explicitly. However, I am not sure this necessarily means all the implicit relations are also noticed (since they are never explicitly pushed through the cosets). This can be reduced to the following problem: given a canonical node w and a free band element u in canonical form, can we show that eventually either w is not a node or e(w,uu) = e(w,u)?

I don't think we ever found a counterexample - but I haven't managed to show that the implicit relations will always end up being somehow implicitly pushed so as to satisfy condition 2. So, in section 2.2 of the attached document, I detail a modified HLT strategy where, as words v are added to the node set, we also push the relation vv=v quite a few times through all the nodes. This is a shoddy fix because it means that actually, we end up explicitly treating band relations which we were hoping we could keep implicit. However, this is still better than explicitly passing all the band relations from the outset. At the end of the attached document I seem to have been close to proving that with these additional relations gradually becoming explicit, the process is indeed proper (I did this last summer so I'm not sure how far we are from a full proof). This algorithm reflects the state of the document, in that we have a coset creation function which cleverly creates new table entries, with the relation pusher and coincidence processor functions relatively unchanged. The actual loop that implements the strategy currently contains, as explained, some extra steps which push band relations.

Future work on this PR (likely not done by me since I am graduating and therefore my brain will be in decline) should aim to either prove that this shoddy approach is valid, then implement it in GAP and then C++ code - or, ideally, prove that the strategy is valid even if we never push implicit relations, and then implement that. There is also a Felsch analogue to be developed. Good luck!

Description of functions implemented in this PR

SEMIGROUPS.PrefixTupleOfFreeBandElement(word, n): returns the largest prefix of word with content size n-1.
SEMIGROUPS.ShortCanonicalFormOfFreeBandElement(word): returns the shortlex least representative of word in the free band (does this work? I'm not sure I ever proved it; @reiniscirpons is the person to ask). The strategy is to recursively take the canonical form of the prefix and suffix (in Howie's words, the initial and terminal) and then see whether they can overlap.

Functions inside the main feature, ToddCoxeterBand:

new_coset: carries out TC1B
tau(coset, word): returns, in JDM/FLS/MT/TC notation, the value e(coset, word)
tauf(coset, word): computes e(coset, word), and if edges are undefined, it defines them using new_coset

Then as mentioned, push_relation and process_coincidences do exactly what you'd expect. The only things that may need tweaking are new_coset, and then the last 50 or so lines of ToddCoxeterBand which currently push a ridiculous number of cosets in an effort to fix the issues in the proof.

Murray Whyte and others added 4 commits July 2, 2020 15:47

Add ToddCoxeterBand method

f22af37

Merge branch 'todd_coxeter_band' of https://github.com/MTWhyte/Semigr…

dd7083e

…oups into todd_coxeter_band

Add short canonical form function

53da03d

Update ToddCoxeterBand with more pushes and different new_coset

fc03ef9

tomcontileslie added the WIP Label for PRs that are Works In Progress (WIP) label Jul 9, 2020

MTWhyte mentioned this pull request Jul 11, 2020

Add ToddCoxeterBand method #684

Closed

james-d-mitchell added do not merge Label for PR that should not be merged hackathon-may-2021 labels May 26, 2021

tomcontileslie changed the title ~~Add ToddCoxeterBand method, with improvement~~ Add ToddCoxeterBand method Jun 2, 2021

james-d-mitchell removed the hackathon-may-2021 label Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ToddCoxeterBand method #691

Add ToddCoxeterBand method #691

tomcontileslie commented Jul 9, 2020 •

edited

tomcontileslie commented Jun 2, 2021 •

edited

Add ToddCoxeterBand method #691

Are you sure you want to change the base?

Add ToddCoxeterBand method #691

Conversation

tomcontileslie commented Jul 9, 2020 • edited

tomcontileslie commented Jun 2, 2021 • edited

Update on the status of this pull request

Description of functions implemented in this PR

tomcontileslie commented Jul 9, 2020 •

edited

tomcontileslie commented Jun 2, 2021 •

edited