From d2fd6e2f3f853128a52f75f8cfde2240324fd1a9 Mon Sep 17 00:00:00 2001 From: reiniscirpons Date: Thu, 12 Feb 2026 15:35:26 +0100 Subject: [PATCH 1/5] Improve project ideas page --- content/en/gsoc.md | 335 ++++++++++++++++++++++++++------ hugo.toml | 7 + layouts/shortcodes/lastmod.html | 1 + layouts/shortcodes/toc.html | 1 + 4 files changed, 283 insertions(+), 61 deletions(-) create mode 100644 layouts/shortcodes/lastmod.html create mode 100644 layouts/shortcodes/toc.html diff --git a/content/en/gsoc.md b/content/en/gsoc.md index 586232b..3d8ebe4 100644 --- a/content/en/gsoc.md +++ b/content/en/gsoc.md @@ -1,71 +1,140 @@ --- layout: "gsoc" bgImage: "/images/smalloverlap_510x510.png" +lastmod: 2026-02-12 --- -# GSoC 2026 project ideas +# GSoC 2026 in libsemigroups -This page contains a list of ideas for GSoC in 2026 within the -[libsemigroups][] organisation: +{{< lastmod >}} -1. [Julia bindings for libsemigroups](#1-julia-bindings-semigroupsjl-for-libsemigroups) -2. [Formal verification of Knuth-Bendix and - Todd-Coxeter](#2-formal-verification-of-knuth-bendix-and-todd-coxeter) -3. [AI for auto-tuning undecidable - problems](#3-ai-for-auto-tuning-undecidable-problems) -4. [High-performance tropical algebra - library](#4-high-performance-tropical-algebra-library) -5. [Computing congruences of finite inverse - semigroups](#5-computing-congruences-of-finite-inverse-semigroups) -6. [Improving errors and their messages in - GAP](#6-improving-errors-and-their-messages-in-gap) -7. [Free bands](#7-free-bands) +## Introduction + +The [libsemigroups][] library for computational semigroup +theory contains efficient implementations of widely used and novel algorithms +in computational semigroup theory. +This library forms the core of the [GAP package Semigroups][] which is +used by researchers in semigroup theory for preliminary computational +explorations that motivate mathematical discoveries or for finding +counterexamples to conjectures. +[libsemigroups][] has also found applications in tooling for cryptography researchers and as +a computational oracle for mathematical formalization projects. + +The [libsemigroups][] project focuses on efficiency and ease of use. +For the second goal, we maintain bindings for the libsemigroups library +in Python in the [libsemigroups_pybind11][] project and the +[GAP package Semigroups][] which depends on [libsemigroups][]. There is +also an ongoing effort to provide Julia bindings as part of the +[Semigroups.jl][] project. + +The development of [libsemigroups][] occurs primarily at the +[University of St Andrews](https://www.st-andrews.ac.uk/) with multiple +external collaborators, and we would like to welcome many more as part of +the [Google summer of code](https://summerofcode.withgoogle.com/) program! +We believe it is a great opportunity for participants to contribute to an +open-source project in mathematics and gain insights into computational +semigroup theory. + +## Project ideas + +Below is a list of ideas for projects within the [libsemigroups][] +organization, along with potential mentors and a rough estimate of the length +and difficulty of the project. + +| Contents | +| ----------- | +| {{< toc >}} | + +--- ### 1. Julia bindings Semigroups.jl for libsemigroups -This project would involve porting as much as possible of the functionality of -[libsemigroups][] into [Semigroups.jl][]. Something similar already exists for -python in [libsemigroups_pybind11][]. The aim is for [Semigroups.jl][] to -expose the low-level functionality of [libsemigroups][] to be used as a -foundation for a separate higher level package. In particular, a longer term -aim of this project is to replace the kernel extension of the -[GAP package Semigroups][] with [Semigroups.jl][]. +**Description**: +The goal of this project is to create functioning bindings of the +[libsemigroups][] library into Julia aspart of the [Semigroups.jl][] project. +The [libsemigroups_pybind11][] project implements something similar for Python. +The aim is for [Semigroups.jl][] to expose the low-level functionality of +[libsemigroups][] to be used as a foundation for a separate higher level +package. In particular, a longer term aim of this project is to replace the +kernel extension of the [GAP package Semigroups][] with [Semigroups.jl][]. -This project may involve: +**This project may involve**: -- some familiarity with modern C++ and Julia; -- writing some tools to (at least partially) automate the generation of the +- writing tooling to (at least partially) automate the generation of the Julia bindings, documentation, and tests; - making design decisions about how to map C++ idioms to Julia. +**Mentors**: [Joe Edwards][] and [James Mitchell][] + +**Length**: 350h + +**Difficulty**: Easy-Medium + +**Prerequisites**: + +- some familiarity with modern C++ and Julia; + +--- + ### 2. Formal verification of Knuth-Bendix and Todd-Coxeter +**Description**: A computer algebra system may produce an answer, but how much confidence can we have that it is correct? We might compute the same solution using two different -algorithms, if they match, then we have higher confidence! This project is to -provide alternate implementations of some key parts of the [Knuth-Bendix][] and -[Todd-Coxeter][] algorithms in [libsemigroups][] to permit the generation of a -formal proof certificate in [rocq][] or [lean][] that _proves_ with 100% -certainty that the output is correct. - -This project may involve: - -- some familiarity with modern C++ and [rocq][] or [lean][]; +algorithms, if they match, then we have higher confidence. Another approach is to +use a [proof assistant](https://en.wikipedia.org/wiki/Proof_assistant) such as +[rocq][] or [lean][] to produce a formal proof of the result, however doing this +by hand is rather cumbersome and does not scale well. But of course, if the +underlying algorithm is correct, it should be possible to produce such proof +certificates automatically. + +The aim of this project is to extend the existing implementation of the +[Knuth-Bendix][] and the [Todd-Coxeter][] algorithms in [libsemigroups][] to have +them return a trace of the decisions made during their runtime. +This trace can then be converted into a formal proof of correctness, as is done +for example in the [MonoidPresentation][] project in [rocq][]. + +**This project may involve**: - implementing a logged rewriter in modern C++; - understanding the mathematics behind [Knuth-Bendix][] and [Todd-Coxeter][] as applied to finitely presented semigroups and monoids. +- learning some aspects of formalized mathematics and proof assistants. -### 3. AI for auto-tuning undecidable problems +**Mentors**: [Reinis Cirpons][], [Joe Edwards][] and [Florent Hivert][] -In this project we aim to use techniques from AI and machine learning to guide -computations related to undecidable problems. If a group or semigroup is -defined by a presentation, then a classical undecidable problem is whether or -not that semigroup is finite. Two key algorithms for attempting to determine -whether or not a finitely presented semigroup is finite are the -[Knuth-Bendix][] and [Todd-Coxeter][] algorithms. [Todd-Coxeter][] in -particular terminates if and only if the semigroup is finite. +**Length**: 175h or 350h -This project may involve: +**Difficulty**: Medium-Hard + +**Prerequisites**: + +- some familiarity with abstract algebra; +- good grasp of modern C++. + +--- + +### 3. AI for auto-tuning undecidable problems + +**Description**: +The aim of this project is to use techniques from AI and machine learning to +guide computations related to undecidable problems. A classical undecidable +problem is the _semigroup finiteness problem_, which asks whether or not a +finite presentation defines a finite semigroup. There is no program that can +always answer this question in finite time, but there are several semidecision +algorithms, which will always give an answer if a semigroup is finite, but may +run forever on an infinite input. + +Two key semidecision algorithms for the finiteness problem are the +[Knuth-Bendix][] and [Todd-Coxeter][] algorithms, both of which are implemented +in [libsemigroups][]. These algorithms have different trade-offs and often one +yields an answer to a particular question much faster than the other, but due to +the undecidable nature of the problem is it very hard to determine which algorithm +to apply first, or how long to wait before switching between them. Such situations +are very common in computational semigroup theory and we believe there is +potential to optimize the usage of these algorithms using a machine learning +based approach. + +**This project may involve**: - extensive testing of the existing implementation of [Knuth-Bendix][] and [Todd-Coxeter][] in [libsemigroups][]; @@ -73,8 +142,22 @@ This project may involve: for this purpose. - implementing these techniques in modern C++. +**Mentors**: [James Mitchell][] and [Finn Smith][] + +**Length**: 175h + +**Difficulty**: Easy-Medium + +**Prerequisites**: + +- some familiarity with Python or modern C++; +- some familiarity with AI or machine learning. + +--- + ### 4. High-performance tropical algebra library +**Description**: There are a number of high-performance linear algebra packages for linear algebra involving matrix and vector manipulation over fields (usually the real numbers or finite fields. Some well-known examples among many others are: @@ -89,15 +172,30 @@ replacing addition/multiplication with minimum/maximum/summation. As such a high-performance library implementing linear algebra over tropical semirings is likely to be widely applicable. -This project may involve: +**This project may involve**: -- developing some familiarity with linear algebra over tropical semirings; +- learning about linear algebra over tropical semirings; - implementing low-level high-performance SIMD accelerated linear algebra over tropical semirings; - developing a new C++ library (build system, documentation, testing framework). +**Mentors**: [Florent Hivert][] and [Finn Smith][] + +**Length**: 350h + +**Difficulty**: Medium-Hard + +**Prerequisites**: + +- some familiarity with linear algebra; +- mastery of modern C++; +- some familiarity with SIMD. + +--- + ### 5. Computing congruences of finite inverse semigroups +**Description**: In the paper - "Computing congruences of finite inverse semigroups" by L. Elliott, A. @@ -111,17 +209,30 @@ outperforms existing implementations by several orders of magnitude. In this project, you would implement this algorithm either in [GAP package Semigroups][] or within the C++ library [libsemigroups][]. -This project may involve: +**This project may involve**: - developing some familiarity with the theory of inverse semigroups; - implementing the algorithms described in [https://arxiv.org/abs/2406.09281][]; - thoroughly testing and documenting the implementation. +**Mentors**: [Reinis Cirpons][] and [James Mitchell][]. + +**Length**: 90h or 175h + +**Difficulty**: Medium + +**Prerequisites**: + +- some familiarity with modern C++ or GAP; +- good grasp of university level abstract algebra. + +--- + ### 6. Improving errors and their messages in GAP [GAP][] is a system for computational discrete algebra, with particular @@ -134,7 +245,7 @@ The aim of this project is to improve errors and their messages in [GAP][] in something of the same spirit as those introduced in [Python 3.13](https://docs.python.org/3/whatsnew/3.13.html#improved-error-messages). -This project may involve: +**This project may involve**: - doing a wide-scale review of error messages in the [GAP][] library; - identification of error messages that can be improved; @@ -143,15 +254,50 @@ This project may involve: - improvements to the error message infrastructure in [GAP][] arising from the above. - +**Length**: 90h or 175h + +**Difficulty**: Easy + +**Prerequisites**: + +- some familiarity with programming in general. + +--- + +### 7. Nielsen-Schreier free bases + +A first course in linear algebra proves that every subspace of a linear space +has a basis. There is an analogue of this statement for free groups, known as +the [Nielsen-Schreier theorem](https://en.wikipedia.org/wiki/Nielsen%E2%80%93Schreier_theorem), which +states that every subgroup of a free group has a basis. For finitely generated +subgroups, there exist rather constructive proofs, which lead to algorithms +for computing the basis explicitly, in the same way that Gaussian elimination +can be used to can be used to find the basis for a vector space. +The goal of this project would be to implement an algorithm for finding the +free basis of a finitely generated subgroup of a free group. + +**This project may involve**: + +- learning the theory behind the Nielsen-Schreier theorem; +- understanding the algorithmic aspects of certain proofs; +- implementing an algorithm for the Nielsen-Schreier theorem in C++. + +**Mentors**: [Reinis Cirpons][] + +**Length**: 175h -### 7. Free bands +**Difficulty**: Easy-Medium + +**Prerequisites**: + +- some familiarity with abstract algebra; +- some familiarity with modern C++. + +--- + +### 8. Free bands In the paper: @@ -165,20 +311,80 @@ algorithms from this paper is given in [https://doi.org/10.5281/zenodo.7071676](https://doi.org/10.5281/zenodo.7071676). In this project, the reference implementation would be re-implemented in modern C++ as part of [libsemigroups][]; and integrating subsemigroups of free bands -and their quotients into the framework established in [https://arxiv.org/abs/1510.01868](https://arxiv.org/abs/1510.01868). +and subsemigroups into the framework established in [https://arxiv.org/abs/1510.01868](https://arxiv.org/abs/1510.01868). -This project may involve: +**This project may involve**: +- learning the theory behind free bands; - object oriented programming in modern C++; -- some familiarity with automata and transducers; - integrating the C++ code into [libsemigroups_pybind11][], - [Semigroups.jl][], and [GAP package Semigroups][]; + [Semigroups.jl][], and [GAP package Semigroups][]. + +**Mentors**: [Reinis Cirpons][] and [James Mitchell][] + +**Length**: 350h + +**Difficulty**: Medium-Hard + +**Prerequisites**: + +- good grasp of abstract algebra; +- some familiarity with automata and transducers; +- good grasp of modern C++. + +--- + +### 9. Morphocompletion for Knuth-Bendix + +[Finitely presented semigroups](https://en.wikipedia.org/wiki/Presentation_of_a_monoid) +are the basic objects of study in combinatorial semigroup theory. +Each finitely presented semigroup is given by specifying an _alphabet_ together +with a set of equations that hold for some words over this alphabet, called the +_relations_. The elements of this semigroup are words over the alphabet, which +are compared by using the relations. In general equality in such a semigroup +is _undecidable_! + +The [Knuth-Bendix][] algorithm attempts to solve this issue by orienting and +extending the set of relations such that there is a straightforward method for +checking equality. Since the underlying problem is undecidable, the algorithm +does not necessarily terminate with respect to a given presentation. +To make matters worse, it may sometimes be necessary to increase the alphabet +by replacing certain words with new letters to make the Knuth-Bendix procedure +terminate when it otherwise wouldn't. One approach to doing this is the +_morphocompletion_ procedure developed in + +- "Morphocompletion for one-relation monoids" by J. Pedersen + [https://link.springer.com/chapter/10.1007/3-540-51081-8_141](https://link.springer.com/chapter/10.1007/3-540-51081-8_141) + +Roughly speaking, the morphocompletion procedure consists of repeatedly running +the Knuth-Bendix algorithm for a certain number of steps, picking some number +of subwords to replace by letters according to some heuristic such as the +frequency they occur with. + +The goal of this project would be to produce an implementation of the +morphocompletion procedure in C++ in the [libsemigroups][] library or +in Python as part of the [libsemigroups_pybind11][] library. As the +procedure depends on various parameters, a further goal for the project +would consist of fine-tuning these parameters to improve performance. + +**This project may involve**: + +- learning the theory behind the Knuth-Bendix algorithm; +- object oriented programming in modern C++; +- implementing functions relating to word statistics; +- fine-tuning algorithm performance by adjusting algorithm parameters. + +**Mentor**: [Joe Edwards][] + +**Length**: 90h + +**Difficulty**: Easy -