Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
GSoC 2012 Report Sergiu Ivanov: Category Theory Module
My name is Sergiu Ivanov and, at the beginning of the Google Summer of Code 2012 program, I was studying mathematics and informatics (with a bias for algebra). Almost immediately after the end of the program I moved to France to start my PhD on biologically-inspired models of computing. I am interested in functional programming, cell biology, abstract algebra and, of course, category theory :-)
This summer I was accepted with an idiosyncratic project of my own: starting a category theory module in SymPy, under the mentorship of Tom Bachmann. I was (and still am) deeply grateful to the SymPy community for their exceptionally open attitude to side ideas!
The full proposal can be found here; the proposal also includes a short theoretical introduction which should be sufficient for grasping the essential concepts behind the project idea. I will therefore only sketch the general roadmap that was in plan at the beginning of summer.
As it can be easily inferred from the name, the goal of my GSoC 2012 project was to start a category theory module in SymPy. Obviously, category theory is a relatively large and complex domain of mathematics, so I had no expectations to cover all of it, or even to go too far beyond the very basics. The initial plan was to implement the following three major components:
The base classes, i.e., the classes which would model some of the fundamental category theoretic notions: objects, morphisms, categories, and diagrams. Note that implementing functors was not planned; instead, I left functors for later stages of development of the category theory module.
Automated diagram plotting, i.e., the functionality that would automatically construct a graphical representation of a supplied diagram. Instead of actually producing a graphics file, I decided to produce
TikZcode. That would allow me to rely on the higher-level graphical abstractions offered by
TikZ, but also to have a nice embeddability into scientific articles.
Automated proof of commutativity basing on the fact of commutativity of a set of diagrams. This basically refers to writing the code that would start with a set diagrams known to be commutative, and that would then be able to decide whether a given diagram is commutative in the given circumstances or not.
In the following sections I will try to describe how the actual development workflow went, restricting the amount of technical details pertaining to the actual project, and instead focusing on the actual development activity.
The GSoC Workflow
The process of working on my GSoC project has largely gone through two
phases. In the first phase, I implemented the base classes and I've
also got automated diagram drawing done. I was one or two weeks late
on schedule, but everything was basically going as expected, the
retard being mostly caused by slight time mismanagement and
overenthusiastic attitude to polishing certain details. The second
phase started at the beginning of August, when I understood that I had
actually badly botched the
Diagram class and that automated proof of
commutativity wasn't just complicated, but very complicated :-)
I will describe the development workflow in two subsections, in accordance with the two phases I mentioned in the previous paragraph. To get a more immersive perspective, take a look at the SymPy-related articles on my blog.
Base Classes and Diagram Drawing
Sketching the base classes and submitting the first large pull request
was really straightforward and easy. I essentially followed the
proposal, adding the not-described-but-necessary bits of functionality
as I went. However, reviewing this pull request took much more time
than I initially expected: instead of working a half of a day, I
actually spend about five full days implementing the suggested
changes. The reason was that, even though I thought I had done
sufficiently much object-oriented modelling before, it turned out that
I was quite a noob and that I missed several important points, both in
modelling the mathematical notions themselves and in being compliant
with the SymPy class structure. With the kind help of my mentor and
SymPy people, I managed to refactor the original class
better represented the corresponding concepts and proved to be much
easier and clearer both to write and to use. More importantly, having
ideas grouped in tighter entities actually enforced very sharp
semantics on the classes, often forcing the user to do things right.
Having base classes done, I took up the next task of implementing
automated diagram drawing. Even though I had sketched the algorithm
in my proposal, a lot of details were missing which I had to figure
out during the development process itself. An important change was
the decision to use
Xy-pic instead of
The reason was that
Xy-pic is specifically meant for typesetting
category-theory-like diagrams, while
TikZ is more generic, and
therefore more stuff would have to be done by hand. So far, the
consequences of the change have been largely beneficial, especially in
what concerns the complexity of the output code.
I split the actual work on diagram drawing into two subtasks: object
layout and generation of actual
Xy-pic code. These two bits of
functionality turned out to be sufficiently independent to submit in
separate pull requests even. I spent quite a lot of time on object
layout because it influences the final aspect of the diagram so much.
Also, I wanted to give the user some control over how the diagram is
laid out, so I added the possibility to explicitly require the objects
to be arranged in a linear-like fashion, as well as the feature to
specify groups of objects. The trick with groups is that each group
is laid out independently from the others, and then they are laid out
together as opaque entities, forming a condensed diagram. I believe
this feature gives better control over the positioning of objects, but
it has not been that thoroughly tested yet.
The generation of actual
Xy-pic code seemed rather straightforward
at the beginning, and indeed, I was able to produce plots of simple
diagrams within a day or two after having finished the object layout
part. However, it was the details that required much more attention:
I spent about a week on getting curved arrows curve nicely and not
intersect too much with other arrows. I've spent another long
timespan on re-positioning morphism names in order to minimise the
number of ugly overlaps of labels and arrows.
I am very glad to state that both pull requests, the one concerning
object layout and the other one, dealing with generation of
code have already been merged into SymPy's
master branch. Diagram
drawing has not yet been thoroughly tested, though. Furthermore,
while working on my other tasks I discovered a lot of room for
improvement (i.e., I tried some diagrams and some of them were laid
out rather uglily). I do believe though that the heuristics I have
implemented can be extended or plugged sufficiently enough to get
decent automated diagram plots.
Automated Proof of Diagram Commutativity
I started working on automatic proof of diagram commutativity at the beginning of August. While this meant I was lagging behind my proposed schedule rather seriously, I wasn't too nervous, because I believed that reasoning about commutativity was rather well described in the proposal and only some minor details would have to be added to that description to get a decent implementation. It turned out I was wrong; very wrong :-)
First of all, before starting to write the code, I had to better
formalise my goals. It suddenly turned out to be far from being easy.
After some hours of meditation, I arrived at the conclusion that the
original implementation of
Diagram was badly screwed: it looked like
I had stuffed two concepts into a single class: the concept of a
commutative diagram and the concept of a commutative implication.
When I realised this, I quickly split
Diagram into two classes,
which has made the code clearer, of course.
However, writing the commutativity code meant that
have to model diagrams instead of just being a container for
morphisms, as it used to be in the diagram drawing period. I wasn't
too much concerned about this; yet, having implemented an algorithm
for find embeddings of diagrams, I discovered I had some strange
hash-randomisation-related failures. When I tracked down the source
of error, I found that
Diagram was still badly screwed: it was
supposed to store all the morphisms of a diagram, but instead it would
only add some of them; different morphisms every time. It felt like
slapping myself on the face and fixing a simple bug would have solved
the issue; however, when I actually fixed the addition of morphisms
Diagram would automatically add all morphisms to itself on
creation, it obviously failed in the cases where there were loops or
cycles. I desperately tried to fix everything in an ad-hoc way, but
that turned to be impossible. Now note that I realised this one week
before the pencils-down date :-)
So I re-thought and re-implemented
Diagram from scratch. This
resulted in a completely different, much more intelligent class, which
was still largely compatible with the old
Diagram, so the
fundamental change of internal workings did not affect the existing
code almost at all. After that I had to implement embeddings of all
kinds of diagrams (including infinite diagrams). I started working on
this one day before the pencils-down date and I've spent two days on
it, having written what I believe to be the most sophisticated and
well commented piece of code I've ever done, to only throw it away :-)
It turned out that I could use the concept of expanded generators
which I introduced myself to deal with infinite diagrams as if they
were finite. I have proved a couple lemmas on the way; they are all
in my notebook, but I will put them on my blog soon. I will not go
into details here; appropriate comments will be soon available in the
docstring of the new
Diagram class which will hopefully be merged in
a couple weeks (that mostly depends on how fast I fix the issues my
mentor Tom has pointed out).
Very happy with my diagram embedding functionality, I started writing the code for doing the actual proofs. I thought very little about optimisation in order to keep things as simple as possible. Unfortunately, this has resulted in prohibitively slow code. This meant that everything was so slow that I could not even check whether it worked or not. I am very seriously committed to optimising the existing approach and getting it to work though. See the "Further Plans" sections for more details.
I would like to state that this year's GSoC experience was extremely exciting, mainly because of the friendly atmosphere in the SymPy community. I found it very useful to post on the blog every week, although I was always late with my blog post :-)
I am very excited with having rediscovered test-driven development! I found out that this made it possible to write good tests, which depended very little on the structure of the code that implemented the actual functionality. And yet more fundamentally, with test-driven development one can understand much better the functionality that one eventually describes in code.
I would also like to thank my mentor Tom Bachmann for his tolerant
attitude towards my always being late and eventually towards my
failure to implement the reasoning about commutativity in time. I
would totally like to express my corresponding appreciation and
thankfulness by actually finalising the said task and by having it
merged into SymPy's
My further plans include responding to Tom's suggestions about the
pull request which introduces the new
Diagram class, as well as
bumping up test coverage of diagram drawing. I also intend to find
out why some diagrams are typeset in an ugly way and fix the issues.
In parallel with that, I will start posting to my blog the proofs necessary to explain why my idea about proving the commutativity of a diagram will actually work, just as I have promised before. After that I will work on optimising the reasoning procedure so that it runs much more efficiently.
I'd also plan to work a little bit on automated diagram drawing and
implement the possibility to force certain coordinates on some objects
in the layout of a diagram, as well as on extending the part which