Find file
Fetching contributors…
Cannot retrieve contributors at this time
440 lines (295 sloc) 58 KB
description: A retrospective of 6 years of SoC, and lessons learned
As part of Google's [Summer of Code](!Wikipedia "Google Summer of Code") [program](, they sponsor 5-10 SoC [projects for Haskell](
The Haskell Summer of Codes have often produced excellent results, but how excellent is excellent? Are there any features or commonalities between successful projects or unsuccessful ones?
# Example retrospective: Debian
In 2009, a blogger & Debian developer produced [a]( [three]( [part]( retrospective series on the Debian Summer of Code projects. The results are interesting: some projects were a failure and the relevant student drifted away and had little to do with Debian again; and some were great successes. I don't discern any particular lessons there, except perhaps one against hubris or filling unclear needs. I decided to compile my own series of retrospectives on the Haskell Summers of Code.
# Judging Haskell SoCs
Google describes SoC as
> "...a global program that offers students stipends to write code for open source projects. We have worked with the open source community to identify and fund exciting projects for the upcoming summer."^[<>]
> "...a global program that offers student developers stipends to write code for various open source software projects. We have worked with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together over 4500 successful student participants and over 3000 mentors from over 100 countries worldwide, all for the love of code. Through _Google Summer of Code_, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all."^[<>]
It is intended to produce source code for the 'use and benefit of all'; it is not meant to produce academic papers, code curiosities, forgotten blog posts, groundwork for distant projects, but 'exciting' new production code. This is the perspective I take in trying to assess SoC projects: did it ship *anything*? If standalone, are the results in active use by more than a few developers or other codebases? If a modification to an existing codebase, was it merged and now is actively maintained^[The Haskell ecosystem evolves fast, and strong static typing means that a package can quickly cease to be compilable if not maintained.]? And so on. Sterling Clover argues that this is far too demanding and does not consider whether an involved student is energized by his contribution to go on and contribute still more[^sclv]; I disagree about the former, and I have not done the latter because it would be too labor-intensive to track down every student and assess their later contributions, which would involve still more subjective appraisals^[For example, how long must a student 'continue to participate/make contributions to Haskell community'? Spencer Janssen, a successful 2006 SoC student, went on to be one of the 2 main developers on the popular [Xmonad](!Wikipedia) window manager, but then wound down his Haskell contributions and stopped entirely ~2009 (much to my dismay as an Xmonad developer). Is he a success for SoC?]. (Perhaps in the future I or another Haskeller will do that.)
[^sclv]: From the 11 February 2011 Haskell-cafe thread, ["Haskell Summers of Code retrospective (updated for 2010)"](
> There was some discussion of this on [Reddit]( Below is a slightly cleaned-up version of my comments there.
> I really appreciate this roundup. But I think the bar is set somewhat too high for success. A success in this framework seems to be a significant and exciting improvement for the entire Haskell community. And there have certainly been a number of those. But there are also projects that are well done, produce results that live on, but which aren't immediately recognizable as awesome new things. Furthermore, GSoc explicitly lists a goal as inspiring young developers towards ongoing community involvement/open source development, and these notes don't really take that into account.
> For example, I don't know of any direct uptake of the code from the HaskellNet project, but the author did go on to write a small textbook on Haskell in Japanese. As another example, Roman (of Hpysics) has, as I understand it, been involved in a Russian language functional programming magazine.
> So I think there needs to be a slightly more granular scale that can capture some of these nuances. Perhaps something like the following:
> - \[ \] Student completed (i.e. got final payment)
> - \[ \] Project found use (i.e. as a lib has at least one consumer, or got merged into a broader codebase)
> - \[ \] Project had significant impact (i.e. wide use/noticeable impact)
> - \[ \] Student continued to participate/make contributions to Haskell community
> A few more detailed comments about projects that weren't necessarily slam dunks, but were at the least, in my estimation, modest successes:
> 1. GHC-plugins -- Not only was the work completed and does it stand a chance of being merged, but it explored the design space in a useful way for future GHC development, and was part of Max becoming more familiar with GHC internals. Since then he's contributed a few very nice and useful patches to GHC, including, as I recall, the magnificent TupleSections extension.
> 2. GHC refactoring -- It seems unfair to classify work that was taken into the mainline as unsuccessful. The improvement weren't large, but my understanding is that they were things that we wanted to happen for GHC, and that were quite time consuming because they were cross-cutting. So this wasn't exciting work, but it was yeoman's work helpful in taking the GHC API forward. It's still messy, I'm given to understand, and it still breaks between releases, but it has an increasing number of clients lately, as witnessed by discussions on -cafe.
> 3. Darcs performance -- by the account of Eric Kow & other core darcs guys, the hashed-storage stuff led to large improvements (and not only in performance)[[2]]( -- the fact that there's plenty more to be done shouldn't be counted as a mark against it.
## Haskell retrospective
Haskell wasn't part of the first Summer of Code in 2005, but it was accepted for 2006. We start there
### 2006
The 2006 [homepage]( lists the following projects:
- ["Fast Mutable Collection Types for Haskell"](; Caio Marcelo de Oliveira Filho, mentored by Audrey Tang
**Unsuccessful**. This ultimately resulted in the [HsJudy](!Hackage) library ('fast mutable collection' here meaning 'array'). HsJudy was apparently used in Pugs at one time, but no more.
- ["Port Haddock to use GHC"](; David Waern, mentored by Simon Marlow
**Successful**. Haddock has used the GHC API ever since.[^complaints]
- ["A model for client-side scripts with HSP"](; Joel Björnson, mentored by Niklas Broberg
**Successful?** Was initially unsuccessful, but seems to've been picked up again.
- "GHCi based debugger for Haskell"; José Iborra López, mentored by David Himmelstrup
**Successful**. The [GHCi debugger]( was accepted into GHC HEAD, and is in production use.
- ["HaskellNet"](; Jun Mukai, mentored by Shae Erisson
**Unsuccessful**. HaskellNet is dead, was noted to be ["uncompleted"](, and none of it has propagated elsewhere. (I'm not entirely sure what happened with the HaskellNet code - I know of [two]( [repos](, but that's about it.) Shae tells me that this poor uptake is probably due to a lack of advertising, and not any actual defect in the HaskellNet code.
- ["Language.C - a C parser written in Haskell"](; Marc van Woerkom, mentored by Manuel Chakravarty
**Unsuccessful**. According to [Don Stewart's outline]( of the 2006 SoC, this project was not completed.
- ["Implement a better type checker for Yhc"](; Leon P Smith, mentored by Malcolm Wallace
**Unsuccessful**. See the Language.C SoC
- ["Thin out cabal-get and integrate in GHC"](; Paolo Martini, mentored by Isaac Jones
**Successful**. Code lives on as [cabal-install](!Hawiki), which we all know and love.
- "Storable a => ByteString a"; Spencer Janssen, mentored by Don Stewart
**Successful**? (Again, per Don.) Currently exists as [storablevector](!Hackage), with [20 reverse dependencies](
4 successful; 2 unsuccessful; and 2 failures.
### 2007
The [2007 homepage]( lists:
- ["Darcs conflict handling"](; Jason Dagit, mentored by David Roundy
**Successful**. The work was successful in almost completely getting rid of the exponential conflict bug, and has been in released Darcs for years
- ["Automated building of packages and generation of Haddock documentation"](; Sascha Böhme, mentored by Ross Paterson
**Successful**. The auto build and doc generation are long-standing and very useful parts of Hackage.
- ["Rewrite the typechecker for YHC and nhc98"](; Mathieu Boespflug, mentored by Malcolm Wallace
**Successful**? According to the TMR writeup, the type-checker code has made it into YHC. (I add a question mark because YHC is so little used.)
- ["Cabal Configurations"](; Thomas Schilling, mentored by Michael Isaac Jones
**Successful**. Cabal configurations are very useful for enabling/disabling things and are extremely common in the wild.
- ["Update the Hat tracer"](; Kenn Knowles, mentored by Malcolm Wallace
**Unsuccessful**. The update apparently happened, since the [Hat homepage]( says "Version 2.06 released 2nd Oct 2008", but it is [described]( as unmaintained, and I can't seem to find any examples of people actually using Hat.
- ["Generalizing Parsec to ParsecT and arbitrary input (ByteStrings)"](; Paolo Martini, mentored by Philippa Jane Cowderoy
**Successful?**. The performance is still so terrible that few people use it.
- ["Shared Libraries for GHC"](; Clemens Fruhwirth, mentored by Simon Marlow
**Successful**. The situation is unclear to me, but I know that for some period dynamic linking worked for some platforms. However, it's 2010 and I still have static linking, although GHC 6.12 apparently gets dynamic linking; so I'm going to chalk this one up as a mixed success.
- ["Libcurl"](; Mieczysław Bąk, mentored by Bryan O'Sullivan
**Unknown** The archived homepage [homepage]( and [repository]( indicate that the package name was [curl](!Hackage) and indeed a [curl](!Wikipedia "cURL") binding of that name exists - but none of the metadata points to Bąk as either author or maintainer; if it is the same package, it is pretty successful with [158 reverse dependencies](
- ["Extending GuiHaskell: An IDE for Haskell Hackers"](; Asumu Takikawa, mentored by Neil David Mitchell
**Unsuccessful**. GuiHaskell does not exist in any usable form. (The homepage summarizes the situation thusly: ["**Warning**: This project is fragile, unfinished, and I do not recommend that anyone tries using it."](
6 successes; 2 unsuccessful; 1 unknown.
#### See also
- [The Monad.Reader's](!Hawiki "The Monad Reader") [issue 9]( covers SoC projects
- <>
### 2008
The [2008 homepage]( isn't kind enough to list all the projects, but it does tell us that only 7 projects were accepted by Google.
So we can work from the []( page which lists 6:
- "C99 Parser/Pretty-Printer"; by Benedikt Huber, mentored by Iavor Diatchki
**Successful**. The first try failed, but the second won through, and now people are doing things like [parsing the Linux kernel]( with it.
- ["GMap - Fast composable maps"](; by Jamie Brandon. mentored by Adrian Charles Hey
**Unsuccessful**. GMap is on [Hackage](, but there are [0 users]( after 3 years.
- "Haskell API Search"; Neil Mitchell, mentored by Niklas Broberg
**Successful**. The improved performance and search capability have made it into [Hoogle](!Hackage "hoogle") releases, and Hoogle is one of the more popular Haskell applications (with [1.7m web searches](
- ["Cabal 'make-like' dependency framework"](; Andrea Vezzosi, mentored by Duncan Coutts
**Unsuccessful**. ([His code]( [wound]( [up]( becoming [hbuild](, which is not on Hackage or apparently used by anyone.)
- ["GHC plugins"](; Maximilian Conroy Bolingbroke, mentored by Sean Seefried
**Unsuccessful**? As of [January 2010](, the patch adding plugins functionality has yet to be accepted & applied; as of February 2011, the [ticket]( remains open and the code unmerged. The code is apparently not yet bitrotten by the passage of 3 years but how long can its luck last? The code was finally merged in 4 August 2011; [the docs]( do not list any users.
- "Data parallel physics engine"; Roman Cheplyaka, mentored by Manuel M. T. Chakravarty
**Unsuccessful**. It seems to be finished but no use made of the actual engine that I can see mentioned on the [engine's blog]( (I would give reverse dependency statistics, but [Hpysics](!Hawiki) seems to have never been uploaded to Hackage.)
- "GHC API"; Thomas Schilling, mentored by Simon Marlow <!-- -->
**Unsuccessful**. Schilling's fixes went in, but they were in general minor changes (like adding the GHC monad) or bug-fixes; the GHC API remains a mess.
2 successful, 5 unsuccessful.
#### Don Stewart's view
[Don Stewart writes]( in reply to the foregoing:
> "We explicitly pushed harder in 2008 to clarify and simplify the goals of the projects, ensure adequate *prior Haskell experience* and to focus on libraries and tools that directly benefit the community.
> And our success rate was much higher.
> So: look for things that benefit the largest number of Haskell developers and users, and from students with proven Haskell development experience. You can't learn Haskell from zero on the job, during SoC."
#### See also
- The Monad.Reader's [Issue 12](
### 2009
5 projects were [accepted]( this year; Darcs tried to apply in its own right was rejected.
In general, these looked good. Most of them will be widely useful -- especially the Darcs and Haddock SoCs -- or address longstanding complaints (many criticisms of laziness revolve around how unpredictable it makes memory consumption). The only one that bothers me is the EclipseFP project. I'm not sure Eclipse is common enough among Haskellers or potential Haskellers to warrant the effort^[In the [2010 survey]( of Haskellers, 3% reported ever using Eclipse for Haskell programming. In the [2011 survey](, 4% did.], but at least the project is focused on improving an existing plugin than writing one _ab initio_. The 5 were:
- ["Optimising Darcs for medium to large repositories"](; by Petr Ročkai; mentored by [Eric Kow](
**Unknown**. [hashed-storage]( exists and is used in Darcs, but from watching the bugtracker traffic, it's unclear whether Darcs saw a net gain from it.
- ["haskell-src-exts -> haskell-src"](; by Niklas Broberg; mentored by Neil Mitchell
**Successful**. Niklas added a large number of [patches]( but it's unclear to mean what practical benefit it adds besides handling comments now (which was useful for hlint). Speaking practically, [haskell-src]( has 104 reverse dependencies, and [haskell-src-exts]( has 223; so the latter seems to have indeed surpassed its predecessor.
- ["Haddock improvements"](; by Isaac Dupree; mentored by David Waern
**Successful?**. Dupree's [patches]( have been applied to head and apparently make cross-package links [usually work](
- ["Improving space profiling experience"](; by Gergely Patai; mentored by Johan Tibell
**Successful**. [hp2any]( seems quite alive and usable.
- ["Extend EclipseFP functionality for Haskell"](; by Thomas ten Cate; mentored by Thomas Schilling
**Unsuccessful**. See [Cate's summing-up](
3 successful, 1 unknown, 1 unsuccessful.
### 2010
[7 projects]( were accepted:
- [Improvements to Cabal's test support](; Thomas Tuegel, mentored by Johan Tibell
**Successful**? The functionality is now in a released version of `cabal-install` and a number of packages use the provided test syntax.^[As of 18 March 2011, I have local copies of 8 repositories which seem to make use of the new syntax: `angle, cabal, concurrent-extra, hashable, rrt, safeint, spatialIndex, unordered-containers, wai-app-static`.] <!-- TODO: update using shell command in ~/bin: find . -name "*.cabal" -exec fgrep - - files-with-matches - - ignore-case 'test-suite ' {} \; | fgrep -v share | fgrep -v 'abal/tests/' | sort -->
- [Infrastructure for a more social Hackage 2.0](; Matthew Gruen, mentored by Edward Kmett
**Unknown**. [Gruen's blog]( was last updated October 2010, and Hackage still hasn't switched over and gotten the new features & benefit of the rewrite. But the code exists and there is a running [public demo](, so this may yet be a success.
- [A high performance HTML generation library](; Jasper Van der Jeugt, mentored by Simon Meier
**Successful** [blaze-html](!Hackage) has been released and is actively developed; version has [50 total reverse dependencies]( and [blaze-builder](!Hackage) has [97 reverse dependencies]( though there's much overlap. (This site is built on [hakyll](!Hackage), which uses blaze-html.)
- [Improvements to the GHC LLVM backend](; Alp Mestanogullari, mentored by Maximilian Bolingbroke
**Unsuccessful**. Dan Peebles in #haskell says that Alp's SoC never got off the ground when his computer died at the beginning of the summer; with nothing written or turned in, this can't be considered a successful SoC, exactly. But could it have been?
The LLVM backend is still on track to become the default GHC backend^[A development that surprises me, since I had been under the impression that most GHC work ultimately winds up being scrapped or abandoned like [Liskell]( or [Mobile Haskell](], suggesting that it's popular in GHC HQ (and the [DDC]( dialect), and it seems to also be popular among [Haskell bloggers]( The scope is restricted to taking a working backend and optimizing it. In general, it seems like a decent SoC proposal, and better than the next one:
- [Implementing the Immix Garbage Collection Algorithm](; Marco Túlio Gontijo e Silva, mentored by Simon Marlow
**Unsuccessful**. The GHC repository history, as of 4 February, contains no patches adding Immix GC. Silva writes in his blog's [SoC summary]( that "Although the implementation is not mature enough to be included in the repository, I’m happy with the state it is now. I think it’s a good start, and I plan to keep working on it." (His [new blog](, begun in August 2010, contains no mention of Immix work.) The [GHC wiki]( says that "it's functional, doesn't have known bugs and gets better results than the default GC in the nofib suite. On the other hand, it gets worse results than the default GC for the nofib/gc suite." Marco said in a [Disqus comment]( on this page:
> "Hi. I wondered about continuing my work on the Immix GC collector, but Simon Marlow, my mentor, thought it was not a good idea to invest more effort on Immix. So I dropped it, and started working on other things. Greetings."
- ["Improving Darcs Performance"](; Adolfo Builes, mentored by Eric Kow
**Unknown**. This replaced a previous proposal to write a Haskell binding to the [GObject](!Wikipedia) library, which never started. Looking through the Darcs repository history, I see a number of new tests related to the global cache, but no major edits to cache-related modules.
- [Improving Darcs's network performance](; Alexey Levan, mentored by Petr Rockai
**Successful**. Levan divided his SoC into 2 parts, improving Darcs's performance in fetching the many small files that make up a repository's revision history, and writing 'a smart server that can provide clients with only files they need in one request'. The 'smart server' seems to have been abandoned as not being worthwhile, but the fetching idea was implemented and will be in the [2.8 release](
The [basic idea]( is to combine all the small files into a single [tarball](!Wikipedia "tar (file format)") which can be downloaded at full speed, and avoid the latency of many roundtrips. The 2.8 release description claims that when `darcs optimize --http` was used on the Darcs repository, a full download went from 40 minutes to 3 minutes. This feature would not be enabled by default, but the gain for larger repositories would be large enough that I feel comfortable classifying it as a successful SoC.
#### Predicting 2010 results
Borrowing from our [3 cardinal sins](#lessons-learned) of SoCs, and per my usual practice of testing my understanding by [making predictions](Prediction markets#calibration), what predictions do I make about the 2010 SoCs?
Most of the 7 SoCs are laudably focused on an existing application. You don't need to justify a speedup of normal Darcs operations because there's an installed base of Darcs users that will benefit; a new GC for GHC or a LLVM backend will benefit every Haskeller; better Cabal support for testing may go unused by many package authors who either have no tests or don't want to bother - but a fair number will bother, and it will get maintained as part of Cabal, and similarly for the Hackage 2.0 project.
The Immix GC strikes me as a very challenging summer project; a GC is one of the most low-level pieces of a functional language and is intertwined with all sorts of code and considerations. It would not surprise me if that project wound up just getting a little closer to a working Immix GC but not producing a production-quality GC scheduled to come to compilers near you.
2 in particular concern me as potentially falling prey to sins #2 & 3: the GObject-binder tool, and the high-performance HTML library:
1. Let's assume that the HTML library does wind up as being faster than existing libraries, and as useful - that compromises don't destroy its utility. Who will use it? It will almost surely have an API different enough from existing libraries that a conversion will be painful. There are roughly 42 users of the existing [xhtml]( library; will their authors wish to embrace a cutting-edge infant library? Is HTML generation even much of a bottleneck for them? (Speaking just for Gitit, Pandoc and its HTML generation are not usually a bottleneck.)
2. The case against the GObject project makes itself; GTK2Hs isn't as widely used as one would expect, and this seems to be due to the difficulty of installation and its general complexity. So there are few users of existing libraries; would there be more users for those libraries no one has bothered to bind nor yet clamored for? (This project might fall afoul of sin #1, but I do not know how difficult the GObject data is to interpret.)
#### 2010 results
As of February 2010, I grade the 7 SoC for 2010 as follows: 3 successes, 2 unknown, and 2 unsuccessful. (One unknown, Hackage 2.0, will probably turn out to be a success once it goes live as the main Hackage site.) As one would hope, the results seem to be better than the results for 2008 or 2009.
Of my original predictions, I think I was right about the Immix GC & GObject & Darcs network optimization, semi-right about Hackage 2.0 & Cabal testing support, somewhat wrong about the LLVM work, and completely wrong about the HTML/`blaze` SoC. (I am not sure why I was wrong about the last, and don't judge myself harshly for not predicting the [exogenous](!Wikipedia) failure of the LLVM SoC.)
### 2011
[]( got 7 projects again for 2011. They are:
1. ["Improve EclipseFP"](; Alejandro Serrano, mentored by Thomas Schilling
> "Eclipse is one of the most popular IDEs in our days. EclipseFP is a project developing a plug-in for it that supports Haskell. Now, it has syntax highlighting, integration of GHCi and supports some properties of Cabal files. My idea is to extend the set of tools available, at least with:
> - Autocompletion and better links to documentation,
> - A way to run unit tests within Eclipse,
> - More support for editing Cabal files visually, including a browser of the available packages."
2. ["Simplified OpenGL bindings"](; Alexander Göransson, mentored by Jason Dagit
> "Modernize and simplify OpenGL bindings for Haskell. Focus on safety, shaders and simplicity."
3. ["Interpreter Support for the Cabal-Install Build Tool"](; anklesaria, by Duncan Coutts
> "This project aims to provide cabal-install with an '[repl](!Wikipedia "Read–eval–print loop")' [[`cabal ghci`](] command by adding to the Cabal API. This would allow package developers to use GHCi and Hugs from within packages requiring options and preprocessing from Cabal. "
4. ["Convert the `text` package to use UTF-8 internally"](; Jasper Van der Jeugt, by Edward Kmett ([detailed proposal](
> "For Haskell projects handling Unicode text, the `text` library offers both speed and simplicity-of-use. When it was written, benchmarks indicated that UTF-16 would be a good choice for the internal encoding in the library. However, these (rather artificial) benchmarks were did not take into account the time taken to
> 1. decode the 'Real World' data and
> 2. encode it to write it back.
> I propose to
> 1. benchmark and
> 2. convert the library to UTF-8 if it is a faster choice for 'Real World'-applications."
5. ["Build multiple Cabal packages in parallel"](; Mikhail Glushenkov, by Johan Tibell
> "Cabal is a system for building and packaging Haskell libraries and programs. This project's aim is to augment Cabal with support for building packages in parallel. Many developers have multi-core machines, but Cabal runs the build process in a single thread, only making use of one core. If the build process could be parallelized, build times could be cut by perhaps a factor of 2-8, depending on the number of cores and opportunity of parallel execution available."
6. ["Darcs Bridge"](; Owen Stephens, by Ganesh Sittampalam
> "My proposed project is to create a generic bridge that will enable easy interoperability and synchronisation between Darcs and other VCSs. The bridge will be designed to be generic, but the focus of this project will be Darcs2 ↔ Git and Darcs2 ↔ Darcs1. The bridge should allow loss-less, correct conversion to and from Darcs repositories, allowing users to use the tool that suits them and their project best, be that Darcs as it currently exists, or another tool."
7. ["Darcs: primitive patches version 3"]( ([expanded blog description](; Petr Ročkai, by Eric Kow
> "Darcs, a revision control system, uses so-called patches to represent changes to individual version-controlled files, where the 'primitive' patches are the lowest level of this representation, capturing notions like 'hunks' (akin to what `diff(1)` produces), token replace and file and directory addition/removal. I propose to implement a different representation of these primitive patches, hoping to improve both performance and flexibility of darcs and to facilitate future development."
#### Predicting 2011 results
Which seem like good selections for SoC, and which seem less appropriate?
1. \#1 is the *second* EclipseFP SoC, after a failed [2009](#2009) attempt; why should we think this one will do better?
2. With #2, the fear is that the result will not be used; there is an OpenGL binding already, after all, and I haven't heard that there are very many people who want to do OpenGL graphics but were deterred by complexity or danger in it.
3. `cabal ghci` is a long-requested Cabal feature, and it sounds as if all the groundwork and experimentation has been done. I have no problem with this one.
4. Benchmarking sounds quite doable, and `text` is increasingly used; but if I had to criticize it, I would criticize it for *under*ambition, for sounding too modest and not a good use of a slot.
5. \#5 is a second crack at the parallel compilation problem (building on a [2008](#2008) SoC) and is troubling in the same way the EclipseFP SoC is.
6. There are multiple existing Darcs->other VCS programs, so the task is quite doable. An escape hatch would be very valuable for users (even if rarely used).
7. This one sounds tremendously speculative to me.
I respect Ročkai & Kow, but in idling on `#darcs` and reading the occasional Darcs-related emails & Reddit posts, I don't know of any fully worked out design for said patch design, which makes it a challenging theoretical problem (patch theory being general & powerful), a major implementation issue (since the existing primitive patches are naturally assumed all throughout the Darcs codebase), and difficult to verify that it will not backfire on users or legacy repositories. All in all, #7 sounds like the sort of project where the *best* case scenario is a repository branch/fork somewhere that few besides the author understands, which is better on some usecases and worse on others, but not actually in general use. That might be a success by the Darcs's team's lights, but not in the sense I have been using in this history.
To summarize my feelings:
- \#1 seems a bit doubtful but is more likely to succeed (because presumably most of the heavy lifting was done previously).
- I predict #2 & #7 will likely fail
- I would be mildly surprised if *both* #3 & #5 succeed - since they're challenging and long-request Cabal features - but I expect at least one of them to succeed. Which, I am not sure.
- I expect with confidence that #4 & #6 will succeed.
#### 2011 results
1. "Improve EclipseFP"; Alejandro Serrano, mentored by Thomas Schilling
**Successful**. The [coding]( was finished, to the author's apparent satisfaction, and the work was included in the [2.1.0]( release.
2. "Simplified OpenGL bindings"; Alexander Göransson, mentored by Jason Dagit
**Unsuccessful**. Jason Dagit says Alexander never started for unknown personal reasons and so no work was ever done (no `OpenGLRawNice` library exists, a post-August 2011 Google search for "Alexander Göransson OpenGL" is dry, nothing on Hackage seems to mention OpenGL 4.0 support, etc.).
3. "Interpreter Support for the Cabal-Install Build Tool"; anklesaria, by Duncan Coutts
**Unsuccessful**? anklesaria's final post, ["Ending GSoC"](, says the work is done and provide a repository with patches by `` - but no patches by that email appear in the Cabal repository as of 10 December 2011; nor does there appear to be any discussion in the [cabal-dev ML]( archives.
4. "Convert the `text` package to use UTF-8 internally"; Jasper Van der Jeugt, by Edward Kmett
**Successful**. Jasper published 2 posts on benchmarking the converted `text` against the original (["Text/UTF-8: Initial results"]( & ["Text/UTF-8: Studying memory usage"](; discussing the results in ["Text/UTF-8: Aftermath"](, the upshot is that the conversion has a real but small advantage, potentially would cause interoperability problems, requires considerable testing, and won't be merged in (the fork will be maintained against hopes of future GHC optimizations). Jaspers says the benefits wound up being a bigger & cleaner test/benchmark suite, and some optimizations made for the UTF-8 version can be applied to the original. Since Edward Kmett [seems pleased](, I have marked it a success (although I remain dubious about whether it was a good SoC).
5. "Build multiple Cabal packages in parallel"; Mikhail Glushenkov, by Johan Tibell
**Unsuccessful**? Glushenkov reported in ["Parallelising cabal-install: Results"]( that the patches were done and people could play with his repository; the comments report that it basically works and does offer speedups. However, as before, no patch by him appears in the mainline Cabal, and the last discussion was 6 November 2011 where [he provides a patch bundle]( No one commented; Mikhail says the patches may be "too invasive" and need reworking before merging.[^Mikhail] Hopefully it will be merged in soon and I can mark it 'Successful'.
6. "Darcs Bridge"; Owen Stephens, by Ganesh Sittampalam
**Successful**? Owen's [blog posts]( conclude with ["GSoC: Darcs Bridge - Results"]( summarizing the final features: he succeeded in most of the functionality. Brent Yorgey tells me that he has successfully used the tool to convert repositories to put onto Github, but says there are "some critical bugs" and use is still "clunky" (eg. currently requiring Darcs HEAD; see the usage guide on the [Darcs wiki]( Whether the bugs will be fixed and the package polished to the point where it will be widely used remains to be seen.
7. "Darcs: primitive patches version 3"; Petr Ročkai, by Eric Kow
**Unsuccessful**. Ročkai wrote two posts (["soc reloaded: progress 1"]( & ["soc reloaded: Outcomes"]( This seems to have turned out as I predicted above:
> "Since my last report, I have decided to turn somewhat more radical again. The original plan was to stick with the darcs codebase and do most (all) of the work within that, based primarily on writing tests for the testsuite and not exposing anything of the new functionality in a user-visible fashion. I changed my mind about this. The main reason was that the test environment, as it is, makes certain properties hard to express: a typical test-suite works with assertions (HUnit) and invariants (QuickCheck). In this environment, expressing ideas like 'the displayed patches are aesthetically pleasing' or 'the files in the repository have reasonable shape' is impractical at best. An alternative would have been to make myself a playground using the darcs library to expose the new code. But the fact is, our current codebase is entrenched in all kinds of legacy issues, like handling filenames and duplicated code. It makes the experimenter’s life harder than necessary, and it also involves rebuilding a whole lot of code that I never use, over and over. All in all, I made a somewhat bold decision to cut everything that lived under `Darcs.Patch` (plus a few dependencies, as few as possible) into a new library, which I named `patchlib`, in the best tradition of `cmdlib`, `pathlib` and `fslib`. At that point, I also removed custom file path handling from that portion of code, removed the use of a custom `Printer` (a pretty-printer implementation) module and a made few other incompatible changes."
The remaining work?
> "The obvious future work lies in the conflict handling. There are two main options in this regard: either re-engineer a patch-level, commute-based representation of conflicts (in the spirit of mergers and conflictors), as V3 'composite' patches, or alternatively, use a non-patch based mechanism for tracking conflicts and resolutions. It’s still somewhat early to decide which is a better choice, and they come with different trade-offs. Nevertheless, the decision, and the implementation, constitute a major step towards darcs 3. The other major piece of work that remains is the repository format: in this area, I have done some research in both the previous and this year’s project, but there are no definitive answers, even less an implementation. I think we now have a number of good ideas on how to approach this. We do need to sort out a few issues though, and the decision on the conflict layer also influences the shape of the repository.
> Each of these two open problems is probably about the size of an ambitious SoC project. On top of that, a lot of integration work needs to happen to actually make real use of the advancements. We shall see how much time and resources can be found for advancing this cause, but I am relatively optimistic: the primitive level has turned out fairly well, and to me it seems that shedding the shackles of legacy code sprawl can boost the project as a whole significantly forward."
As I wrote before, the Darcs team will disagree with my assessment, but I believe marking it 'Unsuccessful' is most consistent with how all previous SoCs have been judged[^IRC].
[^Mikhail]: 11 December 2011, [Google+](
> "Regarding the parallel cabal-install patches - Duncan is concerned that my changes are too invasive. I hope to get them merged in during the next few months after some reworking (we're currently discussing what needs to be done)."
[^IRC]: From [my conversation in `#darcs`]( with Eric Kow and other Darcs developers:
< kowey> mornfall [Petr Ročkai] and I did discuss the proposal beforehand... one
thing to clear up first of all is that this is very specifically about the primitive
patch level and not a wider patch theory project
< kowey> the difference being that it's easier to do in a SoC project
< owst> Also, mornfall has the advantage of being very experienced with the Darcs
code-base, and its concepts - he's not going to require time to "get used to it"
so I'd argue he's certainly not the average SoC student...
< kowey> I think mornfall has also put a good show of effort into thinking about
(A) building off previous thinking on the matter (see his proposal),
(B) fitting into the Darcs agenda -- particularly in aiming for this work to happen in mainline
with the help of recent refactors and also to result in some cleanups and
(C) making the project telescope
< gwern> owst: well, in a sense, that's a negative for the project as well as a positive implementation-wise
- SoCs are in part about bringing new people into communities
< kowey> by telescope I mean, have a sane ordering of can-do to would-be-awesome
< Heffalump> gwern: yeah, though the Haskell mentors didn't see it that way
< kowey> (the mental image being that you can collapse a telescope)
< gwern> owst: I didn't mention that because I'm trying to not be unrelentingly negative,
and because investigating backgrounds of everyone would require hours of work
< kowey> (sorry, I misread and see now that gwern did catch that this was primpatch specific)
< owst> gwern: in part, but not in full - they are ultimately also about "getting code written" for a project
and that's certainly going to happen for mornfall's project!
< gwern> owst: that's the same reason I don't also judge SoCs by whether the student continued on in the community
- because it'd be too damn much work
< owst> gwern: sure, I thought as much.
< gwern> owst: even though the student's future work would probably flip a number of projects from failure to success and vice-versa
(eg. what has Spencer Janssen been doing lately? how many of the SoC students you see on the page did that and have not
been heard from since like Mun of Frag?)
< gwern> so, I just judge on whether the code gets used a lot and whether it did something valuable
< kowey> it's a project that has long-term value for Darcs
< kowey> I think I agree with the last line of your prediction,
"That might be a success by the Darcs’s team’s lights, but not in the sense I have been using in this history."
< kowey> although I'm certainly hoping for something better in the middle bit: code that winds up in darcs mainline
plus specifications on the wiki
So of the 7 2011 SoCs:
- 4 were unsuccessful (3 possibly not)
- 3 were successful (1 possibly not)
My predictions were in general accurate; I remain hopeful that at least one of the Cabal SoCs will be merged in, which would give me a clean sweep and also render the final 2011 SoC record as good as the 2010 SoC record.
It troubles me that neither Cabal SoC has been merged in yet, in line with the historical trend for big Cabal SoC improvements to be partially done but never go into production. Duncan Coutts [says]( they are in the queue, but if neither gets merged in before the 2012 SoC starts, the lesson seems to be that Cabal is too dangerous and uncertain to waste SoCs on.
### Lessons learned
So, what lessons can we learn from the past years of SoCs? It seems to me like there are roughly 3 groups of explanations for failure. They are:
1. _Hubris_. GuiHaskell is probably a good example; it is essentially a bare-bones IDE, from its description. It is expecting a bit much of a single student in a single summer to write *that*!
2. _Unclear use_. HsJudy is my example here. There are already so many arrays and array types in Haskell! What does HsJudy bring to the table that justifies a FFI dependency? Who's going to use it? Pugs initially did apparently, but perhaps that's just because it was there - when I looked at Pugs/HsJudy in 2007, certainly Pugs had no need of it. (The data parallel physics engine is probably another good example. Is it just a benchmark for the GHC developers? Is it intended for actual games? If the former, why is it a SoC project, and if the latter, isn't that a little hubristic?)
3. _Lack of propaganda_. One of the reasons Don Stewart's bytestring library is so great is his relentless evangelizing, which convinces people to actually take the effort to learn and use Bytestrings; eventually by network effects, the whole Haskell community is affected & improved[^academic]. Some of these SoC projects suffer from a distinct lack of community buy-in - who used HaskellNet? Who used Hat when it was updated? Indifference can be fatal, and can defeat the point of a project. What good is a library that no one uses? These aren't academic research projects which accomplish their task just by existing, after all. They're supposed to be useful to real Haskellers.
### Future SoC proposals
There are 2 major collections of ideas for future SoC projects, aside from the general frustrations expressed in the [annual survey](
- The [Haskell proposals]( [subreddit](, with ideas ranked by popularity
- the [Haskell Summer of Code]( [trac](!Wikipedia)
Let's look at the first 12 and see whether they're good ideas, bad ideas, or indifferent.
1. [port GHC to the ARM architecture]( It would be a good thing if we could easily compile our Haskell programs for ARM, which is used in many cellphones, but an even better idea would [using the LLVM backend]( to [crosscompile](!Wikipedia). It would be somewhat tricky, but LLVM already has fairly solid [cross-compilation support](, and making GHC capable of using it seems like a reasonable project for a student to tackle.
2. ["Implement overlap and exhaustiveness checking for pattern matching"]( this seems both quite challenging and also a specialized use. I use [GADTs]( rarely, but I suspect that those writing GADT code rarely make overlap or omission errors.
3. [Incremental garbage collection]( this *may* be a good idea depending on how much of the code was already written. But I fear that this would go the way of the Immix GC SoC and would be a bad idea.
4. ["ThreadScope with custom probes"]( I don't understand the description and can't judge it.
5. ["A simple, sane, comprehensive Date/Time API"]( having puzzled over date-time libraries before, I'm all for this one! It's a well-defined problem, within the scope of a summer, and meets a need. Its only problem is that it doesn't sound sexy or cool.
6. ["Combine Threadscope with Heap Profiling Tools"]( Uncertain. Going by the [Arch download statistics](, Threadscope is downloaded more often than one would expect, so perhaps integration would be useful.
7. ["Haddock with embedded wiki feature, a la RWH, so we can collaborate on improving the documentation"]( This is a bad idea mostly because there are so many diverging ideas and possible implementations - it's just not clear what one would do. Is it some sort of Haddock server? A Gitit wiki with clever hooks? Some lightweight in-browser editor combined with Darcs?
8. ["HTTP Library Replacement"]( A good idea, assuming the linked attempts and alternate libraries haven't already solved the issue.
9. ["Using Type Inference to Highlight Code *Properly*"]( The difficult part is accessing the type information of an identifier inside a GHCi sessions - a problem probably already solved by [scion](!Hackage). Colorizing the display of a snippet is trivial. So this would make a bad SoC.
10. ["Transformation and Optimisation Tool"]( This initially sounds attractive, but previous refactoring tools have been ignored. The tools that have gotten uptake are things like GHC's `-Wall` (which warns about possible semantic issues) and [hlint](!Hackage) (which warns about style issues and redundancy with standard library functions) - not like Hera.
11. ["Webkit-based browser written in Haskell, similar in [plugin] architecture to Xmonad"]( This is probably the worst single idea in the whole bunch. A web browser these days is an entire operating system, but worse, one in which one must supply and maintain the userland as well; it is a thankless task that will not benefit the Haskell community (except incidentally through supporting libraries), nor a task it is uniquely equipped for. It is an infinite time sink - the only thing worse than this SoC failing would be it succeeding!
12. ["Add NVIDIA CUDA backend for Data Parallel Haskell"]( [DPH](!Hawiki "GHC/Data Parallel Haskell") is rarely used; a CUDA backend would be even more rarely utilized; [CUDA](!Wikipedia) has a reputation for being difficult to coax performance out of; and difficulties would likely be exacerbated by the usual Haskell issues with space usage & laziness. (DPH/CUDA use unboxed strict data, but there are interface issues with the rest of the boxed lazy Haskell universe.) All in all, there are probably better SoCs[^kamatsu].
[^kamatsu]: Liam O'Connor [begs to differ]( on the value of a DPH or CUDA SoC.
# External links
It's difficult to quantify how 'useful' a package is; it's easier to punt and ask instead how 'popular' it is. There are a few different sources we can appeal to:
1. Package downloads:
i. Don Stewart provides, for [Arch Linux](!Wikipedia), [a status page]( which includes Arch download numbers
ii. The [Debian]( (and [Ubuntu]( Popularity Contest offers limited popularity data; eg. [xmonad](
iii. some 2006-2009 [Hackage statistics]( are available by [month]( & [ranking](; live Hackage statistics is an open [bug report]( which will be closed by Matthew Gruen's Hackage 2.0 (2010 SoC)
2. [Reverse dependencies]( can be examined several ways:
i. <>
ii. <>
iii. [cabal-query](!Hackage)
iv. [HackageOneFive](
3. Searching for mentions, blog posts, and unreleased packages elsewhere; key sites to search include:
i. [Haskell subreddit](
ii. [Github](
iii. [Google Code](
iv. the [Haskell wiki](
v. [Haskell mailing lists](
[^complaints]: I can hear the wankers in the peanut gallery - "Yeah, and it's been buggy ever since!" Hush you. ([Waern's reply](
[^academic]: Many good and worthwhile projects suffer this fate because of their academic origins. There's no reward for someone who creates a great technique or library and gets the wider community to adopt it as standard. As far as the Haskell community is concerned, one Don Stewart is worth more than a dozen of Oleg Kiselyov; [Oleg's work]( is mindblowingly awesome in both quantity and quality, everyone acknowledges, but how often does anyone actually *use* any of it?
([Iteratees]( may be the exception; although there are somewhere upwards of [5 implementations]( by Oleg and others, the original [iteratee](!Hackage) has picked up [4 reverse dependencies](, its most popular successor [33]( and iteratees in general may one day become as widely used as bytestrings.)