Skip to content

Google summer of code (GSOC) 2015

B@rmaley.exe edited this page Mar 5, 2015 · 35 revisions

This is the page for coordination of the GSoC for scikit-learn.

Scikit-learn is a machine learning module in Python. See for more details.

Scikit-learn is taking part of the GSoC trough the Python Software Foundation:

Instructions to student: achieving a good proposal

Difficulty: Scikit-learn is a technical project. Contributing via a GSoC requires a number of expertise in Python coding as well as numerical and machine learning algorithms.

Important: Read: Expectations for prospective students

Application template: Please follow this template.

Also important: A letter from Gaël to former applicants. His suggestions are just as relevant this year.

Hi folks,

The deadline for applications is nearing. I'd like to stress that the scikit-learn will only be accepting high-quality application: it is a challenging, though rewarding, project to work with. To maximize the quality of your application, here are a few advice:

  1. First discuss on the mailing list a pre-proposal. Make sure that both the scikit-learn team and yourself are entousiastic about the idea. Try to have one or two possible mentors that hold a dialog with you.

  2. Satisfy the PSF requirements ( briefly:

    • Demonstrate to your prospective mentor(s) that you are able to complete the project you've proposed
    • Blog for your GSoC project.
    • Contribute at least one patch to the project

I'd add the the patch should be somewhat substantial, not just fixing typos.

To contribute patch, please have a look at the [contribution guide] ( and the Easy issues in the tracker.

  1. In parallel with 2, start a online document (google doc, for instance) to elaborate your final proposal, and if you manage to convince mentors, you can get feedback on it.

As a final note, I want to stress that GSOC projects are ambitious: we are talking about a few months of full time work. Thus the ideas proposed are idea challenging, and the students are supposed to draw a battle plan, with difficult variants and less difficult variants. The GSOC is a full major set of contributions, not a single pull request.

Good luck, I am looking forward to seeing the proposals. You'll see, the scikit is a big friendly and enthousiastic community,


A list of topics for a Google summer of code (GSOC) 2015

Disclaimer: This list of topics is currently being updated from last year's, and some information (like the names of possible mentors) is not definitive. Please e-mail the list with any questions.

Online Low Rank Matrix Completion

Possible mentor: Olivier Grisel, Vlad Niculae, Peter Prettenhofer (backup)

Possible candidate:

Goal: Online or Minibatch SGD or similar on a squared l2 reconstruction loss + low rank penalty (nuclear norm) on scipy.sparse matrix: the implicit components of the sparse input representation would be interpreted by the algorithms as missing values rather than zero values.

Application: Build a scalable recommender system example, e.g. on the movielens dataset.

TODO: find references in the literature. Matrix Factorization Jungle

Improve GMM

Possible mentors: Gael Varoquaux, Vlad Niculae, Andreas Mueller

Possible candidate: ???

  • Refurbish the current GMM code to put it to the scikit's standards
  • Implement a core-set strategy for GMM

Issue to get started :

Generalized Additive Models ( GAMs )

Possible mentor: Paolo Losi, Alex Gramfort, (others?)

Possible candidate:

Goal: Add the additive ( or additive_model ) directory and implement a few additive models.

  • Help finishing up the PR by jcrudy on including pyearth into scikit
  • Add Generalized Additive Model ( GAM )
  • Add SpAM ( Sparse Additive Model )
  • Add GAMLSS ( GAM for Location Scale and Shape )
  • Add LISO ( LASSO ISOtone for High Dimensional Additive Isotonic Regression)


Metric learning

Possible mentor: ???

Possible candidate: Barmaley-exe, ???

Goal: add some of metric learning algorithms (like NCA, ITML, LMNN) to be used with KNNs and as transformers. Brian Kulis has a survey and a tutorial on metric learning, that seem to be a good place to start.

Possible mentors

Here are people that have said that they might be available for mentoring:

Gaël Varoquaux, Vlad Niculae, Olivier Grisel, Andreas Mueller, Alexandre Gramfort, Arnaud Joly, Michael Eickenberg.

Clone this wiki locally