Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
165 lines (128 sloc) 7.25 KB

Scikit-learn Paris 2013 sprint funding

This document serves to organize and motivate funding for the scikit-learn international sprint that will be held in Paris, Jul 22-28 2013. In short, Scikit-learn is an easy-to-use and general purpose machine learning toolbox written in Python. It provides state-of-the-art implementations of many well known machine learning algorithms, while maintaining an easy-to-use interface tightly integrated with the Python language. The development of this toolbox only started at the beginning of 2010, but in only a few years, the project has grown to a mature world wide opensource project. More than 50 developers have done non trivial contributions, 28 have contributed in the last 4 months and more than 44000 lines of code have been written.

International sprints are a crucial part in helping scikit-learn be a state of the art, well maintained dynamic project. The last international sprint (Granada, Spain, 2011) has had a striking effects on the quality of the code [1], and us integrate new developpers (Andy Mueller, now release manager, Jake Vanderplas, Gilles Louppe...) The next international sprint will take place in Paris, from the 22nd to the 28th of July [2]: your sponsorship helps keeping the sprint affordable for young and talented contributors. If you are interested in funding, please contact us at any time.


1   Scikit-learn: machine learning in Python

1.1   Project usecases and benefits to the Python community

What: The scikit-learn is a BSD-licensed project for machine learning using Python and SciPy. Its goals are to find a good trade-off between ease of use for non specialists, and computational performance.

Why: In the modern data-intensive computing landscape, machine learning algorithms undermine the decision and data mining strategies. Python is increasingly used with big data, for instance by the web industries. It shines in these settings because it is a general-purpose language with e.g. a wealth of web frameworks or connectors to many data sources such as relational databases. In parallel, Python enjoys a growing success in the scientific computing communities. As a result, it benefits from excellent numerical computing libraries. For these reasons, the Python community has both the need and the tools for a easy-to-use yet state-of-the-art machine learning library such as the scikit-learn.

Ease of use: The scikit-learn tries to bridge the gap between academic research and general-purpose programming by being accessible to non-specialists. This goal motivates the following choices:

  • A strong focus on documentation and examples: http://scikit-learn.org
  • Ease of installation by keep the dependencies low (only numpy and scipy) and limiting the use of compiled code
  • Simple API avoiding the use of machine-learning jargon

Performance: Computational performance is achieved by using recent algorithmic results from machine learning academic research and careful profiling-based optimization. The scikit-learn is consistently amongst the best performing machine learning toolkits on mid-sized datasets [1]

[1]Automated Performance benchmarks: http://scikit-learn.github.io/scikit-learn-speed/
[2]Comparative Performance benchmarks http://scikit-learn.github.com/ml-benchmarks/

1.2   Community-driven development

Community: The strength of the scikit-learn project has been its community. As of Sept 2011, the project has had at least 52 different contributors total [2] and 20 contributors have made more than 10 commits during the 0.9 development period. Each new feature is developed collaboratively using pull request and code review [3] . The level if collaborative work in the project is, in our experience, exceptional: feature additions are improved through a iterative effort involving many contributors. The project is management on an open-governance model: features are contributed without any explicit soliciting.

[3]Contributor list: https://github.com/scikit-learn/scikit-learn/contributors
[4]Pull requests: https://github.com/scikit-learn/scikit-learn/pulls

Contributors: Contributors come from around the world. Historically, a kernel of developers where located in France, but the project quickly outgrew these. A majority of developers are located in mostly Europe, followed by North America, but we also receive strong contributions from Asian and South America. Most of the developers are graduate students or young academics, although some work in startups. In summer 2011, the project benefited from a Google summer of code, which was hugely successful.

Sprints: Up to Sept 2011, 9 sprints have been organized [4], mostly located in Paris with remote contributors working over Internet. The sprints open the door to integrating new contributors as they improve communication and lower the barrier of entry to the project.

[5]Past sprints: https://github.com/scikit-learn/scikit-learn/wiki/Past-sprints

2   Paris sprint organization

Goals and technical topics of the sprint are discuss on the Sprint planing wiki page [5]

[6]Sprint planning wiki: https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events

Local organization: the organization is done by four active members of the french python community: Nelle Varoquaux (president of the AFPy), Olivier Grisel (vice secretary of the AFPy), Gaël Varoquaux (chair of Scipy'08, Scipy'09, EuroScipy'10, EuroScipy'11) and Alexandre Gramfort (host of Software Carpentry 2011 and of the Scikit-learn sprint). It also has the support of several research teams: Parietal (INRIA) and Telecom Paris.

Budget: given the local support and investments from various academic institutions, the budget requirements for the sprint is fairly low. However, some contributors work on the project without support from their day job. In addition, some core very active contributors live far from Europe and need a significant travel budget.

Currently, the following trips need financing (in order of priority):

From Estimated cost
Beijing €1500
Liège €200
Sydney €1800
Utrecht €200
Bucharest €300
Vienna €250
Bonn €250
Surrey €300

Thus, to finance all the trips, a total of €4800 is needed. In addition, accomodation will cost around 1500€.

3   Funding contact information

If you are interested in helping to fund this sprint, we would greatly appreciate any support regarding the flights or accommodation of our contributors, or with the venue of our sprint. Any amounts above €200 (the lowest travel cost) are very welcome.

We will display all the sponsors' logos on the following page of the official project site, along with links to their respective sites.

If you would like to contribute by sponsoring, please contact Nelle Varoquaux at nelle dot varoquaux at afpy dot org.