Skip to content

Past sprints

GaelVaroquaux edited this page Mar 26, 2011 · 24 revisions

Past sprints

Paris coding Sprint, 8-9 Sept. 2010

Place:

INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on irc.freenode.org. Room to be determined.

Some ideas:

  • extend the tutorial with features selection, cross-validation, etc
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • Group lasso with coordinate descent in GLM module
  • Covariance estimators (Ledoit-Wolf) -> Regularized LDA
  • Add transform in LDA
  • PCA with fit + transform
  • preprocessing routines (center, standardize) with fit transform
  • K-means with Pybrain heuristic
  • Make Pipeline object work for real
  • FastICA

= Anything you can think of, such as:=

  • Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
  • Canonical Correlation Analysis
  • Kernel PCA
  • Gaussian Process regression

0.4 Coding Sprint, 16 & 17 June 2010

Place:

channel #scikit-learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Some ideas:

  • adapt the plotting features from the em module into gmm module.
  • incorporate more datasets : the diabetes from the lars R package, featured datasets from http://archive.ics.uci.edu/ml/datasets.html , etc.
  • anything from the issue tracker.
  • extend the tutorial with features selection, cross-validation, etc
  • profile and improve the performance of the gmm module.
  • submit some new classifier
  • refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
  • make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • anything you can think of.

= Documentation Week, 14-18 March 2010 =

Place:

channel #learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Possible Tasks:

  • Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
  • Documentation for neural networks (nonexistent)
  • Examples. We currently only have a few of them. Expand and integrate them into the web page.
  • Write a Tutorial.
  • Write a FAQ.
  • Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
  • Review documentation.
  • Customize the sphinx generated html.
  • Create some cool images/logos for the web page.
  • Create some benchmark plots.

= Code sprint in Paris, 3 March 2010 =

Terminated, see http://fseoane.net/blog/2010/scikitslearn-coding-spring-in-paris/

== Participants ==

  • Alexandre Gramfort
  • Olivier Grisel
  • Vincent Michel
  • Fabian Pedregosa
  • Bertrand Thirion
  • Gaël Varoquaux

== Goals ==

Implement a few targeted functionalities for penalized regressions.

== Target functionalities ==

  1. GLMnet
  2. Bayesian Regression (Ridge, ARD)
  3. Univariate feature selection function

Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)

Extras, if time permits:

  1. LARS

== Proposed workflow ==

Pair programming:

  1. GLMNet (AG, OG)
  2. Bayesian regression (FP, VM)
  3. Feature selection (BT, GV)
  4. LARS: Whoever is finished first.

== Place in the repository ==

  1. I think GLMNet goes well in scikits.learn.glm.

Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model

  1. Bayessian regression: scikits.learn.bayes . It's short and explicit.

Edouard: Again the term Bayes might not lead to a clear organization of algorithms.

Past sprints

Paris coding Sprint, 8-9 Sept. 2010

Place:

INRIA research center in Saclay-Ile de France, also in channel #scikit-learn, on irc.freenode.org. Room to be determined.

Some ideas:

  • extend the tutorial with features selection, cross-validation, etc
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • Group lasso with coordinate descent in GLM module
  • Covariance estimators (Ledoit-Wolf) -> Regularized LDA
  • Add transform in LDA
  • PCA with fit + transform
  • preprocessing routines (center, standardize) with fit transform
  • K-means with Pybrain heuristic
  • Make Pipeline object work for real
  • FastICA

= Anything you can think of, such as:=

  • Spectral Clustering + manifold learning (MDS/PCA, Isomap, Diffusion maps, tSNE)
  • Canonical Correlation Analysis
  • Kernel PCA
  • Gaussian Process regression

0.4 Coding Sprint, 16 & 17 June 2010

Place:

channel #scikit-learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Some ideas:

  • adapt the plotting features from the em module into gmm module.
  • incorporate more datasets : the diabetes from the lars R package, featured datasets from http://archive.ics.uci.edu/ml/datasets.html , etc.
  • anything from the issue tracker.
  • extend the tutorial with features selection, cross-validation, etc
  • profile and improve the performance of the gmm module.
  • submit some new classifier
  • refactor the ann module (artificial neural networks) to conform to the API in the rest of the modules, or submit a new ann module.
  • make it compatible with python3 (shouldn't be hard now that there's a numpy python3 relase)
  • design a sphinx template for the main web page [here http://www.flickr.com/photos/fseoane/4573612893/] is a temptative design, but was not translated into a sphinx template.
  • anything you can think of.

= Documentation Week, 14-18 March 2010 =

Place:

channel #learn, on irc.freenode.org. If you do not have an IRC client or behind a firewall, check out http://webchat.freenode.net/

Possible Tasks:

  • Document our design choices (methods in each class, convention for estimated parameters, etc.). Most of this is in ApiDiscussion.
  • Documentation for neural networks (nonexistent)
  • Examples. We currently only have a few of them. Expand and integrate them into the web page.
  • Write a Tutorial.
  • Write a FAQ.
  • Documentation and Examples for Support Vector Machines. What's in the web is totally outdated. Integrate the documentation from gumpy, see ticket:27 (assigned: Fabian Pedregosa)
  • Review documentation.
  • Customize the sphinx generated html.
  • Create some cool images/logos for the web page.
  • Create some benchmark plots.

= Code sprint in Paris, 3 March 2010 =

Terminated, see http://fseoane.net/blog/2010/scikitslearn-coding-spring-in-paris/

== Participants ==

  • Alexandre Gramfort
  • Olivier Grisel
  • Vincent Michel
  • Fabian Pedregosa
  • Bertrand Thirion
  • Gaël Varoquaux

== Goals ==

Implement a few targeted functionalities for penalized regressions.

== Target functionalities ==

  1. GLMnet
  2. Bayesian Regression (Ridge, ARD)
  3. Univariate feature selection function

Edouard: Most of things we need are already in datamind, the main main issue is to cut the dependance with FFF(nipy)

Extras, if time permits:

  1. LARS

== Proposed workflow ==

Pair programming:

  1. GLMNet (AG, OG)
  2. Bayesian regression (FP, VM)
  3. Feature selection (BT, GV)
  4. LARS: Whoever is finished first.

== Place in the repository ==

  1. I think GLMNet goes well in scikits.learn.glm.

Edouard: The GLM term is confusing: Indeed in GLMNet the G means "generalized", however in neuroimaging people understand "general" which is in fact a linear model

  1. Bayessian regression: scikits.learn.bayes . It's short and explicit.

Edouard: Again the term Bayes might not lead to a clear organization of algorithms.

  1. Feature selection: featsel? selection ? I'm not sure about this one.

AG : maybe univ?

Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing:

If there's code that we want to share and it does not fit into any of these schemes, it's ok to put it into sandbox/ (it does not yet exist)

  1. Feature selection: featsel? selection ? I'm not sure about this one.

AG : maybe univ?

Edouard: Maybe it is to early to decide the structure of the repository during your coding sprint. I think this organization should follow discussion we had we Fabian, Gael and Bertand. Next I tried to synthesize those discussions, however its just a proposition and many things are missing:

If there's code that we want to share and it does not fit into any of these schemes, it's ok to put it into sandbox/ (it does not yet exist)

Clone this wiki locally