update ml

sdpython · Oct 8, 2016 · 346a211 · 346a211
1 parent d0809d8
commit 346a211
Show file tree

Hide file tree

Showing 4 changed files with 49 additions and 2 deletions.
diff --git a/_doc/sphinxdoc/source/index.rst b/_doc/sphinxdoc/source/index.rst
@@ -79,7 +79,7 @@ Contenu des enseignements
     :maxdepth: 1
 
     1. Algorithmes et programmation <td_1a>
-    2. Python pour un Data Scientist <td_2a>
+    2. Python pour un Data Scientist / Economiste <td_2a>
     3. Eléments logiciels pour le traitement des données massives <td_3a>
     4. Projets informatiques <projet_info>
     5. Examens <i_exams>

diff --git a/_doc/sphinxdoc/source/questions/images/xgboost.png b/_doc/sphinxdoc/source/questions/images/xgboost.png
diff --git a/_doc/sphinxdoc/source/questions/some_ml.rst b/_doc/sphinxdoc/source/questions/some_ml.rst
@@ -0,0 +1,46 @@
+
+Machine Learning
+================
+
+La bible que tout le monde recommande :
+`The Elements of Statistical Learning <http://statweb.stanford.edu/~tibs/ElemStatLearn/>`_, Trevor Hastie, Robert Tibshirani, Jerome Friedman
+
+Modèle ou features ?
+++++++++++++++++++++
+
+On passe 90% du temps à créer de nouvelles features, 10% restant à améliorer
+les paramètres du modèle : 
+:ref:`Travailleur les features ou changer de modèle <mlfeaturesmodelrst>`.
+
+XGBoost
++++++++
+
+`XGBoost <https://github.com/dmlc/xgboost>`_ 
+est une librairie de machine learning connue pour avoir gagné de nombreuses 
+`compétitions <https://github.com/dmlc/xgboost/blob/master/demo/README.md#machine-learning-challenge-winning-solutions>`_.
+Extrait de `XGBoost: A Scalable Tree Boosting System <https://arxiv.org/pdf/1603.02754.pdf>`_ :
+
+.. image:: images/xgboost.png
+    :height: 400
+
+
+Plusieurs améliorations ont été implémentées pour rendre l'apprentissage rapide
+et capable de gérer de gros volumes de données :
+
+* *exact greedy :* algorithme standard pour apprendre une forêt aléatoire
+* *approximate global :* chaque noeud est un seuil sur une variable, ce seuil est choisi
+  parmi toutes les valeurs possibles ou des quantiles, ces quantiles sont fixes pour un arbre
+* *approximate local :* ou ces quantiles sont réalustés pour chaque noeud
+* *out-of-core :* la librairie compresse les valeurs des variables par colonnes pour réduire l'empreinte
+  mémoire
+* *sparsity aware :* la librairie tient compte des valeurs manquantes qui ne sont pas traitées
+  comme des valeurs comme les autres, chaque noeud d'un arbre possède une direction par défaut 
+* *parallel :* certains traitements sont parallélisés
+
+Interprétabilité
+++++++++++++++++
+
+* `Making Tree Ensembles Interpretable <https://arxiv.org/pdf/1606.05390v1.pdf>`_
+* `Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife <http://jmlr.csail.mit.edu/papers/volume15/wager14a/wager14a.pdf>`_
+* `Random Rotation Ensembles <http://www.jmlr.org/papers/volume17/blaser16a/blaser16a.pdf>`_
+* `Understanding variable importances in forests of randomized trees <http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized-trees.pdf>`_
diff --git a/_doc/sphinxdoc/source/td_2a.rst b/_doc/sphinxdoc/source/td_2a.rst
@@ -185,15 +185,16 @@ Quelques extraits :
 
     notebooks/_gs2a_statdes
     notebooks/_gs2a_ml
+    questions/some_ml
 
 
 *Lectures*
 
 * :ref:`Travailleur les features ou changer de modèle <mlfeaturesmodelrst>`
 * :ref:`Bien démarrer un projet de machine learning <l-debutermlprojet>`
 * :ref:`question_projet_2014`
-* `Making Tree Ensembles Interpretable <https://arxiv.org/pdf/1606.05390v1.pdf>`_
 * `XGBoost: A Scalable Tree Boosting System <https://arxiv.org/pdf/1603.02754.pdf>`_
+* `Making Tree Ensembles Interpretable <https://arxiv.org/pdf/1606.05390v1.pdf>`_
 * `Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife <http://jmlr.csail.mit.edu/papers/volume15/wager14a/wager14a.pdf>`_
 * `Random Rotation Ensembles <http://www.jmlr.org/papers/volume17/blaser16a/blaser16a.pdf>`_
 * `Understanding variable importances in forests of randomized trees <http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of-randomized-trees.pdf>`_