# Resources

---


# Related Projects[](https://scikit-learn.org/stable/related_projects.html#related-projects "Permalink to this headline")

Projects implementing the scikit-learn estimator API are encouraged to use the  [scikit-learn-contrib template](https://github.com/scikit-learn-contrib/project-template)  which facilitates best practices for testing and documenting estimators. The  [scikit-learn-contrib GitHub organisation](https://github.com/scikit-learn-contrib/scikit-learn-contrib)  also accepts high-quality contributions of repositories conforming to this template.

Below is a list of sister-projects, extensions and domain specific packages.

## Interoperability and framework enhancements[](https://scikit-learn.org/stable/related_projects.html#interoperability-and-framework-enhancements "Permalink to this headline")

These tools adapt scikit-learn for use with other technologies or otherwise enhance the functionality of scikit-learn’s estimators.

**Data formats**

-   [Fast svmlight / libsvm file loader](https://github.com/mblondel/svmlight-loader)  Fast and memory-efficient svmlight / libsvm file loader for Python.
    
-   [sklearn_pandas](https://github.com/paulgb/sklearn-pandas/)  bridge for scikit-learn pipelines and pandas data frame with dedicated transformers.
    
-   [sklearn_xarray](https://github.com/phausamann/sklearn-xarray/)  provides compatibility of scikit-learn estimators with xarray data structures.
    

**Auto-ML**

-   [auto-sklearn](https://github.com/automl/auto-sklearn/)  An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
    
-   [TPOT](https://github.com/rhiever/tpot)  An automated machine learning toolkit that optimizes a series of scikit-learn operators to design a machine learning pipeline, including data and feature preprocessors as well as the estimators. Works as a drop-in replacement for a scikit-learn estimator.
    

**Experimentation frameworks**

-   [REP](https://github.com/yandex/REP)  Environment for conducting data-driven research in a consistent and reproducible way
    
-   [Scikit-Learn Laboratory](https://skll.readthedocs.io/en/latest/index.html)  A command-line wrapper around scikit-learn that makes it easy to run machine learning experiments with multiple learners and large feature sets.
    

**Model inspection and visualisation**

-   [dtreeviz](https://github.com/parrt/dtreeviz/)  A python library for decision tree visualization and model interpretation.
    
-   [eli5](https://github.com/TeamHG-Memex/eli5/)  A library for debugging/inspecting machine learning models and explaining their predictions.
    
-   [mlxtend](https://github.com/rasbt/mlxtend)  Includes model visualization utilities.
    
-   [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick)  A suite of custom matplotlib visualizers for scikit-learn estimators to support visual feature analysis, model selection, evaluation, and diagnostics.
    

**Model selection**

-   [scikit-optimize](https://scikit-optimize.github.io/)  A library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization, and includes a replacement for  `GridSearchCV`  or  `RandomizedSearchCV`  to do cross-validated parameter search using any of these strategies.
    
-   [sklearn-deap](https://github.com/rsteca/sklearn-deap)  Use evolutionary
    
    algorithms instead of gridsearch in scikit-learn.
    

**Model export for production**

-   [onnxmltools](https://github.com/onnx/onnxmltools)  Serializes many Scikit-learn pipelines to  [ONNX](https://onnx.ai/)  for interchange and prediction.
    
-   [sklearn2pmml](https://github.com/jpmml/sklearn2pmml)  Serialization of a wide variety of scikit-learn estimators and transformers into PMML with the help of  [JPMML-SkLearn](https://github.com/jpmml/jpmml-sklearn)  library.
    
-   [sklearn-porter](https://github.com/nok/sklearn-porter)  Transpile trained scikit-learn models to C, Java, Javascript and others.
    
-   [treelite](https://treelite.readthedocs.io/)  Compiles tree-based ensemble models into C code for minimizing prediction latency.
    

## Other estimators and tasks[](https://scikit-learn.org/stable/related_projects.html#other-estimators-and-tasks "Permalink to this headline")

Not everything belongs or is mature enough for the central scikit-learn project. The following are projects providing interfaces similar to scikit-learn for additional learning algorithms, infrastructures and tasks.

**Structured learning**

-   [tslearn](https://github.com/tslearn-team/tslearn)  A machine learning library for time series that offers tools for pre-processing and feature extraction as well as dedicated models for clustering, classification and regression.
    
-   [sktime](https://github.com/alan-turing-institute/sktime)  A scikit-learn compatible toolbox for machine learning with time series including time series classification/regression and (supervised/panel) forecasting.
    
-   [HMMLearn](https://github.com/hmmlearn/hmmlearn)  Implementation of hidden markov models that was previously part of scikit-learn.
    
-   [PyStruct](https://pystruct.github.io/)  General conditional random fields and structured prediction.
    
-   [pomegranate](https://github.com/jmschrei/pomegranate)  Probabilistic modelling for Python, with an emphasis on hidden Markov models.
    
-   [sklearn-crfsuite](https://github.com/TeamHG-Memex/sklearn-crfsuite)  Linear-chain conditional random fields ([CRFsuite](http://www.chokkan.org/software/crfsuite/)  wrapper with sklearn-like API).
    

**Deep neural networks etc.**

-   [nolearn](https://github.com/dnouri/nolearn)  A number of wrappers and abstractions around existing neural network libraries
    
-   [keras](https://github.com/fchollet/keras)  Deep Learning library capable of running on top of either TensorFlow or Theano.
    
-   [lasagne](https://github.com/Lasagne/Lasagne)  A lightweight library to build and train neural networks in Theano.
    
-   [skorch](https://github.com/dnouri/skorch)  A scikit-learn compatible neural network library that wraps PyTorch.
    

**Broad scope**

-   [mlxtend](https://github.com/rasbt/mlxtend)  Includes a number of additional estimators as well as model visualization utilities.
    

**Other regression and classification**

-   [xgboost](https://github.com/dmlc/xgboost)  Optimised gradient boosted decision tree library.
    
-   [ML-Ensemble](https://mlens.readthedocs.io/)  Generalized ensemble learning (stacking, blending, subsemble, deep ensembles, etc.).
    
-   [lightning](https://github.com/scikit-learn-contrib/lightning)  Fast state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc…).
    
-   [py-earth](https://github.com/scikit-learn-contrib/py-earth)  Multivariate adaptive regression splines
    
-   [Kernel Regression](https://github.com/jmetzen/kernel_regression)  Implementation of Nadaraya-Watson kernel regression with automatic bandwidth selection
    
-   [gplearn](https://github.com/trevorstephens/gplearn)  Genetic Programming for symbolic regression tasks.
    
-   [scikit-multilearn](https://github.com/scikit-multilearn/scikit-multilearn)  Multi-label classification with focus on label space manipulation.
    
-   [seglearn](https://github.com/dmbee/seglearn)  Time series and sequence learning using sliding window segmentation.
    
-   [libOPF](https://github.com/jppbsi/LibOPF)  Optimal path forest classifier
    
-   [fastFM](https://github.com/ibayer/fastFM)  Fast factorization machine implementation compatible with scikit-learn
    

**Decomposition and clustering**

-   [lda](https://github.com/lda-project/lda/): Fast implementation of latent Dirichlet allocation in Cython which uses  [Gibbs sampling](https://en.wikipedia.org/wiki/Gibbs_sampling)  to sample from the true posterior distribution. (scikit-learn’s  [`sklearn.decomposition.LatentDirichletAllocation`](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html#sklearn.decomposition.LatentDirichletAllocation "sklearn.decomposition.LatentDirichletAllocation")  implementation uses  [variational inference](https://en.wikipedia.org/wiki/Variational_Bayesian_methods)  to sample from a tractable approximation of a topic model’s posterior distribution.)
    
-   [kmodes](https://github.com/nicodv/kmodes)  k-modes clustering algorithm for categorical data, and several of its variations.
    
-   [hdbscan](https://github.com/scikit-learn-contrib/hdbscan)  HDBSCAN and Robust Single Linkage clustering algorithms for robust variable density clustering.
    
-   [spherecluster](https://github.com/clara-labs/spherecluster)  Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hypersphere.
    

**Pre-processing**

-   [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding)  A library of sklearn compatible categorical variable encoders.
    
-   [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn)  Various methods to under- and over-sample datasets.
    

## Statistical learning with Python[](https://scikit-learn.org/stable/related_projects.html#statistical-learning-with-python "Permalink to this headline")

Other packages useful for data analysis and machine learning.

-   [Pandas](https://pandas.pydata.org/)  Tools for working with heterogeneous and columnar data, relational queries, time series and basic statistics.
    
-   [statsmodels](https://www.statsmodels.org/)  Estimating and analysing statistical models. More focused on statistical tests and less on prediction than scikit-learn.
    
-   [PyMC](https://pymc-devs.github.io/pymc/)  Bayesian statistical models and fitting algorithms.
    
-   [Sacred](https://github.com/IDSIA/Sacred)  Tool to help you configure, organize, log and reproduce experiments
    
-   [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/)  Visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
    

### Recommendation Engine packages[](https://scikit-learn.org/stable/related_projects.html#recommendation-engine-packages "Permalink to this headline")

-   [implicit](https://github.com/benfred/implicit), Library for implicit feedback datasets.
    
-   [lightfm](https://github.com/lyst/lightfm)  A Python/Cython implementation of a hybrid recommender system.
    
-   [OpenRec](https://github.com/ylongqi/openrec)  TensorFlow-based neural-network inspired recommendation algorithms.
    
-   [Spotlight](https://github.com/maciejkula/spotlight)  Pytorch-based implementation of deep recommender models.
    
-   [Surprise Lib](http://surpriselib.com/)  Library for explicit feedback datasets.
    

### Domain specific packages[](https://scikit-learn.org/stable/related_projects.html#domain-specific-packages "Permalink to this headline")

-   [scikit-image](https://scikit-image.org/)  Image processing and computer vision in python.
    
-   [Natural language toolkit (nltk)](https://www.nltk.org/)  Natural language processing and some machine learning.
    
-   [gensim](https://radimrehurek.com/gensim/)  A library for topic modelling, document indexing and similarity retrieval
    
-   [NiLearn](https://nilearn.github.io/)  Machine learning for neuro-imaging.
    
-   [AstroML](https://www.astroml.org/)  Machine learning for astronomy.
    
-   [MSMBuilder](http://msmbuilder.org/)  Machine learning for protein conformational dynamics time series.



Tutorials
--

-   [Machine Learning for NeuroImaging in Python](https://nilearn.github.io/)
    
-   [Machine Learning for Astronomical Data Analysis](https://github.com/astroML/sklearn_tutorial)

---


## Videos[](https://scikit-learn.org/stable/presentations.html#videos "Permalink to this headline")

-   An introduction to scikit-learn  [Part I](https://conference.scipy.org/scipy2013/tutorial_detail.php?id=107)  and  [Part II](https://conference.scipy.org/scipy2013/tutorial_detail.php?id=111)  at Scipy 2013 by  [Gael Varoquaux](http://gael-varoquaux.info/),  [Jake Vanderplas](https://staff.washington.edu/jakevdp)  and  [Olivier Grisel](https://twitter.com/ogrisel). Notebooks on  [github](https://github.com/jakevdp/sklearn_scipy2013).
    
-   [Introduction to scikit-learn](http://videolectures.net/icml2010_varaquaux_scik/)  by  [Gael Varoquaux](http://gael-varoquaux.info/)  at ICML 2010
    
    > A three minute video from a very early stage of scikit-learn, explaining the basic idea and approach we are following.
    
-   [Introduction to statistical learning with scikit-learn](https://archive.org/search.php?query=scikit-learn)  by  [Gael Varoquaux](http://gael-varoquaux.info/)  at SciPy 2011
    
    > An extensive tutorial, consisting of four sessions of one hour. The tutorial covers the basics of machine learning, many algorithms and how to apply them using scikit-learn. The material corresponding is now in the scikit-learn documentation section  [A tutorial on statistical-learning for scientific data processing](https://scikit-learn.org/stable/tutorial/statistical_inference/index.html#stat-learn-tut-index).
    
-   [Statistical Learning for Text Classification with scikit-learn and NLTK](https://pyvideo.org/video/417/pycon-2011--statistical-machine-learning-for-text)  (and  [slides](https://www.slideshare.net/ogrisel/statistical-machine-learning-for-text-classification-with-scikitlearn-and-nltk)) by  [Olivier Grisel](https://twitter.com/ogrisel)  at PyCon 2011
    
    > Thirty minute introduction to text classification. Explains how to use NLTK and scikit-learn to solve real-world text classification tasks and compares against cloud-based solutions.
    
-   [Introduction to Interactive Predictive Analytics in Python with scikit-learn](https://www.youtube.com/watch?v=Zd5dfooZWG4)  by  [Olivier Grisel](https://twitter.com/ogrisel)  at PyCon 2012
    
    > 3-hours long introduction to prediction tasks using scikit-learn.
    
-   [scikit-learn - Machine Learning in Python](https://newcircle.com/s/post/1152/scikit-learn_machine_learning_in_python)  by  [Jake Vanderplas](https://staff.washington.edu/jakevdp)  at the 2012 PyData workshop at Google
    
    > Interactive demonstration of some scikit-learn features. 75 minutes.
    
-   [scikit-learn tutorial](https://www.youtube.com/watch?v=cHZONQ2-x7I)  by  [Jake Vanderplas](https://staff.washington.edu/jakevdp)  at PyData NYC 2012
    
    > Presentation using the online tutorial, 45 minutes.

---

Challenges
--

If you are equipped with basics of Python Data Science, taking up challenges is a great way to implement the skills.

- [Kaggle Competitions](https://www.kaggle.com/competitions)

- [Iron Viz](https://tableau.com/iron-viz)

- [Driven Data](https://drivendata.org/competitions)

- [Coda Lab](https://competitions.codalab.org)

- [ODS AI](https://ods.ai/competitions)

- [KD Nuggets](https://www.kdnuggets.com/competitions/index.html)
