Skip to content

An Ubuntu VM with Jupyter notebooks and the Scikit and related Python libraries for machine learning

License

Notifications You must be signed in to change notification settings

gavinln/scikit-vm

Repository files navigation

Scikit-vm

About

This project provides a Ubuntu (18.04) Vagrant Virtual Machine (VM) with numerical and scientific libraries for Python. It includes the following libraries.

There are Ansible scripts that automatically install the software when the VM is started.

Running

  1. Change to the root of the project
cd scikit-vm
  1. To start the virtual machine(VM) type
vagrant up
  1. Connect to the VM
vagrant ssh

Jupyter notebook

  1. Install Jupyter notebook extensions
jupyter contrib nbextension install --user
  1. Go to the Edit menu nbextensions config option to setup plugins

  2. Some useful plugins

  • Code prettify
  • Collapsible Headings
  • Comment/Uncomment Hotkey
  • ExecuteTime
  • Select CodeMirror Keymap
  • Table of Contents (2)
  1. Start the Jupyter notebook or Jupyterlab environment
/vagrant/scripts/notebook-jupyter.sh
  1. Open the notebook in the browser at the URL

Jupyterlab extensions

Jupyterlab 3.x supports installing extensions directory using pip without building the extensions using node.

The following extensions are useful

pipenv install jupyterlab-git
pipenv install jupyterlab_vim
pipenv install jupyterlab_lsp
pipenv install aquirdturtle_collapsible_headings  # less useful

Jupyterlab extensions old

  1. Enable the server extension
jupyter labextension install @jupyterlab/shortcutui
jupyter labextension install @jupyter-widgets/Jupyterlab-manager
jupyter labextension install @Jupyterlab/shortcutui
jupyter labextension install @Jupyterlab/toc
jupyter labextension install jupyterlab-jupytext
jupyter labextension install jupyterlab_vim
jupyter labextension install @krassowski/jupyterlab_go_to_definition
jupyter labextension install @aquirdturtle/collapsible_headings

# pip install nbresuse  # displays memory usage on the bottom status bar

# pip install jupyterlab_code_formatter
jupyter labextension install @ryantam626/jupyterlab_code_formatter
# jupyter serverextension enable --py jupyterlab_code_formatter
  1. Setup the keyboard shortcut in the "Advanced Settings Editor"
{
    "shortcuts": [{
            "command": "jupyterlab_code_formatter:yapf",
            "keys": ["Ctrl Shift G"],
            "selector": ".jp-Notebook.jp-mod-editMode"
        }
    ]
}
  1. Install the Jupyter language server

  2. Start the Jupyter lab interface ./scripts/lab-jupyter.sh

  3. Install these Jupyter extensions

  • @jupyter-widgets/Jupyterlab-manager
  • @Jupyterlab/shortcutui
  • @Jupyterlab/toc
  • jupyterlab-jupytext
  • jupyterlab_vim
  • @ryantam626/jupyterlab_code_formatter

nbresuse

Other extensions

From the PyData presentation

  • jupyterlab-git
  • nbdime
  • jupyterlab_code_formatter
  • jupyterlab-toc
  • jupyterlab-quickopen
  • jupyterlab-sidecar
  • jupyterlab-drawio
  • jupyterlab-topbar
  • jupyterlab-sql
  • jupyterlab-celltags
  • jupyterlab-go-to-definition
  • jupyterlab-lsp
  • voila
  • jupyterlab-matplotlib
  • jupyterlab-variableinspector
  • jupyterlab-templates

Scikit-learn notebooks

Scipy 2018

The SciPy 2018 conference has two tutorials on using the Scikit-learn library. There are two videos: Video 1 and Video 2

The notebooks are in a Github project called scipy-2018-sklearn

To get the notebooks run the following

  1. Change to the notebooks directory
cd /vagrant/notebooks
  1. Get the notebooks into the directory scipy2018
svn export https://github.com/amueller/scipy-2018-sklearn/trunk/notebooks scipy2018
  1. In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2018 directory.

Scipy 2017

To get the Scikit learn notebooks from Scipy 2017. The video for this conference is on Youtube

  1. Change to the notebooks directory
cd /vagrant/notebooks
  1. Get the notebooks into the directory scipy2017
svn export https://github.com/amueller/scipy-2017-sklearn/trunk/notebooks scipy2017
  1. In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2017 directory.

Scipy 2016

To get the Scikit learn notebooks from Scipy 2016. The video for this conference is on Youtube

  1. Change to the notebooks directory
cd /vagrant/notebooks
  1. Get the notebooks into the directory scipy2016
svn export http://github.com/amueller/scipy-2016-sklearn/trunk/notebooks scipy2016
  1. In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2016 directory.

Text tutorial

  1. Change to the text notebooks directory
cd /vagrant/notebooks/text_processing
  1. Get the text examples code and data
svn export https://github.com/scikit-learn/scikit-learn/trunk/doc/tutorial/text_analytics/skeletons skeletons
svn export https://github.com/scikit-learn/scikit-learn/trunk/doc/tutorial/text_analytics/data data

Plotting

Matplotlib

Seaborn

Plotly

Plotly.js is an open source Javascript plotting library. Plotly.py is a Python wrapper over the Plotly javascript plotting library. Plotly express is a high-level wrapper around the Plotly Python library for rapid data exploration and plotting. It uses Pandas dataframes to transfer data to Plotly.

An introduction to plotly express is given in this article. Alternative libraries that wrap the low-level Plotly libraries are discussed on this page.

https://github.com/jonmmease/plotly_ipywidget_notebooks

Lolviz

Visualize data structures

Bokeh

Python visualization library

Plotnine

Grammar of graphics for Python

Dexplot

A plotting library like Seaborn that wraps matplotlib and data frames.

Dimensionality reduction

Dimensionality reduction similar to t-SNE

Uniform Manifold Approximation and Projection

Creating dashboards

The dashboards not only display data but also accept simple user inputs.

Additional Jupyter libraries

[Qgrid][https://github.com/quantopian/qgrid]

Serving machine learning models

Scikit-learn model with ONNX and FastAPI

Porting Flask to FastAPI

Pandas

Extending pandas

Learning pandas

Jupyterlab tips

Miscellaneous

Machine learning operational details

  1. Feature store from Google and GoJEK

  2. Machine learning formats

  3. Managing ML experiements

Generate requirements.txt file

python ansible\convert-ansible-to-requirements.py  > requirements.txt

Choose the correct Python version

To create a Pipfile on the Windows subsystem for Linux and choose the correct Python

pipenv --python $(which python3)
pipenv install --python /usr/bin/python3

Install libraries manually

To install the libraries manually type:

pipenv install numpy
pipenv install scipy
pipenv install sklearn
pipenv install pandas
pipenv install watermark
pipenv install pydot3
pipenv install matplotlib
pipenv install statsmodels
pipenv install seaborn
pipenv install flake8
pipenv install yapf
pipenv install Pillow
pipenv install plotly
pipenv install tornado==5.1.1
pipenv install jupyter
pipenv install jupyter-contrib-nbextensions
pipenv install jupytext
pipenv install jupyterlab
pipenv install nbresuse

Clean Jupyter notebooks

  1. Convert notebook to python file
jupytext --to py 19_jupyter-widgets.ipynb
  1. Run flake8 to check coding conventions
flake8 19_jupyter-widgets.ipynb
  1. Automatically fix code style
autopep8 -i -a 19_jupyter-widgets.ipynb
  1. Setup flake8 in vim
:set makeprg=flake8\ %
:make
:clast
:cprevious
  1. Convert back to notebook
jupytext --to ipynb 19_jupyter-widgets.ipynb

Vifm filtering files

  1. Add a filter to hide files
:filter {*.py}
  1. Toggle filter to only show files ending in py
:filter!
  1. Show the value of the filter
:filter?

AI tools

Articles

Tools

  • mlflow - manage machine learning models - 4761 stars
  • DVC - data version control - 3662 stars
  • Pachyderm - reproducible data science using Docker - 3965 stars
  • streamlit - tools for machine learning - 1389 stars
  • scikit libraries

Python libraries to try

  • logguru - more convenient logging
  • pyupgrade - upgrade Python code from 3.6 to 3.7/3.8/3.9
  • ntfy - cross-platform notifications

Other machine learning libraries

  • Shap - explain the output of machine learning models

PyCaret

PyCaret is a low-code machine learning library that supports multiple machine learning libraries including sklearn on the CPU and other libraries such as XGBoost, LightGBM and Catboost on the GPU. It also supports integration with MLFlow.

PyCaret depends on an older version of scikit-learn (0.23.2) and cannot be used with scikit-learn (1.0.2) as of 2022-01-15.

This happens even when running poetry add pycaret --allow-prereleases

TPOT

TPOT is an automated Python machine learning tool that optimizes machine learning pipelines using genetic programming. Other than sklearn it supports libraries such as XgBoost and PyTorch on the GPU. It does not support lightgbm.

  1. Install pre-requisite libraries
pip install deap update_checker tqdm stopit
  1. Install tpot
pip install tpot

Auto-sklearn

Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for Sklearn estimators. It does not support other libraries such as XGBoost nor LightGBM. This tutorial explains how to use Auto-sklearn.

Lux visualization library

The Lux library allows visual exploration of pandas data frames in a Jupyter notebook.

  1. Install lux library
pip install lux-api
  1. Activate the lux Jupyter lab extension
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget

Hyper-parameter optimization libraries

Feature engineering

Videos

https://www.youtube.com/watch?v=tfWzbhKX294

https://www.youtube.com/watch?v=vsKNxbP8R_8

Feature engineering books

http://www.feat.engineering/

Machine learning videos including feature engineering

https://www.youtube.com/c/HeatonResearch/search

Statsmodels

Videos

  1. Introduction to statsmodels
  2. Introduction to patsy
  3. Patsy design matrices

Setup pre-commit

https://pre-commit.com/

  1. Install pre-commit pip install pre-commit
2. Check version
pre-commit --version
  1. Generate a sample .pre-commit-config.yaml file
pre-commit sample-config
  1. Save and overwrite the existing .pre-commit-config.yaml
pre-commit sample-config > .pre-commit-config.yaml
  1. Run the pre-commit checks manuall
pre-commit run --all-files
  1. Install the pre-commit hooks
pre-commit install
  1. Uninstall the pre-commit hooks
pre-commit uninstall

Prediction intervals

Interpretable machine learning

Ranking

Custom loss functions for gradient boosting

Requirements

The following software is needed to get the software from github and run Vagrant to set up the Python development environment.

About

An Ubuntu VM with Jupyter notebooks and the Scikit and related Python libraries for machine learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published