Scikit-vm

Source code - Github
Author - Gavin Noronha - gavinln@hotmail.com

About

This project provides a Ubuntu (18.04) Vagrant Virtual Machine (VM) with numerical and scientific libraries for Python. It includes the following libraries.

There are Ansible scripts that automatically install the software when the VM is started.

Running

Change to the root of the project

cd scikit-vm

To start the virtual machine(VM) type

vagrant up

Connect to the VM

vagrant ssh

Jupyter notebook

Install Jupyter notebook extensions

jupyter contrib nbextension install --user

Go to the Edit menu nbextensions config option to setup plugins
Some useful plugins

Code prettify
Collapsible Headings
Comment/Uncomment Hotkey
ExecuteTime
Select CodeMirror Keymap
Table of Contents (2)

Start the Jupyter notebook or Jupyterlab environment

/vagrant/scripts/notebook-jupyter.sh

Open the notebook in the browser at the URL

Jupyterlab extensions

Jupyterlab 3.x supports installing extensions directory using pip without building the extensions using node.

The following extensions are useful

pipenv install jupyterlab-git
pipenv install jupyterlab_vim
pipenv install jupyterlab_lsp
pipenv install aquirdturtle_collapsible_headings  # less useful

Jupyterlab extensions old

Enable the server extension

jupyter labextension install @jupyterlab/shortcutui
jupyter labextension install @jupyter-widgets/Jupyterlab-manager
jupyter labextension install @Jupyterlab/shortcutui
jupyter labextension install @Jupyterlab/toc
jupyter labextension install jupyterlab-jupytext
jupyter labextension install jupyterlab_vim
jupyter labextension install @krassowski/jupyterlab_go_to_definition
jupyter labextension install @aquirdturtle/collapsible_headings

# pip install nbresuse  # displays memory usage on the bottom status bar

# pip install jupyterlab_code_formatter
jupyter labextension install @ryantam626/jupyterlab_code_formatter
# jupyter serverextension enable --py jupyterlab_code_formatter

Setup the keyboard shortcut in the "Advanced Settings Editor"

{
    "shortcuts": [{
            "command": "jupyterlab_code_formatter:yapf",
            "keys": ["Ctrl Shift G"],
            "selector": ".jp-Notebook.jp-mod-editMode"
        }
    ]
}

Install the Jupyter language server
Start the Jupyter lab interface ./scripts/lab-jupyter.sh
Install these Jupyter extensions

@jupyter-widgets/Jupyterlab-manager
@Jupyterlab/shortcutui
@Jupyterlab/toc
jupyterlab-jupytext
jupyterlab_vim
@ryantam626/jupyterlab_code_formatter

nbresuse

Other extensions

From the PyData presentation

jupyterlab-git
nbdime
jupyterlab_code_formatter
jupyterlab-toc
jupyterlab-quickopen
jupyterlab-sidecar
jupyterlab-drawio
jupyterlab-topbar
jupyterlab-sql
jupyterlab-celltags
jupyterlab-go-to-definition
jupyterlab-lsp
voila
jupyterlab-matplotlib
jupyterlab-variableinspector
jupyterlab-templates

Scikit-learn notebooks

Scipy 2018

The SciPy 2018 conference has two tutorials on using the Scikit-learn library. There are two videos: Video 1 and Video 2

The notebooks are in a Github project called scipy-2018-sklearn

To get the notebooks run the following

Change to the notebooks directory

cd /vagrant/notebooks

Get the notebooks into the directory scipy2018

svn export https://github.com/amueller/scipy-2018-sklearn/trunk/notebooks scipy2018

In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2018 directory.

Scipy 2017

To get the Scikit learn notebooks from Scipy 2017. The video for this conference is on Youtube

Change to the notebooks directory

cd /vagrant/notebooks

Get the notebooks into the directory scipy2017

svn export https://github.com/amueller/scipy-2017-sklearn/trunk/notebooks scipy2017

In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2017 directory.

Scipy 2016

To get the Scikit learn notebooks from Scipy 2016. The video for this conference is on Youtube

Change to the notebooks directory

cd /vagrant/notebooks

Get the notebooks into the directory scipy2016

svn export http://github.com/amueller/scipy-2016-sklearn/trunk/notebooks scipy2016

In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2016 directory.

Text tutorial

Change to the text notebooks directory

cd /vagrant/notebooks/text_processing

Get the text examples code and data

svn export https://github.com/scikit-learn/scikit-learn/trunk/doc/tutorial/text_analytics/skeletons skeletons
svn export https://github.com/scikit-learn/scikit-learn/trunk/doc/tutorial/text_analytics/data data

Plotting

Matplotlib

Matplotlib

Seaborn

Seaborn

Plotly

Plotly.js is an open source Javascript plotting library. Plotly.py is a Python wrapper over the Plotly javascript plotting library. Plotly express is a high-level wrapper around the Plotly Python library for rapid data exploration and plotting. It uses Pandas dataframes to transfer data to Plotly.

An introduction to plotly express is given in this article. Alternative libraries that wrap the low-level Plotly libraries are discussed on this page.

https://github.com/jonmmease/plotly_ipywidget_notebooks

Dexplot

A plotting library like Seaborn that wraps matplotlib and data frames.

Dimensionality reduction

Dimensionality reduction similar to t-SNE

Uniform Manifold Approximation and Projection

Creating dashboards

The dashboards not only display data but also accept simple user inputs.

Additional Jupyter libraries

[Qgrid][https://github.com/quantopian/qgrid]

Serving machine learning models

Scikit-learn model with ONNX and FastAPI

Porting Flask to FastAPI

Pandas

Extending pandas

Cyberpandas - supports IP addresses
GeoPandas - supports geographic data

Learning pandas

Jupyterlab tips

Miscellaneous

Machine learning operational details

Feature store from Google and GoJEK
Machine learning formats
Managing ML experiements

Generate requirements.txt file

python ansible\convert-ansible-to-requirements.py  > requirements.txt

Choose the correct Python version

To create a Pipfile on the Windows subsystem for Linux and choose the correct Python

pipenv --python $(which python3)
pipenv install --python /usr/bin/python3

Install libraries manually

To install the libraries manually type:

pipenv install numpy
pipenv install scipy
pipenv install sklearn
pipenv install pandas
pipenv install watermark
pipenv install pydot3
pipenv install matplotlib
pipenv install statsmodels
pipenv install seaborn
pipenv install flake8
pipenv install yapf
pipenv install Pillow
pipenv install plotly
pipenv install tornado==5.1.1
pipenv install jupyter
pipenv install jupyter-contrib-nbextensions
pipenv install jupytext
pipenv install jupyterlab
pipenv install nbresuse

Clean Jupyter notebooks

Convert notebook to python file

jupytext --to py 19_jupyter-widgets.ipynb

Run flake8 to check coding conventions

flake8 19_jupyter-widgets.ipynb

Automatically fix code style

autopep8 -i -a 19_jupyter-widgets.ipynb

Setup flake8 in vim

:set makeprg=flake8\ %
:make
:clast
:cprevious

Convert back to notebook

jupytext --to ipynb 19_jupyter-widgets.ipynb

Vifm filtering files

Add a filter to hide files

:filter {*.py}

Toggle filter to only show files ending in py

:filter!

Show the value of the filter

:filter?

AI tools

Articles

AI developer stack

Tools

mlflow - manage machine learning models - 4761 stars
DVC - data version control - 3662 stars
Pachyderm - reproducible data science using Docker - 3965 stars
streamlit - tools for machine learning - 1389 stars
scikit libraries

Python libraries to try

logguru - more convenient logging
pyupgrade - upgrade Python code from 3.6 to 3.7/3.8/3.9
ntfy - cross-platform notifications

Other machine learning libraries

Feature engine - feature transformers for scikit-learn

Shap - explain the output of machine learning models

Shap video - background behind approach

yellowbrick - visualize machine learning models

PyCaret

PyCaret is a low-code machine learning library that supports multiple machine learning libraries including sklearn on the CPU and other libraries such as XGBoost, LightGBM and Catboost on the GPU. It also supports integration with MLFlow.

PyCaret depends on an older version of scikit-learn (0.23.2) and cannot be used with scikit-learn (1.0.2) as of 2022-01-15.

This happens even when running poetry add pycaret --allow-prereleases

TPOT

TPOT is an automated Python machine learning tool that optimizes machine learning pipelines using genetic programming. Other than sklearn it supports libraries such as XgBoost and PyTorch on the GPU. It does not support lightgbm.

Install pre-requisite libraries

pip install deap update_checker tqdm stopit

Install tpot

pip install tpot

Auto-sklearn

Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for Sklearn estimators. It does not support other libraries such as XGBoost nor LightGBM. This tutorial explains how to use Auto-sklearn.

Lux visualization library

The Lux library allows visual exploration of pandas data frames in a Jupyter notebook.

Install lux library

pip install lux-api

Activate the lux Jupyter lab extension

jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget

Hyper-parameter optimization libraries

Feature engineering

Videos

https://www.youtube.com/watch?v=tfWzbhKX294

https://www.youtube.com/watch?v=vsKNxbP8R_8

Feature engineering books

http://www.feat.engineering/

Machine learning videos including feature engineering

https://www.youtube.com/c/HeatonResearch/search

Statsmodels

Videos

Setup pre-commit

https://pre-commit.com/

Install pre-commit pip install pre-commit

2. Check version
pre-commit --version

Generate a sample .pre-commit-config.yaml file

pre-commit sample-config

Save and overwrite the existing .pre-commit-config.yaml

pre-commit sample-config > .pre-commit-config.yaml

Run the pre-commit checks manuall

pre-commit run --all-files

Install the pre-commit hooks

pre-commit install

Uninstall the pre-commit hooks

pre-commit uninstall

Prediction intervals

Interpretable machine learning

https://christophm.github.io/interpretable-ml-book/counterfactual.html

Ranking

Custom loss functions for gradient boosting

Requirements

The following software is needed to get the software from github and run Vagrant to set up the Python development environment.

Oracle VM VirtualBox
Vagrant version 1.9.1 or higher

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
ansible		ansible
doc		doc
notebooks		notebooks
scripts		scripts
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Vagrantfile		Vagrantfile
algorithms_flash_cards.md		algorithms_flash_cards.md
pyproject.toml		pyproject.toml
python_flash_cards.md		python_flash_cards.md
scikit_flash_cards.md		scikit_flash_cards.md
tmux.yaml		tmux.yaml
win-linux.bat		win-linux.bat

License

gavinln/scikit-vm

Folders and files

Latest commit

History

Repository files navigation

Scikit-vm

About

Running

Jupyter notebook

Jupyterlab extensions

Jupyterlab extensions old

Other extensions

Scikit-learn notebooks

Scipy 2018

Scipy 2017

Scipy 2016

Text tutorial

Plotting

Matplotlib

Seaborn

Plotly

Lolviz

Bokeh

Plotnine

Dexplot

Dimensionality reduction

Creating dashboards

Additional Jupyter libraries

Serving machine learning models

Pandas

Extending pandas

Learning pandas

Jupyterlab tips

Miscellaneous

Machine learning operational details

Generate requirements.txt file

Choose the correct Python version

Install libraries manually

Clean Jupyter notebooks

Vifm filtering files

AI tools

Python libraries to try

Other machine learning libraries

PyCaret

TPOT

Auto-sklearn

Lux visualization library

Hyper-parameter optimization libraries

Feature engineering

Videos

Feature engineering books

Machine learning videos including feature engineering

Statsmodels

Setup pre-commit

Prediction intervals

Interpretable machine learning

Ranking

Custom loss functions for gradient boosting

Requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages