- Source code - Github
- Author - Gavin Noronha - gavinln@hotmail.com
This project provides a Ubuntu (18.04) Vagrant Virtual Machine (VM) with numerical and scientific libraries for Python. It includes the following libraries.
There are Ansible scripts that automatically install the software when the VM is started.
- Change to the root of the project
cd scikit-vm
- To start the virtual machine(VM) type
vagrant up
- Connect to the VM
vagrant ssh
- Install Jupyter notebook extensions
jupyter contrib nbextension install --user
-
Go to the Edit menu nbextensions config option to setup plugins
-
Some useful plugins
- Code prettify
- Collapsible Headings
- Comment/Uncomment Hotkey
- ExecuteTime
- Select CodeMirror Keymap
- Table of Contents (2)
- Start the Jupyter notebook or Jupyterlab environment
/vagrant/scripts/notebook-jupyter.sh
- Open the notebook in the browser at the URL
Jupyterlab 3.x supports installing extensions directory using pip without building the extensions using node.
The following extensions are useful
pipenv install jupyterlab-git
pipenv install jupyterlab_vim
pipenv install jupyterlab_lsp
pipenv install aquirdturtle_collapsible_headings # less useful
- Enable the server extension
jupyter labextension install @jupyterlab/shortcutui
jupyter labextension install @jupyter-widgets/Jupyterlab-manager
jupyter labextension install @Jupyterlab/shortcutui
jupyter labextension install @Jupyterlab/toc
jupyter labextension install jupyterlab-jupytext
jupyter labextension install jupyterlab_vim
jupyter labextension install @krassowski/jupyterlab_go_to_definition
jupyter labextension install @aquirdturtle/collapsible_headings
# pip install nbresuse # displays memory usage on the bottom status bar
# pip install jupyterlab_code_formatter
jupyter labextension install @ryantam626/jupyterlab_code_formatter
# jupyter serverextension enable --py jupyterlab_code_formatter
- Setup the keyboard shortcut in the "Advanced Settings Editor"
{
"shortcuts": [{
"command": "jupyterlab_code_formatter:yapf",
"keys": ["Ctrl Shift G"],
"selector": ".jp-Notebook.jp-mod-editMode"
}
]
}
-
Install the Jupyter language server
-
Start the Jupyter lab interface ./scripts/lab-jupyter.sh
-
Install these Jupyter extensions
- @jupyter-widgets/Jupyterlab-manager
- @Jupyterlab/shortcutui
- @Jupyterlab/toc
- jupyterlab-jupytext
- jupyterlab_vim
- @ryantam626/jupyterlab_code_formatter
From the PyData presentation
- jupyterlab-git
- nbdime
- jupyterlab_code_formatter
- jupyterlab-toc
- jupyterlab-quickopen
- jupyterlab-sidecar
- jupyterlab-drawio
- jupyterlab-topbar
- jupyterlab-sql
- jupyterlab-celltags
- jupyterlab-go-to-definition
- jupyterlab-lsp
- voila
- jupyterlab-matplotlib
- jupyterlab-variableinspector
- jupyterlab-templates
The SciPy 2018 conference has two tutorials on using the Scikit-learn library. There are two videos: Video 1 and Video 2
The notebooks are in a Github project called scipy-2018-sklearn
To get the notebooks run the following
- Change to the notebooks directory
cd /vagrant/notebooks
- Get the notebooks into the directory scipy2018
svn export https://github.com/amueller/scipy-2018-sklearn/trunk/notebooks scipy2018
- In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2018 directory.
To get the Scikit learn notebooks from Scipy 2017. The video for this conference is on Youtube
- Change to the notebooks directory
cd /vagrant/notebooks
- Get the notebooks into the directory scipy2017
svn export https://github.com/amueller/scipy-2017-sklearn/trunk/notebooks scipy2017
- In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2017 directory.
To get the Scikit learn notebooks from Scipy 2016. The video for this conference is on Youtube
- Change to the notebooks directory
cd /vagrant/notebooks
- Get the notebooks into the directory scipy2016
svn export http://github.com/amueller/scipy-2016-sklearn/trunk/notebooks scipy2016
- In your Jupyter notebook list at http://192.168.33.10:8888/ the notebooks will be in the scipy2016 directory.
- Change to the text notebooks directory
cd /vagrant/notebooks/text_processing
- Get the text examples code and data
svn export https://github.com/scikit-learn/scikit-learn/trunk/doc/tutorial/text_analytics/skeletons skeletons
svn export https://github.com/scikit-learn/scikit-learn/trunk/doc/tutorial/text_analytics/data data
Plotly.js is an open source Javascript plotting library. Plotly.py is a Python wrapper over the Plotly javascript plotting library. Plotly express is a high-level wrapper around the Plotly Python library for rapid data exploration and plotting. It uses Pandas dataframes to transfer data to Plotly.
An introduction to plotly express is given in this article. Alternative libraries that wrap the low-level Plotly libraries are discussed on this page.
https://github.com/jonmmease/plotly_ipywidget_notebooks
Visualize data structures
Python visualization library
Grammar of graphics for Python
A plotting library like Seaborn that wraps matplotlib and data frames.
Dimensionality reduction similar to t-SNE
Uniform Manifold Approximation and Projection
The dashboards not only display data but also accept simple user inputs.
[Qgrid][https://github.com/quantopian/qgrid]
Scikit-learn model with ONNX and FastAPI
- Cyberpandas - supports IP addresses
- GeoPandas - supports geographic data
- Working efficienctly with Jupyterlab
- Upgrading to Jupyterlab
- Jupyterlab libraries & resources
- Convert notebook to Confluence
- Python library: Markdown to HTML
- Best practices with notebooks
-
Feature store from Google and GoJEK
-
Machine learning formats
-
Managing ML experiements
python ansible\convert-ansible-to-requirements.py > requirements.txt
To create a Pipfile on the Windows subsystem for Linux and choose the correct Python
pipenv --python $(which python3)
pipenv install --python /usr/bin/python3
To install the libraries manually type:
pipenv install numpy
pipenv install scipy
pipenv install sklearn
pipenv install pandas
pipenv install watermark
pipenv install pydot3
pipenv install matplotlib
pipenv install statsmodels
pipenv install seaborn
pipenv install flake8
pipenv install yapf
pipenv install Pillow
pipenv install plotly
pipenv install tornado==5.1.1
pipenv install jupyter
pipenv install jupyter-contrib-nbextensions
pipenv install jupytext
pipenv install jupyterlab
pipenv install nbresuse
- Convert notebook to python file
jupytext --to py 19_jupyter-widgets.ipynb
- Run flake8 to check coding conventions
flake8 19_jupyter-widgets.ipynb
- Automatically fix code style
autopep8 -i -a 19_jupyter-widgets.ipynb
- Setup flake8 in vim
:set makeprg=flake8\ %
:make
:clast
:cprevious
- Convert back to notebook
jupytext --to ipynb 19_jupyter-widgets.ipynb
- Add a filter to hide files
:filter {*.py}
- Toggle filter to only show files ending in py
:filter!
- Show the value of the filter
:filter?
Articles
Tools
- mlflow - manage machine learning models - 4761 stars
- DVC - data version control - 3662 stars
- Pachyderm - reproducible data science using Docker - 3965 stars
- streamlit - tools for machine learning - 1389 stars
- scikit libraries
- logguru - more convenient logging
- pyupgrade - upgrade Python code from 3.6 to 3.7/3.8/3.9
- ntfy - cross-platform notifications
- Feature engine - feature transformers for scikit-learn
- Shap - explain the output of machine learning models
- Shap video - background behind approach
- yellowbrick - visualize machine learning models
PyCaret is a low-code machine learning library that supports multiple machine learning libraries including sklearn on the CPU and other libraries such as XGBoost, LightGBM and Catboost on the GPU. It also supports integration with MLFlow.
PyCaret depends on an older version of scikit-learn (0.23.2) and cannot be used with scikit-learn (1.0.2) as of 2022-01-15.
This happens even when running poetry add pycaret --allow-prereleases
TPOT is an automated Python machine learning tool that optimizes machine learning pipelines using genetic programming. Other than sklearn it supports libraries such as XgBoost and PyTorch on the GPU. It does not support lightgbm.
- Install pre-requisite libraries
pip install deap update_checker tqdm stopit
- Install tpot
pip install tpot
Auto-sklearn is an automated machine learning toolkit and a drop-in replacement for Sklearn estimators. It does not support other libraries such as XGBoost nor LightGBM. This tutorial explains how to use Auto-sklearn.
The Lux library allows visual exploration of pandas data frames in a Jupyter notebook.
- Install lux library
pip install lux-api
- Activate the lux Jupyter lab extension
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget
- https://github.com/optuna/optuna
- https://github.com/scikit-optimize/scikit-optimize
- https://github.com/fmfn/BayesianOptimization
- https://github.com/hyperopt/hyperopt
https://www.youtube.com/watch?v=tfWzbhKX294
https://www.youtube.com/watch?v=vsKNxbP8R_8
https://www.youtube.com/c/HeatonResearch/search
Videos
- Install pre-commit pip install pre-commit
2. Check version
pre-commit --version
- Generate a sample .pre-commit-config.yaml file
pre-commit sample-config
- Save and overwrite the existing .pre-commit-config.yaml
pre-commit sample-config > .pre-commit-config.yaml
- Run the pre-commit checks manuall
pre-commit run --all-files
- Install the pre-commit hooks
pre-commit install
- Uninstall the pre-commit hooks
pre-commit uninstall
-
https://scikit-learn.org/stable/auto_examples/linear_model/plot_quantile_regression.html
-
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_quantile.html
-
https://machinelearningmastery.com/prediction-intervals-for-machine-learning/
-
http://www.xavierdupre.fr/app/mlinsights/helpsphinx/index.html
- http://www.legendu.net/misc/blog/learning-to-rank/
- https://everdark.github.io/k9/notebooks/ml/learning_to_rank/learning_to_rank.html
- https://github.com/microsoft/LightGBM/tree/master/examples/lambdarank
- https://stackoverflow.com/questions/64294962/how-to-implement-learning-to-rank-using-lightgbm
- https://towardsdatascience.com/custom-loss-functions-for-gradient-boosting-f79c1b40466d
- https://hippocampus-garden.com/lgbm_custom/
- https://maxhalford.github.io/blog/lightgbm-focal-loss/
- https://amaarora.github.io/2020/06/29/FocalLoss.html
The following software is needed to get the software from github and run Vagrant to set up the Python development environment.
- Oracle VM VirtualBox
- Vagrant version 1.9.1 or higher