# Python Virtual Environment(s) and Installed Libraries
Installing and managing various Python libraries used in data analysis and ML.

## Non-Conda Setup
After entering Python virtual env, there is no need to use `pip3` anymore, `pip` works fine.

In the current setup, all Python environments are in the ~/environments folder, not in projects.

If not using sudo and pip3, than libraries are not visible inside Jupyter.

Search for pip packages: [https://pypi.org/](https://pypi.org/)

In [None]:
```
sudo pip3 install xgboost
sudo pip3 install pandas
sudo pip3 install matplotlib
sudo pip3 install seaborn
sudo pip3 install scikit-learn
sudo pip3 install yellowbrick
sudo pip3 install pydotplus
sudo apt install graphviz
sudo pip3 install dtreeviz
sudo pip3 install rfpimp
sudo pip3 install xgbfir
sudo pip3 install xlrd
sudo pip3 install pandas-profiling
sudo pip3 install statsmodels
sudo pip3 install shap
sudo pip3 install phik
```

### LightGBM
1. [Install CMake](https://vitux.com/how-to-install-cmake-on-ubuntu-18-04/) - CMake is an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler-independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice. 

```
# Check the latest version on https://cmake.org/download/
$ wget https://github.com/Kitware/CMake/releases/download/v3.18.3/cmake-3.18.3.tar.gz

$ mkdir cmake
$ mv cmake-3.18.3.tar.gz cmake/cmake-3.18.3.tar.gz
$ cd cmake
$ tar -zxvf cmake-3.18.3.tar.gz
$ cd cmake-3.18.3
$ ./bootstrap
$ make -j12
$ sudo make install
$ cmake --version
```

2. [Install LightGBM](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#linux)

```
$ cd ~/git
$ git clone --recursive https://github.com/microsoft/LightGBM ; cd LightGBM
$ mkdir build ; cd build
$ cmake ..
$ make -j4
```

3. Make it available to Jupyter

```
$ cd .. ; cd python-package
$ sudo python3 setup.py install
```

## TPOT
Consider [TPOT](https://epistasislab.github.io/tpot/) your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

```
sudo pip3 install deap update_checker tqdm stopit
sudo pip3 install dask[delayed] dask[dataframe] dask-ml fsspec>=0.3.3
sudo pip3 install scikit-mdr skrebate
sudo pip3 install tpot
```

## Node.js
```
To be able to install Jupyter Lab extensions
https://www.geeksforgeeks.org/installation-of-node-js-on-linux/

$ sudo apt install nodejs
$ node --version

# Node Package Manager(NPM)
$ sudo apt install npm
$ npm --version

$ sudo apt-get update
$ sudo apt-get upgrade
```

## Missingno
```
https://github.com/ResidentMario/missingno
$ sudo pip3 install missingno
$ sudo pip3 install quilt
$ sudo quilt install ResidentMario/missingno_data
```

## Keras
```
https://keras.io/
https://github.com/hsekia/learning-keras/wiki/How-to-install-Keras-to-Ubuntu-18.04

# With no GPU support
$ sudo pip3 install tensorflow
$ sudo pip3 install keras

```

## Catboost
```
https://catboost.ai/docs/concepts/about.html
https://catboost.ai/docs/installation/python-installation-method-pip-install.html

$ sudo pip3 install catboost

# visualization tools
$ sudo pip3 install ipywidgets 
$ sudo jupyter nbextension enable --py widgetsnbextension

```

## imblearn
```
https://imbalanced-learn.readthedocs.io/en/stable/index.html
https://imbalanced-learn.readthedocs.io/en/stable/install.html

$ sudo pip3 install -U imbalanced-learn

```

## scikit-plot
```
https://scikit-plot.readthedocs.io/en/stable/index.html

$ sudo pip3 install scikit-plot

```

## missingpy
Missingpy is a library for missing data imputation in Python. It has an API consistent with scikit-learn, so users already comfortable with that interface will find themselves in familiar terrain. Currently, the library supports k-Nearest Neighbors based imputation and Random Forest based imputation (MissForest)
```
https://pypi.org/project/missingpy/

$ sudo pip3 install missingpy

```

## plotnine
Plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.
```
https://plotnine.readthedocs.io/en/stable/index.html

$ sudo pip3 install plotnine

```

# Deprecated:

## Conda Cheatsheet
[Conda Cheatseet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf)

## Python virtual environment
There are confirmed problems when using conda and virtual environments together. As for WSL I use conda (and pip sometimes) to manage packages, I do not use virtualenv library here. **Conda keeps environments in a common directory, whereas virtualenv environment can be stored anywhere, usually inside project folder**.

```
# I do not use it in WSL (Windows Ubuntu)
pip install virtualenv

# Create an environment while inside a git repository folder
virtualenv venv

# Activate the environment
source venv/bin/activate

# Register libraries available in the environment
pip freeze > requirements.txt

# Install libraries from a file
pip install -r requirements.txt

# Deactivate the environment
deactivate
```

### In WSL use conda virtual environments:
```
# Create an empty environment while inside a git repository folder (not able to run Jupyter Lab)
conda create --name cenv

# Create an environment while inside a git repository folder (but copy the base environment with Jupyter Lab)
conda create --clone base --name venv

# Activate the environment
conda activate venv

# Register libraries available in the environment
pip freeze > requirements.txt

# Install libraries from a file
pip install -r requirements.txt

# Deactivate the environment
conda deactivate
```

## Required Libraries
For Anaconda, use conda package manager whenever possible to ensure all dependencies are managed properly

```
python -m pip install --upgrade pip
conda install -c conda-forge pip

# Find the packet name for a library name
pip search janitor

pip install --no-deps fastai
conda install -c fastai -c pytorch -c anaconda fastai gh anaconda

pip install umap-learn
conda install -c conda-forge umap-learn

pip install pandas
conda install -c conda-forge pandas

pip install pyjanitor
conda install -c conda-forge pyjanitor

pip install imbalanced-learn
conda install -c conda-forge imbalanced-learn


conda install -c conda-forge rfpimp

conda install -c conda-forge pydotplus

conda install -c anaconda py-xgboost

```

```
auto-sklearn
https://automl.github.io/auto-sklearn/master/installation.html
# For Ubuntu:
sudo apt-get install build-essential swig
# or conda
conda install gxx_linux-64 gcc_linux-64 swig
```

```
Yellowbrick
https://www.scikit-yb.org/en/latest/
conda install -c districtdatalabs yellowbrick
```

```
Pandas Profiling
https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/installation.html
conda install -c conda-forge pandas-profiling
```

```
MLxtend
http://rasbt.github.io/mlxtend/
conda install -c conda-forge mlxtend
```

```
Graphviz - Graph Visualization Software
https://graphviz.org/download/
sudo apt install graphviz
```

```
dtreeviz : Decision Tree Visualization
https://github.com/parrt/dtreeviz
pip install dtreeviz
or
sudo apt install graphviz
```

```
which python
```

In [2]:
import autosklearn
import yellowbrick
import pandas_profiling

In [3]:
for lib in ['autosklearn', 
            'yellowbrick', 
            'pandas_profiling',
            'catboost']:
    try:
        lib_var = vars()[lib]
        print(lib_var.__name__, lib_var.__version__)
    except:
        print("-- Missing", lib)

autosklearn 0.8.0
yellowbrick 1.1
pandas_profiling 2.9.0
-- Missing catboost
