# Bootstrap Your Local Setup

There are multiple methods to setup your environment to take advantage of Intel software optimizations.  If you are interested in running these exercises locally pick the method that appeals to your style of development.  Using one of these methods is required to take advantage of Intel software optimizations. 

* All these install methods assume an environment using Ubuntu 20.04.03 LTS.  This can be on bare metal or running on WSL2 on Windows 10/11.

## Sections
- [Anaconda Setup](#Anaconda-Setup)
- [Intel Distribution of Python](#Intel-Distribution-of-Python)
- [Intel AI Kits](#Intel-AI-Kits)
- [Intel Data Science Workstation Kit](#Intel-Data-Science-Workstation-Kit)
- [In a Hurry, use the Intel DevCloud](#Intel-DevCloud)

## Anaconda Setup

Anaconda and the Intel Distribution of Python are the bare minumim required to install in order to take advantage of Intel optimizations.

* [Download and install the latest version of Anaconda](https://www.anaconda.com/products/distribution#Downloads)

## Intel Distribution of Python

* High-Performance Python:  

    * Take advantage of the most popular and fastest growing programming language with underlying instruction sets optimized for Intel® architectures.
    * Achieve near-native performance through acceleration of core Python numerical and scientific packages that are built using Intel® Performance Libraries.
    * Achieve highly efficient multithreading, vectorization, and memory management, and scale scientific computations efficiently across a cluster.
    * Core packages include Numba, NumPy, SciPy, and more.
    

    
* __Step 1__: Add the Intel Distribution for python Anaconda channel, this will set the Intel packages over default packages, add the Intel channel as follows:

          conda config --add channels intel
       
* __Step 2__: To install the latest full Intel python distribution enter the following:

          conda create -n idp intelpython3_full python=3    
        
* __Step 3__: Activate your virtual environment with  

          conda activate idp  
    
* __Step 4__: Now, you have the virtual environment with intel-python installed.  Now you have to install jupyter-notebook additionally for the new environment with this command (after the activation of the environment with the command above):

 
          conda install -c conda-forge jupyterlab
               
    
    [The latest release notes and links to forums for the Intel Distribution for Python can be found here](https://www.intel.com/content/www/us/en/developer/articles/release-notes/distribution-for-python-release-notes.html)

## Intel AI Kits

## Use APT Package Manager

If you want to use your distributions package manager to keep all of the Intel optimized software up to date, the following method could be of interest.  This provides a bit more software but has the added convenience of just automatically updating along with the rest of your system.

## Add the Intel repositories to your system keyring

__Step 1:__

#### Download the key to system keyring
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

#### Add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list.0

__Step 2:__  

        sudo apt update
        sudo apt upgrade
        
* After the upgrade the Intel oneAPI Base Toolkit as well as all the [toolkits](https://www.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/apt.html) will be available for install.

__Step 3:__

        sudo apt install oneapi-basekit
        
__Step 4:__

* In order to activate the environment execute the following command

        source /opt/intel/oneapi/setvars.sh
        
* This will result in the following environment being installed.  Add it to your .bashrc or execute it manually.

        base                  *  /opt/intel/oneapi/intelpython/latest
        2022.1.0                 /opt/intel/oneapi/intelpython/latest/envs/2022.1.0

__Step 5:__

* Change ownership of ~/.conda to your user instead of root.  This will enable you to add features without having to use sudo.

        sudo chown $USER:$USER -R .conda

* We will need to add jupyter and any other features you might want.

        create a new environment conda create -–clone base –-name jupyter
        
        conda env list
        
        jupyter                  /home/user/.conda/envs/jupyter
        base                  *  /opt/intel/oneapi/intelpython/latest
        2022.1.0                 /opt/intel/oneapi/intelpython/latest/envs/2022.1.0
        
    
__Step 6:__

        conda activate jupyter
        
* If running on a local install of Ubuntu enter the below in the terminal:

        jupyter lab
        
* If running on wsl2 

        jupyter lab --no-browser
        
* The contents of the oneAPI Base Toolkit are described below:

## Intel® oneAPI Base Toolkit

* Heterogeneous Development across CPUs, GPUs, and FPGAs

* The Intel® oneAPI Base Toolkit (Base Kit) is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures. It features an industry-leading C++ compiler that implements SYCL*, an evolution of C++ for heterogeneous computing.

* Domain-specific libraries and the Intel® Distribution for Python* provide drop-in acceleration across relevant architectures. Enhanced profiling, design assistance, and debug tools complete the kit.

* The contents of the kit are as follows:

* Intel oneAPI Base Toolkit
* General Compute

    * Intel® oneAPI Collective Communications Library
    * Intel® oneAPI Data Analytics Library
    * Intel® oneAPI Deep Neural Networks Library
    * Intel® oneAPI DPC++/C++ Compiler
    * Intel® oneAPI DPC++ Library
    * Intel® oneAPI Math Kernel Library
    * Intel® oneAPI Threading Building Blocks
    * Intel® oneAPI Video Processing Library
    * Intel® Advisor
    * Intel® Distribution for GDB*
    * Intel® Distribution for Python*
    * Intel® DPC++ Compatibility Tool
    * Intel® FPGA Add-on for oneAPI Base Toolkit
    * Intel® Integrated Performance Primitives
    * Intel® VTune™ Profiler

### If you are doing strictly machine learning you are done, but if you want to add any frameworks this can easily be done by adding the AI Analytics Toolkit.

__Step 1:__

        sudo apt install intel-aikit
        
* The contents of the AI-Kit are described below.  A number of additional conda environments are created after installation.

## Intel® AI Analytics Toolkit

The following additional packages will be installed:

  * intel-aikit-getting-started intel-oneapi-model-zoo intel-oneapi-modin intel-oneapi-neural-compressor intel-oneapi-pytorch intel-oneapi-tensorflow (1341 MB)
  
   * End-to-End AI and Machine Learning Acceleration

        * Intel® Distribution for Python* including highly-optimized scikit-learn and XGBoost libraries
        * Intel® Optimization for PyTorch*
        * Intel® Optimization for TensorFlow*
        * Intel® Optimization of Modin* (available through Anaconda* only)
        * Intel® Neural Compressor
        * Model Zoo for Intel® architecture

## Intel Data Science Workstation Kit
``` user@Precision-5820-1:~$ conda env list
# conda environments:
#
jupyter                  /home/user/.conda/envs/jupyter
base                  *  /opt/intel/oneapi/intelpython/latest
2022.1.0                 /opt/intel/oneapi/intelpython/latest/envs/2022.1.0
modin                    /opt/intel/oneapi/intelpython/latest/envs/modin
modin-0.13.3             /opt/intel/oneapi/intelpython/latest/envs/modin-0.13.3
pytorch                  /opt/intel/oneapi/intelpython/latest/envs/pytorch
pytorch-1.10.0           /opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0
tensorflow               /opt/intel/oneapi/intelpython/latest/envs/tensorflow
tensorflow-2.8.0         /opt/intel/oneapi/intelpython/latest/envs/tensorflow-2.8.0
                         /opt/intel/oneapi/modin/latest
                         /opt/intel/oneapi/pytorch/latest
                         /opt/intel/oneapi/tensorflow/latest 
```

# Intel Data Science Workstation Kit

This package installs a great deal of software all at once.  The method above allows one to have more control over their environments and will alway be the latest version.  However, if you want all the components listed below which have had some validation to help ensure smooth functionality amongst all the versions this is a reasonable choice.  The tradeoff is this install method will lag behind the latest releases to the Intel conda channel.

* The Data Science Workstation lineup from [Intel and Intel's OEM partners](https://www.intel.com/content/www/us/en/products/systems-devices/workstations/data-science-workstations.html) provides data scientists, data analysts, and developers productive and cost-effective AI development solutions to quickly generate insights for their organizations.

* Open, optimized software tools are coupled with optimal compute and memory hardware configurations to deliver the best out-of-the-box developer experience, whether you are prototyping or developing production AI.
* High-memory systems can fit large datasets for efficient preprocessing, considerably shortening the time required to sort, filter, label, and transform your data.
* Familiar Python*‡ APIs deliver software accelerations of up to 10x to 100x for training and inference.
* Together, the hardware and software bundle inside the workstations enables data scientists to easily iterate and analyze data at scale.

* Central to the optimized AI software stack of the Data Science Workstation† is the Intel oneAPI AI Analytics Toolkit (AI Kit) that accelerates end-to-end data science and machine-learning pipelines using Python-based tools and frameworks. The components of the toolkit are open and standards-based, while offering both drop-in acceleration with almost no code changes and seamless scaling to multiple nodes and architectures.



### Components of This Toolkit:
* PyTorch*: The Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) is included in PyTorch as the default math kernel library for deep learning. See this article on the Intel® Developer Zone for more details.
* Intel® Optimization for TensorFlow*: This version integrates primitives from the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) into the TensorFlow runtime for accelerated performance.
* Intel® Distribution for Python*: Get faster Python application performance right out of the box, with minimal or no changes to your code. This distribution is integrated with Intel® Performance Libraries such as the Intel® oneAPI Math Kernel Library and the Intel®oneAPI Data Analytics Library. The distribution also includes daal4py, a Python module integrated with the Intel® oneAPI Data Analytics Library as well as the Python Data Parallel Processing Library (PyDPPL), a light weight Python wrapper for Data Parallel C++ and SYCL that provides a data parallel interface and abstractions to efficiently tap into device management features of CPUs and GPUs running on Intel® Architecture.
* Intel® Distribution of Modin*, which enables you to seamlessly scale preprocessing across multi nodes using this intelligent, distributed dataframe library with an identical API to pandas. This distribution is only available by Installing the Intel® AI Analytics Toolkit with the Conda* Package Manager.
* Low Precision Optimization Tool: Provide a unified, low-precision inference interface across multiple deep learning frameworks optimized by Intel with this open-source Python library.
* Intel® Extension for Scikit-learn*: a seamless way to speed up your Scikit-learn application using the Intel® oneAPI Data Analytics Library (oneDAL). Patching scikit-learn accelerates stock scikit by single line change. 

# Installation

This instance was built using Ubuntu 20.04 LTS.  If you are on Windows 10/11, jump to addendum to setup the environment and come back here to Step 1.

__Step 1:__ Update your system:

Launch a terminal (CTRL+ALT+T) and enter

        sudo apt update && sudo apt upgrade -y
        
__Step 2:__ After the update a reboot will be required, enter:

        sudo reboot

__Step 3:__ Launch a terminal (CTRL+ALT+T) and enter.  __Note__: The download is approximately 1.8G, a stable network connection is desired.

        wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18273/Intel-AIkit-2021.4.1-Linux-x86_64.sh
        
   [If the download fails check for the correct link here.](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html#gs.1bs6zw)

        
__Step 4:__ Change the permissions of the resulting download so that it is an executable.

        chmod 755 Intel-AIkit-2021.4.1-Linux-x86_64.sh

__Step 5:__ Install the Intel AIkit

        ./Intel-AIkit-2021.4.1-Linux-x86_64.sh

        Follow the steps and make sure to choose yes to the final question “Do you wish the installer to initialize the Intel-AIkit”

__Step 6:__ Close the terminal and launch a new one.  This will result in the base conda environment being activated.

* Note*
    * This collection of packages has been validated against each other for functionality.  At this point this is a stable environment and should be treated as the base package environment.  Any modifications to this collection should be done in a cloned environment so that changes can be isolated and you can easily get back to a known good.  





## Add Jupyter to the Data Science Workstation Installation

### In this section we will:

* Create a new environment and add some additional packages.
* Clone the Intel samples repository.
* Run a sample project using a Jupyter Notebook.


__Step 1:__ Create New Environment

* Launch a terminal 
        
        (CTRL+ALT+T) and enter
        conda create –-clone base –-name <pick something> for example:
        conda create -–clone base –-name jupyter

__Step 2:__

        Conda env list

* You should see two environments now.  The * denotes the active environment.  
* Activate the new environment:

        Conda activate jupyter

* If you check the environments the asterisks should be next to the new environment name and the prompt should indicate the environment name as well.
        conda env list

__Step 3:__ Start Jupyter Lab

From the terminal enter:

        jupyter lab

* The browser should auto launch. Then on the panel on the left navigate to where you downloaded the notebooks.

## Addendum:  WSL2 on Windows 11:

__Step 1:__

* Tap the Windows Key and enter update, windows update will be the first match.  Go through that process and reboot.

__Step 2:__

* Launch windows terminal with administrative privileges.  Tap the Windows key and enter terminal.  The down carrot on the right will expose the admin privilege option. 


__Step 3:__

* At the prompt enter: wsl – install (two dashes) This will automatically install Ubuntu 20.04

__Step 4:__

* Restart  -- The install actually takes place here and after the reboot, this takes a couple of minutes.

__Step 5:__

* Ubuntu will launch automatically, follow the steps to create a new user.  It’s a good idea to have the identical username for Windows as you do Ubuntu.

__Step 6:__

* Jump back up to [Intel Data Science Workstation Kit](#What-is-Data-Parallel-C++) and follow along, even the update steps. 
    * Note* a reboot in WSL2 is exiting the WSL2 session.  To relaunch WSL2 tap the Windows key, enter terminal and then click on the down carrot and choose Ubuntu.  This is a great way to have multiple terminal sessions in a tabbed environment.

# Intel DevCloud

If you are in a hurry and just want to get going immediately with zero software install required [Sign up for the Intel DevCloud](https://devcloud.intel.com/oneapi/get_started/).  It takes just a few minutes and all the software and more are installed and preconfigured ready to run.  It's free and all you will need for access is a browser.  After you have your account execute the below cell to populate the datasets and notebooks.

In [None]:
! rsync -avhP /data/oneapi_workshops/AI_Kit_DT_XGBoost ~/AI_Kit_DT_XGBoost