# Unix - Conda - Pip

We run commands in CLI, e.g., Terminal, but here we are using Jupyter Notebook to demonstrate:

1. Output from commands
2. That when experiementing or developing programs in the notebook, you can avoid switching tools.
3. That you can install pacakges in the beginning of the notebook (e.g., Sagemaker) so it's clear what packages are used.

## Unix commands

### Basics

For some of the basic commands, you can run use the cell to run it directly:

In [1]:
pwd # print working directory

'/Users/sbezawada/Documents/Workspace/MLE-COURSE/4brainmle/mleassign/mle-assign1/assignments/week-1-mle-basictools/nb'

You always want to know "where you are" and how to look for help. Use `man` to display the user manual of any command that we can run on the terminal. 

In [2]:
man pwd # press ESC to exit

Further, cell magic `%%bash` turns a cell into a bash script where you can run multiple lines (See more cell/line magics [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html)). 

In [3]:
%%bash
pwd
ls -lah # WE DO: What are those flags for?

/Users/sbezawada/Documents/Workspace/MLE-COURSE/4brainmle/mleassign/mle-assign1/assignments/week-1-mle-basictools/nb
total 120
drwxr-xr-x  4 sbezawada  staff   128B Aug 13 13:34 .
drwxr-xr-x  8 sbezawada  staff   256B Aug 11 22:05 ..
drwxr-xr-x  3 sbezawada  staff    96B Aug 11 22:23 .ipynb_checkpoints
-rw-r--r--  1 sbezawada  staff    56K Aug 13 13:34 unix-conda-pip.ipynb


### Download a data file

Let's download some data from [Census Income Data Set](https://archive.ics.uci.edu/ml/datasets/Census+Income); but first we would like to make a new directory to store the data at the same level as `nb` directory.

In [5]:
%%bash
cd ..
#mkdir dat
# WE DO: make a new directory called `dat` under the project root directory


In [8]:
brew install tree

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/sbezawada/opt/anaconda3/envs/mle-course

  added / updated specs:
    - conda-tree


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2022.6.15  |       h033912b_0         149 KB  conda-forge
    certifi-2022.6.15          |   py38h50d1736_0         155 KB  conda-forge
    colorama-0.4.5             |     pyhd8ed1ab_0          18 KB  conda-forge
    conda-4.13.0               |   py38h50d1736_1         989 KB  conda-forge
    conda-package-handling-1.8.1|   py38hed1de0f_1         1.7 MB  conda-forge
    conda-tree-1.0.5           |     pyhd8ed1ab_0          10 KB  conda-forge
    networkx-2.8.5             |     pyhd8ed1ab_0         1.5 MB  conda-forge
    openssl-1.1.1q             |       hfe4f2af_0         1.9 MB  conda-forge
    pycosa

Let's confirm that we just created an empty directory `dat` using command `tree`. Not all command can be run in the cell directly, adding ! in front of command can do the trick:

In [6]:
!tree .. -L 2

[01;34m..[0m
├── [00mLICENSE[0m
├── [00mREADME.md[0m
├── [01;34mdat[0m
│   └── [00madult.csv[0m
├── [01;34mmd[0m
│   └── [00mgit-more.md[0m
├── [01;34mnb[0m
│   └── [00munix-conda-pip.ipynb[0m
└── [01;34mpandas-sklearn-basics[0m
    └── [00mpandas-sklearn-basics.ipynb[0m

4 directories, 6 files


In [8]:
!brew install wget

[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libunistring/manifests/1.0[0m
######################################################################## 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libunistring/blobs/sha256:18a16[0m
[34m==>[0m [1mDownloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sh[0m
######################################################################## 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libidn2/manifests/2.3.3[0m
######################################################################## 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libidn2/blobs/sha256:1ed7a729a0[0m
[34m==>[0m [1mDownloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sh[0m
######################################################################## 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/wget/manifests/1.21.3[0m
#################

In [7]:
# download a csv file
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data -O ../dat/adult.csv

--2022-08-13 13:30:52--  https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3974305 (3.8M) [application/x-httpd-php]
Saving to: ‘../dat/adult.csv’


2022-08-13 13:30:54 (5.08 MB/s) - ‘../dat/adult.csv’ saved [3974305/3974305]



In [6]:
!tree .. -L 2 # check again

[01;34m..[0m
├── [00mLICENSE[0m
├── [00mREADME.md[0m
├── [01;34mdat[0m
│   └── [00madult.csv[0m
├── [01;34mmd[0m
│   └── [00mgit-more.md[0m
├── [01;34mnb[0m
│   └── [00munix-conda-pip.ipynb[0m
└── [01;34mpandas-sklearn-basics[0m
    └── [00mpandas-sklearn-basics.ipynb[0m

4 directories, 6 files


### Inspect data

Let's inspect first 10 lines of data:

In [7]:
!head ../dat/adult.csv

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial, Wife, White, Female, 0, 0, 40, United-States, <=50K
49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service, Not-in-family, Black, Female, 0, 0, 16, Jamaica, <=50K
52, Self-emp-not-inc, 209642, HS-grad, 9, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 45, United-States, >5

How about the last 3 lines? ( wait, I only see 2 lines. what's wrong? ) 

In [8]:
!tail -n 5 ../dat/adult.csv

40, Private, 154374, HS-grad, 9, Married-civ-spouse, Machine-op-inspct, Husband, White, Male, 0, 0, 40, United-States, >50K
58, Private, 151910, HS-grad, 9, Widowed, Adm-clerical, Unmarried, White, Female, 0, 0, 40, United-States, <=50K
22, Private, 201490, HS-grad, 9, Never-married, Adm-clerical, Own-child, White, Male, 0, 0, 20, United-States, <=50K
52, Self-emp-inc, 287927, HS-grad, 9, Married-civ-spouse, Exec-managerial, Wife, White, Female, 15024, 0, 40, United-States, >50K



How many records are there?

In [9]:
!wc -l ../dat/adult.csv

   32562 ../dat/adult.csv


Challenge: how many columns are there?

In [10]:
!head -1 ../dat/adult.csv | sed 's/[^,]//g' | wc -c
#sed is linux stream editor
#wc -c stands for word count

      15


## Conda: environment manager + package manager

You might have heard about Anaconda / Miniconda / Miniforge:

- What is the difference between `conda` and `miniconda`? [An answer](https://stackoverflow.com/questions/45421163/anaconda-vs-miniconda).
- What is the difference between `miniconda` and `miniforge`? [An answer](https://stackoverflow.com/questions/60532678/what-is-the-difference-between-miniconda-and-miniforge).

Though Conda is considered both a package manager and an environment manager, we  focus on using it as the environment manager, and use `pip` as the package manager.

### Installation

First, install `miniconda` (Feel free to skip if you have already installed it, or any flavor of conda).

Run the following commands in terminal to download the lastest Miniconda distribution and install it (Mac Intel Book). 

    ```
    cd ~
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
    bash Miniconda3-latest-MacOSX-x86_64.sh
    ```

For M1 chip, install `miniforge` instead. If you are new to `brew`, following instructions [here](https://brew.sh) to install Homebrew first:

    ```
    brew install miniforge
    ```

Outputs from these installations are rather long, so we ask you to run it in a terminal window instead.

Confirm that the installation is successful:

In [11]:
%%bash
which conda  # where the executable conda is in my path?
conda -V     # which version of conda?
which python # where the executable python is in my path?
python -V    # what version of python?

/Users/sbezawada/opt/anaconda3/envs/mle-course/bin/conda
conda 4.13.0
/Users/sbezawada/opt/anaconda3/envs/mle-course/bin/python
Python 3.8.13


### Create a new env

Create a new conda environment named `py39_12` where you specify a python version to be 3.9.12. Note the last flag `--yes` to skip the confirmation prompt.

In [4]:
!conda create --name py39_12 python=3.9.12 --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/sbezawada/opt/anaconda3/envs/py39_12

  added / updated specs:
    - python=3.9.12


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2022.6.15          |   py39hecd8cb5_0         154 KB
    pip-22.1.2                 |   py39hecd8cb5_0         2.4 MB
    python-3.9.12              |       hdfd78df_1        10.3 MB
    setuptools-61.2.0          |   py39hecd8cb5_0        1012 KB
    tzdata-2022a               |       hda174b7_0         109 KB
    ------------------------------------------------------------
                                           Total:        14.0 MB

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/osx-64::ca-certificates-2022.07.19-hecd8cb5_0
  certifi            pkgs/main/osx-64::certifi-2022.6.1

In [24]:
!ipython kernel install --user --name=py39_12 --display-name "py39_12"

Installed kernelspec py39_12 in /Users/sbezawada/Library/Jupyter/kernels/py39_12


In [7]:
#!conda env list
#!source activate base
%%zsh conda activate py39_12
#!conda init zsh

UsageError: Line magic function `%%zsh` not found.


In [3]:
!conda activate p39_12


CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.




In [7]:
!echo $PATH

/Users/sbezawada/opt/anaconda3/bin:/Users/sbezawada/opt/anaconda3/condabin:/Users/sbezawada/bin:/Applications/SnowSQL.app/Contents/MacOS:/Library/Frameworks/Python.framework/Versions/3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/Apple/usr/bin


In [12]:
%%bash
conda init bash

no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/condabin/conda
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/bin/conda
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/bin/conda-env
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/bin/activate
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/bin/deactivate
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/etc/profile.d/conda.sh
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/etc/fish/conf.d/conda.fish
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/shell/condabin/Conda.psm1
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/shell/condabin/conda-hook.ps1
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/lib/python3.8/site-packages/xontrib/conda.xsh
no change     /Users/sbezawada/opt/anaconda3/envs/mle-course/etc/profile.d/conda.csh
no change     /Users/sbezawada/.bash_profile
No action taken.


In [14]:
# YOU CODE HERE 
# list all environments in conda
!conda env list

# conda environments:
#
                         /Users/sbezawada/opt/anaconda3
base                  *  /Users/sbezawada/opt/anaconda3/envs/mle-course
py39_12                  /Users/sbezawada/opt/anaconda3/envs/mle-course/envs/py39_12



You shall see something like this:
```
# conda environments:
#
                         /Users/flora/miniforge3
                         /Users/flora/miniforge3/envs/tf38
base                  *  /usr/local/Caskroom/miniforge/base
py39                     /usr/local/Caskroom/miniforge/base/envs/py39
py39_12                  /usr/local/Caskroom/miniforge/base/envs/py39_12
```

### Activate an env

By default, you are under the `base` environment, to activate the new environement, run `conda activate py39_12` in terminal.

If you are using VS code, click on the top right button to switch python kernel. You will be prompted to install `ipykernel`, follow the instructions to install the package. Or you can run `source activate`, followed by 
`conda activate /usr/local/Caskroom/miniforge/base/envs/py39_12` in a cell. 

If you are using jupyter notebook on a localhost (by defaut 8888), you can restart the jupyter notebook after activating the new environment.

Now verify the python version:

In [13]:
#!conda init bash
!conda activate -n py39_12


CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.




In [1]:
%%bash

python --version
which python
conda env list

Python 3.9.12
/usr/local/Caskroom/miniforge/base/envs/py39_12/bin/python
# conda environments:
#
                         /Users/flora/miniforge3
                         /Users/flora/miniforge3/envs/tf38
base                     /usr/local/Caskroom/miniforge/base
py39                     /usr/local/Caskroom/miniforge/base/envs/py39
py39_12               *  /usr/local/Caskroom/miniforge/base/envs/py39_12



<details>
<summary>Click to see how to add the Conda env to you jupyter notebook kernels</summary>

```
pip install ipykernel
python -m ipykernel install --user --name=py39_12
```

Next time you launch jupyter notebook, you will see `py39_12` as an option under Kernel/Change kernel.
</details>




### Delete an env

To keep a lean list of environments, we want to prune unused environements from time to time. Simply do the following
```
conda remove --name old_env --all --yes
```

## Pip: python package manager

We recommend to use pip as your python package installer/manager [(fun read on pip vs conda.)](https://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda) Note you don't have install pip explictly since it was done during the installation of miniconda/miniforge. 

In [2]:
!which pip

/usr/local/Caskroom/miniforge/base/envs/py39_12/bin/pip


### Install package

We can certainly run `pip` commands the same way as earlier. Just for fun, let's try Jupyter line magic `%pip`. 

First make sure that we would like to have the most recent version of `pip` installed.

In [3]:
%pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install numpy pandas

Collecting numpy
  Downloading numpy-1.22.3-cp38-cp38-macosx_11_0_arm64.whl (12.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting pandas
  Downloading pandas-1.4.2-cp38-cp38-macosx_11_0_arm64.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: numpy, pandas
Successfully installed numpy-1.22.3 pandas-1.4.2
Note: you may need to restart the kernel to use updated packages.


Now you can use these packages:

In [4]:
# allow multiple outputs in a single cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [3]:
import numpy as np

# create a numpy array with random numbers
x = np.random.rand(10)
x.shape
x.reshape((5,-1))

array([[0.56683762, 0.83438395],
       [0.91538766, 0.16309203],
       [0.76082512, 0.49287772],
       [0.04812872, 0.03178841],
       [0.86329904, 0.79072763]])

In [5]:
%precision 3
x.reshape((5,-1))

'%.3f'

array([[0.567, 0.834],
       [0.915, 0.163],
       [0.761, 0.493],
       [0.048, 0.032],
       [0.863, 0.791]])

In [6]:
import pandas as pd
census_df = pd.read_csv(filepath_or_buffer='../dat/adult.csv', header = None)
census_df.head()
census_df.tail()
census_df.shape

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


(32561, 15)

### Uninstall packages

In [9]:
# Uninstall a package
%pip uninstall numpy pandas --yes

Found existing installation: numpy 1.22.3
Uninstalling numpy-1.22.3:
  Successfully uninstalled numpy-1.22.3
Found existing installation: pandas 1.4.2
Uninstalling pandas-1.4.2:
  Successfully uninstalled pandas-1.4.2
Note: you may need to restart the kernel to use updated packages.


In [10]:
%pip cache purge

Files removed: 5
Note: you may need to restart the kernel to use updated packages.
