# Unix - Conda - Pip

We run commands in CLI, e.g., Terminal, but here we are using Jupyter Notebook to demonstrate:

1. Output from commands
2. That when experiementing or developing programs in the notebook, you can avoid switching tools.
3. That you can install pacakges in the beginning of the notebook (e.g., Sagemaker) so it's clear what packages are used.

## Unix commands

### Basics

For some of the basic commands, you can run use the cell to run it directly:

In [3]:
pwd # print working directory

'/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools/nb'

You always want to know "where you are" and how to look for help. Use `man` to display the user manual of any command that we can run on the terminal. 

In [2]:
man pwd # press ESC to exit

Further, cell magic `%%bash` turns a cell into a bash script where you can run multiple lines (See more cell/line magics [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html)). 

In [4]:
%%bash
pwd
ls -lah # WE DO: What are those flags for?

#--- 202010_IM: the flags are for the following
#--- -l: list long descriptions of entries
#--- -a: list hidden files and attributes
#--- -h: list human readable info;  eg file sizes

/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools/nb
total 68K
drwxr-xr-x 3 kidcoconut kidcoconut 4.0K Oct 24 16:44 .
drwxr-xr-x 6 kidcoconut kidcoconut 4.0K Oct 24 16:30 ..
drwxr-xr-x 2 kidcoconut kidcoconut 4.0K Oct 24 16:29 .ipynb_checkpoints
-rw-r--r-- 1 kidcoconut kidcoconut  56K Oct 24 16:44 unix-conda-pip.ipynb


### Download a data file

Let's download some data from [Census Income Data Set](https://archive.ics.uci.edu/ml/datasets/Census+Income); but first we would like to make a new directory to store the data at the same level as `nb` directory.

 

In [1]:

#--- 202210_IM:  pre-reqs;  project root directory is /MLE-10;  prep dir
%cd ~/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools/nb
%cd ..


/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools/nb
/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools


In [7]:
%%bash
# WE DO: make a new directory called `dat` under the project root directory

#--- 202210_IM:  only create the dir if it does not already exist (-p)
pwd
mkdir -p dat

/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools


Let's confirm that we just created an empty directory `dat` using command `tree`. Not all command can be run in the cell directly, adding ! in front of command can do the trick:


    #--- 202210_IM: PREREQ - make sure that tree library is installed.
                         ensure that name resolution is properly configured:
                         

    #--- UC:  ensure that /etc/resolv.conf does not get overwritten in WSL ...
    - edit /etc/wsl.conf to prevent re-write
    - in Powershell, un:   wsl -shutdown
    - open Linux,  create /etc/resolvconf/resolv.conf
    - add namespaces to /etc/resolv.conf (softlink)
    - reopen shell and confirm no rewrite
    
    /etc/resolv.conf


    sudo apt-get install tree

In [8]:
%%bash

#--- 202210_IM:  NOTE - for some reason, the dir has reverted to the orig
#---             Q:  are cells initialized each with their own bash env?
#---             Q:  how do you preserve locs between cells?
pwd
tree   #--- IM:  list dir and contents in a tree structure

/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools
.
├── LICENSE
├── README.md
├── dat
│   └── adult.csv
├── md
│   └── git-more.md
├── nb
│   ├── dat
│   └── unix-conda-pip.ipynb
└── pandas-sklearn-basics
    └── pandas-sklearn-basics.ipynb

5 directories, 6 files


In [9]:
%%bash

#--- download csv file
wget https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data -O ./dat/adult.csv

--2022-10-24 16:51:15--  https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3974305 (3.8M) [application/x-httpd-php]
Saving to: ‘./dat/adult.csv’

     0K .......... .......... .......... .......... ..........  1%  256K 15s
    50K .......... .......... .......... .......... ..........  2%  508K 11s
   100K .......... .......... .......... .......... ..........  3% 4.36M 8s
   150K .......... .......... .......... .......... ..........  5% 10.1M 6s
   200K .......... .......... .......... .......... ..........  6%  491K 6s
   250K .......... .......... .......... .......... ..........  7% 3.39M 5s
   300K .......... .......... .......... .......... ..........  9% 1.85M 5s
   350K .......... .......... .......... .......... .......... 10% 1.57M 4s
   4

In [12]:
%%bash
pwd            #--- working dir remains the project root;  
               #    expected ~/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools

tree . -L 2   #--- generate the tree, limiting the depth to 2 layers

/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools
.
├── LICENSE
├── README.md
├── dat
│   └── adult.csv
├── md
│   └── git-more.md
├── nb
│   ├── dat
│   └── unix-conda-pip.ipynb
└── pandas-sklearn-basics
    └── pandas-sklearn-basics.ipynb

5 directories, 6 files


### Inspect data

Let's inspect first 10 lines of data:

In [15]:
%%bash
pwd                    #--- check your starting point

head ./dat/adult.csv
echo '>>>END'

/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial, Wife, White, Female, 0, 0, 40, United-States, <=50K
49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service, Not-in-family, Black, Female, 0, 0, 16, Jamaica, <=50K
52, Self-emp-not-inc, 209642, HS-grad, 9, Marr

How about the last 3 lines? ( wait, I only see 2 lines. what's wrong? ) 

#--- 202210_IM:  possibly there is a carriage return on the last line?

In [16]:
%%bash
pwd                    #--- check your starting point

tail -n 3 ./dat/adult.csv
echo '>>>END'

/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools
22, Private, 201490, HS-grad, 9, Never-married, Adm-clerical, Own-child, White, Male, 0, 0, 20, United-States, <=50K
52, Self-emp-inc, 287927, HS-grad, 9, Married-civ-spouse, Exec-managerial, Wife, White, Female, 15024, 0, 40, United-States, >50K

>>>END


How many records are there?

#--- syntax
wc:  word count command
-l:  number of lines

In [17]:
!wc -l ./dat/adult.csv

32562 ./dat/adult.csv


Challenge: how many columns are there?

#--- syntax
head:  output the first x lines
-1:    x=1;  output only 1 line

| sed s/: pipe the output to sed; bash stream editor;  s/ substitute <match text> / <new text> /
s/:    match any whitespace
[^,]:  negated match;  match anything that is not a comma
//g:   do this globally across the string

| wc:  pipe to the word count command
wc -c: output the count of bytes


In [18]:
%%bash

#--- step 1:  breaking down the #cols step
#--- get the first line from the data file
head -1 ./dat/adult.csv

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K


In [19]:
%%bash

#--- step 2:  breaking down the #cols step
#--- regex parse the first line; substitute globally anything that it not a comma with a null/no data
#--- NOTE:  there are only 14 commas!
head -1 ./dat/adult.csv | sed 's/[^,]//g'

,,,,,,,,,,,,,,


In [20]:
%%bash

#--- step 3:  breaking down the #cols step
#--- perform a word/byte count on the output stream
#--- WARN:  this code depends on the last carriage return byte for a correct tally
head -1 ./dat/adult.csv | sed 's/[^,]//g' | wc -c

15


In [21]:
!head -1 ./dat/adult.csv | sed 's/[^,]//g' | wc -c

15


## Conda: environment manager + package manager

You might have heard about Anaconda / Miniconda / Miniforge:

- What is the difference between `conda` and `miniconda`? [An answer](https://stackoverflow.com/questions/45421163/anaconda-vs-miniconda).
- What is the difference between `miniconda` and `miniforge`? [An answer](https://stackoverflow.com/questions/60532678/what-is-the-difference-between-miniconda-and-miniforge).

Though Conda is considered both a package manager and an environment manager, we  focus on using it as the environment manager, and use `pip` as the package manager.


    #--- 202210_IM:  note that conda and miniconda also perform pkg mgmt through 'conda install'.  Is this an alias to pip?
    Answer:  Pip is for python pkg mgmt;  conda is not specific to python
    Recommended:  for our course, we will use pip for python pkg mgmt only

### Installation - Miniconda

First, install `miniconda` (Feel free to skip if you have already installed it, or any flavor of conda).



    #--- 202210_IM:  install miniconda for linux
    NOTE:  running this may replaces conda and jupyter (these instructions) may not run anymore 
    NOTE:  you may need to re-run conda init <shell>
    https://educe-ubc.github.io/conda.html#:~:text=Installing%20Miniconda%201%20Install%20Miniconda%20by%20entering%3A%20bash,download%20any%20files%20using%20CLI%3A%20conda%20install%20wget


#--- install miniconda for MacOS
Run the following commands in terminal to download the lastest Miniconda distribution and install it (Mac Intel Book). 

    ```
    cd ~
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
    bash Miniconda3-latest-MacOSX-x86_64.sh
    ```

For M1 chip, install `miniforge` instead. If you are new to `brew`, following instructions [here](https://brew.sh) to install Homebrew first:

    ```
    brew install miniforge
    ```

Outputs from these installations are rather long, so we ask you to run it in a terminal window instead.

Confirm that the installation is successful:

In [22]:
%%bash
which conda  # where the executable conda is in my path?
conda -V     # which version of conda?
which python # where the executable python is in my path?
python -V    # what version of python?

/home/kidcoconut/miniconda3/bin/conda
conda 22.9.0
/home/kidcoconut/miniconda3/bin/python
Python 3.9.12


### Create a new env

Create a new conda environment named `py39_12` where you specify a python version to be 3.9.12. Note the last flag `--yes` to skip the confirmation prompt.

In [2]:
!conda create --name py39_12 python=3.9.12 --yes

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/kidcoconut/miniconda3/envs/py39_12

  added / updated specs:
    - python=3.9.12


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-22.2.2                 |   py39h06a4308_0         2.3 MB
    python-3.9.12              |       h12debd9_1        19.2 MB
    ------------------------------------------------------------
                                           Total:        21.5 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main None
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu None
  ca-certificates    pkgs/main/li

In [23]:
# YOU CODE HERE 
# list all environments in conda
!conda info --envs

# conda environments:
#
                         /home/kidcoconut/anaconda3
                         /home/kidcoconut/anaconda3/envs/mle-course
base                     /home/kidcoconut/miniconda3
py39_12               *  /home/kidcoconut/miniconda3/envs/py39_12



You shall see something like this:
```
# conda environments:
#
                         /Users/flora/miniforge3
                         /Users/flora/miniforge3/envs/tf38
base                  *  /usr/local/Caskroom/miniforge/base
py39                     /usr/local/Caskroom/miniforge/base/envs/py39
py39_12                  /usr/local/Caskroom/miniforge/base/envs/py39_12
```

### Activate an env

By default, you are under the `base` environment, to activate the new environement, run `conda activate py39_12` in terminal.


    #--- 202210_IM:  NOTE that I had to get back into mle-course to launch the jupyter notebook (this)
    ~/miniconda3/bin/conda init
    conda activate py39_12


    #--- 202210_IM:  select the python interpreter in VSCode;  bottom right; in blue bar where it specifies the env and python version, e.g 3.8.2 64-bit.  Note:  changed to 3.9.12('py39_12':conda)



If you are using VS code, click on the top right button to switch python kernel. You will be prompted to install `ipykernel`, follow the instructions to install the package. Or you can run `source activate`, followed by 
`conda activate /usr/local/Caskroom/miniforge/base/envs/py39_12` in a cell. 

If you are using jupyter notebook on a localhost (by defaut 8888), you can restart the jupyter notebook after activating the new environment.

    #--- 202210_IM:  In VSCode I installed jupyter into miniconda; under activated py39_12;  pip install jupyter-core


Now verify the python version:

In [24]:
%%bash

echo "INFO:  Python version ... (expected:  Python 3.9.12)"
python --version
echo 

echo "INFO:  which python ... (expected:  miniconda)"
which python
echo

echo "INFO:  which conda ... (expected:  miniconda)"
which conda
echo 

echo "INFO:  conda env list ..."
conda env list
echo


INFO:  Python version ... (expected:  Python 3.9.12)
Python 3.9.12

INFO:  which python ... (expected:  miniconda)
/home/kidcoconut/miniconda3/bin/python

INFO:  which conda ... (expected:  miniconda)
/home/kidcoconut/miniconda3/bin/conda

INFO:  conda env list ...
# conda environments:
#
                         /home/kidcoconut/anaconda3
                         /home/kidcoconut/anaconda3/envs/mle-course
base                     /home/kidcoconut/miniconda3
py39_12               *  /home/kidcoconut/miniconda3/envs/py39_12




<details>
<summary>Click to see how to add the Conda env to you jupyter notebook kernels</summary>
    
    #--- 202210_IM:  very cool;  TODO - look into jupyter metadata and options
                     Q:  do I run the below in my Terminal env, as well as in VSCode Terminal env?
                        - done in Windows WSL Terminal for py39_12 env
                        - done in VSCode Terminal for py39_12 env

```
pip install ipykernel
python -m ipykernel install --user --name=py39_12
```

Next time you launch jupyter notebook, you will see `py39_12` as an option under Kernel/Change kernel.
</details>


### Delete an env

To keep a lean list of environments, we want to prune unused environements from time to time. Simply do the following
```
conda remove --name old_env --all --yes
```

## Pip: python package manager

We recommend to use pip as your python package installer/manager [(fun read on pip vs conda.)](https://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda) Note you don't have install pip explictly since it was done during the installation of miniconda/miniforge. 

#--- 202210_IM:  to oversimplify ... use pip for python package mgmt; use conda for virtual env mgmt

In [25]:
%%bash

which pip       #--- determine which pip binary is being used
echo
pip --version   #--- determine the active version # for pip


/home/kidcoconut/miniconda3/envs/py39_12/bin/pip

pip 22.3 from /home/kidcoconut/miniconda3/envs/py39_12/lib/python3.9/site-packages/pip (python 3.9)


### Install package

We can certainly run `pip` commands the same way as earlier. Just for fun, let's try Jupyter line magic `%pip`. 

First make sure that we would like to have the most recent version of `pip` installed.

In [26]:
%pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [27]:
%pip install numpy pandas

Note: you may need to restart the kernel to use updated packages.


Now you can use these packages:

In [28]:
# allow multiple outputs in a single cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [29]:
import numpy as libNumpy

#--- create a numpy array and fill with random nums
print("INFO:  create array and initialize ...")
aryTemp = libNumpy.random.rand(10)     #--- create a random array with 10 elements

print("INFO:  output the array dims ...")
aryTemp.shape                          #--- output the array dims;  ie 10x0

print("INFO:  re-shape; re-dim the array to 5 rows")
aryTemp.reshape((5,-1))                #--- redim with 5rows;  auto-fit the cols? same as (5,2)


INFO:  create array and initialize ...
INFO:  output the array dims ...


(10,)

INFO:  re-shape; re-dim the array to 5 rows


array([[0.59943858, 0.42000102],
       [0.24039002, 0.21928006],
       [0.5218054 , 0.9342311 ],
       [0.47931219, 0.273564  ],
       [0.59309076, 0.91393576]])

In [30]:
#--- adjust the precision of values to 3 decimals
%precision 3

#--- update the array 
aryTemp.reshape((5,-1))

'%.3f'

array([[0.599, 0.42 ],
       [0.24 , 0.219],
       [0.522, 0.934],
       [0.479, 0.274],
       [0.593, 0.914]])

In [32]:
#--- 202210_IM:  ensure we are in the correct working dir;  expected:  ~/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools
!pwd
#%cd ~/myLurnins/fourthbrain.ai/code/whodunnit


/home/kidcoconut/myLurnins/fourthbrain.ai/code/MLE-10/assignments/week-01-mle-basictools


In [33]:
#--- dataframe creation and manipulation exercise
import pandas as libPandas
strFilPath = './dat/adult.csv'
dtfCensus = libPandas.read_csv(filepath_or_buffer=strFilPath, header = None)
dtfCensus.head()                 #--- output the first x rows of the dataframe
dtfCensus.tail()                 #--- output the last y rows of the dataframe
dtfCensus.shape                  #--- output the dimension of the dataframe (32561, 15)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K
32560,52,Self-emp-inc,287927,HS-grad,9,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,>50K


(32561, 15)

### Uninstall packages

In [2]:
# Uninstall a package
%pip uninstall numpy pandas --yes

Found existing installation: numpy 1.23.4
Uninstalling numpy-1.23.4:
  Successfully uninstalled numpy-1.23.4
Found existing installation: pandas 1.5.1
Uninstalling pandas-1.5.1:
  Successfully uninstalled pandas-1.5.1
Note: you may need to restart the kernel to use updated packages.


In [3]:
%pip cache purge

Files removed: 173
Note: you may need to restart the kernel to use updated packages.
