## Lecture 2: Environments

### Bash Environment

- **Bash environment** is defined by a set of environment variables used by the shell and the OS
- **Environment variable** is a key-value pair, used by programs and shell scripts to configure the system behavior

In [1]:
%%bash
#to set an environment variable, we use the export command
export MY_VARIABLE="value1"
echo $MY_VARIABLE

value1


In [None]:
%%bash
#printenv can list the environment variables currently set in the bash session
#notes that here we do not see MY_VARIABLE since in Jupyter notebook, each cell starts a new bash session
printenv

VSCODE_CRASH_REPORTER_PROCESS_TYPE=extensionHost
TERM=xterm-color
SHELL=/bin/zsh
CLICOLOR=1
TMPDIR=/var/folders/qb/pd0pzd3x0jj2c_2nkn2hmrxw0000gn/T/
HOMEBREW_REPOSITORY=/opt/homebrew
CONDA_SHLVL=2
PYTHONUNBUFFERED=1
CONDA_PROMPT_MODIFIER=(data-science-general) 
ORIGINAL_XDG_CURRENT_DESKTOP=undefined
MallocNanoZone=0
PYDEVD_USE_FRAME_EVAL=NO
PYTHONIOENCODING=utf-8
_CONDA_EXE=/opt/anaconda3/bin/conda
USER=yuhaohuo
COMMAND_MODE=unix2003
CONDA_EXE=/opt/anaconda3/bin/conda
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.UWfEOVzF3i/Listeners
__CF_USER_TEXT_ENCODING=0x1F5:0x19:0x34
PAGER=cat
COLUMNS=80
ELECTRON_RUN_AS_NODE=1
_CE_CONDA=
CONDA_ROOT=/opt/anaconda3
CONDA_PREFIX_1=/opt/anaconda3
PATH=/opt/anaconda3/envs/data-science-general/bin:/opt/anaconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/

The main environment variables include:
1. **PATH:** Defines the directories where the system searches for executable programs (which can be called directly in termianl regardless to which directory you are located)
2. **HOME:** defines path to home directory
3. **USER:** defines the username
4. **PYTHONPATH:** specify additional directories where the system looks for Python packages (can be empty, since python also searched current working directory and paths added by virtual environment)
5. **DYLD_LIBRARY_PATH:** specify additional directories where the system looks for shared libraries (such as math libraries), used in Linux system, not in mac OS

In [3]:
%%bash
# print PATH values
echo $PATH

/opt/anaconda3/envs/data-science-general/bin:/opt/anaconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin


In [8]:
%%bash
echo $PATH

/opt/anaconda3/envs/data-science-general/bin:/opt/anaconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin


In [None]:
%%bash
# we can add our executabile program by appending the path to the PATH variable
export PATH=$PATH:/Users/yuhaohuo/Desktop/code/rc_class/research_computing_notes #add to the end
# export PATH=/Users/yuhaohuo/Desktop/code/rc_class/research_computing_notes:$PATH ,add to the begining
# the order in which we add directories to PATH is important, since the first directory in the PATH is the first one searched
echo $PATH
# note that any modification on environment variable through export is only temporary (will not be saved outside the bash the session)

/opt/anaconda3/envs/data-science-general/bin:/opt/anaconda3/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Users/yuhaohuo/Desktop/code/rc_class/research_computing_notes


**Note:** both PATH, PYTHONPATH, and DYLD_LIBRARY_PATH are path variables, the way we add new path to these variables are the same as how we add path to PATH

#### Making the changes on environment variables persistent:
1. Find the ~/.bash_profile on macOS or ~/.bashrc on Linux
2. Using vim or VS code to add a line such as `export PATH=/path/to/your/program:$PATH` or type in the termianl `echo "export PATH=/path/to/your/program:$PATH" >> ~/.bashrc`
3. Restart the bash session or use source `~/.bashrc` to reload bashrc file.

### Python Environment

A Python environment is a directory that contains:
1. Python interpreter
2. Curated set of installed packages

Packages can have conflicts, and some packages may require a specific version of python interpreter. Therefore, good practice is to create a new virtual environment for each project.

Python interpreter is the program that runs Python code, it can be installed through `brew install python`

In [None]:
%%bash
# check python version
python --version
# we can start python session by directly type python in the terminal
# >>> is the Python prompt, we can type Python code between the prompt and press Enter to execute the code.
# quite the python session through quit()

Python 3.13.5


In [None]:
# show the current Python interpreter
import platform
platform.python_implementation()
# this is used to implment the code we written in Python language; CPython means that written in C

'CPython'

In [11]:
# check the version of Python inside python session
import sys
print(sys.version)

3.13.5 | packaged by Anaconda, Inc. | (main, Jun 12 2025, 11:23:37) [Clang 14.0.6 ]


#### Manage virtual environments using **venv**

In [None]:
%%bash
# To create a new virtual environment:
# 1. create a directory to store virtual environments
cd ..
mkdir rc_venvs
# 2. create the virtual environment in this directory
python3.12 -m venv rc_venvs/rc_computing_notes
# -m means using module, venv means using venv to create virtual environment
# note that the python interpreator in the created virtual environment will be in the same version as the one in the currently activated python environment (when executing the script)
# the name of the venv we created is rc_computing_notes
# we can denote the python version we want for creating the virtual env, to install specific python version, using brew install python@3.12 (etc.)

In [14]:
%%bash
# we can check the virtual environment directory
cd ../rc_venvs/rc_computing_notes
tree -L 3

[01;34m.[0m
├── [01;34mbin[0m
│   ├── [00mActivate.ps1[0m
│   ├── [00mactivate[0m
│   ├── [00mactivate.csh[0m
│   ├── [00mactivate.fish[0m
│   ├── [01;32mpip[0m
│   ├── [01;32mpip3[0m
│   ├── [01;32mpip3.12[0m
│   ├── [01;36mpython[0m -> [01;32mpython3.12[0m
│   ├── [01;36mpython3[0m -> [01;32mpython3.12[0m
│   └── [01;36mpython3.12[0m -> [01;32m/opt/homebrew/opt/python@3.12/bin/python3.12[0m
├── [01;34minclude[0m
│   └── [01;34mpython3.12[0m
├── [01;34mlib[0m
│   └── [01;34mpython3.12[0m
│       └── [01;34msite-packages[0m
└── [00mpyvenv.cfg[0m

7 directories, 11 files


In [None]:
%%bash
# to activate the virtual environment, you can use the following command
source ../rc_venvs/rc_computing_notes/bin/activate
# to list current python packages in the virtual env
pip list
#check python version
python --version
# to deactivate
deactivate

In [None]:
%%bash
# to install a package (such as numpy)
source ../rc_venvs/rc_computing_notes/bin/activate
pip install numpy
#the package is installed at rc_computing_notes/lib/python3.12/site-packages
# we can also use pip list to check the current packages
pip list

In [None]:
%%bash
# Jupyter provides an interactive environment to run Python code; it typically opens in a web browser or in vs code
# It is an evolution of IPython, “interactive Python”, which is an interactive shell
# to use Jupyter notebook, we first install Jupyter metapackage
pip install jupyter
# then create the kernel for this environment
python -m ipykernel install --user --name rc_computing_notes --display-name "rc_computing_notes"
# we can then start a Jupyter notebook by typing jupyter-lab in the terminal

In [2]:
%%bash
# when looking at the tree structure of the venv directory, we may find things like  python -> python3.12 and python3.12 -> /opt/homebrew/opt/python@3.12/bin/python3.12
# these are symbolic links; a symbolic link is a file that is pointer to another file or directory
# to create a symbolic link, we can use the ln -s command, e.g. ln -s /path/to/target /path/to/link
# to see which file a symbolic link points to, we can use the ls -l command
ls -l ../rc_venvs/rc_computing_notes/bin/python3

lrwxr-xr-x@ 1 yuhaohuo  staff  10 Nov 14 23:56 [35m../rc_venvs/rc_computing_notes/bin/python3[m[m -> python3.12


In [3]:
%%bash
# alias is a shortcut for a command; to list the aliases currently set in the bash session, we can use the alias command
alias

In [None]:
# To make life easeir, we can create bash scripts to load virtual environment; one bash script per environment.
# The below script can be used to update both environment variables and also relevant Python virtual environment.

# Define environment variables
export MY_VAR1="value1"
export PATH="/my/custom/path:$PATH"
# Source the Python virtual environment
source /path/to/your/venv/bin/activate
# Print a message to confirm the environment is set
echo "Environment variables are set, and the virtual environment is activated."

In [None]:
%%bash
# we can save the above script as activate_env.sh and then source the script to load the environment
source activate_env.sh
# if it is not executiable, we can use the chmod command
chmod +x activate_env.sh

#### Manage virtual environment using **conda**

- Conda is an open-source environment management system; it collects Python packages mainly from Anaconda. 
- Different from venv, python versions installed by conda is independent from the system versions.
- However, conda is not recommended on remote cluster: conda will install many non-Python libraries that override cluster's optimized system libraries, even if we are only using it to manage python environment.
- Otherwise, we can use conda or venv.

When using conda for env management, it is optimal to have two files for recording Python packages and Python version:
1. requirements.txt: for recording packages
2. environment.yml: for recording Python version and other metadata of the environment

In [4]:
%%bash
# export the package requirments of the current virtual env
pip freeze > requirements.txt

In [None]:
# create the environment.yml file like this
# channels are the sources of packages; defaults is the default channel of Anaconda, conda-forge is a community-driven channel
name: rc_comput_notes_conda
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.12
  - pip=25.3
  - pip:
    - -r requirements.txt

In [None]:
%%bash
# then we can replicate our venv environment created for research computing notebooks in conda
conda env create -f environment.yml
# activate the env
conda activate rc_comput_notes_conda
# deactivate the env
conda deactivate

In [None]:
# otherwise, if we create the env from the start using conda we can follow the following steps:
#1.create env
conda create -n name_of_env
#2.install packages
conda install package_names

In [None]:
#export environment configuration file for colloboration
#Option 1:
conda env export --from-history --name rc_comput_notes_conda > environment.yml
#Option 2:
#if we want to have yml and txt two files, we can:
## create a environment.yml files specifying exactly same content as the one used for this project (may change python and pip version and env name)
## pip freeze > requirements.txt
## the latter approach might be better since sometime there are packages that conda do not natively support

In [None]:
# list all conda envs
conda env list
# remove a conda environment
conda env remove -n env_name