# 🐍 Managing packages and environments for Python: local and in Codespaces 🐍

### Why do you need to care?  

Some code or functions only work on certain versions of Python or Python packages. So you need to:

- test that your code works on multiple versions
- and/or keep the version that worked when you made your particular project with your particular requirements



# 🚀🚀🚀
# TLDR;

Python versions and packages are files in `/Users/lizre/.pyenv/versions/` 📁 🐍 
<img width="195" alt="image" src="https://user-images.githubusercontent.com/38010821/227729495-4b87c744-986f-4378-80c0-fde31363e6af.png">

Python versions (`site`s) come with an _interpreter_:  an "executable file" that turns the language into binary so your computer can run it. 🐍 ⬅️➡️ 💻

In [86]:
!which python

/Users/lizre/.pyenv/versions/3.7.3/bin/python


Each `site` has a `/lib`: your library of packages.  📁 🐍 📚
- Each `/lib` includes `site-packages`: the ones you've installed 📚 📦

In [90]:
lib = !cd ~/.pyenv/versions/3.7.3/lib/python3.7 && ls
lib[147:150]

['shutil.py', 'signal.py', 'site-packages']

In [94]:
site_packages = !cd ~/.pyenv/versions/3.7.3/lib/python3.7/site-packages && ls
site_packages[2:5]

['Cython-0.29.23.dist-info',
 'Flask-2.0.1.dist-info',
 'GitPython-3.1.24.dist-info']

So packages are always inside a particular python `site`. 📦 📥 📁 🐍

Your `PATH` _environment variable_, set in your `~/.bashrc`, tells our OS where to look for executable files.

In [96]:
!cat ~/.bashrc

export PATH="$HOME/.ipython/kernels/ijavascript/bin:$PATH" /Users/lizre/Downloads/harnesslib/examples/.venv/bin:/Library/Frameworks/Python.framework/Versions/3.11/bin:/usr/local/Cellar/pyenv-virtualenv/1.2.1/shims:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools

`pip` refers to `PATH` to know which version/`site`'s `site-packages` to install into.
<br>
<br>

< end TLDR; >

# 🚀🚀🚀

<br>
<br>

# Python versions and packages are files  📁🐍 

- A _python version_ is a version of the Python programming language.
- A python _interpreter_ is an "executable file" that runs the language.
- There's a 1:1 relationship: when you install a version of python language, it also installs an interpreter that supports that version

`python --version` shows the version the interpreter supports:

In [37]:
!python --version

Python 3.7.3


The interpreter is at this `path`:

In [36]:
!which python

/Users/lizre/.pyenv/versions/3.7.3/bin/python


(`which` means `path to executable`. for some reason it's not `where`.)

<br>


🔎 _Aside: executable files_ 🔍

Languages like Python are actually written in a "lower level language" that is more about manipulating the computer hardware. 

You use an _interpreter_ to translate from Python to the lower language, which is bytecode, which then translates even lower, to machine code (binary). 

And the interpreter can be written in a totally different language than all of those, which for Python is C.

So the C interpreter translates the python into bytecode and bytecode into machine code/binary.

<br>

## 📁📁📁 So if you have multiple python or package versions, you have multiple files 📁📁📁

`/Users/lizre/.pyenv/versions` contains multiple versions of python/interpreter:


In [4]:
!cd /Users/lizre/.pyenv/versions/ && ls

[34m3.5.7[m[m  [34m3.7.11[m[m [34m3.7.3[m[m  [34m3.8.12[m[m


(They're in `pyenv` because `pyenv` lets you have multiple python versions. Otherwise a single version of python might be in default path `/usr/bin/python`.)

📚 In each version folder is **`lib/python3.X`, which holds your packages. It's your `lib`rary of packages.** 📚 

It has:
<br>1) the standard library files that are part of the Python installation

In [48]:
lib = !cd ~/.pyenv/versions/3.7.3/lib/python3.7 && ls
lib[53:57]

['ctypes', 'curses', 'dataclasses.py', 'datetime.py']

<br>2) `site-packages`:

In [49]:
lib[149]

'site-packages'

📦 `site-packages` is for third-party packages: 📦

In [72]:
site_packages = !cd ~/.pyenv/versions/3.7.3/lib/python3.7/site-packages && ls
site_packages[0:5]

['Babel-2.12.1.dist-info',
 'Cython',
 'Cython-0.29.23.dist-info',
 'Flask-2.0.1.dist-info',
 'GitPython-3.1.24.dist-info']

`site` means a specific python folder/version/interpreter. 📁🐍  

so these are the packages especially for our python `3.7.3` folder/version/interpreter. 📦 📁 🐍

The packages are different inside the `3.8.12 site`:

In [22]:
site_packages = !cd ~/.pyenv/versions/3.8.12/lib/python3.8/site-packages && ls
site_packages[0:5]

['README.txt',
 '_distutils_hack',
 'distutils-precedence.pth',
 'pip',
 'pip-21.1.1.dist-info']

## So packages are inside a particular python `site`.  📦 📥 📁 🐍 



So when you `pip install` a package, it adds to the `/site-packages` of that python site. 

(`pip` knows which `site` because of `PATH`: we'll talk about what `PATH` is in a bit.)

It also means if you just copy and paste a package into `/site-packages`, you've installed it!

You can find out what interpreter `pip` thinks you're on:

In [25]:
!pip --version

pip 23.0.1 from /Users/lizre/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip (python 3.7)


It knows the path to the interpreter because of `$PATH`, which we'll talk about later.

Recall that `which` means `path to executable`, so here's the path to our `pip` executable:

In [13]:
!which pip

/Users/lizre/.pyenv/versions/3.7.3/bin/pip


`pip list` shows packages it has installed.

`pip list` does not show:
1) the standard library
<br>2) packages you manually installed by copying its files directly into `site-packages`

In [31]:
pip_list = !pip list
pip_list[180:184]

['pandas                               1.3.2',
 'pandas-profiling                     3.1.0',
 'pandas-stubs                         1.2.0.62',
 'pandocfilters                        1.4.3']

`pip show` tells you more about a packages, including its dependencies and dependents:

In [102]:
!pip show pandas 

Name: pandas
Version: 1.3.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: pandas-dev@python.org
License: BSD-3-Clause
Location: /Users/lizre/.pyenv/versions/3.7.3/lib/python3.7/site-packages
Requires: numpy, python-dateutil, pytz
Required-by: cmdstanpy, harnesslib, Kqlmagic, mlflow, mlxtend, octopy, pandas-profiling, phik, prophet, researchpy, seaborn, sklearn-pandas, statsmodels, visions


Store a project's required packages in requirements.txt.

In [8]:
!pipreqs --force

INFO: Successfully saved requirements file in /Users/lizre/Downloads/learn-py/requirements.txt


- It will only include the contents of `pip list` (only packages installed by `pip`)
- Don't use `pip freeze > requirements.txt`: it is [harmful because it includes too many things](https://medium.com/@tomagee/pip-freeze-requirements-txt-considered-harmful-f0bce66cf895); also [pypi says not to use pip freeze](https://pypi.org/project/pipreqs/).



In [10]:
!cat requirements.txt

pandas==1.3.2


### Install a package from source code with `pip install .` 📦

You usually install packages like `pip install pandas`. When you do this, pip looks in PyPI package repository.

But sometimes, you want to install a package not in a registry, or you are making changes to the package yourself. Then you install it from its source code.

Go to the directory it’s in, like a github repo, and do `pip install .`. 
- `.` means current directory.
- Pip will look for a setup.py or pyproject.toml.
    - pyproject.toml is a more simple and readable replacement for setup.py
    - toml: Tom's Obvious, Minimal Language.
- These files define the dependencies for building (turning into executable) and installing. 
	- This is different from requirements.txt, which are to RUN the project.
- You could instead do `pip install -e .`: an editable install”


# 🛣️ Manage program versions by managing the `PATH` to their executable files 🛣️

Recall that programs like `python` and `pip` (and even `ls`!) have "executable files".

When you run `python`: 

In [63]:
!python script.py

15


it tells your OS to execute the python interpreter. So the OS searches for an executable file with the name `python`. 

It searches a list of directories in an environment variable called `PATH` (we'll explain _environment_ and _variable_ in a bit).

### So `PATH` tells our OS where to look for the executable files of commands. 📁 🛣️ 👀


In [74]:
!echo $PATH

/Users/lizre/.pyenv/versions/3.7.3/bin:/usr/local/Cellar/pyenv/2.3.15/libexec:/usr/local/Cellar/pyenv/2.3.15/plugins/python-build/bin:/Library/Frameworks/Python.framework/Versions/3.11/bin:/usr/local/Cellar/pyenv-virtualenv/1.2.1/shims:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools


List of directories separated by `:`.
- /Users/lizre/.pyenv/versions/3.7.3/bin <-- `!which python` returned `/Users/lizre/.pyenv/versions/3.7.3/bin/python`. **so our python executable is here!!**
- /Users/lizre/.pyenv/shims <-- (shims are like helpers for executables)
- /usr/local/bin
- /usr/bin
- /bin



Without `PATH`, you'd have to put the full directory of the `ls` command:

In [68]:
!which python

/Users/lizre/.pyenv/versions/3.7.3/bin/python


In [69]:
! /Users/lizre/.pyenv/versions/3.7.3/bin/python script.py

15


Once it finds the executable, the operating system will execute it and pass `script.py` as an argument.

### `pip` knows which `site`, because of `PATH`.

When you `pip install`, it looks in the `PATH` directories for a python executable.
- Remember how `!which python` returned `/Users/lizre/.pyenv/versions/3.7.3/bin/python`, and that was also the first directory in our `PATH`? Than means our python executable is in this directory, and `pip` will know that!
    
Then it installs the package into `lib/site-packages` of that python executable:

`/.pyenv/versions/3.7.3/lib/python3.7/site-packages`


🔎 _Aside: Why more than one directory?_ 🔍

1) Manage multiple versions of a program, like python. Each version may be installed in a different directory. 

2) Find programs installed in non-standard locations, either because you installed them manually or a package manager used non-standard locations. 

3) Control search order: The PATH specifies the order in which the system searches for executables, so you can put your default first.


### The `PATH` directories are sometimes to `/bin`, which holds `binary` executables 🛣️🗑️

`/bin` is not like a garbage bin, it means `binary`. Remember how the python intepreter translates to `binary`? 💡

It's a standard directory in mac, for executable files that are essential for your computer to function.

So it's in `PATH` by default. 
- so when you get a new mac, `echo $PATH` will return `/bin`
- so you already have access to executables

In [8]:
!ls /bin

[31m[[m[m         [31mcsh[m[m       [31mecho[m[m      [31mksh[m[m       [31mmkdir[m[m     [31mrealpath[m[m  [31mstty[m[m      [31mwait4path[m[m
[31mbash[m[m      [31mdash[m[m      [31med[m[m        [31mlaunchctl[m[m [31mmv[m[m        [31mrm[m[m        [31msync[m[m      [31mzsh[m[m
[31mcat[m[m       [31mdate[m[m      [31mexpr[m[m      [31mlink[m[m      [31mpax[m[m       [31mrmdir[m[m     [31mtcsh[m[m
[31mchmod[m[m     [31mdd[m[m        [31mhostname[m[m  [31mln[m[m        [30m[41mps[m[m        [31msh[m[m        [31mtest[m[m
[31mcp[m[m        [31mdf[m[m        [31mkill[m[m      [31mls[m[m        [31mpwd[m[m       [31msleep[m[m     [31munlink[m[m


See how basics like `echo` and `ls` are in there!

in Finder, they look like this:

<img width="489" alt="image" src="https://user-images.githubusercontent.com/38010821/227289808-041439e0-171a-4262-bb3e-28b299a95bb0.png">


The `exec` means they're executable!

#### `usr/bin`is for one that are base, but not essential, like `git`. 

`/usr` is "Unix System Resources".

In [84]:
bin = !ls /usr/bin
bin[265:270]

['git',
 'git-receive-pack',
 'git-shell',
 'git-upload-archive',
 'git-upload-pack']

### An _environment variable_ like `PATH` is a variable that's available to a process. 

An environment is not a physical entity--it's just a set of settings (like system time, or user preferences, or language-specific settings like version) and variables.

- like how you set `aaa=2` in a jupyter notebook, but then you dont have `aaa` in another notebook or another session. That notebook--session is an _environment_.

You access variables with `$`.

In [21]:
!$PATH

/bin/bash: /Users/lizre/.pyenv/versions/3.7.3/bin:/usr/local/Cellar/pyenv/2.3.15/libexec:/usr/local/Cellar/pyenv/2.3.15/plugins/python-build/bin:/Library/Frameworks/Python.framework/Versions/3.11/bin:/usr/local/Cellar/pyenv-virtualenv/1.2.1/shims:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools: No such file or directory


### Set environment variables, like `$PATH`, in `zshrc` or `bashrc` _configuration files_ 🐚 🎛️

`bash` and `zsh` are `shell`s: command-line interfaces.
- they have differences in syntax and features (eg command completion)

Configure these `shell`s in their "rc" ("run commands") files: `~/.bashrc` and  `~/.zshrc`
- its  called a "run command", not "configuration", because it has commands that are run by the `bash` or `zsh` when it starts up
- eg commands that set environment variables
- eg, when `bash` starts up, it `source`s `~/.bashrc`

In [76]:
!cat ~/.bashrc

export PATH="$HOME/.ipython/kernels/ijavascript/bin:$PATH" /Users/lizre/Downloads/harnesslib/examples/.venv/bin:/Library/Frameworks/Python.framework/Versions/3.11/bin:/usr/local/Cellar/pyenv-virtualenv/1.2.1/shims:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools

`~` means user's home directory (`/Users/lizre`), so identical to: 

In [77]:
!cat /Users/lizre/.bashrc

export PATH="$HOME/.ipython/kernels/ijavascript/bin:$PATH" /Users/lizre/Downloads/harnesslib/examples/.venv/bin:/Library/Frameworks/Python.framework/Versions/3.11/bin:/usr/local/Cellar/pyenv-virtualenv/1.2.1/shims:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools

`export` means to make available to other programs

So when `zshrc` starts up, or when you `source ~/.zshrc`, any commands in the file are executed as if you had typed them at the command prompt. Including that `export $PATH` command.

### You edit shell run commands files (`zshrc` or `bashrc`) in text editor 

or append with `>>`:

`echo 'export PATH="/opt/homebrew/opt/node@16/bin:$PATH"' >> ~/.zshrc`

You could even add other stuff. Like if you often run `ssh analytics-console.github.net` to get into the analytics console, you could add this:

`alias console="ssh username@analytics-console.github.net"`

Then you can just use `console`.

# _virtual environments_ are just folders with a `pip` executable, python interpreter, and `site-packages`. 

Recall that an environment is not a physical entity--it's just a set of settings (like system time, or user preferences, or language-specific settings like version) and variables.

So the word "virtual" is redundant here.

([difference between VM, docker, and virtual env](https://stephen-odaibo.medium.com/docker-containers-python-virtual-environments-virtual-machines-d00aa9b8475) -- we'll talk about docker later)

In [110]:
!python3.9 -m venv new_env


This does: 

1) A new directory `my_env` 

2) Inside `my_env`, `bin` for `pip` and python executables
- It'll use whatever python interpreter you used to make it -- here, 3.9. So always include the full version (not just python 3).

In [111]:
! cd new_env/bin && ls

Activate.ps1     activate.fish    [31mpip[m[m              [35mpython[m[m
activate         [31measy_install[m[m     [31mpip3[m[m             [35mpython3[m[m
activate.csh     [31measy_install-3.9[m[m [31mpip3.9[m[m           [35mpython3.9[m[m


3) `pyvenv.cfg` of metadata about the environment like version of python used, and options used in venv command

4) a `bin/activate` script, so you can activate and deactivate the venv

<br>

🔎 _Aside: python interpreter as a `copy` vs a `symlink`_ 🔍

In Finder, the `pip` in `new_env/bin` is an executable, but the `python3.9` is just blank with a little arrow:

<img width="793" alt="image" src="https://user-images.githubusercontent.com/38010821/227731147-af50cea4-6ccb-4c04-a653-125d820089c9.png">

This is a _symbolic link_ (or `symlinks`) to the actual Python executable.

That means if you share your env with someone else, they'll need to already have that version of python interpreter, stored in the same link/path.

You could make it actually store a python executable by using `python3.9 -m venv myenv --copies`, instead of `python3.9 -m venv myenv --symlink`.
<br>


### Create requirements.txt to record your packages/dependencies

In terminal:

In [8]:
!pipreqs --force

INFO: Successfully saved requirements file in /Users/lizre/Downloads/learn-py/requirements.txt


- You'll see requirements.txt appear in src. It will not exactly match everything you `import` because pipreqs only includes ones not in standard library.
- Can also use `pip freeze > requirements.txt` but it is [harmful because it includes too many things.](https://medium.com/@tomagee/pip-freeze-requirements-txt-considered-harmful-f0bce66cf895); also [pypi says not to use pip freeze](https://pypi.org/project/pipreqs/).



In [10]:
!cat requirements.txt

pandas==1.3.2


To use:

In [12]:
!pip install -r requirements.txt



In [15]:
!cd my_env && ls

[34mbin[m[m        [34minclude[m[m    [34mlib[m[m        pyvenv.cfg


Creates a folder called `my_env`, with Python, pip and `site-packages`:

In [None]:
├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── easy_install
│   ├── pip
│   ├── pip3
│   ├── python 
│   └── python3.7
├── include
├── lib
│   └── python3.7
│       └── site-packages
└── pyvenv.cfg

lib: a copy of Python.

## Enter/activate env

`activate` scripts tell your shell to use the venv’s Python executable and its `site-packages`, instead of the system ones.

so just run `activate`:

In [19]:
!source my_env/bin/activate 

### What activating does

Now prompt has env name:

![image](https://user-images.githubusercontent.com/38010821/153217272-690c3c2d-7035-474b-88ee-3ba1238a2d21.png)


When you activate a virtual environment, you are essentially telling the system to use the Python interpreter and other files located in the virtual environment directory, rather than the global Python installation on the system. 

And now, instead of looking for Python in `/Users/lizre/.pyenv/versions/3.7.3/bin/python`, it's looking in `my_env`:

In [92]:
!source my_env/bin/activate && which python

/Users/lizre/Downloads/learn-py/my_env/bin/python


And `my_env` is at the beginning of PATH, meaning the venv is the first directory used:

In [98]:
!source my_env/bin/activate && echo $PATH

/Users/lizre/Downloads/learn-py/my_env/bin:/Users/lizre/.pyenv/versions/3.7.3/bin:/usr/local/Cellar/pyenv/2.0.6/libexec:/usr/local/Cellar/pyenv/2.0.6/plugins/python-build/bin:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin


Whereas outside the venv, `my_env` is not in PATH at all:

In [101]:
!echo $PATH

/Users/lizre/.pyenv/versions/3.7.3/bin:/usr/local/Cellar/pyenv/2.0.6/libexec:/usr/local/Cellar/pyenv/2.0.6/plugins/python-build/bin:/Users/lizre/.pyenv/shims:/Users/lizre/.pyenv/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin


### Install packages in the venv

`lib` &  `site-packages`: holds dependencies/packages you install in the venv:

In [34]:
!cd my_env/lib/python3.7/site-packages && ls

[34m__pycache__[m[m                 [34mpkg_resources[m[m
easy_install.py             [34msetuptools[m[m
[34mpip[m[m                         [34msetuptools-40.8.0.dist-info[m[m
[34mpip-19.0.3.dist-info[m[m


In [35]:
!source my_env/bin/activate && pip install numpy

Collecting numpy
  Using cached https://files.pythonhosted.org/packages/09/8c/ae037b8643aaa405b666c167f48550c1ce6b7c589fe5540de6d83e5931ca/numpy-1.21.5-cp37-cp37m-macosx_10_9_x86_64.whl
Installing collected packages: numpy
Successfully installed numpy-1.21.5
[33mYou are using pip version 19.0.3, however version 22.0.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Now numpy's there:

In [36]:
!cd my_env/lib/python3.7/site-packages && ls

[34m__pycache__[m[m                 [34mpip-19.0.3.dist-info[m[m
easy_install.py             [34mpkg_resources[m[m
[34mnumpy[m[m                       [34msetuptools[m[m
[34mnumpy-1.21.5.dist-info[m[m      [34msetuptools-40.8.0.dist-info[m[m
[34mpip[m[m


## Exit env

In [None]:
!deactivate

Because a venv is just a folder, to delete one, just delete its folder.

# An environment for everyone: Codespaces


## 🚀🚀🚀 
## TLDR;

Use devcontainers to set up VSCode in Codespaces with all the things you need, like an OS, Python, and your requirements. The codespace user doesn't have to install anything, or `run` and `build` Docker. You just open the codespace and are ready to code.

Minimal requirements to run a devcontainer are just `/.devcontainer`, a dockerfile and a `devcontainer.json`.

But here are all the components we'll cover: 

### A `/.devcontainer` 🫙 with:

1) a **dockerfile 🐳**: instructions of OS and Python and reuirements.txt to use in the Codespace
2) a **`devcontainer.json`** that says to use the Dockerfile, and sets some other stuff up
3) optionally, a **`on-create-command.sh`**

### As needed, secret/key access 🔑 
Configured on github.com Settings.

### Maintenance tools 
Like a playbook and unit tests for codespace-related files

### Optionally: 
- 🧪 **Tests** that codespace was built correctly 
- ⚙️ **VSCode settings** to, e.g., include certain extensions
- 🏗 **Prebuild**: makes building codespace faster



<br>

<br>

< end TLDR; >

## 🚀🚀🚀

<br>
<br>


# `/devcontainer` 🫙

Examples: [base](https://github.com/microsoft/vscode-dev-containers/blob/v0.222.0/containers/python-3/.devcontainer/base.Dockerfile), [security-advisory-filtering](https://github.com/github/security-advisory-filtering/commit/3938d544b471ee1f81e7e663ee14fd8714c90ac7), [airflow](https://github.com/github/airflow-sources/tree/master/.devcontainer), [actions-aml](https://github.com/github/actions-aml/blob/main/.devcontainer/Dockerfile); [teaching template](https://github.com/education/codespaces-teaching-template-py/blob/main/.devcontainer/Dockerfile)




## `/devcontainer` 🫙 component 1: dockerfile 🐳


Docker isolates not just the Python `site-packages`, but also the OS and the version of Python. 

[harnesslib one](https://github.com/github/harnesslib/blob/main/Dockerfile.harnesslib)


### Dockerfiles `build` into `image`s (templates), which `run` as `container`s 

First, you write instructions what you want to install: usually an OS (you can even run Windows on a mac!), dependencies, and your project. This is called a _dockerfile_. 🐳 📁 

Then you turn those instructions into a set of binary executable files: using `docker build` to _build_ the _dockerfile_ into an _image_ 🖼️.
- 💬 "image" is a metaphor for the idea of these executables being like a snapshot. Let's think of it more like an _environment template_.

When you `docker run` an _image_ it creates a _container_ -- so a _container_  🫙 is a running _image_.
- Why call it "container" instead of "environment"? Because environments are more complex; e.g., it also includes your hardware and networking. Even the containerized OS is a simplified one. So consider a container a type of environment.

Generating and running containers is all done by the Docker _engine_/"daemon"
- 💬 "daemon" is from mythology of a guardian entity. Let's prefer "engine".


<img width="621" alt="image" src="https://user-images.githubusercontent.com/38010821/227788568-c28acc56-6c71-4cad-93b0-a0b14021546f.png">

### Codespaces devcontainers automatically install & run the Docker engine, do `docker build` (dockerfile --> image) and `docker run` (image --> container)! 

### 📄 So all you need is to write dockerfile, like [this annotated example.](https://github.com/lizre/learn-py/blob/master/.devcontainer/Dockerfile) 


### Set a non-root user in your dockerfile or devcontainer

Every process on a computer is associated with a user account and its permissions to do actions and access resources.

The _root user_ has full permissions.
 
When a process is started, it has the permissions of the account that started it. 

☢️ So if you run a container as a root user, the container is now root user can do whatever it wants. Like access your personal documents.

Docker containers default to running as the _root user_.

But Codespaces defaults to a _non-root_ user.

👍 Still, it's considered best to **explicitly `useradd` a non-root user to your Dockerfile.** (but I can't figure out why. 🤔)

Alternatively, some examples add `"remoteUser": "vscode"` to the `devcontainer.json`. This is the default user that will be used when running the container. If not provided, the default user specified in the Dockerfile will be used.
Most examples seem to either do "useradd" in the Dockerfile, OR this remoteUser thing here, not both. There doesn't seem a clear advantage to which one.



# TODO: venv in docker file

- https://www.youtube.com/watch?v=qLvAHhJAVlI&list=PLmsFUfdnGr3wTl-NCblzcrEv2lFSX975-&index=15

examples that use venv in the docker file
- https://github.com/education/codespaces-teaching-template-py/blob/main/.devcontainer/Dockerfile
- https://github.com/microsoft/vscode-dev-containers/blob/v0.222.0/containers/python-3/.devcontainer/base.Dockerfile
    





## `/devcontainer` 🫙 component 2: devcontainer.json

`devcontainer.json` configures a codespace. It tells VS Code how to build and run the Docker container, using the Dockerfile as a template.

All Codespaces have a configuration. If you create one without a `devcontainer.json` file, Codespaces uses a [default configuration](https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/adding-a-dev-container-configuration/introduction-to-dev-containers#using-the-default-dev-container-configuration)

You can define multiple configurations.

### Write a `devcontainer.json` like [this annotated example](https://github.com/lizre/learn-py/blob/master/.devcontainer/devcontainer.json)

You need to refer to the dockerfile in the `devcontainer.json`, e.g., `"dockerfile": "Dockerfile"`. This means you can change the path to the Dockerfile. For example, if you want the Dockerfile to be in the root directory of the project instead of in `/.devcontainer`,, you can do  `"dockerfile": "../Dockerfile"`.

# With just the `Dockerfile` and `devcontainer.json`, you can run your devcontainer!

Put them in .devcontainer like [this](https://github.com/lizre/learn-py/tree/master/.devcontainer):

![image](https://user-images.githubusercontent.com/38010821/228025282-e68bf0fe-feb1-4a6b-ad3b-475dc1713a63.png)


Open a codespace. It'll say you're on a "custom image":

![image](https://user-images.githubusercontent.com/38010821/228026931-f331aa76-134b-423c-a236-6e5d9ff5100b.png)

And `pip list` will have the stuff from your `requirements.txt`:

![image](https://user-images.githubusercontent.com/38010821/228028817-92e3123c-0a97-4940-add1-e2d72e5e7d08.png)

In addition to some other stuff, maybe that comes with our base image.

## `/devcontainer` 🫙 component 3:  optional: `on-create-command.sh`

on-create-command.sh 
to automatically run various set up / install commands

https://github.com/github/airflow-sources/blob/master/.devcontainer/on-create-command.sh

https://github.com/github/airflow-sources/blob/777d30ba67f325b5fa72e8ded5f04fb70578362c/.devcontainer/devcontainer.json#LL74-L75

# Tests that codespace was set up correctly 🧪

## Things you might want to test

e.g., that pip is installed in the codespace.

## Ways to test them

### 1) Run each thing manually when developing devcontainer and/or when users build codespace

e.g., open the codespace and run "which pip".

### 2) Put tests in a [`.devcontainer/test_codespace.py`](https://github.com/lizre/learn-py/blob/master/.devcontainer/test_codespace.py), which user runs when they build the codespace

![image](https://user-images.githubusercontent.com/38010821/228383939-ed73b6c0-79d0-40dd-9f4d-6e3a1969ed36.png)


### 3) Automatically on every codespace creation, with on-create-command.sh? This doesnt't seem common but could be cool

[example](https://github.com/github/airflow-sources/commit/fb89004ac8c3848ff3062841ccea592adf9cec57)
- Add to `devcontainer.json`: `"onCreateCommand": ".devcontainer/on-create-command.sh",`
- Make an [on-create-command.sh](https://github.com/github/airflow-sources/blob/master/.devcontainer/on-create-command.sh)

But broke the codespace:

2023-03-28 17:46:16.792Z: /bin/sh: 1: .devcontainer/on-create-command.sh: Permission denied

and this

#16 0.472 chmod: cannot access '.devcontainer/on-create-command.sh': No such file or directory

potentially related:
- https://github.com/microsoft/vscode-remote-release/issues/5432
- https://stackoverflow.com/questions/38882654/docker-entrypoint-running-bash-script-gets-permission-denied


# As needed: Authentication 🔑

Sometimes in Codespaces you need to access sevices or resources outside the Codespace. Like Azure storage. To do that, you need to show that service or resource that you/the Codespace user are who you say you are. That's called _authentication_.

### authentication: verifying identity

Resources: [1](https://cloud.google.com/docs/authentication), [2](https://zapier.com/learn/apis/chapter-4-authentication-part-1/)

Different from authorization (permission to do things). 

_Principal_: an identity that can be granted access.
- users, services, apps. Your Codespace!

_Secret_: anything that you want to control access to. eg API keys, passwords.

_Credentials_: any info used to authenticate

_Password_ 
- is credential
- user-generated, stored in human memory, manually repeated with each use, and usually 1:1 with a human

_Key_
- is credential
- generated by API, usually used programmatically (in code, and used for non-human services)
- No standard way to include; sometimes add to URL, or put in request body, or auth header instead of username and pw.
- Oauth: automates key exchange so you dont have to type it out


### authenticate in a codespace by adding secrets


Go to https://github.com/lizre/learn-py/settings/secrets/codespaces, then "New repository secret"


<img width="681" alt="image" src="https://user-images.githubusercontent.com/38010821/228086774-243d5615-f982-4a7f-b4f4-bce41c804526.png">

<br>

Then it's there: 

<br>

<img width="621" alt="image" src="https://user-images.githubusercontent.com/38010821/228086807-47e63196-ceb0-4510-b01d-01871a795865.png">


Then it's available in your Codespace!:

![image](https://user-images.githubusercontent.com/38010821/228087171-7b00d845-7136-403c-ba8f-cd188a094025.png)



You can access it in jupyter notebook with `os.environ.get('NOT_SO_SECRET')`.

# Optional: VSCode settings ⚙️
    
VSCode settings are stored in a json:

In [3]:
!cat ~/Library/Application\ Support/Code/User/settings.json

{
    "[python]": {
        "editor.defaultFormatter": null,
        "editor.formatOnSave": true,
    },
    "python.testing.unittestEnabled": false,
    "python.testing.pytestEnabled": true,
    "python.defaultInterpreterPath": "python",
}

You can put similar json in your repo in `/.vscode/settings.json`: https://github.com/lizre/learn-py/tree/master/.vscode:

In [19]:
!cat ~/Downloads/learn-py/.vscode/settings.json

{
    "editor.fontSize": 35,
    "python.testing.pytestEnabled": true,
    "python.defaultInterpreterPath": "python",
}

See how I made the font size huge, 35!

Now put it in the repo:

            
<img width="664" alt="image" src="https://user-images.githubusercontent.com/38010821/228092487-c4825b03-2dd1-4a69-a553-5b74f1b57cc1.png">


Then when you build or rebuild the Codespace, it'll apply my huge font size:

<img width="722" alt="image" src="https://user-images.githubusercontent.com/38010821/228093857-a30a42e3-edb6-477a-94f8-7d550ead11f1.png">


In [None]:
Optional:

# Optional: prebuilds 🏗️

https://docs.github.com/en/codespaces/prebuilding-your-codespaces/about-github-codespaces-prebuilds

https://docs.github.com/en/codespaces/prebuilding-your-codespaces/configuring-prebuilds

TLDR; makes it faster to build a new codespace.

lower priority

# TODO:  


# Codespace maintenance and changes

### Write a maintenance playbook
Repos with codespace should include a maintenance playbook that lists:
1) the files related to the codespace
2) actions to take and when
3) common troubleshooting 

### Write tests that apply to codespace-related files

eg, this found errors in my test_codespace.py:
https://github.com/github/harnesslib/blob/569f894b77bafa2c762eec2948cb44907be1983d/.github/workflows/notebook-integration-test.yaml

https://github.com/github/harnesslib/blob/569f894b77bafa2c762eec2948cb44907be1983d/.github/workflows/simple-batch-integration-test.yaml


Devcontainer updates when you create a codespace or rebuild the container. Use VS Code Command Palette (Shift+Command+P) --> `Codespaces: Rebuild Container`.

[each push to a branch that has a prebuild configuration results in a GitHub-managed GitHub Actions workflow run to update the prebuild.](https://docs.github.com/en/codespaces/prebuilding-your-codespaces/about-github-codespaces-prebuilds#about-pushing-changes-to-prebuild-enabled-branches)

Keeping keys updated
- eg https://github.com/github/harnesslib/issues/229

