# The Missing Semester to Your CS Education
[link](https://missing.csail.mit.edu/)

## Lecture 1: The Shell

Shell: bash
\[name\]@\[machine\] \[path\]

The shell looks for programs through environment variables. In Mac and Linux, it's just a root directory. In Windows, there are several drive partitions, and it depends which partition you're in, e.g. C and D drive.
- `echo $SHELL`
- `cd -` toggles between directories
- `ls -l` full info: directory permissions `chmod`
- `<` and `>`, `>>` pushes to the location
- root user (user id 0): the super-user, use `sudo`
- `#` means root user
- `echo 1060 | sudo tee brightness`: man changes his screen brightness
- `curl --head --silent google.com | grep --ignore-case content-length | cut --delimiter=' ' -f2`
- `find -L ./datasets -maxdepth 4 -name '*1991*'`

Where is conda?
- `conda`: No such file or directory
- `which conda`: /c/Users/vhli2/anaconda3/Scripts/conda
- Issue seems to be with the Git Bash integration w/ VSCode. I may have multiple versions of Git Bash.
- `conda` installation / activation works fine with command prompt and Git Bash outside VSCode

## Lecture 2
- Variable assignment: `foo=bar`
- Executing the program `foo`: `foo = bar`
- Single quotes = print verbatim, double quotes = evaluate. `echo "$foo"` (actually substitutes) vs `echo '$foo'`
- `tmp/missing/mcd.sh`
- `$0` to `$9`
- `$#` - number of arguments
- `$_`: access last argument of last command
- `!!`: execute the last command, can do `sudo !!` if permission denied
- `$@`, `$?`, `$$`, `sudo !!`, `$_`

Storing results in variables
- **Command substitution**: `echo "We are in $(pwd)"`
- `for file in $(ls)`
- Show this and parent directory: `cat <(ls) <(ls ..)`
- **Process substitution**: `<( CMD )`: `diff <(ls foo) <(ls bar)` shows the difference btwn files in directories
- Blobbing: `ls *.sh`, ls anything that has .sh as the end
- `rm project?`: rm anything w/ project and one more character
- `echo foo{,1,2,10}`: same as `echo foo foo1 foo2 foo10`
- Convert png to jpg: `convert image.{png,jpg}`

Python
- In the first line, `#!/usr/bin/env python` tells the bash to run the Python script using the Python program

find is OP
```
# Find all directories named src
find . -name src -type d
# Find all python files that have a folder named test in their path
find . -path '*/test/*.py' -type f
# Find all files modified in the last day
find . -mtime -1
# Find all zip files with size in range 500k to 10M
find . -size +500k -size -10M -name '*.tar.gz'
```

```
# Delete all files with .tmp extension
find . -name '*.tmp' -exec rm {} \;
# Find all PNG files and convert them to JPG
find . -name '*.png' -exec convert {} {}.jpg \;
```

locate: `locate` uses a compiled some sort of index / database for quickly searching

`tree`

The `xargs` command executes a command using STDIN as arguments. For example, `ls | xargs rm` deletes the files in the current directory.


wsl
- `sudo su` to go into admin mode
- create aliases for python, pip, and conda in the `~/.bashrc` script

## Lecture 3

normal mode
- normal \<ESC\> <--> i insert
- R replace mode
- V visual mode
- S-V visual-line
- C-V visual-block
- : command-line mode 
- ^V = Ctrl-V = \<C-V\>

counts, modifiers

# Lecture 4: Data Wrangling

- Regex
- `sed`: wrangles data based on a Regex
- `sort`, `uniq`: sort, unique
- `awk`: columnar operations on data

Two types of wrangling
- Command-line wrangling: something produces a list of arguments, can run through xargs on each argument
- Binary data wrangling: videos, images, etc.

# Lecture 5: Command-line Environment
- Job control
- Terminal multiplexers
- Dotfiles
- Efficiently work with remote machines

Job control: signals that can be sent
- `SIGHUP` - terminal line hangup (i.e. delete terminal)
- `SIGINT` - interrupt a program (Ctrl-C)
- `SIGQUIT` - quit program
- `SIGTERM` - software termination signal
- My `^\` doesn't work

Terminal multiplexers: `tmux`
- Three core concepts
- Sessions
- Sessions have windows (like tabs)
- Windows have panes

Dotfile
- Aliases: command built into the shell that remaps a source sequence of characters into a longer sequence
- Default flag ex.: `alias ll="ls -lah"`
- Shorten long strings ex.: `alias gs="git status"`
- `alias ll`: prints out what the alias is
- How do you persist the aliases in your current environment? 
- `~/.bashrc` and `~/.vimrc` are configuration files
- Can search **dotfiles** on github
- Pro tip: Create a `dotfiles` folder in the home directory. Create symbolic links, **symlinks**, in the default `~/.bashrc` and `~/.vimrc` files that link to files in the `dotfiles` folder

Remote machines: `ssh`, secure shell


# Lecture 6: Version-Control System
- Git models history as a collection of files and folders within a level of some top-level directory (root). 
- Git uses a **directed acyclic graph** to model history.
- Parents are the previous snapshot. When you **merge**, the node points back to the snapshots that creates the merge.

What data structure underlies git?

Pointers
- `type blob = array<byte>`
- `type tree = map<string, tree|blob>`
- `type commit = struct { parents: array<commit>, author: string, message: string, snapshot: tree }`

Actual data
- `type object = blob|tree|commit`
- `objects = map<string, object>`
- A store function uses sha1 to compute an id and stores the `object` into `objects`. A load function returns the `object` from an `id` key.
- `references = map<string, string>` allow naming a snapshot/node.

Some commands
- `git checkout`: lets you move around in version history
- `git diff`: compare differences

# Lecture 7, Debugging and Profiling

### Debugging
- Python: `mypy` and `flake8`, newer tools?
- `pdb`

### Profiling
- Real time: wall clock elapsed time from start to finish of the program, including the time taken by other processes and time taken while blocked (e.g. waiting for I/O or network)
- User: amount of time spent in the CPU running user code
- Sys: amount of time spent in the CPU running kernel code
- Generally **User + Sys** tells you how much time your process actually spent in the CPU

Most of the time when people refer to profilers they actually mean *CPU profilers*. Two types
- Tracing: keep a record of every function call your program makes
- Sampling: probe your program periodically (e.g., every millisecond) and record the program's stack
- In Python, use the `cProfile` module to profile time per function call, e.g. `python -m cProfile -s tottime grep.py`

Other debuggers/profilers
- `perf` reports events outside the code, e.g., stalled cycles, page faults, etc.
- flame graph, call graph

Resource monitoring
- `htop` to monitor system usage
- `hyperfine`

# Lecture 8, Metaprogramming
The process surrounding programming.

### Build systems
If you write a paper in LaTeX, what are the commands you need to run to produce your paper? What about the ones used to run your benchmarks, plot them, and then insert that plot into your paper?

You're writing XXX, and you have a sequence of commands that you have to run to build it. A build system encodes these commands into a tool that you can use.

- Targets: things you want to build
- Dependencies: things that you need to build the targets
- Rules: how to go from dependencies to target

### `make`
- `make` is a tool that can be found in essentially any system. Not great to very complex software. Great for simple or medium-complexity software.
- `make` searches in the current directory for a file called `Makefile`

In make, `%` means wildcard.

### Versioning
- `8.1.7`: major, minor, patch
- Patch: if the change you made is entirely backwards-compatible; externally, nothing changes; e.g. security fixes
- Minor: add something, next is `8.2.0`, code dependencies should still work on these
- Major: you make a backwards-incompatible change such as moving/renaming a function
- Lock files: a file that lists the exact version you are currently depending on of each dependency
- Another reason is to get reproducible builds.
- Vendoring: you copy all the coe of your dependencies into your own project; means you have to explicitly pull in any updates from the upstream maintainers over time

### Continuous integration systems
- A cloud-build system
- Project stored in the internet somewhere; set up a service that runs an ongoing service that uses the project; e.g., release a library to PyPI whenever you push to a branch, run a test suite whenever someone submits a pull request, check code style whenever you commit. These are **event-triggered actions**.
- e.g., class website is set up using GitHub Pages. Every push to master runs the Jekyll blog software and makes the built site available on a particular GitHub domain.
- `dependabot`: updates dependency file
- Test suite: all of the tests in the program
- Unit test: small, self-contained test on a single feature, a "micro-test"
- Integration test: test interactions between subsystems of a program
- Regression test: test things that are broken in the past
- Mocking: replace a module with a dummy version


# Lecture 10: Potpourri
- Keyboard re-mapping
- Daemons: tools running in the background, generally ends in `d`, e.g. `ssh` on a remote server
- `crond` already runs scheduled task, use `cron`
- FUSE: file systems in user space
- Backups
- API: JSON, curl, authentication token (OAuth) - look at documentation
- Command-line arguments
- Floating window manager
- VPNs: change internet service provider, negative view
- Markdown: works on Facebook messenger?
- Hammerspoon: auto-instantiate your windows layout, auto-mute speaker
- Booting + Live USBs
- Docker, Vagrant, VMs, Cloud OpenStack
- Notebook programming
- GitHub

# Lecture 9, Security and Cryptography


# Lecture 10/11: Potpourri and Q&A