# Lecture 2: Developent Tools

How to "read" this lecture notebook
<details>
<summary>click to expand</summary>

As you go through this notebook (or any notebook for this class), you will encounter new concepts and code that implements them -- just like you would see in a textbook. Of course, in a textbook, it's easy to read code and an explanation of what it does and think that you understand it.
<br />

### Learn by doing
But this notebook is different from a textbook because it allows you to not just read the code, but play with it. **You can and should try out changing the code that you see**. In fact, in many places throughout this reading notebook, you will be asked to write your own code to experiment with a concept that was just covered. This is a form of "active reading" and the idea behind it is that we really learn by **doing**. 
<br />

### Change everything
But don't feel limited to only change code when I prompt you. This notebook is your learning environment and your playground. I encourage you to try changing and running all the code throughout the notebook and even to **add your own notes and new code blocks**. Adding comments to code to explain what you are testing, experimenting with or trying to do is really helpful to understand what you were thinking when you revisit it later. 
<br />

### Make this notebook your own
Make this notebook your own. Write your questions and thoughts. At the end of every reading notebook, I will ask the same set of questions to try to elicit your questions, reaction and feedback. When we review the reading notebook in class, I encourage you to share!

</details>

## Learning Objectives

Before we get rolling, we have to install tools, ensure your laptops have the right environment and access to the course code. But you don't yet have access to the instructions I'm showing you now.

We'll literally have to pull ourselves up by our own bootstraps.

By the end of this lecture, you will be able to:
- Master essential Git commands for version control and collaboration
- Install Python properly and set up virtual environments for reproducible development
- Use some essential terminal/command-line operations
- Learn how to to code and run applications remotely with VS Code tunnels
- Install Docker on local machines (preparation for later lecture)

# 2.0 Why Development Tools Matter
<img alt="Development tools are like Doc Brown's garage - essential for building great things" src="../images/L02_bttf_tools.png" style="display:block;">

In *Back to the Future*, Doc Brown had a whole garage full of specialized tools to build the DeLorean time machine. Sure, the flux capacitor got all the glory, but without the right wrenches, soldering irons, and testing equipment, that DeLorean would never have hit 88 miles per hour.

The same is true for machine learning. You can have the most brilliant algorithm in the world, but if you can't:
- **Track your changes** (and undo disasters)
- **Work with virtual envs in Python** and switch between them
- **Navigate the command line** efficiently
- **Work on powerful remote machines** when your laptop isn't enough

...then you're going to have a bad time. These are the fundamental tools that professional data scientists and ML engineers use every single day.



# 2.1 Git and GitHub Basics
<img alt="Git is like a time machine for your code" src="../images/L02_git_time_machine.png" style="display:block;">

Git is a **version control system** - think of it as a time machine for your code. Every time you make a "commit," you're creating a snapshot that you can return to later. Made a terrible mistake? No problem - just go back in time!

**GitHub** is a website that hosts Git repositories online, making it easy to:
- Back up your code in the cloud
- Collaborate with others
- Share your work with the world

## Installing Git

First, make sure Git is installed on your machine:

**Windows**: Download from https://git-scm.com/download/windows and run the installer. Accept the defaults.

**macOS**: Git comes with Xcode Command Line Tools. Install by running `xcode-select --install` in Terminal, or download from https://git-scm.com/download/mac

**Linux**: Use your package manager:
```bash
sudo apt install git  # Ubuntu/Debian
sudo dnf install git  # Fedora
```

Verify installation:
```bash
git --version
```

## The Basic Git Workflow

Here's the typical workflow you'll use hundreds of times:

1. **Clone** a repository (get a copy on your machine)
2. **Make changes** to files
3. **Stage** the changes you want to save
4. **Commit** those changes with a message
5. **Push** your commits to GitHub

Let's look at each of these commands.

## Setting Up the Course Repository

For this class, you'll use a **fork and clone** workflow. This gives you your own copy of the course materials where you can take notes, complete exercises, and track your work - while still being able to pull in updates I release throughout the semester.

We'll talk about the git commands, but let's first setup your repo for the course. You can run the commands in the terminal in the steps below, even if you don't understand them yet.

### Step 1: Fork the Repository

1. Go to the course repository on GitHub: `https://github.com/dylanwalker/bus675_s26`
2. Click the **Fork** button in the top-right corner
3. This creates a copy under your own GitHub account

### Step 2: Clone Your Fork
Open a terminal and run:

```bash
# Change to the directory where you keep your code 
cd ~/Code

# Clone YOUR fork (not the original!)
git clone https://github.com/YOUR-GITHUB-USERNAME/bus675_s26.git

# Navigate into the directory you just cloned
cd bus675_s26
```


### Step 3: Add the Upstream Remote

Next we will connect your local repo to my original again, by adding my original as another remote. The idea is that you will pull that when I add new files to the course repo. If I don't screw this part up, it should work perfectly. If not, you might get merge conflicts, but we'll talk about how to deal with them if they arise.

First, we'll add the course repo as a new remote for YOUR fork:
```bash
# Add a new remote called "upstream"
git remote add upstream https://github.com/dylanwalker/bus675_s26.git
```

Now, we'll verify the remotes:
```bash
# Verify that you have both origin (your fork remote) and upstream (original course repo remote).
git remote -v
```


## Checking Status

The most useful command when you're not sure what's going on:

```bash
git status
```

This tells you:
- Which branch you're on
- Which files have been modified
- Which files are staged for commit
- Which files are untracked (new files Git doesn't know about yet)

**Pro tip**: Run `git status` frequently. It's like checking your mirrors while driving - do it often!

## Staging and Committing Changes

After you've modified files, you need to tell Git which changes to save. This is a two-step process:

**Step 1: Stage the changes**
```bash
# Stage a specific file
git add filename.py

# Stage multiple files
git add file1.py file2.py

# Stage all changes in the current directory
git add .
```

**Step 2: Commit with a message**
```bash
git commit -m "Add feature to calculate model accuracy"
```

The `-m` flag lets you write the commit message inline. Without it, Git will open a text editor for you to write a longer message.

## Writing Good Commit Messages

Your future self (and your collaborators) will thank you for writing clear commit messages. Here are some guidelines:

**Good commit messages:**
- `"Fix bug in data preprocessing that caused NaN values"`
- `"Add random forest model with hyperparameter tuning"`
- `"Update README with installation instructions"`

**Bad commit messages:**
- `"fixed stuff"` (What stuff?)
- `"changes"` (What changes?)
- `"asdfasdf"` (Come on, really?)

Think of it this way: if you need to find this change six months from now, what would help you find it?

## Pushing to GitHub

After committing locally, push your changes to GitHub:

```bash
git push
```

If it's your first push to a new branch:
```bash
git push -u origin main
```

The `-u` flag sets up tracking so future pushes can just use `git push`.

## Pulling Changes

To get the latest changes from GitHub (made by you on another computer, or by collaborators):

```bash
git pull
```

**Important**: Always pull before you start working! This helps avoid merge conflicts.

A typical day might look like:
1. `git pull` - Get any new changes
2. Do your work
3. `git add .` - Stage changes
4. `git commit -m "Your message"` - Commit
5. `git push` - Push to GitHub


In this course, at the start of most classes, I will instruct you to run:

`git pull upstream main`

This will pull all of the new files I added for the lecture that day.

<!-- Start Exercise 2.1 -->
<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: Set Up Your Course Repository </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">

1. Make sure you have forked the course repo, cloned it to a local folder on your laptop, and added the course repo as a remote (as instructed above).
2. Create a new file called `yourname_setup.txt` with a brief message confirming setup.
3. Stage, commit, and push your change to your fork.
4. Verify your change appears on GitHub.
</div>

<hr/>
<!-- End Exercise 2.1 -->

## Viewing History

To see the history of commits:

```bash
# See commit history
git log

# See a condensed one-line-per-commit history
git log --oneline

# See the last 5 commits
git log -5
```

To see what changed in a specific file:
```bash
git log -p filename.py
```

## Basic Branching
<img alt="Branches are like alternate timelines" src="../images/L02_bttf_alt_timeline.png" style="display:block;">
<font size=2>Doc Brown explains alternate timelines in <i>Back to the Future Part II (1989)</i></font>

Branches let you work on features or experiments without affecting the main codebase. Think of it like creating an alternate timeline (very *Back to the Future Part II*). They can also be useful to let AI agents contribute to your codebase in a controlled manner that allows for human review of their work.

```bash
# See all branches
git branch

# Create a new branch
git branch experiment-new-model

# Switch to that branch
git checkout experiment-new-model

# Or create and switch in one command
git checkout -b experiment-new-model
```

When your experiment works, you can merge it back into the main branch. If it doesn't work, you can just delete the branch - no harm done to your main code!



## Handling Merge Conflicts

Sometimes when you pull, Git will tell you there's a **merge conflict** - this means someone else changed the same lines you changed. Don't panic! Here's what you'll see:

```
<<<<<<< HEAD
your version of the code
=======
their version of the code
>>>>>>> branch-name
```

To resolve:
1. Open the file and look for `<<<<<<<`, `=======`, and `>>>>>>>`
2. Decide which version to keep (or combine them)
3. Delete the conflict markers
4. `git add` the file
5. `git commit`

The best way to handle conflicts? **Avoid them** by pulling frequently and communicating with your team!

# 2.2 Python Installation and Virtual Environments

Now that you have the course repository set up, let's make sure you have Python configured correctly. This is foundational - everything else depends on having a properly configured Python environment.

## Installing Python (The Right Way)

**Important**: Do NOT use your operating system's pre-installed Python! Here's why:
- macOS and Linux come with Python, but it's often outdated
- System Python is used by your OS - messing with it can break things
- You want control over your Python version and packages

### Windows
1. Go to https://www.python.org/downloads/
2. Download the latest Python 3.x (3.12 recommended)
3. **IMPORTANT**: Check "Add Python to PATH" during installation
4. Choose "Customize installation" and ensure pip is included

### macOS
The easiest method is using [Homebrew](https://brew.sh/) (an awesome package manager for macOS). Run the following command from a terminal:
```bash
# Install Homebrew if you haven't already
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Python
brew install python@3.12
```

Alternatively, download from https://www.python.org/downloads/

### Linux
```bash
# Ubuntu/Debian
sudo apt update
sudo apt install python3.12 python3.12-venv python3-pip
```

Verify your installation:
```bash
python --version   # or python3 --version
# Should show Python 3.12.x or similar
```

## Why Virtual Environments?

Imagine you're working on two projects:
- Project A needs pandas version 1.5
- Project B needs pandas version 2.0

Without virtual environments, you'd have a conflict. Virtual environments solve this by creating isolated Python installations for each project.

Think of it like having separate toolboxes for different jobs - your woodworking tools don't get mixed up with your electrical tools.

## Creating and Using Virtual Environments

Python's built-in `venv` module makes this easy:

```bash
# Create a virtual environment
python -m venv myenv

# Activate it
# On Windows:
myenv\Scripts\activate

# On macOS/Linux:
source myenv/bin/activate

# Your prompt will change to show the active environment:
# (myenv) $
```

Once activated, any packages you install go into that environment only:

```bash
# Install packages
pip install pandas numpy scikit-learn

# See what's installed
pip list

# Save your dependencies to a file
pip freeze > requirements.txt

# Later, recreate the environment from that file
pip install -r requirements.txt
```

To deactivate when you're done:
```bash
deactivate
```

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

**Pro Tip: Shared Environments**

Virtual environments can take up a lot of space (pandas, sklearn, pytorch, etc. are big!), particularly when you have one for each project. If you typically require the same set of Python libraries across many projects, you can create a common virtual environment and reuse it.

Create a directory to keep your virtual environments (e.g., `C:\Users\your_username\.venvs\` on Windows or `~/.venvs/` on Mac/Linux), then create a shared one there:

```bash
python -m venv ~/.venvs/base_ml
```

You can set this as your active Python or Jupyter kernel from within VS Code.
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

# 2.3 Essential Terminal Commands

The terminal (also called command line, shell, or console) is your direct line of communication with the computer. While clicking through folders works fine for basic tasks, the terminal lets you:

- Work much faster once you know the commands
- Automate repetitive tasks
- Work on remote servers that don't have graphical interfaces
- Do things that simply aren't possible with a mouse

<img src="../images/L02_arnie_boots.png" width=600px style="display:block;">
<font size=2>Arnie getting some clothes in <i>Terminator 2: Judgement Day (1991)</i></font>

As Arnold said in *The Terminator*: "I need your clothes, your boots, and your motorcycle." Well, we need `ls`, `cd`, and `grep`. Let's go!

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

**Windows Users**: 
Before you proceed, you can (and should) install Windows Subsystem for Linux (wsl). Install it with `wsl --install` and then a distribution (Ubuntu recommneded) using `wsl --install ubuntu`. Start it by calling `wsl -d ubuntu`
  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

## Navigation Commands

**`pwd`** - Print Working Directory (Where am I?)
```bash
pwd
# Output: /home/username/projects
```

**`ls`** - List directory contents (What's here?)
```bash
ls           # Basic listing
ls -l        # Long format (shows permissions, size, date)
ls -la       # Long format including hidden files
ls -lh       # Long format with human-readable sizes
```

**`cd`** - Change Directory (Go somewhere else)
```bash
cd projects           # Go into 'projects' folder
cd ..                 # Go up one level
cd ~                  # Go to home directory
cd /                  # Go to root directory
cd -                  # Go back to previous directory
```

## File Operations

**`mkdir`** - Make a new directory
```bash
mkdir new_project
mkdir -p projects/ml/experiment1  # Create nested directories
```

**`cp`** - Copy files or directories
```bash
cp file.txt backup.txt           # Copy a file
cp -r folder1 folder2            # Copy a directory (-r for recursive)
```

**`mv`** - Move or rename files
```bash
mv old_name.txt new_name.txt     # Rename a file
mv file.txt ../other_folder/     # Move to another location
```

**`rm`** - Remove files or directories
```bash
rm unwanted_file.txt             # Delete a file
rm -r unwanted_folder            # Delete a directory and contents
rm -rf folder                    # Force delete (be VERY careful!)
```

‚ö†Ô∏è **Warning**: Unlike the Recycle Bin, `rm` is permanent! There's no undo. Double-check before you hit Enter.

## Viewing File Contents

**`cat`** - Display entire file contents
```bash
cat data.csv
```

**`head`** - Show first lines of a file
```bash
head data.csv          # First 10 lines (default)
head -20 data.csv      # First 20 lines
```

**`tail`** - Show last lines of a file
```bash
tail data.csv          # Last 10 lines
tail -f logfile.log    # Follow file in real-time (great for logs!)
```

**`less`** - View file with scrolling
```bash
less large_file.txt    # Press 'q' to quit, arrow keys to scroll
```

## Text Processing with `grep`

`grep` is one of the most powerful commands - it searches for patterns in files.

```bash
# Find lines containing "error" in a log file
grep "error" logfile.txt

# Case-insensitive search
grep -i "ERROR" logfile.txt

# Search recursively in all files in a directory
grep -r "import pandas" .

# Show line numbers
grep -n "def train_model" *.py

# Count matching lines
grep -c "WARNING" logfile.txt
```

## The Power of Piping (`|`)

Here's where the terminal gets really powerful. The pipe character `|` takes the output of one command and feeds it as input to another. You can chain commands together like LEGO blocks!

```bash
# Show first 10 lines of a CSV, then find rows with "California"
cat sales_data.csv | head -100 | grep "California"

# Count how many Python files mention "pytorch"  (wc stands for 'word count'; -l counts lines)
grep -l "pytorch" *.py | wc -l

# Find all unique values in the second column of a CSV
cat data.csv | cut -d',' -f2 | sort | uniq

# Show the 10 largest files in a directory
ls -lS | head -10
```

This is like being able to say: "Hey computer, do this, THEN do this, THEN do this" - all in one line!

## Getting Help

**`man`** - Manual pages (documentation)
```bash
man ls       # Full documentation for ls command
man grep     # Full documentation for grep
```

**`--help`** - Quick help for most commands
```bash
ls --help
grep --help
```

On macOS, you might need to use `info` instead of `man` for some commands.

## Superuser Privileges with `sudo`

Some operations require administrator ("superuser") privileges:

```bash
# Install software (Linux)
sudo apt install some-package

# Edit a system file
sudo nano /etc/hosts
```

You'll be prompted for your password. Use `sudo` carefully - you can do serious damage to your system with superuser privileges!

As Uncle Ben (not the rice, the Spider-Man uncle) said: "With great power comes great responsibility."

## Environment Variables

Environment variables are like global settings for your terminal session:

```bash
# View all environment variables
env

# View a specific variable
echo $PATH
echo $HOME

# Set a variable for this session
export MY_API_KEY="secret123"

# Use it in a command
echo $MY_API_KEY
```

The `PATH` variable is especially important - it tells the terminal where to look for programs. If you ever see "command not found," it might be a PATH issue.

<!-- Start Exercise 2.2 -->
<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: Terminal Treasure Hunt (4 min) </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">

Use terminal commands to answer these questions about the course repository.

1. How many `.png` files are in the `images` directory? (Hint: use `ls` and `wc -l`)
2. What are the last 2 lines of `readme.md`?
3. How many lines in total are in that readme file?
4. Use `grep` and piping to find the third line mentioning 'learn' in lecture 1. (hint: use `grep`, `head` and `tail`)

</div>

Input your answers here:

1.  
2. 
3.
4.

<hr/>
<!-- End Exercise 2.2 -->

# 2.4 Remote Development

<img alt="Remote development lets you use powerful servers from your laptop" width="650" src="../images/L02_wargames.png" style="display:block;">
<font size=2>Matthew Broderick remote-accessing a government war simulation AI in <i>Wargames (1983)</i></font>

Sometimes your laptop just isn't enough. Maybe you need:
- A powerful GPU for training deep learning models
- More RAM for large datasets
- A Linux environment for certain tools
- To keep a long-running job going without tying up your laptop

This is where remote development comes in. You write code on your laptop, but it runs on a powerful server somewhere else.



## SSH: Secure Shell

SSH is the standard way to connect to remote servers:

```bash
# Basic connection
ssh username@server.address.com

# With a specific port
ssh -p 2222 username@server.address.com

# Using an SSH key (more secure than passwords)
ssh -i ~/.ssh/my_key.pem username@server.address.com
```

Once connected, you have a terminal on the remote machine. Everything you type runs on the server, not your laptop.

<div class="callout" style="
  width: 80%;
  background: rgba(127,127,127,0.15);
  border: 1px solid rgba(127,127,127,0.3);
  padding: 10px 30px;
  margin: 20px;
  border-radius: 6px;
  text-align: justify;
  text-align-last: left;
  font-size: 11pt;
">
  <span style="
    font-size: 60pt;
    line-height: 1;
    float: left;
    margin: 0px 0px 0px 0;
  ">
    üí°
  </span>

**Edit text in a terminal**: 
Sometimes you're ssh'd into a server and you need to edit a file from the terminal. You may not have access to a GUI (Graphical User Interface), so you need something that works in the terminal. There are more opinions than ice cream flavors, but I like `nano` as a text editor for its simplicity.  
  <!-- clearfix -->
  <div style="clear: both;"></div>
</div>

## Transferring Files with `scp`

`scp` (secure copy) transfers files between your machine and a remote server.

The syntax is: `scp [source] [destination]` where remote paths include `username@server:`

Here's how you can use it:

```bash
# Copy a file TO a remote server
scp local_file.txt username@server:/path/to/destination/

# Copy a file FROM a remote server
scp username@server:/path/to/file.txt ./local_folder/

# Copy an entire directory
scp -r my_project/ username@server:/home/username/projects/
```



<!-- Start Exercise 2.3 -->

<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: SSH and SCP practice </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">

We have access to `students.chapman.edu`.  Let's ssh into this. One weird quirk is that our username is our chapman email (which has an `@` in it) and we also use `@` to indicate username for the server, so our ssh command will look like this:

```bash
ssh youremail@chapman.edu@students.chapman.edu
```

1. Open up a terminal and try to ssh into `students.chapman.edu` (leave this terminal open)
2. Make the file `test.txt` somewhere on your local machine
3. From your local machine, use `scp` from a terminal to copy test.txt to your home directory on `students.chapman.edu`
4. Go back to the terminal with your open ssh session and verify that `test.txt` transferred. 

</div>

<hr/>
<!-- End Exercise 2.3 -->

## VS Code Remote Development

Here's where things get really cool. VS Code can connect directly to a remote server, giving you:
- Full editor experience (syntax highlighting, autocomplete, debugging)
- File explorer for the remote machine
- Integrated terminal running on the server
- Extensions running remotely

It feels like you're working locally, but everything runs on the server!

### Setting Up VS Code Tunnels

VS Code tunnels create a secure connection without needing to configure SSH keys or firewalls. Don't do it yet, but here are the steps you would take:

**On the remote server:**
```bash
# Download and run the VS Code CLI
curl -Lk 'https://code.visualstudio.com/sha/download?build=stable&os=cli-alpine-x64' --output vscode_cli.tar.gz
tar -xf vscode_cli.tar.gz

# Start the tunnel
./code tunnel
```

**On your laptop:**
1. Open VS Code
2. Install the "Remote - Tunnels" extension
3. Click the remote icon in the bottom-left corner
4. Select "Connect to Tunnel"
5. Sign in with your GitHub (preferred) or Microsoft account
6. Select your tunnel from the list

Now you're able to edit and run code on the remote server with your full VS Code setup on your local machine!

I'll demonstrate what this looks like in practice, then you'll set it up yourself in a couple minutes.

## Managing Remote Python Environments

When working remotely, you'll use the same virtual environment workflow we covered earlier - just on the remote server:

```bash
# Create and activate a virtual environment on the remote server
python -m venv myenv
source myenv/bin/activate  # Linux/Mac (most servers are Linux)

# Install from your requirements file
pip install -r requirements.txt
```

**Pro tip**: Keep your `requirements.txt` in version control so you can easily recreate your environment on any server!

## Running Long Jobs

What if you want to disconnect from the server but keep your job running? Use `nohup` or `screen`/`tmux`:

We won't cover this in detail, but I want you to know these tools exist.

**Using `nohup`:**
```bash
# Run a script that continues after you disconnect
nohup python train_model.py > output.log 2>&1 &
```

**Using `tmux` (recommended):**
```bash
# Start a new tmux session
tmux new -s training

# Run your command
python train_model.py

# Detach (Ctrl+B, then D)
# You can now disconnect from SSH safely

# Later, reattach to see progress
tmux attach -t training
```

`tmux` is like having multiple terminal windows that persist on the server, even when you're not connected.

## Best Practices for Remote ML Workflows

1. **Always use version control** - Your code should be in Git
2. **Use environment files** - `requirements.txt` or `environment.yml`
3. **Log everything** - Save outputs to files, not just the screen
4. **Use configuration files** - Don't hardcode paths or parameters
5. **Test locally first** - Debug on a small sample before running big jobs remotely
6. **Monitor resource usage** - Use `htop`, `nvidia-smi` for GPU
7. **Clean up after yourself** - Delete temporary files and old environments

<!-- Start Exercise 2.4 -->

<hr/>
<img src="../images/stop_right_margin.png" align="left">

<font size=3 color="darkred"> Exercise: Remote Connection Practice </font>
<div class="inclass_exercise_body" style="padding-left: 130px; width: 85%; text-align: justify;text-align-last: left;">

Using `students.chapman.edu` let's set up a VS Code tunnel (this is not a machine we'll actually use for coding, it's just convenient for the demonstration).

1. Set up a VS Code tunnel connection on `students.chapman.edu` by following the instructions above (you will have to ssh into `students.chapman.edu` and run the commands to download and install Code on that server)
2. Ensure the remote tunnels connections extension is installed in vscode on your local machine
3. Open a new VS Code window on your laptop and connect to your tunnel

</div>

<hr/>
<!-- End Exercise 2.4 -->

# 2.5 Docker Installation (Preparation)
<img alt="Docker containers are like shipping containers for software" width="700" src="../images/L02_docker_shipping.png" style="display:block;">
<font size=2>image credit: grzegorz petrykowski</font>

Docker is a tool that lets you package applications with all their dependencies into "containers." Think of it like a shipping container - everything the app needs is inside, and it runs the same way on any machine.

We'll use Docker extensively later in the course for:
- Creating reproducible ML environments
- Deploying models
- Running databases and other services

For now, let's just get it installed.



## Installing Docker Desktop

**Windows:**
1. Go to https://www.docker.com/products/docker-desktop
2. Download Docker Desktop for Windows
3. Run the installer
4. You may need to enable WSL 2 (Windows Subsystem for Linux)
5. Restart your computer

**macOS:**
1. Go to https://www.docker.com/products/docker-desktop
2. Download Docker Desktop for Mac (choose Apple Silicon or Intel)
3. Drag to Applications folder
4. Open Docker Desktop and complete setup

**Linux:**
```bash
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install docker.io
sudo systemctl start docker
sudo systemctl enable docker
```

## Verifying Your Installation

Open a terminal and run:

```bash
docker --version
```

You should see something like:
```
Docker version 24.0.6, build ed223bc
```

Try running a test container:
```bash
docker run hello-world
```

If you see a friendly message from Docker, you're all set! We'll dive deeper into Docker in a later lecture.

# 2.6 Summary and Key Takeaways

Today we covered the essential development tools that you'll use throughout your ML career:

**Git & GitHub:**
- Fork, clone, and add upstream - the course repo workflow
- `add`, `commit`, `push`, `pull` - the daily workflow
- Write meaningful commit messages
- Use branches for experiments
- Pull before you start working!

**Python & Virtual Environments:**
- Install Python from python.org (not the system version!)
- Use virtual environments to isolate project dependencies
- `requirements.txt` for reproducibility

**Terminal Commands:**
- Navigation: `pwd`, `ls`, `cd`
- File operations: `cp`, `mv`, `rm`, `mkdir`
- Viewing files: `cat`, `head`, `tail`, `less`
- Searching: `grep`
- Piping commands with `|` for powerful workflows

**Remote Development:**
- `ssh` for connecting to servers
- `scp` for transferring files
- VS Code tunnels for a great remote coding experience
- `tmux` for persistent sessions

**Docker:**
- Installed and ready for later use

These tools might feel awkward at first, but with practice they'll become second nature. As Doc Brown would say: "If you put your mind to it, you can accomplish anything!"

# Quick Reference Cheat Sheet

| Task | Command |
|------|--------|
| Clone a repo | `git clone <url>` |
| Check status | `git status` |
| Stage all changes | `git add .` |
| Commit | `git commit -m "message"` |
| Push | `git push` |
| Pull | `git pull` |
| List files | `ls -la` |
| Change directory | `cd <folder>` |
| Copy file | `cp source dest` |
| Move/rename | `mv old new` |
| Delete | `rm file` |
| View file | `cat file` or `less file` |
| First 10 lines | `head file` |
| Search in files | `grep "pattern" file` |
| Connect to server | `ssh user@server` |
| Copy to server | `scp file user@server:path` |

Here's a [quick video](https://www.youtube.com/watch?v=i_23KUAEtUM) on performing some common Git actions in VS Code.

---

# Your Turn: Questions, Reactions, and Feedback

Before our next class, think about:

1. What commands or concepts were new to you?
2. What do you want more practice with?
3. How do you think these tools will help in your ML work?
4. Any questions or points of confusion?

**Write your thoughts below:**

*[Your reflections here]*