# HW2A - Terminal, Conda, and Environment Management

See Canvas for details on how to complete and submit this assignment.

## Introduction

This assignment establishes your professional development environment using terminal (macOS) or Git Bash (Windows) and conda. You'll move beyond Jupyter notebooks to understand the broader ecosystem that supports data science work - the command line tools, environment management, and utilities that make reproducible analysis possible.

### Learning Objectives
- Build skills with terminal navigation and file manipulation for efficient development workflows
- Create and manage isolated Python environments for reproducible analysis
- See how conda manages complete development environments, not just Python packages
- Create Python scripts and run them from the command line
- Discover new tools that enhance the command line experience

By the end, you'll have a properly configured development environment and the skills to maintain it. You'll understand why environments matter for reproducible work and have practical experience with tools that will make your future development more efficient.

This assignment should take 60-90 minutes to complete.

### Generative AI Allowance
You may use GenAI tools for brainstorming, explanations, and debugging if you disclose it, understand it, and validate it. Your submission must represent your own work and you are solely responsible for its correctness.

### Scoring
- Part 1: Environment Setup (10 pts)
- Part 2: Conda Beyond Python (10 pts)
- Part 3: Better Tools (10 pts)
- Environment Understanding Questions: (10 pts - Graduate only)

## Terminal, Conda, and Environment Management

### Part 1: Environment Creation and Directory Structure (10 points)

You should complete lectures 4b (terminal workshop) and 5a (conda workshop), and review the provided guides for both before proceeding. You may also find it helpful to refer to those guides if you get stuck during this part of the homework.

Note: throughout this, when you are asked to copy and paste (copy-paste), we recommend that you use the following keyboard shortcuts:

- macOS
  - `Command+C` for copy
  - `Command+V` for paste
- Windows
  - `Control+C` for copy
  - `Control+V` for paste

#### Task 1: Create and Configure Environment

First, if you have not already created the `insy6500` virtual environment as demonstrated in lecture 5a, do so now by following along with the lecture recording and referencing the instructions in the Conda guide. After activating that environment - you should see `(insy6500)` in the terminal prompt - start Jupyter Lab as instructed.

Once you have Jupyter Lab running, open a new notebook and paste the following code into it:

```python
# check insy6500 environment
import sys

def check_package(name, import_as=None):
    """Check if a package is installed and print its version."""
    try:
        if import_as:
            exec(f"import {name} as {import_as}")
            module = eval(import_as)
        else:
            module = __import__(name)
        
        version = getattr(module, '__version__', 'version unknown')
        print(f"✓ {name:<15} {version}")
        return True
    except ImportError:
        print(f"✗ {name:<15} NOT INSTALLED")
        return False

# Check all required packages
print("Checking INSY 6500 required packages:\n")
packages = [
    'numpy',
    'pandas', 
    'jupyter',
    'seaborn',
    'matplotlib',
    'statsmodels',
    'scipy'
]

results = [check_package(pkg) for pkg in packages]

# Summary
print("\n" + "="*40)
if all(results):
    print("🎉 All packages installed! Environment ready!")
else:
    print("⚠️  Some packages missing. Please run:")
    print("conda activate insy6500")
    for i, pkg in enumerate(packages):
        if not results[i]:
            print(f"conda install {pkg}")
```

Run that code and copy-paste the output in the following cell.

##### Confirm Conda Environment

#### Task 2: Create Project Directory Structure

From your INSY 6500 folder (`~/insy6500`), use the terminal commands you have learned to construct the following folder structure:

```text
~/insy6500/
├── homework/
│   └── notebooks/
├── scripts/
├── projects/
├── resources/
└── data/
    ├── raw/
    └── processed/
```

You will need to use a combination of `cd`, `mkdir`, and `ls` to accomplish this. You may also need to use `del` and `rmdir` (remove directory) to remove folders and/or files created during the workshop. If you choose to use `rm -ri` or `rm -rI` for this task, as described in the workshop, *be very careful* as it can be quite destructive!

Now try to visualize what you just created using different `ls` commands:

```bash
ls -la
ls -R  # Recursive list - hard to read, right?
```

Notice how `ls` doesn't clearly show the hierarchical structure? This is one limitation of basic Unix tools.

### Part 2: Conda Beyond Python Packages (10 points)

#### Task 1: Install System Tools via Conda

Conda can install more than just Python packages! Let's get a better directory visualization tool. Run the following command in your terminal to install the `tree` utility. Do this with the `insy6500` environment activated.

```bash
conda install -c conda-forge tree
```

Note the `-c conda-forge` option, which tells conda to use a different package repository. `tree` is not on the default Anaconda repo, which is focused on data science essentials. [conda-forge](https://conda-forge.org/) is community maintained, with a broader selection of over 25k packages, including many command line tools like `tree`.

Now verify the installation and use it:

```bash
which tree      # Should show path within your conda environment
tree --version  # gives version number and copyright info
tree            # much better visualization of your directory structure!
```

Notice how much clearer this is than `ls -R`? There are many tools like `tree` that can be installed with conda to improve your terminal experience. We will explore a few more later.

First, let's build a Python program to help us copy output from commands like `tree`.

#### Task 2: Running Python from the Command Line

So far, this class has focused on running Python in notebooks. But you can also run Python scripts from the command line. We'll start with a very simple example that shows how to write and run Python in this manner.

Change to the `scripts` folder you created earlier, create the file `hello.py` and edit it using the `nano` text editor by executing the following commands in your terminal:

```bash
cd ~/insy6500/scripts
touch hello.py
nano hello.py
```

Copy-paste the following code into nano.

```python
print("Hello, World!")
```

Then press `control-o` to save your changes, press enter to confirm overwriting `hello.py`, and press `control-x` to exit nano, returning you to the command line. There, run `cat hello.py` to see the contents of the file and verify the changes. Finally, run the program with the following command:

```bash
python hello.py
```

You just ran Python outside of Jupyter! This is how most production Python code runs.


#### Task 3: Building Command-Line Tools with Python

Let's take it up a notch. Python can create useful command-line utilities. We're going to create a tool that will capture output from any terminal command (like `ls` or `tree`) and put it in your clipboard. `pyperclip` is a cleverly named Python package that allows us to do just that on macOS and Windows systems.

Let's see if it is available on the default repo:

```bash
conda search pyperclip
```

You will see that it is not found. The current channels (default repos) are listed, along with some information about how to find alternative sources, like conda-forge, which we'll check next:

```bash
conda search pyperclip -c conda-forge
```

You will find that they have several versions of the package. Install the latest version (the default choice) of `pyperclip` to the active environment using conda:

```bash
conda install -c conda-forge pyperclip
```

Now use `touch` and `nano` as before to create the file `clip.py` in your `scripts` folder and open it in the editor. Copy-paste the following code into it:

```python
"""Copy stdin to clipboard - a simple command-line utility."""
import sys
import pyperclip

# Read all input from stdin (standard input)
input_text = sys.stdin.read()

# Copy to clipboard
pyperclip.copy(input_text)

# Print an informative confirmation message
print(f"Copied {len(input_text)} characters to clipboard")
```

Test your new tool by "piping" the output of `tree` to it:

```bash
# Copy directory structure to clipboard
tree -d ~/insy6500 | python ~/insy6500/scripts/clip.py
```

Now paste the tree output into the following cell.

##### Project Directory Structure

### Part 3: Better Help and Better Tools (10 points)

#### Task 1: The Problem with Traditional Help

Getting help with command-line tools can be overwhelming. Try the following:

```bash
conda --help  # Wall of text!
tree --help   # So many options!
```

These help pages are comprehensive but not beginner-friendly. Let's install a better solution, `tldr` (Too Long; Didn't Read), which gives practical examples instead of exhaustive documentation for most command line tools. First use the `--info` option to show more detailed results from `conda search`, then use `tail` to only see the latest version:

```bash
conda search tldr -c conda-forge --info | tail -n 22
```

Of the additional details, the most interesting may be the list of dependencies at the end. Install the latest version with:

```bash
conda install -c conda-forge tldr
```

Now try `tldr` for the same tools and compare the results:

```bash
tldr conda  # Practical examples!
tldr tree   # Just the useful stuff!
```

It is also quite useful for terminal essentials. Try these too:

```bash
tldr ls
tldr cd
```

#### Task 2: Upgrading Basic Tools

Let's compare basic and enhanced versions of common tools:

```bash
# First, look at your script with cat (the basic tool)
cat ~/insy6500/scripts/clip.py
```

Now install and try `bat`, a modern replacement for `cat`:

```bash
conda install -c conda-forge bat
bat ~/insy6500/scripts/clip.py
```

Notice the differences? Syntax highlighting (color coding), line numbers, and better formatting! Try:
```bash
tldr bat  # Learn what else bat can do
bat --plain ~/insy6500/scripts/clip.py  # When you want simple output
```

#### Task 3: Putting It All Together

Let's combine everything you've learned:

First, export and view all the packages installed in the active conda environment:

```bash
cd ~/insy6500
conda env export | bat
```

The result will be displayed by `bat` in a scrollable format. Control it with the following keys:

- `j`/`k` or arrow keys to scroll up/down  
- `Space` for next page
- `q` to quit

Notice how many packages are installed - likely 200+! This list includes all the dependencies needed for the packages you explicitly installed.

Now export ONLY what you explicitly installed using the `--from-history` option. These commands will save the output to `env-6500.yml`, then view it using `bat` with line numbers and syntax highlighting.

```bash
conda env export --from-history > env-6500.yml
bat env-6500.yml
```

Much shorter, right? This shows only the packages you actually typed `conda install` for. This should help you appreciate the role that third-party packages play in the Python ecosystem.

Finally, copy your minimal environment:

```bash
bat env-6500.yml | python scripts/clip.py
```

And paste the result in the markdown cell below.

##### Conda Primary Dependencies

### Environment Understanding Questions (10 points, Graduate Only)

This question is for grad students only.

In a markdown cell, answer:

1. **Environment Isolation Test**: Deactivate your conda environment with `conda deactivate` and try running `tldr conda`. What happens? Now reactivate with `conda activate insy6500` and try again. What does this tell you about where these tools live and when they're accessible?

2. **The Power of Piping**: We used commands like `tree | python scripts/clip.py` and `conda env export | bat`. What is the `|` doing here? Try running `tree > tree_output.txt` instead - how is `>` different from `|`? What kinds of workflows does this enable?

3. **Tool Discovery**: Compare your experience getting help with `conda --help` versus `tldr conda`. Which would you rather use as a beginner? What does this suggest about the importance of finding the right tools for your skill level?

4. **Beyond Python Packages**: We installed command-line tools (`tree`, `bat`, `tldr`) using conda, not just Python libraries. What advantage does this give you over installing these tools separately at the system level? Consider what happens when you work on multiple projects.