---
title: "Setting Up Your Toolkit"
jupyter: python3
execute:
    enabled: true
---

::: {.callout-note title="What you'll learn in this module"}
This module guides you through setting up your Python development environment. You'll learn how to create a virtual environment using uv, install all necessary packages for this course, and verify your installation by importing key libraries we'll use throughout the semester.
:::

## Why a Consistent Environment Matters

Before we dive into data analysis, we need to get our tools in order. A well-configured development environment saves you from dependency conflicts, version mismatches, and the dreaded "works on my machine" problem.

We'll use `uv` to manage our Python environment. If you're familiar with `pip` and `virtualenv`, think of `uv` as a faster, more reliable alternative. It handles package installation and dependency resolution with impressive speed.

The best part? It works seamlessly with the `pyproject.toml` file that defines all course dependencies. One command installs everything you need.

## Setting Up Your Environment

Let's walk through the setup process. The first step is creating a virtual environment. Open your terminal and navigate to the project directory. Then run this command:

```bash
uv venv
```

This creates an isolated Python environment in a `.venv` directory. Virtual environments keep project dependencies separate from your system Python, preventing conflicts between different projects.

::: {.column-margin}
**Note:** If you don't have `uv` installed yet, you can install it by following the instructions at [https://docs.astral.sh/uv/](https://docs.astral.sh/uv/). On macOS and Linux, you can use `curl -LsSf https://astral.sh/uv/install.sh | sh`. On Windows, use `powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"`.
:::

The second step is activating the environment. This tells your terminal to use the Python interpreter and packages from your virtual environment instead of the system-wide installation.

On macOS and Linux, use:

```bash
source .venv/bin/activate
```

On Windows, use:

```bash
.venv\Scripts\activate
```

You'll notice your terminal prompt changes to show `(.venv)` at the beginning. This indicates the environment is active.

The third step is installing all course dependencies. Since this project uses a `pyproject.toml` file, one command installs everything:

```bash
uv pip install -e .
```

The `-e` flag installs the project in "editable" mode, which is useful if you're developing or modifying code. The installation might take a few minutes since we're installing many powerful libraries for data science, machine learning, and network analysis.

## Verifying Your Installation

Now comes the moment of truth. Let's verify that all required libraries are properly installed. The code cell below imports every major library we'll use in this course.

If everything is set up correctly, this cell will run without errors. If you encounter import errors, double check that you activated your virtual environment and ran the installation command.

In [None]:
# Core data science libraries
import numpy as np
import pandas as pd
import scipy

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
import bokeh
import altair as alt

# Machine learning libraries
import sklearn
import torch
import torchvision

# Natural language processing libraries
import transformers
import sentence_transformers
import gensim

# Network analysis libraries
import networkx as nx
import igraph

# LLM and agentic AI tools
import ollama
import langchain_core
import langchain_ollama
import langgraph

# Utility libraries
import requests
from PIL import Image
from tqdm import tqdm
import pydantic

print("All libraries imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")

If you see "All libraries imported successfully!" along with version numbers, congratulations! Your environment is ready. You now have access to some of the most powerful tools in data science, machine learning, and AI.

## Version Control with Git

Now that your environment is ready, let's master version control. Think of Git as a time machine for your code. It lets you save snapshots of your work, experiment with new ideas without fear, and collaborate with others without stepping on each other's toes.

The pen-and-paper exercise introduced you to Git's conceptual model. Now we'll put those concepts into practice with hands-on exercises that mirror real-world workflows.

### Setting Up Your Repository

Before we start the Git exercises, you need to set up your repository through GitHub Classroom.

**Step 1: Accept the assignment**

Your instructor will provide an assignment invitation link on Brightspace. Click this link. You may need to sign in to GitHub if you haven't already. Then click "Accept this assignment."

GitHub Classroom will create a personal repository just for you. After a few moments, you'll see a confirmation message with a link to your repository. Click that link.

**Step 2: Find your repository clone URL**

You're now on your repository page on GitHub. The URL in your browser will look like:

```
https://github.com/sk-classroom/git-practice-assignment-YOUR-USERNAME
```

This is your repository's home page. To get your code onto your computer, you need to clone it. Look for the green **"<> Code"** button near the top right of the page (below the repository name and above the file list). Click it.

A dropdown appears showing three tabs: HTTPS, SSH, and GitHub CLI. Make sure **HTTPS** is selected (it should be underlined). You'll see a URL ending in `.git`:

```
https://github.com/sk-classroom/git-practice-assignment-YOUR-USERNAME.git
```

Click the copy icon (ðŸ“‹) next to the URL to copy it to your clipboard.

**Step 3: Clone the repository**

Open your terminal and navigate to where you want to work (like your Documents folder or a projects directory):

```bash
# If you copied the URL, paste it after 'git clone'
git clone https://github.com/sk-classroom/git-practice-assignment-YOUR-USERNAME.git
cd git-practice-assignment-YOUR-USERNAME
```

This creates a local copy of your repository that's already connected to GitHub. The repository is already initialized, and the remote connection is already set up. You can focus entirely on learning Git commands.

::: {.column-margin}
**Tip:** Make sure you're inside the cloned directory (`cd git-practice-assignment-YOUR-USERNAME`) before running any git commands in the exercises below.
:::

### Creating Your First Commits

Let's create a simple Python script and track it with Git.

Now create a simple Python script called `analysis.py`:

```bash
echo 'def calculate_mean(data):
    total = sum(data)
    return total / len(data)' > analysis.py
```

Check the status of your repository:

```bash
git status
```

You'll see `analysis.py` listed as an "untracked file." Git sees the file exists but isn't tracking its history yet. This is Area 1 from the pen-and-paper exercise, where all your files live.

Add the file to the staging area:

```bash
git add analysis.py
```

This moves the file to Area 2, the staging area. You're telling Git "I want this file in my next snapshot." Check the status again:

```bash
git status
```

Now `analysis.py` is listed as "Changes to be committed." This is your staging area. You've chosen what goes into your next snapshot.

Create your first commit:

```bash
git commit -m "Add mean calculation function"
```

Congratulations! You've created your first snapshot. The `-m` flag lets you add a message describing what this snapshot contains. Always write clear, descriptive commit messages. Your future self will thank you.

### Fixing a Bug

The function above has a critical bug. It crashes when given an empty list. Let's fix it and commit the fix.

Edit `analysis.py` to add error handling:

```bash
echo 'def calculate_mean(data):
    if len(data) == 0:
        return 0
    total = sum(data)
    return total / len(data)' > analysis.py
```

Check what changed:

```bash
git diff
```

The `git diff` command shows exactly what changed. Lines starting with `-` were removed. Lines starting with `+` were added. This is incredibly useful for reviewing your work before committing.

Stage and commit the fix:

```bash
git add analysis.py
git commit -m "Fix crash when data is empty"
```

You now have two commits in your history. View them:

```bash
git log --oneline
```

Each commit has a unique identifier (a hash) and your descriptive message. This is your project's timeline.

### Working with Branches

Branches let you experiment without breaking your stable code. Let's say you want to try a completely new approach to calculating statistics, but you're not sure it will work. Branches give you a safe playground.

Create a new branch called `add-median`:

```bash
git branch add-median
git checkout add-median
```

The `git branch` command creates the branch. The `git checkout` command switches to it. You can combine these steps with `git checkout -b add-median`.

Check which branch you're on:

```bash
git branch
```

The asterisk shows your current branch. Now add a new function to `analysis.py`:

```bash
echo '
def calculate_median(data):
    if len(data) == 0:
        return 0
    sorted_data = sorted(data)
    n = len(sorted_data)
    if n % 2 == 0:
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
    return sorted_data[n//2]' >> analysis.py
```

Stage and commit the new function:

```bash
git add analysis.py
git commit -m "Add median calculation function"
```

Now for the magic. Switch back to your main branch:

```bash
git checkout main
cat analysis.py
```

Notice the median function disappeared! It's not gone, it's just on the `add-median` branch. Your main branch remains exactly as you left it. This is the power of branches.

### Merging Branches

Your median function works perfectly. Time to bring it into the main branch. This process is called merging.

Make sure you're on the main branch:

```bash
git checkout main
```

Merge the `add-median` branch:

```bash
git merge add-median
```

Git automatically combines the work from both branches. Check your file:

```bash
cat analysis.py
```

The median function is now in main. Your experiment succeeded and you've integrated it into your stable code. This merge was smooth because no conflicts existed.

### Understanding Merge Conflicts

Real collaboration isn't always this smooth. Sometimes two branches modify the same lines of code in different ways. When this happens, Git can't automatically decide which version to keep. You must resolve the conflict manually.

Let's create a merge conflict deliberately so you know how to handle it. Create two branches that modify the same function in different ways.

First, create a branch that changes the mean function to use NumPy:

```bash
git checkout -b use-numpy-mean
echo 'import numpy as np

def calculate_mean(data):
    if len(data) == 0:
        return 0
    return np.mean(data)

def calculate_median(data):
    if len(data) == 0:
        return 0
    sorted_data = sorted(data)
    n = len(sorted_data)
    if n % 2 == 0:
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
    return sorted_data[n//2]' > analysis.py
git add analysis.py
git commit -m "Use NumPy for mean calculation"
```

Now switch back to main and create a different branch that also modifies the mean function:

```bash
git checkout main
git checkout -b add-mean-docstring
echo 'def calculate_mean(data):
    """Calculate the arithmetic mean of a dataset.

    Returns 0 for empty datasets."""
    if len(data) == 0:
        return 0
    total = sum(data)
    return total / len(data)

def calculate_median(data):
    if len(data) == 0:
        return 0
    sorted_data = sorted(data)
    n = len(sorted_data)
    if n % 2 == 0:
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
    return sorted_data[n//2]' > analysis.py
git add analysis.py
git commit -m "Add docstring to mean function"
```

Now merge the first branch into main:

```bash
git checkout main
git merge use-numpy-mean
```

This succeeds because main hasn't changed since we branched. Now try to merge the second branch:

```bash
git merge add-mean-docstring
```

Conflict! Git displays a message like "CONFLICT (content): Merge conflict in analysis.py." Git tried to merge both changes but couldn't decide which version of the mean function to keep.

### Resolving the Conflict

Open `analysis.py` in a text editor. You'll see conflict markers:

```
<<<<<<< HEAD
import numpy as np

def calculate_mean(data):
    if len(data) == 0:
        return 0
    return np.mean(data)
=======
def calculate_mean(data):
    """Calculate the arithmetic mean of a dataset.

    Returns 0 for empty datasets."""
    if len(data) == 0:
        return 0
    total = sum(data)
    return total / len(data)
>>>>>>> add-mean-docstring
```

Everything between `<<<<<<< HEAD` and `=======` is the current version (from main). Everything between `=======` and `>>>>>>> add-mean-docstring` is the incoming version (from the branch you're merging).

You must manually edit the file to keep what you want. Let's combine the best of both: use NumPy and add the docstring:

```bash
echo 'import numpy as np

def calculate_mean(data):
    """Calculate the arithmetic mean of a dataset using NumPy.

    Returns 0 for empty datasets."""
    if len(data) == 0:
        return 0
    return np.mean(data)

def calculate_median(data):
    if len(data) == 0:
        return 0
    sorted_data = sorted(data)
    n = len(sorted_data)
    if n % 2 == 0:
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
    return sorted_data[n//2]' > analysis.py
```

After resolving the conflict, stage the resolved file:

```bash
git add analysis.py
```

Complete the merge by creating a commit:

```bash
git commit -m "Merge add-mean-docstring, combining NumPy and documentation"
```

Check your history:

```bash
git log --oneline --graph --all
```

The `--graph` flag shows your branch structure visually. You can see where branches diverged and where they merged back together.

### Pushing to GitHub

You've been making commits locally, but they only exist on your computer so far. Let's push them to GitHub so they're backed up and your instructor can see your work.

**This is critical:** Your instructor will review your work directly on GitHub. Work that exists only on your local computer cannot be seen or graded. You must push your commits to make them visible.

Since you cloned your repository from GitHub Classroom, the connection to GitHub (called the "remote") is already set up. The remote is named `origin` by convention. You can verify this:

```bash
git remote -v
```

This shows you the URL of your GitHub repository. You should see your GitHub Classroom repository URL listed.

Now push your commits to GitHub:

```bash
git push origin main
```

The `git push` command uploads your local commits to GitHub. The word `origin` refers to your GitHub repository, and `main` is the branch you're pushing.

::: {.column-margin}
**GitHub Classroom:** Our classroom is [Applied Soft Computing Spring 2026](https://classroom.github.com/classrooms/146034511-applied-soft-computing-spring-2026). Assignment links are distributed through Brightspace.
:::

Visit your repository page on GitHub and refresh it. You should now see your `analysis.py` file and all your commits! Click on "commits" to see your commit history. This is incredibly useful for reviewing what changed and when.

**Develop a habit of pushing frequently.** After completing each exercise section, push your work. This keeps your local repository and GitHub in sync.

Now push your other branches:

```bash
git push origin add-median
git push origin use-numpy-mean
git push origin add-mean-docstring
```

Visit your repository on GitHub (go back to the repository page from Step 2 in the setup section, or find it at `https://github.com/sk-classroom/git-practice-assignment-YOUR-USERNAME`). Click the "branches" dropdown near the top left (next to the branch icon). You'll see all your branches listed. GitHub provides a visual interface for viewing diffs, commit history, and branch structure.

### Working from Multiple Computers

GitHub serves as a central backup and collaboration hub. Because your work is on GitHub, you can work from any computer.

If you want to work from a different computer, just clone your repository there:

```bash
git clone https://github.com/sk-classroom/git-practice-assignment-yourname.git
cd git-practice-assignment-yourname
```

This downloads the entire repository, including all history and branches.

If you've been working on one computer and want to get the latest changes on another (or if you made edits directly on GitHub's website), use:

```bash
git pull origin main
```

This fetches and merges changes from the remote repository into your local branch.

::: {.callout-important title="Verify Your Work is Pushed"}
**IMPORTANT: Your instructor will review your work directly on GitHub Classroom.**

Before you finish, you must ensure everything is pushed to GitHub. Work that only exists on your local computer cannot be seen or graded.

**Check synchronization status:**

Run this command in your repository directory:

```bash
git status
```

You should see:
- `"Your branch is up to date with 'origin/main'"`
- `"nothing to commit, working tree clean"`

If you see messages about unpushed commits or uncommitted changes, commit and push them now!

**Verify on GitHub:**

Visit your repository page on GitHub (you can always find it at `https://github.com/sk-classroom/git-practice-assignment-YOUR-USERNAME`). Check that you see:

- [ ] The `analysis.py` file with both `calculate_mean()` and `calculate_median()` functions
- [ ] At least 5 commits in your commit history (click "commits" to view)
- [ ] Multiple branches visible in the branches dropdown: `main`, `add-median`, `use-numpy-mean`, `add-mean-docstring`
- [ ] A merge commit showing conflict resolution in your history

**If anything is missing from GitHub, you need to push it!** Use the commands you learned:

```bash
# Push main branch
git push origin main

# Push any missing branches
git push origin branch-name
```

Your instructor will review your GitHub repository directlyâ€”there's no need to submit a URL. Just make sure everything is pushed and visible on GitHub.
:::

::: {.callout-tip title="Try it yourself"}
Practice the complete workflow on your own:

1. Accept the assignment and clone your repository
2. Work through all the exercises, making commits as you go
3. Push your commits regularly to back up your work
4. Create branches to experiment with different approaches
5. Deliberately create merge conflicts to practice resolving them
6. Use `git log --oneline --graph --all` to visualize your work

The more you practice, the more natural Git becomes. Soon you'll be branching and merging without thinking twice. Remember: you can push as often as you want, so don't wait until the end!
:::

### Why This Matters

Version control transforms how you work. You can experiment fearlessly knowing you can always return to a working state. You can try multiple approaches in parallel. You can collaborate with teammates without coordination headaches. You can review your project's evolution to understand how you got here.

Every professional software project uses version control. Data science projects benefit just as much. When your analysis depends on dozens of interdependent scripts and notebooks, version control becomes essential for reproducibility and collaboration.

Throughout this course, we'll use Git to track our work. You'll create commits after completing exercises, branch when experimenting with different approaches, and merge when you've found solutions that work. These workflows will become second nature.

Let's get started.