# Week 6 Tuesday: Collaborating on Jupyter Notebooks with GitHub
## From Two-Person Turn-Taking to Real Parallel Teamwork

---

### Quick Recap: Where We Are
By end of Week 5 you can:
- Create and clone repositories
- Stage, Commit, and Push in VS Code
- Pull changes from GitHub
- Fork a repository
- Add collaborators and work in turns with a partner
- Resolve a simple merge conflict

**Today's question:** What changes when the file is a `.ipynb` notebook and 3-4 people are working *at the same time*?

### Today's Roadmap
| Time | Activity |
|------|----------|
| 0:00–0:05 | Recap |
| 0:05–0:30 | Part 1 — Why notebooks are a special challenge |
| 0:30–0:55 | Part 2 — Branches: your personal sandbox |
| 0:55–1:15 | Part 3 — Hands-on partner exercises |
| 1:15–1:30 | Part 4 — Team patterns and assignment intro |

## Part 1: Why Notebooks Create Unique Git Challenges

### The Hidden Problem: Notebooks Are JSON Files

A notebook looks like a clean interface of cells and text, but on disk it's stored as a **JSON file** — structured text full of brackets, keys, and metadata.

Here's what one code cell looks like *inside* the `.ipynb` file:

```json
{
 "cell_type": "code",
 "execution_count": 3,
 "metadata": {},
 "outputs": [
  {
   "name": "stdout",
   "output_type": "stream",
   "text": ["Average: 3.54\n"]
  }
 ],
 "source": ["print(df.GPA.mean())"]
}
```

Git does not know this represents a notebook cell. It sees **lines of JSON text**. Two people modifying the same cell in any way — even by just running it — looks to Git like two people editing the same lines of text.

---

### Three Things That Create Notebook Conflicts

**1. Running cells (the most common surprise)**

When you run a cell, the notebook writes the output *into the JSON*. If two people both run the same cell:

```
Person A runs cell 1  →  output saved into their copy of the JSON
Person B runs cell 1  →  output saved into their copy of the JSON
Both push  →  Git: CONFLICT (same lines, modified by two people)
```

Even if the output is **identical**, Git still flags it as a conflict.

<font color="red">**GIT COLLABORATION:** **CLEAR OUTPUT BEFORE EVERY COMMIT**! Every time, no exceptions.</font>

---

**2. Editing the same cell**

Two people editing the same cell's code is a straightforward conflict — same as two people editing the same line in a `.py` file.

**Fix:** Divide the notebook into owned sections. Each person only edits their own cells.

---

**3. Execution counts**

Each cell stores `execution_count` — how many times it's been run. Person A runs cell 3 twice, Person B runs it once. The numbers differ. Conflict.

**Fix:** Same as #1. Clearing outputs also resets all execution counts to `null`.

### The Most Important Habit: Clear Outputs Before Every Commit

```
Your commit ritual — memorize this sequence:

  1. Run your code and verify it works
  2. Cell menu → All Output → Clear    ← THE CRITICAL STEP
  3. Save  (Ctrl+S or Cmd+S)
  4. Stage changes in VS Code
  5. Write a clear commit message
  6. Commit and Push

Never skip step 2. Never commit a notebook with outputs.
```

**In VS Code:** `Cell → All Output → Clear`  
**Or:** `Ctrl+Shift+P` → type `Clear All Cell Outputs`

**What gets removed:** printed text, tables, charts, errors, execution counts  
**What stays:** your code, markdown text, cell structure

---

### What a Notebook Conflict Looks Like (and How to Fix It)

If you get a conflict, you can open the raw file (right-click → Open With → Text Editor):

```json
{
 "cell_type": "code",
<<<<<<< HEAD
 "execution_count": 3,
 "outputs": [{"text": ["3.54\n"]}],
=======
 "execution_count": 2,
 "outputs": [{"text": ["3.54\n"]}],
>>>>>>> abc1234
 "source": ["print(df.GPA.mean())"]
}
```

**Resolution steps:**
1. The outputs are the same — this is just a run-count conflict
2. Delete everything between `<<<<<<<` and `>>>>>>>` including the markers
3. Leave: `"execution_count": null` and `"outputs": []`
4. Save, stage, commit: `"Resolve notebook output conflict"`

## Part 2: Branches — Your Personal Sandbox

### The Problem With Everyone Pushing to Main

With two people taking turns, pushing directly to `main` is manageable. With 3–4 people working *simultaneously*, it breaks down:

```
10:00 AM — All three people clone the repo
10:15 AM — Person A pushes half-finished code to main
10:15 AM — Person B pulls it, their notebook crashes
10:20 AM — Person C pushes, overwrites Person B's work
10:20 AM — Everyone is confused, nothing runs
```

Branches solve this.

---

### What Is a Branch?

A **branch** is a parallel copy of your repository where you can work freely without affecting anyone else until you're ready to share.

```
main:                ──●──●──●──────────────────────────●──  (always works)
                               \                        /
section-2-revenue:              ●──●──●──●──●──●──●    (Person B)
                                 \                  /
section-3-trends:                 ●──●──●──●──●──●      (Person C)
```

Everyone works independently. When a section is complete and tested, it merges into `main`. Nobody's work-in-progress breaks anyone else.

---

### Creating a Branch

**In VS Code (easiest):**
1. Look at the **bottom-left** status bar — you see `⎇ main`
2. Click on it
3. Select **"Create new branch..."** (top bar)
4. Type your branch name, press Enter

**In the terminal:**
```bash
git checkout -b section-2-revenue
```
The `-b` flag means: create AND switch to this branch.

---

### Pushing Your Branch to GitHub in VSCode

The first time you push a new branch, VS Code needs to publish it to GitHub.

**Option 1 — Publish Branch button (appears automatically):**  
After your first commit on the new branch, look at the bottom-left status bar.
You will see a **cloud icon with an up-arrow** next to your branch name.
Click it — VS Code publishes the branch to GitHub and sets up tracking.

**Option 2 — Source Control panel:**
1. Open Source Control (`Ctrl+Shift+G`)
2. Click the **`···` More Actions** menu (top-right of the panel)
3. Select **Publish Branch**

After the first publish, the **↕ Sync Changes** button in the status bar handles
all future pushes — click it after every commit.

> ✅ **Confirm it worked:** On GitHub.com, click the branch dropdown near the
> top-left of your repository. Your branch name should appear in the list.

---

### Pushing Your Branch to GitHub from Command Line

First push of a new branch:
```bash
git push -u origin section-2-revenue
```
VS Code will also show a **"Publish Branch"** button automatically.

---

### Branch Naming for the Assignment

```
section-1-loading     ← Person 1
section-2-revenue     ← Person 2
section-3-trends      ← Person 3
section-4-reps        ← Person 4 (4-person teams)
```

### Merging Your Branch Back to Main in VSCode

When your section is complete and all cells run cleanly, follow these four steps
inside VS Code — no terminal needed.

**Step 1 — Switch to `main`**
1. Click the branch name in the **bottom-left status bar**
   (e.g., `⎇ section-2-revenue`)
2. A branch picker appears at the top of the screen
3. Click **`main`** — the status bar updates to `⎇ main`

**Step 2 — Pull the latest `main`**
1. Click the **↕ Sync Changes** button in the status bar
2. VS Code pulls any commits teammates pushed to `main` since you started

**Step 3 — Merge your branch into `main`**

1. With `main` branch selected, click on the (...) next to the main repo in Source Control
2. Pick `Branch` then `Merge`
3. Sync the changes

or

1. Open the Command Palette: `Ctrl+Shift+P` / `Cmd+Shift+P`
2. Type **`Git: Merge Branch`** and press Enter
3. Select your branch from the list (e.g., `section-2-revenue`)
4. VS Code performs the merge

**Step 4 — Push the updated `main` to GitHub**
1. Click **↕ Sync Changes** again (the status bar will show `↑1` or more)
2. VS Code pushes the newly merged `main` to GitHub

---

**If Step 3 produces a conflict:**  
VS Code highlights each conflict in the editor with inline Accept/Reject buttons.
Resolve each one (clear output blocks if the conflict is in notebook outputs),
then:
1. Open Source Control (`Ctrl+Shift+G` / `Cmd+Shift+G`)
2. Click **+** next to each resolved file to stage it
3. Type commit message: `Merge section-2-revenue into main, resolve conflicts`
4. Click **✓ Commit**, then **↕ Sync Changes** to push

---

### Merging Your Branch Back to Main from Command Line

When your section is complete and all cells run cleanly:

```bash
# Step 1: Switch to main
git checkout main

# Step 2: Get the latest main (teammates may have merged since you started)
git pull

# Step 3: Bring your branch's work into main
git merge section-2-revenue

# Step 4: Push the updated main to GitHub
git push
```

If there's a conflict during step 3, VS Code will highlight it. Resolve it (clear outputs if needed), then:
```bash
git add .
git commit -m "Merge section-2-revenue into main, resolve conflicts"
git push
```

---

### Staying in Sync While Others Merge in VSCode

After a teammate merges to `main`, bring those changes into your own branch
before continuing. Do this every time someone posts "Main updated" in the group
chat:

1. **Switch to `main`** — click the status bar branch name, select `main`
2. **Pull** — click **↕ Sync Changes** to download the new commits
3. **Switch back to your branch** — click the status bar again, select your
   branch (e.g., `section-3-trends`)
4. **Merge `main` into your branch** — open Command Palette
   (`Ctrl+Shift+P` / `Cmd+Shift+P`), type **`Git: Merge Branch`**, select `main`

---

### Staying in Sync While Others Merge from Command Line

After Person 1 merges, Persons 2 and 3 should update their branches:

```bash
git checkout main             # switch to main
git pull                      # get the new merged changes
git checkout section-2-revenue  # go back to your branch
git merge main                # bring those changes into YOUR branch
```

---

This prevents conflicts from accumulating. Do it after every teammate merges.

## Part 3: Hands-On Exercises

### Exercise A: The Output Conflict (10 min, with a partner)

Experience exactly what happens when outputs are not cleared.

**Setup:** Person A creates a repo, adds Person B as collaborator. Both clone it.

**Person A — create a notebook called `output_demo.ipynb`, add this cell, RUN it, then commit WITHOUT clearing:**

In [None]:
# Person A: add this cell, run it, then commit WITH outputs showing
# (doing the wrong thing on purpose to see what happens)

scores = [88, 92, 79, 95, 83, 77, 91, 86]
average = sum(scores) / len(scores)
print(f'Average: {average:.1f}')
print(f'Highest: {max(scores)}')
print(f'Lowest: {min(scores)}')

# Commit message: 'Add score analysis - outputs NOT cleared (demo)'
# Push to GitHub

**Person B after Person A pushes:**
1. Pull the changes
2. Run the same cell (identical code, identical output)
3. Try to push
4. Read the error message — what does it say?

**Both together:**
- Right-click the notebook → Open With → Text Editor
- Find the `<<<<<<< HEAD` conflict markers
- What exactly is different between the two versions?
- Resolve it: delete both output blocks, leave `"outputs": []` and `"execution_count": null`
- Commit: `"Resolve output conflict - lesson learned"`

**Key insight:** The outputs were *identical*, yet Git still flagged a conflict. This is why we clear outputs. Always.

---

### Exercise B: Parallel Branch Work (15 min, same partner)

Now use branches to work at the same time without conflicts.

**Person A — create branch `feature-stats` and add this cell:**

In [None]:
# Person A: create branch feature-stats first
# In VS Code: click ⎇ main in the status bar → Create new branch... → feature-stats

def summarize_scores(scores):
    """Calculate summary statistics for a list of scores.

    Args:
        scores: list of numbers

    Returns:
        dict with keys: mean, median, high, low, count

    >>> result = summarize_scores([88, 92, 79, 95, 83])
    >>> result['count']
    5
    >>> result['high']
    95
    """
    if not scores:
        return {}
    s = sorted(scores)
    n = len(s)
    mid = n // 2
    median = s[mid] if n % 2 != 0 else (s[mid - 1] + s[mid]) / 2
    return {'mean': round(sum(s)/n, 2), 'median': median,
            'high': s[-1], 'low': s[0], 'count': n}

scores = [88, 92, 79, 95, 83, 77, 91, 86]
stats = summarize_scores(scores)
for k, v in stats.items():
    print(f'  {k}: {v}')

# Clear outputs -> Save -> Commit: 'Add summarize_scores() - Person A'
# Then click the Publish Branch button in the status bar (cloud icon with up-arrow)

**Person B — simultaneously on branch `feature-ranking`:**

In [None]:
# Person B: create branch feature-ranking first
# In VS Code: click ⎇ main in the status bar → Create new branch... → feature-ranking

def rank_students(names, scores):
    """Return students ranked by score, highest first.

    Args:
        names: list of student name strings
        scores: list of numbers (same length)

    Returns:
        list of (name, score) tuples sorted descending

    >>> rank_students(['Alice','Bob','Carol'], [88, 95, 79])
    [('Bob', 95), ('Alice', 88), ('Carol', 79)]
    """
    return sorted(zip(names, scores), key=lambda p: p[1], reverse=True)

names = ['Alice','Bob','Carol','Dave','Eve','Frank','Grace','Henry']
scores = [88, 92, 79, 95, 83, 77, 91, 86]
for rank, (name, score) in enumerate(rank_students(names, scores), 1):
    print(f'  {rank}. {name}: {score}')

# Clear outputs -> Save -> Commit: 'Add rank_students() - Person B'
# Then click the Publish Branch button in the status bar (cloud icon with up-arrow)

---

**After both push their branches — merge them (VSCode):**

Person A merges first:
1. Status bar → click `⎇ feature-stats` → select **`main`**
2. Click **↕ Sync Changes** to pull
3. `Ctrl+Shift+P` → **`Git: Merge Branch`** → select `feature-stats`
4. Click **↕ Sync Changes** to push

Person B then merges (same steps with `feature-ranking`):
1. Status bar → click branch name → select **`main`**
2. Click **↕ Sync Changes** to pull — this gets Person A's merged function
3. `Ctrl+Shift+P` → **`Git: Merge Branch`** → select `feature-ranking`
4. Click **↕ Sync Changes** to push

---

**Or, after both push their branches — merge them from Command Line:**

Person A merges first:
```bash
git checkout main
git pull
git merge feature-stats
git push
```

Person B then merges:
```bash
git checkout main
git pull            # gets Person A's merged function
git merge feature-ranking
# probably no conflict — different cells = different JSON
git push
```

---

> Probably no conflict — different cells = different JSON objects.

**Check on GitHub → Insights → Network graph** — you'll see both branches meet at main. Both functions are present. Nobody waited.

**Why it worked:** Each person added a *new* cell. New cells are new JSON objects. Git can merge them automatically because they don't overlap.

**The takeaway:** Branches + separate cells = genuine parallel work.

## Part 4: Team Communication Protocol

### The Real Cause of Most Git Problems

It's not the technology. It's lack of communication.

```
Without communication:
  'I'll just quickly fix that cell...'
  [partner edits same cell, nobody knows]
  [conflict, frustration, 20 minutes lost]

With communication:
  'Starting Section 2, on branch section-2-revenue'
  [30 min later]
  'Section 2 pushed. Ready for review. Merging tomorrow.'
```

**Use your group chat (Slack, Discord, text) with these templates:**

| When | Say |
|------|-----|
| Starting | `'Starting [section], on branch [name]'` |
| Pausing | `'Pausing on [section], last commit: [message]'` |
| Pushed | `'Pushed [branch] — [what's done]'` |
| Need review | `'[Name], can you review my branch?'` |
| About to merge | `'Merging [branch] to main — hold off 2 min'` |
| Done merging | `'Main updated — safe to pull'` |

---

### Division of Work: The Safe Approach

Divide by **notebook sections**, not by functions.

Each person owns one section — their own cells, their own branch. This is the safest pattern because:
- You're working in different parts of the JSON
- Git can automatically merge non-overlapping sections
- One person's merge doesn't break another's cells

**Avoid:**
- Two people editing the same cell at the same time
- Editing cells in someone else's section without asking
- Merging to main without announcing it in the group chat

---

### Week 6 Assignment Introduction

**Business Sales Data Analysis — teams of 3 or 4**  
Starts today, continues Thursday, due Monday 11:59 PM

| Section | Branch | Core Question |
|---------|--------|---------------|
| 1 — Loading | `section-1-loading` | What does the data look like? Any issues? |
| 2 — Revenue | `section-2-revenue` | Which products and regions earn the most? |
| 3 — Trends | `section-3-trends` | How did sales change month by month? |
| 4 — Sales Reps *(4 teams)* | `section-4-reps` | Who are the top performers? |

Each section is merged to main via Pull Request — we'll learn PRs on Thursday.

In [None]:
# Fill in your team info — save and commit this as your first team action

team = {
    "team_name": "",
    "repo_name": "",  # will be: week6-sales-analysis-[team-name]
    "members": [
        {"name": "", "github": "", "section": "Section 1 - Loading",  "branch": "section-1-loading"},
        {"name": "", "github": "", "section": "Section 2 - Revenue", "branch": "section-2-revenue"},
        {"name": "", "github": "", "section": "Section 3 - Trends",  "branch": "section-3-trends"},
        # 4-person teams uncomment:
        # {"name": "", "github": "", "section": "Section 4 - Reps", "branch": "section-4-reps"},
    ]
}

print(f"Team: {team['team_name']}")
for m in team['members']:
    if m['name']:
        print(f"  {m['name']} ({m['github']}) -> {m['branch']}")

### Repo Owner Checklist (do before Thursday)

The Section 1 person typically owns the repo:

- [ ] Create repo `week6-sales-analysis-[team-name]` on GitHub — set to Public
- [ ] Add all teammates: Settings → Collaborators → Add People
- [ ] Download starter notebook from Canvas and commit it to the repo
- [ ] Commit message: `'Initial commit: add assignment notebook'`
- [ ] Push and share the URL in your group chat

**Every team member before Thursday:**
- [ ] Accept the GitHub collaboration invitation (check email)
- [ ] `git clone [repo-url]`
- [ ] Open in VS Code — confirm the notebook loads
- [ ] `git checkout -b section-X-name` — create your branch
- [ ] Make one small test commit to confirm push access works

---

### Summary

- Notebooks are JSON — outputs and execution counts cause hidden conflicts
- Clear outputs before every commit — the single most important habit
- Branches give each person a safe sandbox for parallel work
- Separate sections of a notebook merge cleanly because they're separate JSON objects
- Communication in your group chat prevents most problems before they start

See you Thursday with repos ready and branches created!

---
*Week 6 Tuesday — Notebook Collaboration and Branching*