# Lab 2: Jupyter-JSC & Git (2 Hours)

## ‚è±Ô∏è Time Allocation
- **Part 1 (35 min):** Launch Jupyter-JSC and environment setup
- **Part 2 (35 min):** Git fundamentals and repository cloning
- **Part 3 (30 min):** Python virtual environments and kernels
- **Part 4 (20 min):** Hands-on practice (optional advanced topics)

## üéØ Learning Objectives

### Core (Essential - Everyone Should Complete)
- ‚úÖ Launch and connect to Jupyter-JSC session
- ‚úÖ Understand basic Git workflow (clone, pull)
- ‚úÖ Clone course repository
- ‚úÖ Configure Git identity
- ‚úÖ Create Python virtual environment
- ‚úÖ Register custom Jupyter kernel

### Optional (For Early Finishers)
- üîµ Git branching and merging
- üîµ Commit and push changes
- üîµ Resolve merge conflicts
- üîµ Jupyter extensions and customization
- üîµ Multiple environment management

---

## Section 1: Introduction to Jupyter-JSC (5 min)

### What is Jupyter-JSC?
**Jupyter-JSC** is a web-based JupyterLab environment that runs directly on JSC supercomputers:
- Access JupyterLab through your browser
- Launch compute jobs seamlessly
- No local GPU/CPU constraints
- Integrated with JURECA filesystem

### Advantages Over Local Jupyter
- ‚úÖ Direct access to HPC storage (no data transfer needed)
- ‚úÖ GPU-accelerated notebooks
- ‚úÖ Shared environment with teammates
- ‚úÖ Pre-installed scientific libraries
- ‚úÖ Automatic job scheduling

### Architecture
```
Your Browser ‚Üí Jupyter-JSC Portal ‚Üí SLURM Scheduler ‚Üí Compute Node (JupyterLab)
```

## Section 2: Launching Jupyter-JSC (10 min)

### Step 1: Access the Portal
1. Navigate to: https://jupyter-jsc.fz-juelich.de
2. Click **"Login"**
3. Select **"Judoor"** as authentication method
4. Enter your Judoor credentials

### Step 2: Configure Your Job
After login, you'll see a job configuration form:

**Required Settings:**
- **System:** `JURECA`
- **Project:** `training2600`
- **Partition:** `login` 
- **Time:** `120` (2 hour)
- **Nodes:** `1`

**Advanced Settings (Optional):**
- **Memory:** Leave default (RAM per node)
- **Reservation:** Leave blank
- **Environment:** `Default` (or select a custom kernel if available)

### Step 3: Launch
1. Click **"Start"**
2. Wait for job to start (usually 1-2 minutes)
3. You'll see a progress bar
4. Once ready, click **"Connect"**

### ‚ö†Ô∏è Troubleshooting
- **Job pending for >5 min:** System may be busy, wait or reduce resources
- **Job fails immediately:** Check project membership in Judoor
- **Can't connect:** Clear browser cache, try incognito mode

## Section 3: Git Basics (10 min)

### Why Git?
Git is a version control system that:
- Tracks changes in your code
- Enables collaboration
- Allows you to revert mistakes
- Synchronizes code across machines

### Key Concepts
- **Repository (repo):** A project folder tracked by Git
- **Commit:** A snapshot of your code at a point in time
- **Branch:** A parallel version of your code
- **Clone:** Copy a remote repository to your machine
- **Pull:** Download updates from remote
- **Push:** Upload your commits to remote

### Git Workflow
```
Clone ‚Üí Edit Files ‚Üí Stage Changes ‚Üí Commit ‚Üí Push
```

## Section 4: Cloning the Course Repository (5 min)

### From Jupyter-JSC Terminal
Once connected to Jupyter-JSC:

1. **Open a Terminal:**
   - Click **File ‚Üí New ‚Üí Terminal**

2. **Navigate to Your Workspace:**

In [None]:
# Run these commands in the Jupyter-JSC Terminal
# Make sure you have forked your version of the course repository on GitHub
# (Not in this notebook - we're showing the commands here for reference)

# Navigate to your project directory
!cd $PROJECT_training2600

# Create your workspace (if not already done in Lab 1)
!mkdir -p $USER
!cd $USER

# Clone your version of the course repository (which you have forked)
!git clone https://github.com/YOUR_ORG/iceland-ml-course.git

# Enter the repository
!cd iceland-ml-course

# Check repository status
!git status

### Configuration Git Identity
Set your Git identity (only needed once):

In [None]:
# Set your name and email
!git config --global user.name "Your Name"
!git config --global user.email "your.email@hi.is"

# Verify configuration
!git config --list | grep user

### Common Git Commands Reference

```bash
# Clone a repository
git clone <repo_url>

# Check status of your changes
git status

# Pull latest changes from remote
git pull

# Stage changes for commit
git add <file_name>        # Stage specific file
git add .                   # Stage all changes

# Commit staged changes
git commit -m "Your message here"

# Push commits to remote
git push

# View commit history
git log --oneline

# Create a new branch
git checkout -b <branch_name>

# Switch branches
git checkout <branch_name>
```

## Section 5: Creating a Python Kernel (5 min)

### Why Custom Kernels?
Jupyter kernels define the Python environment for your notebooks:
- Different projects need different packages
- Avoid dependency conflicts
- Reproducible environments

### Create a Virtual Environment

In [None]:
# Run these in the Jupyter-JSC Terminal

# Load Python module
module load Python/3.12.3

# Create virtual environment
python -m venv /p/project1/training2600/$USER/envs/ml_eo_course

# Activate environment
source /p/project1/training2600/$USER/envs/ml_eo_course/bin/activate

# Upgrade pip
pip install --upgrade pip

# Install essential packages
pip install ipykernel numpy pandas matplotlib jupyterlab rasterio earthengine-api torch

# Register kernel with Jupyter
python -m ipykernel install --user --name=ml_eo_course --display-name="Python (ML-EO Course)"

### Verify Kernel Installation
1. In JupyterLab, click **Kernel ‚Üí Change Kernel**
2. You should see **"Python (ML-EO Course)"** in the list
3. Select it to switch to your custom kernel

## Section 6: First Notebook on HPC (5 min)

### Test Your Environment
Let's verify everything works:

In [1]:
import sys
import os
import platform
import socket

print("="*50)
print("HPC Environment Check")
print("="*50)
print(f"Hostname: {socket.gethostname()}")
print(f"Python Version: {sys.version}")
print(f"Python Executable: {sys.executable}")
print(f"Platform: {platform.platform()}")
print(f"User: {os.getenv('USER')}")
print(f"Home: {os.getenv('HOME')}")
print(f"Project Dir: {os.getenv('PROJECT_training2600')}")
print(f"Scratch Dir: {os.getenv('SCRATCH')}")
print("="*50)

HPC Environment Check
Hostname: jrlogin04.jureca
Python Version: 3.12.3 (main, Jul 26 2024, 17:40:49) [GCC 13.3.0]
Python Executable: /p/project1/training2600/hashim1/envs/ml_eo_course/bin/python
Platform: Linux-5.14.0-570.42.2.el9_6.x86_64-x86_64-with-glibc2.34
User: hashim1
Home: /p/home/jusers/hashim1/jureca
Project Dir: /p/project1/training2600
Scratch Dir: /p/scratch/training2600


In [2]:
# Check installed packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")

Matplotlib created a temporary cache directory at /tmp/matplotlib-gdtdnool because the default path (/p/home/jusers/hashim1/jureca/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.


NumPy version: 1.26.4
Pandas version: 2.2.2
Matplotlib version: 3.9.2


In [5]:
# Simple computation test
data = np.random.randn(1000, 1000)
mean = np.mean(data)
std = np.std(data)

print(f"Generated 1000x1000 random matrix")
print(f"Mean: {mean:.6f}")
print(f"Std: {std:.6f}")

# Quick visualization
plt.figure(figsize=(8, 4))
plt.hist(data.flatten(), bins=50, alpha=0.7)
plt.title("Distribution of Random Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.grid(True, alpha=0.3)
plt.show()

Generated 1000x1000 random matrix
Mean: 0.002182
Std: 0.999746


## Hands-On Exercise: Create Your First Analysis Notebook

**Task:** Create a new notebook that:
1. Loads a sample CSV dataset
2. Performs basic statistical analysis
3. Creates a simple visualization
4. Saves results to your project directory

### Step-by-Step:

In [6]:
# 1. Create sample dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# Simulate satellite acquisition dates
dates = pd.date_range('2024-01-01', '2024-12-31', freq='16D')  # Sentinel-2 revisit
data = pd.DataFrame({
    'date': dates,
    'cloud_cover': np.random.uniform(0, 100, len(dates)),
    'ndvi_mean': np.random.uniform(0.2, 0.8, len(dates)),
    'scene_id': [f'S2A_TILE_{i:03d}' for i in range(len(dates))]
})

print("Sample Dataset:")
print(data.head())
print(f"\nTotal scenes: {len(data)}")

Sample Dataset:
        date  cloud_cover  ndvi_mean      scene_id
0 2024-01-01    18.684786   0.628798  S2A_TILE_000
1 2024-01-17    60.911040   0.799255  S2A_TILE_001
2 2024-02-02    86.555158   0.268527  S2A_TILE_002
3 2024-02-18     3.752319   0.561167  S2A_TILE_003
4 2024-03-05     9.498157   0.292190  S2A_TILE_004

Total scenes: 23


In [7]:
# 2. Statistical analysis
print("Statistical Summary:")
print(data[['cloud_cover', 'ndvi_mean']].describe())

# Filter clear scenes (cloud cover < 20%)
clear_scenes = data[data['cloud_cover'] < 20]
print(f"\nClear scenes (< 20% clouds): {len(clear_scenes)} / {len(data)} ({len(clear_scenes)/len(data)*100:.1f}%)")

Statistical Summary:
       cloud_cover  ndvi_mean
count    23.000000  23.000000
mean     54.538379   0.521742
std      30.867102   0.162734
min       3.752319   0.246807
25%      33.638570   0.425928
50%      50.888110   0.505253
75%      84.972430   0.633513
max      93.601188   0.799255

Clear scenes (< 20% clouds): 5 / 23 (21.7%)


In [8]:
# 3. Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Cloud cover over time
axes[0].plot(data['date'], data['cloud_cover'], marker='o', linestyle='-', alpha=0.6)
axes[0].axhline(y=20, color='r', linestyle='--', label='20% threshold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Cloud Cover (%)')
axes[0].set_title('Cloud Cover Timeline')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].tick_params(axis='x', rotation=45)

# Plot 2: NDVI distribution
axes[1].hist(data['ndvi_mean'], bins=20, alpha=0.7, edgecolor='black')
axes[1].set_xlabel('Mean NDVI')
axes[1].set_ylabel('Frequency')
axes[1].set_title('NDVI Distribution')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [9]:
# 4. Save results
output_dir = Path(os.getenv('PROJECT_training2600')) / os.getenv('USER') / 'results'
output_dir.mkdir(parents=True, exist_ok=True)

# Save filtered data
clear_scenes.to_csv(output_dir / 'clear_scenes.csv', index=False)
print(f"‚úÖ Saved clear scenes to: {output_dir / 'clear_scenes.csv'}")

# Save figure
fig.savefig(output_dir / 'analysis_plots.png', dpi=150, bbox_inches='tight')
print(f"‚úÖ Saved plots to: {output_dir / 'analysis_plots.png'}")

‚úÖ Saved clear scenes to: /p/project1/training2600/hashim1/results/clear_scenes.csv
‚úÖ Saved plots to: /p/project1/training2600/hashim1/results/analysis_plots.png


## Summary & Next Steps

### What We Covered
‚úÖ Launched Jupyter-JSC and connected to JURECA  
‚úÖ Understood Git basics and cloned repository  
‚úÖ Created a custom Python kernel  
‚úÖ Ran first notebook on HPC  
‚úÖ Performed analysis and saved results  

### Key Takeaways
- Jupyter-JSC provides seamless HPC access via browser
- Git enables version control and collaboration
- Custom kernels isolate dependencies
- Always save important data to `$PROJECT` or `$SCRATCH`

### Git Cheat Sheet
```bash
git status           # Check what changed
git pull             # Get latest updates
git add .            # Stage all changes
git commit -m "msg"  # Commit with message
git push             # Upload to remote
```

### Prepare for Lab 3
Next lab: **Google Earth Engine & Sentinel-2 Data Acquisition**
- Create GEE account beforehand: https://earthengine.google.com/signup
- Review Sentinel-2 basics: https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2

### Additional Resources
- **Git Tutorial:** https://git-scm.com/docs/gittutorial
- **Jupyter-JSC Docs:** https://apps.fz-juelich.de/jsc/hps/jupyter/
- **Python Virtual Environments:** https://docs.python.org/3/tutorial/venv.html

---

**Great work!** You're now ready to acquire real Earth Observation data! üåç

---

## üîµ OPTIONAL: Advanced Topics (For Early Finishers)

## üîµ Advanced Topic 1: Git Branching and Merging

### Why Use Branches?
- Work on features without affecting main code
- Experiment safely
- Collaborate without conflicts

### Basic Branching

In [None]:
# Create a new branch
!git branch my-feature

# Switch to the branch
!git checkout my-feature

# Or create and switch in one command
!git checkout -b my-feature

# Make changes, then commit
!git add .
!git commit -m "Add new feature"

# Switch back to main
!git checkout main

# Merge feature branch into main
!git merge my-feature

# Delete branch when done
!git branch -d my-feature

## üîµ Advanced Topic 2: Git Commit and Push

### Making Changes

In [None]:
# Check status of your repo
!git status

# Stage specific files
!git add file1.py file2.py

# Stage all changes
!git add .

# Commit with message
!git commit -m "Descriptive message about changes"

# Push to remote
!git push origin main

# Pull latest changes
!git pull origin main

### Writing Good Commit Messages
```
‚úÖ Good: "Add preprocessing function for Sentinel-2 normalization"
‚úÖ Good: "Fix bug in patch extraction causing IndexError"
‚ùå Bad: "fixed stuff"
‚ùå Bad: "update"
```

**Format:**
- Use imperative mood ("Add feature" not "Added feature")
- Keep first line under 50 characters
- Explain *what* and *why*, not *how*

## üîµ Advanced Topic 3: Resolving Merge Conflicts

### What is a Merge Conflict?
Happens when Git can't automatically merge changes (e.g., two people edited the same line).

### How to Resolve

In [None]:
# Try to merge/pull and get conflict
!git pull origin main

# Git will show conflicts like:
# <<<<<<< HEAD
# Your changes
# =======
# Their changes
# >>>>>>> branch-name

# Edit the file manually to resolve
# Remove conflict markers and choose what to keep

# Stage resolved files
!git add resolved_file.py

# Complete the merge
!git commit -m "Resolve merge conflict in resolved_file.py"

## üîµ Advanced Topic 4: Jupyter Extensions

### Useful JupyterLab Extensions

1. **Table of Contents**
   - Navigate long notebooks easily
   - Usually pre-installed

2. **Variable Inspector**
   - See all variables in memory
   - Install: `pip install lckr-jupyterlab-variableinspector`

3. **Code Formatter (Black)**
   - Auto-format Python code
   - Install: `pip install jupyterlab-code-formatter black`

4. **Git Extension**
   - Visual Git interface in JupyterLab
   - Usually pre-installed

### Enable Extensions

In [None]:
# Install in your virtual environment
!pip install lckr-jupyterlab-variableinspector
!pip install jupyterlab-code-formatter black isort

# Restart JupyterLab to see new extensions

## üîµ Advanced Topic 5: Jupyter Keyboard Shortcuts

### Essential Shortcuts

**Command Mode** (press `Esc` to enter):
- `A` - Insert cell above
- `B` - Insert cell below
- `D D` - Delete cell
- `M` - Change to Markdown
- `Y` - Change to Code
- `Shift + Up/Down` - Select multiple cells

**Edit Mode** (press `Enter` to enter):
- `Ctrl + Shift + -` - Split cell at cursor
- `Tab` - Code completion
- `Shift + Tab` - Show documentation

**Both Modes:**
- `Shift + Enter` - Run cell and select next
- `Ctrl + Enter` - Run cell
- `Alt + Enter` - Run cell and insert below

### Practice!
Try these shortcuts while working through the course notebooks.

## üîµ Advanced Topic 6: Managing Multiple Environments

### Why Multiple Environments?
- Different projects need different package versions
- Test compatibility
- Keep environments clean

### Create Multiple Environments

In [None]:
# Create environments for different purposes
!python -m venv ~/envs/ml_eo_course
!python -m venv ~/envs/data_analysis
!python -m venv ~/envs/experiments

# Register each as a kernel
!source ~/envs/ml_eo_course/bin/activate && python -m ipykernel install --user --name=ml_eo_course --display-name="ML EO Course"
!source ~/envs/data_analysis/bin/activate && python -m ipykernel install --user --name=data_analysis --display-name="Data Analysis"
!source ~/envs/experiments/bin/activate && python -m ipykernel install --user --name=experiments --display-name="Experiments"

### List and Remove Kernels

In [None]:
# List all available kernels
!jupyter kernelspec list

# Remove a kernel
!jupyter kernelspec uninstall kernel_name

---

## ‚úÖ Lab 2 Completion Checklist

### Core Tasks (Must Complete)
- [ ] Successfully launched Jupyter-JSC session
- [ ] Cloned course repository
- [ ] Configured Git identity (name and email)
- [ ] Created Python virtual environment
- [ ] Registered custom Jupyter kernel
- [ ] Tested kernel in a notebook

### Optional Tasks (If Time Permits)
- [ ] Created a Git branch
- [ ] Made a commit
- [ ] Installed Jupyter extensions
- [ ] Learned keyboard shortcuts
- [ ] Created multiple environments

---

## üìù Homework / Async Learning
- Practice Git commands (there are many online tutorials)
- Explore JupyterLab interface and features
- Install any extensions you find useful
- Review Git cheat sheet (provided in course materials)

## üöÄ Next Lab Preview
**Lab 3: Google Earth Engine & Sentinel-2 Acquisition**
- Authenticate with Google Earth Engine
- Define AOI in Iceland
- Query and download Sentinel-2 imagery

**Pre-Lab Action Required:**
- Apply for Google Earth Engine account (takes 1-2 days!)
- Visit: https://earthengine.google.com/signup