# 2.2 AI-Assisted Coding in Google Colab with Gemini

## Course 3: Advanced Classification Models for Student Success

## Introduction

### What Is Google Colab?

**Google Colab** (Colaboratory) is a free, cloud-based Jupyter notebook environment hosted by Google. It provides:

- **Free GPU/TPU access** for computationally intensive tasks
- **Zero setup** — runs entirely in your browser, no installation needed
- **Google Drive integration** — save, share, and collaborate on notebooks seamlessly
- **Pre-installed libraries** — pandas, scikit-learn, matplotlib, and more are ready to use

### What Is Gemini in Colab?

**Gemini** is Google's AI assistant built directly into Google Colab. Unlike Codex (which is a separate tool), Gemini lives *inside* the notebook environment where you're already writing code. It can:

- **Generate code cells** from natural language descriptions
- **Explain existing code** — highlight a cell and ask "what does this do?"
- **Debug errors** — when a cell throws an error, Gemini can diagnose and fix it
- **Autocomplete code** — suggests completions as you type (like GitHub Copilot)
- **Chat in a side panel** — ask questions without leaving your notebook

### Why Google Colab + Gemini?

For students and institutional researchers, Colab + Gemini offers a compelling combination:

1. **Free tier available** — no subscription required for basic use
2. **Familiar notebook interface** — if you've used Jupyter, you know Colab
3. **AI is embedded** — no context switching between tools
4. **Shareable** — share notebooks like Google Docs with colleagues
5. **Institutional friendly** — many universities already use Google Workspace

### Learning Objectives

1. Set up and navigate Google Colab for data science workflows
2. Use Gemini to generate, explain, and debug code inside Colab
3. Apply the vibecoding workflow (prompt → review → iterate → validate) in Colab
4. Practice building ML models using Gemini-assisted coding

## 1. Getting Started with Google Colab

### Accessing Colab

1. Go to [colab.research.google.com](https://colab.research.google.com)
2. Sign in with your Google account (university or personal)
3. Click **"New Notebook"** to create a blank notebook
4. To enable Gemini: click the **Gemini icon** (✨) in the toolbar or side panel

### The Colab Interface

```
┌──────────────────────────────────────────────────────┐
│  Google Colab                          [Gemini ✨]    │
├──────────────────────────────────────────────────────┤
│                                       │              │
│  [+ Code]  [+ Text]                  │  Gemini      │
│                                       │  Chat Panel  │
│  ┌─────────────────────────────┐     │              │
│  │  Code Cell 1                │     │  Ask me      │
│  │  import pandas as pd       │     │  anything... │
│  │  df = pd.read_csv(...)     │     │              │
│  └─────────────────────────────┘     │              │
│                                       │              │
│  ┌─────────────────────────────┐     │              │
│  │  Code Cell 2                │     │              │
│  │  # Your analysis here      │     │              │
│  └─────────────────────────────┘     │              │
│                                       │              │
└──────────────────────────────────────────────────────┘
```

### Uploading Data to Colab

There are three ways to get your data into Colab:

**Option A: Upload directly**
```python
from google.colab import files
uploaded = files.upload()  # Opens a file picker dialog
```

**Option B: Mount Google Drive**
```python
from google.colab import drive
drive.mount('/content/drive')
# Then read from: /content/drive/MyDrive/your_folder/training.csv
```

**Option C: URL download**
```python
!wget https://your-url.com/training.csv
```

## 2. Using Gemini in Colab

### Method 1: The Gemini Chat Panel

Click the **Gemini icon (✨)** to open the side panel. Type natural language requests:

```
"Load training.csv, create a binary DEPARTED column where SEM_3_STATUS != 'E'
means departed, and show summary statistics."
```

Gemini will generate a code cell you can insert directly into your notebook.

### Method 2: Generate Code in a Cell

1. Click **"+ Code"** to add a new cell
2. Start typing a comment describing what you want:
```python
# Build a logistic regression with L2 regularization to predict DEPARTED
```
3. Gemini autocompletes the code below your comment
4. Press **Tab** to accept, or keep typing to refine

### Method 3: Explain & Debug

- **Explain**: Select a code cell → right-click → **"Explain this code"**
- **Debug**: When a cell produces an error → click **"Fix with Gemini"** in the error output
- **Optimize**: Select code → ask Gemini to "make this more efficient"

### Example: Building a Model with Gemini

Here's how a typical Gemini interaction looks in Colab:

**You type in the Gemini panel:**
```
I have a CSV called training.csv with student records. The target is
SEM_3_STATUS where 'E' = enrolled. I want to:
1. Load the data
2. Create a binary DEPARTED column (1 if SEM_3_STATUS != 'E')
3. Select features: HS_GPA, UNITS_ATTEMPTED_1, UNITS_COMPLETED_1,
   DFW_UNITS_1, GPA_1, DFW_RATE_1
4. Scale features with StandardScaler
5. Build a logistic regression with L2, C=0.1
6. Show AUC, F1, precision, recall, and a confusion matrix
```

**Gemini generates a code cell** that you insert into your notebook, review, and run.

## 3. The CRISP Framework in Colab

The same CRISP prompting framework from notebook 2.1 works perfectly with Gemini:

| Letter | Meaning | Colab Example |
|:-------|:--------|:-------------|
| **C** | Context | "I have training.csv uploaded to Colab with student records..." |
| **R** | Role | "Help me as an institutional research analyst..." |
| **I** | Instructions | "Build a Random Forest classifier to predict student departure..." |
| **S** | Specifics | "Use 200 trees, max_depth=12, StandardScaler on features..." |
| **P** | Product | "Show a classification report, ROC curve, and feature importance bar chart" |

### Colab-Specific Prompting Tips

1. **Reference your cell outputs**: "The dataframe from cell 3 has 5,000 rows — now build a model on it"
2. **Ask for inline comments**: "Generate the code with comments explaining each step"
3. **Request visualizations**: Gemini can generate matplotlib/seaborn plots directly in the notebook
4. **Chain cells**: "Now take the model from the previous cell and plot SHAP values"
5. **Use the error context**: When debugging, Gemini can see the error traceback automatically

## 4. Hands-On Practice: Full Workflow in Colab

Follow these steps in your own Colab notebook to practice the vibecoding workflow.

### Step 1: Setup and Data Loading

Open the Gemini panel and type:
```
Help me set up a machine learning analysis. I need to:
- Import pandas, numpy, sklearn, matplotlib, and seaborn
- Upload a CSV file called training.csv
- Display the first 5 rows and shape of the data
```

### Step 2: Feature Engineering

```
The target variable is SEM_3_STATUS. Create a binary column called DEPARTED
where 1 = departed (SEM_3_STATUS != 'E') and 0 = enrolled (SEM_3_STATUS == 'E').
Show the class distribution.
```

### Step 3: Model Building

```
Build three models to predict DEPARTED using these features:
HS_GPA, UNITS_ATTEMPTED_1, UNITS_COMPLETED_1, DFW_UNITS_1, GPA_1, DFW_RATE_1

Models:
1. Logistic Regression with L2 penalty, C=0.1
2. Random Forest with 200 trees
3. XGBoost with learning_rate=0.1

For each model: scale features with StandardScaler, use an 80/20 train-test split,
and show AUC, F1, precision, and recall.
```

### Step 4: Visualization

```
Plot all three ROC curves on the same chart with a legend.
Then create a bar chart comparing AUC scores across the three models.
```

### Step 5: Interpretation

```
For the best-performing model, show:
1. Feature importance (bar chart)
2. Confusion matrix (heatmap)
3. A brief written summary of results suitable for a provost
```

## 5. Colab-Specific Features for Data Science

### Free GPU for Larger Analyses

For computationally intensive tasks (large datasets, XGBoost with many trees, neural networks):

1. Go to **Runtime → Change runtime type**
2. Select **T4 GPU** (free tier) or **A100** (Colab Pro)
3. This accelerates training significantly for tree-based ensembles and neural networks

### Saving and Sharing

- **Auto-save to Google Drive**: Notebooks save automatically
- **Share with collaborators**: Click "Share" → add email addresses (just like Google Docs)
- **Download as .ipynb**: File → Download → Download .ipynb (compatible with Jupyter)
- **Version history**: File → Revision history (track changes over time)

### Installing Additional Packages

Colab comes with most data science packages pre-installed. For extras:

```python
!pip install xgboost shap lightgbm
```

> **Note**: Package installations reset when the runtime disconnects. Add install commands at the top of your notebook so they run on every session.

### Secrets Management

For API keys or credentials (never hardcode in notebooks):

1. Click the **🔑 key icon** in the left sidebar
2. Add secrets as key-value pairs
3. Access in code:
```python
from google.colab import userdata
api_key = userdata.get('MY_API_KEY')
```

## Summary

### Key Takeaways

1. **Google Colab** is a free, cloud-based notebook environment — no setup required
2. **Gemini** is built directly into Colab, providing AI assistance without context switching
3. The **same CRISP framework** from Codex (notebook 2.1) works with Gemini
4. Colab offers **free GPU access**, Google Drive integration, and easy sharing
5. The vibecoding workflow — prompt → review → iterate → validate — is identical

### What's Next

In **notebook 2.3**, we compare Codex and Colab + Gemini side by side so you can choose the right tool for your workflow.

**Proceed to:** `2.3 Choosing Your AI Coding Tool`