# Mod 7 L2 Code Assignment: Prompt Engineering Practice 

### Goal: 
Practice writing reliable prompts for data modeling tasks (no coding).
### Dataset context: 
One local CSV named hvfhv_2023-07_sample.csv, read with nrows=5000 **(if your csv file is a different name be sure to update that piece of the prompt)**
### Model context (already known): 
Multiple Linear Regression on total_amount.

### You’ll write and run four prompt types:

- Planning

- Instruction

- Role (persona)

- Verification & Red-Team (self-check + stress test)

## REQUIRED: Pre-Setup (PLEASE READ)
- Open up Google Colab 
- Upload your data file into Google Colab
- The planning prompt should tell you what libraries to use and how to do a pd.read_csv with nrows=5000 but if not you may have to add this to the notebook 

## Prompting Resources

- **OpenAI Prompt Engineering Guide (free):** https://platform.openai.com/docs/guides/prompt-engineering  
- **Anthropic Prompt Library:** https://www.anthropic.com/prompt-library  
- **Google: Gemini Prompting (best practices):** https://ai.google.dev/gemini-api/docs/prompting  
- **DeepLearning.AI Short Course: Prompt Engineering:** https://www.deeplearning.ai/short-courses/  
- **Cohere Prompting & RAG Basics:** https://docs.cohere.com/docs/prompting-overview


## Instructor + Class: Planning Prompt

**Purpose:** Get a concise, auditable plan before any code is generated.  
**What to do:** Paste this prompt into your AI assistant (Colab Gemini). Read the plan together and refine if anything is missing. Then **share thoughts about the output**

**Planning Prompt (copy/paste ALL the text below including the requirements):**

First output a concise 5-step plan (one line per step) to avoid data leakage and dtype errors for a single-CSV HVFHV regression task using only the first 5,000 rows.

Requirements:

- CSV name: hvfhv_2023-07_sample.csv (local file)

- Use columns exactly: pickup_datetime, trip_miles, trip_time, base_passenger_fare, tips, total_amount

- Engineer exactly: miles_sq = trip_miles**2, miles_time = trip_miles*trip_time

- Split: train_test_split with test_size=0.2, random_state=42

- Use a Pipeline with: SimpleImputer(median) -> StandardScaler -> LinearRegression

- All transforms occur inside the Pipeline (no leakage)

## You Do (Persona, Verification, Red-Team Prompts)

### You Do: Role Prompt (Persona)

**Purpose:** Make the AI format and comment like a senior analytics engineer—concise and auditable.

**Prompt (copy/paste) and share output cell underneath the prompt:**

Adopt the role of a senior analytics engineer. Communication rules:

- Be concise and auditable

- Use explicit headings for each section

- Add brief in-line comments for any non-obvious step

- Avoid renaming columns or inventing fields

Apply this role to the prior Instruction Prompt. Re-emit the single Python code cell under these communication rules. Output a single cell only, no extra prose.

### You Do: Verification Prompt (Self-Check)

**Purpose:** Append checks that catch common failures before you run.

**Prompt (copy/paste) and share output cell underneath the prompt:**

Append a "# Self-Check" section to the single code cell that prints:

- Total rows after cleaning and % rows dropped due to NA in model columns

- Confirmation that all transforms occur inside the Pipeline (no leakage)

- dtype report for X and y confirming numeric dtypes only

- A simple note if the residuals plot shows potential heteroscedasticity (cone shape) and one mitigation (e.g., log-transform y, add interaction)

- If any check fails, revise the code within the same cell and reprint the Self-Check.


Output: Single updated Python cell only.

### You Do: Red-Team Prompt (Stress Test)

**Purpose:** Identify realistic production risks and add one-line guards.

**Prompt (copy/paste) and share output underneath the prompt:**

List three realistic production failure modes for this pipeline (e.g., missing columns, negative or zero trip_miles, schema drift/date parsing).
Then modify the same single Python code cell to add one-line guards or asserts for each.
Keep guards minimal and readable.
Reprint the final single code cell only.

## We Share: Reflection (3–5 minutes)

- **Which prompt** (Instruction, Role, Verification, Red-Team) changed the AI’s output the most? How?
- **One improvement** you’d make to your prompts next time (be specific).