<a href="https://colab.research.google.com/github/louissiller/mgmt467-analytics-portfolio/blob/main/Week2_1_Prompt_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Week 2.1 — Prompt Practice: Git, GitHub, and Google Colab

**Course:** MGMT 467 — AI‑Assisted Big Data Analytics in the Cloud  
**Session:** Tuesday (2.1) — Developer Environment Setup

### How to use this notebook
- This is a **practice and planning** notebook: most cells are **Markdown** with copy‑pasteable prompt templates you will run in your AI tool (e.g., Gemini).  
- After you run a prompt in your AI tool, **summarize what you learned** in the provided **Reflection** cells here.  
- When a task asks for a short code snippet (e.g., Git or Colab), paste the **final, validated** snippet in the designated cell and add a one‑sentence explanation.

> **Validate everything.** Cross‑check AI outputs with official docs or a second prompt. If two sources disagree, note it and explain which you chose and why.



---
## Prompt Patterns Quick Reference

Use these as starting points and **adapt** them to your context.

### 1) Zero‑Shot (definition/explanation)
```
Act as a clear, concise tutor for first‑year CS students.
Explain {TOPIC} in 5 bullet points max. Include one analogy and one pitfall to avoid.
```

### 2) Few‑Shot (guided answers consistent with examples)
```
You will answer in the same style as the examples.

Q: What is a "commit" in Git?  
A: A snapshot of tracked file changes with a message explaining why.

Q: What is "pushing" in Git?  
A: Sending local commits to a remote repository so others can see them.

Q: {YOUR QUESTION}
A:
```

### 3) Step‑by‑Step Reasoning (show key steps)
```
I need a **numbered, step‑by‑step plan** for {TASK}.
For each step: the goal, one command (if applicable), and a 1‑line verification check.
Avoid hidden steps; keep it to 6–8 steps total.
```



---
## Group A — Git Fundamentals (3 questions)

### A1. What problem does Git solve? How is it different from file syncing?
**Use:** Zero‑Shot, then Few‑Shot for refinement.  
**Run this prompt:**
```
Act as a version control coach.
Explain what Git is and the specific problem it solves compared to simple file syncing (e.g., Drive).
List 3 concrete benefits for a small analytics team.
End with a 2‑sentence analogy.
```
**Reflection (2–4 sentences):** What did you learn that you didn’t already know?


I learned that Git is a distributed version control system that allows for collaborative software development and managinng of complex project histories. With Git, working together in large projects becomes less complex as changes can be tracked without overwriting content created by other developers.


### A2. Commit → Branch → Merge: the minimal workflow
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Create a minimal, step‑by‑step workflow to:
1) initialize a repo, 2) create and switch to a feature branch, 3) commit changes,
4) merge back to main locally, 5) push to a remote named "origin".
For each step include: goal, command, and a quick verification.
```
**Paste final validated commands below and add one sentence on when to branch.**


In [7]:

# Paste your validated minimal Git workflow commands here as comments, e.g.:
# git init
# git checkout -b feature/your-feature-name
# git add .
# git commit -m "Descriptive commit message"
# git status
# git log
# git checkout main
# git merge feature/your-feature-name --no-ff
# git log
# git remote add origin <REMOTE_URL>
# git push -u origin main


Always branch for any significant new feature, bug fix, or experiment to keep your work isolated from your main codebase and facilitate collaboration.


### A3. Resolving a simple merge conflict
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
I have a merge conflict in README.md after merging a feature branch into main.
Give a 6-step recipe to resolve it safely:
- how to open the file, identify conflict markers, choose/merge lines,
- add/commit the resolution, verify the merge, and push.
Include one common pitfall and how to avoid it.
```
**Reflection:** What’s your personal checklist to avoid conflicts getting messy?


Personal checklist to avoid conflicts getting messy:

- Pull the latest changes from main before starting new work on a feature branch.
- Make small, frequent commits with clear messages.
- Merge from main into your feature branch regularly to integrate upstream changes early.
- Communicate with your team about what you are working on to avoid multiple people modifying the same parts of a file simultaneously.


---
## Group B — GitHub Collaboration (3 questions)

### B1. Branch vs. Fork vs. Clone
**Use:** Few‑Shot to drive crisp distinctions with examples.  
**Run this prompt:**
```
Answer using this format:
Term — One-sentence definition — When to use — One example.

Branch —
Fork —
Clone —
```
**Reflection:** Which one will your team use for this course and why?


Our team will use Branch and Fork. Branch is a separate line of development within a single repository that we use for working on a new feature, bug fix, or experiment in isolation. Fork is a copy of a repository under a different user or account, used when we want to start our project starting from a repository.


### B2. Pull Request (PR) checklist for this course
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Write a "PR Checklist" for a university analytics course team repo.
Include: naming convention, description template, screenshots policy, reviewers, CI checks (if any),
and a revert plan. Limit to 8 concise checklist items.
```
**Paste your final checklist below.**


In [8]:

# Example (edit to your team's needs)
pr_checklist = [
    "PR title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)",
    "Description includes: problem, approach, key files, and how to test",
    "Attach 1–2 screenshots (plots/dashboards) if visuals changed",
    "Link related issue or assignment requirement",
    "Request review from 1 teammate; no self-merge",
    "Passes notebook re-run without errors (Runtime > Run all)",
    "No secrets, tokens, or PII in code or outputs",
    "Revert plan: how to roll back quickly if needed"
]
pr_checklist


['PR title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)',
 'Description includes: problem, approach, key files, and how to test',
 'Attach 1–2 screenshots (plots/dashboards) if visuals changed',
 'Link related issue or assignment requirement',
 'Request review from 1 teammate; no self-merge',
 'Passes notebook re-run without errors (Runtime > Run all)',
 'No secrets, tokens, or PII in code or outputs',
 'Revert plan: how to roll back quickly if needed']

Here is a suggested PR Checklist for your team's repository:

*   **PR Title:** Descriptive, following a pattern like `<unit>-<lab>-<short-desc>` (e.g., `u1-lab2-eda-trends`).
*   **Description:** Briefly explains the problem solved, the approach taken, key files modified, and how to test the changes.
*   **Visuals (if applicable):** Include 1–2 screenshots (plots, dashboards) if the PR introduces or changes visualizations.
*   **Linked Issue/Assignment:** Clearly link to the relevant course assignment or team issue being addressed.
*   **Reviewers:** Request review from at least one designated teammate; avoid self-merging.
*   **CI Checks (if implemented):** Ensure all automated checks (like code formatting or basic tests) pass.
*   **Secrets/PII:** Verify that no secrets, API keys, or Personally Identifiable Information are accidentally included in code or output.
*   **Revert Plan:** Briefly describe how to quickly revert the changes if necessary (e.g., "revert this PR").


### B3. Protected `main` workflow
**Use:** Zero‑Shot + Step‑by‑Step.  
**Run this prompt:**
```
Explain how to protect the main branch in a GitHub repo for a class team:
- Require PRs, at least one review, and passing checks
- Disallow force-pushes
Provide a numbered setup guide and a 3-line "why this matters" explanation.
```
**Reflection:** Which protection rules will you actually enable first, and why?


The first rule to enable is "Require a pull request before merging" in the branch name pattern main. Afterwards we should enable "Require approvals", followed by "Require status checks to pass before merging", and "Require linear history". Also check nox for "include administrators" and uncheck the box for "Allow force pushes".


---
## Group C — Google Colab for Analytics (3 questions)

### C1. Why Colab? Benefits & limits for this course
**Use:** Zero‑Shot.  
**Run this prompt:**
```
Act as a data science tech advisor.
List 5 advantages and 3 limitations of Google Colab for analytics coursework.
Tailor to a class that uses BigQuery and dashboards. Keep it to bullet points.
```
**Reflection:** Which two advantages will help *you* most this semester?


1. Easy sharing and collaboration: Allows to share notebooks with my peers to facilitate team projects and instructor feedback.
2. Pre-installed libraries: Reduces time spent on package management and allows usage of data science libraries.


### C2. Authenticate to GCP in Colab and query BigQuery
**Use:** Step‑by‑Step Reasoning for a minimal working snippet.  
**Run this prompt:**
```
Provide a minimal Colab snippet to:
1) authenticate to Google Cloud,
2) run a simple BigQuery SQL (e.g., SELECT 1),
3) get results into a pandas DataFrame,
4) print row count.
Include a one-line note on costs and safe use of LIMIT.
```
**Paste your final validated code below.**


In [9]:
# 1) Authenticate to Google Cloud
# This will prompt you to log in and grant permissions
from google.colab import auth
auth.authenticate_user()

# 2) Run a simple BigQuery SQL and get results into a pandas DataFrame
from google.cloud import bigquery
# Replace <YOUR_PROJECT_ID> with your actual GCP project ID
client = bigquery.Client(project="mgmt467-71800")

# Use a simple query, e.g., SELECT 1
# For larger queries, consider using LIMIT during exploration to control costs and memory usage.
# BigQuery costs are based on the amount of data processed by your queries.
sql = "SELECT 1 AS test_col" # Minimal example
# Or use a sample from your data:
# sql = "SELECT * FROM `your_project_id.your_dataset.your_table` LIMIT 10" # Example query

df = client.query(sql).result().to_dataframe()

# 3) Print row count
print("Rows:", len(df))

# 4) Display the first few rows
display(df.head())

Rows: 1


Unnamed: 0,test_col
0,1


In [10]:

# Minimal BigQuery test in Colab (paste your validated version)
# from google.colab import auth
# auth.authenticate_user()
#
# from google.cloud import bigquery
# client = bigquery.Client(project="<YOUR_PROJECT_ID>")
# sql = "SELECT 1 AS test_col"
# df = client.query(sql).result().to_dataframe()
# print("Rows:", len(df))
# df.head()



### C3. Save notebooks to GitHub from Colab
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Give two safe workflows to keep Colab notebooks versioned in GitHub:
(A) using "File > Save a copy in GitHub",
(B) local git with Drive sync (brief).
Provide steps and cautions (e.g., large outputs, secrets) for each.
```
**Reflection:** Which workflow will your team adopt and why?


The team will adopt workflow (A), because it is good for simple saves directly from Colab. It is a sufficient way for basic versioning due to its simplicity.


---
## Capstone Synthesis (end of class)

**Scenario:** Your team needs a reproducible workflow for this course: team repo on GitHub, branching, Colab auth to BigQuery, and a PR checklist.

**Run this prompt:**
```
Act as a DevEx lead for a university analytics team.
Produce a one-page "Runbook" with:
- Repo structure (folders for notebooks, data, dashboards, docs)
- Branching model (who creates branches, when to merge)
- Colab ↔ BigQuery quickstart (auth, sample query, cost-safe LIMIT)
- PR checklist (max 8 bullets) and protection rules for main
- Two risks + mitigations (e.g., secrets leakage, merge conflicts)
Use concise bullets and keep it classroom-ready.
```

**Paste your final runbook below (or attach as a Markdown file in your repo) and add a 3‑bullet reflection on what you changed after validation.**


# University Analytics Team Runbook

This runbook provides a quick guide to our team's development workflow using GitHub, Colab, and BigQuery.

## 1. Repository Structure

Organize your project within the repository with clear folders:

*   `/notebooks`: For Colab notebooks (`.ipynb`). Organize further by unit or lab (e.g., `/notebooks/unit1/lab2`).
*   `/data`: For small datasets used directly in notebooks (avoid committing large files; use cloud storage for large data).
*   `/dashboards`: For dashboard definitions or code (e.g., Streamlit apps, Looker Studio models).
*   `/docs`: For project documentation, reports, or presentations.
*   `/src`: (Optional) For reusable Python scripts or modules.

## 2. Branching Model

We will use a simple feature branching model:

*   **`main` Branch:** This branch is protected and represents the stable, working version of our project. No direct commits are allowed.
*   **Feature Branches:** For every new feature, bug fix, or experiment, create a new branch off of `main`. Name branches descriptively (e.g., `feature/add-eda-plots`, `fix/bq-auth-error`).
*   **Who Creates Branches:** Any team member working on a task creates their own feature branch.
*   **When to Merge:** Merge your feature branch into `main` only after it has been reviewed and approved via a Pull Request and passes all checks.

## 3. Colab ↔ BigQuery Quickstart

Connecting Colab to BigQuery:

1.  **Authenticate:** Run the following in a code cell:

In [11]:
from google.colab import auth
auth.authenticate_user()

# 2) Run a simple BigQuery SQL and get results into a pandas DataFrame
from google.cloud import bigquery
# Replace <YOUR_PROJECT_ID> with your actual GCP project ID
client = bigquery.Client(project="mgmt467-71800")

# Use a simple query, e.g., SELECT 1
# For larger queries, consider using LIMIT during exploration to control costs and memory usage.
# BigQuery costs are based on the amount of data processed by your queries.
sql = "SELECT 1 AS test_col" # Minimal example
# Or use a sample from your data:
# sql = "SELECT * FROM `your_project_id.your_dataset.your_table` LIMIT 10" # Example query

df = client.query(sql).result().to_dataframe()

# 3) Print row count
print("Rows:", len(df))

# Display the first few rows
display(df.head())

Rows: 1


Unnamed: 0,test_col
0,1


## 4. Pull Request (PR) Checklist & Main Branch Protection

To maintain code quality and a stable `main` branch:

**PR Checklist (Before Requesting Review):**

*   **PR Title:** Descriptive (`<unit>-<lab>-<short-desc>`).
*   **Description:** Problem, approach, key files, testing notes.
*   **Visuals (if applicable):** Include 1–2 relevant screenshots.
*   **Linked Issue/Assignment:** Reference the relevant task.
*   **Passes Checks:** Notebook runs top-to-bottom without errors; CI checks pass.
*   **Clean Code:** No secrets, tokens, or PII in code or outputs.
*   **Revert Plan:** Briefly note how to revert if needed.

**Main Branch Protection Rules (Configured in GitHub Settings > Branches):**

*   Require a pull request before merging.
*   Require at least 1 approval.
*   Require status checks to pass (if configured).
*   Disallow force pushes.
*   Include administrators.

## 5. Risks and Mitigations

Here are two common risks and how to mitigate them:

*   **Risk:** Secrets or sensitive data (e.g., GCP credentials, API keys, PII) are accidentally committed to the repository.
    *   **Mitigation:** Never hardcode secrets directly in notebooks or scripts. Use Colab's Secrets manager or environment variables. Be cautious with printing variables that might contain sensitive data. Review PRs carefully for secrets.

*   **Risk:** Frequent or complex merge conflicts, especially in shared notebook files.
    *   **Mitigation:** Work on separate branches for distinct tasks. Pull the latest changes from `main` frequently into your feature branch. Make small, focused commits. Communicate with teammates about who is working on which files. Use `nbdime` locally to help visualize notebook diffs.

This list shows what has been changed in the code after validation or what could be change to further improve the runbook:

*   Replace the placeholder project ID: Change "mgmt467-71800" in the bigquery.Client line to your specific Google Cloud Project ID where your BigQuery data resides.
* Update the SQL query: Replace sql = "SELECT 1 AS test_col" with a query that selects data from your desired dataset and table in BigQuery. For example, sql = "SELECT * FROM \your_project_id.your_dataset.your_table` LIMIT 100"`
* Consider using LIMIT: As noted in the code comments, always use LIMIT during exploration to control the amount of data processed and manage costs. Adjust the LIMIT value based on your needs.





---
## Submission Checklist (to your team repo + Brightspace link)

- [ ] All **Reflection** sections completed (A1–A3, B1–B3, C1–C3, Capstone).
- [ ] Any code snippets pasted are **validated** and include a 1‑line explanation.
- [ ] Notebook runs top‑to‑bottom without errors (where code cells exist).
- [ ] Commit message: `week2.1-prompt-practice` and open a PR for review.
- [ ] Add this notebook path to your repo **README.md** under Week 2.1.
