<a href="https://colab.research.google.com/github/eugenechi/mgmt467-analytics-portfolio/blob/main/Week2_1_Prompt_Practice_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Week 2.1 — Prompt Practice: Git, GitHub, and Google Colab

**Course:** MGMT 467 — AI‑Assisted Big Data Analytics in the Cloud  
**Session:** Tuesday (2.1) — Developer Environment Setup

### How to use this notebook
- This is a **practice and planning** notebook: most cells are **Markdown** with copy‑pasteable prompt templates you will run in your AI tool (e.g., Gemini).  
- After you run a prompt in your AI tool, **summarize what you learned** in the provided **Reflection** cells here.  
- When a task asks for a short code snippet (e.g., Git or Colab), paste the **final, validated** snippet in the designated cell and add a one‑sentence explanation.

> **Validate everything.** Cross‑check AI outputs with official docs or a second prompt. If two sources disagree, note it and explain which you chose and why.



---
## Prompt Patterns Quick Reference

Use these as starting points and **adapt** them to your context.

### 1) Zero‑Shot (definition/explanation)
```
Act as a clear, concise tutor for first‑year CS students.
Explain {TOPIC} in 5 bullet points max. Include one analogy and one pitfall to avoid.
```

### 2) Few‑Shot (guided answers consistent with examples)
```
You will answer in the same style as the examples.

Q: What is a "commit" in Git?  
A: A snapshot of tracked file changes with a message explaining why.

Q: What is "pushing" in Git?  
A: Sending local commits to a remote repository so others can see them.

Q: {YOUR QUESTION}
A:
```

### 3) Step‑by‑Step Reasoning (show key steps)
```
I need a **numbered, step‑by‑step plan** for {TASK}.
For each step: the goal, one command (if applicable), and a 1‑line verification check.
Avoid hidden steps; keep it to 6–8 steps total.
```



---
## Group A — Git Fundamentals (3 questions)

### A1. What problem does Git solve? How is it different from file syncing?
**Use:** Zero‑Shot, then Few‑Shot for refinement.  
**Run this prompt:**
```
Act as a version control coach.
Explain what Git is and the specific problem it solves compared to simple file syncing (e.g., Drive).
List 3 concrete benefits for a small analytics team.
End with a 2‑sentence analogy.

Q: What is Git and what does it solve?
A: A distributed version control system that records point-in-time snapshots (“commits”) of your project, lets people work in parallel on branches, and then merges changes with a full, searchable history.

Reproducibility: Pin notebooks/SQL/env files to a commit or tag so last week’s model run can be re-created exactly; bisect history to find when a metric drift started.

Safe experimentation: Create a branch for a new feature or model, get diff-based code review in a PR, and merge without breaking main.

Traceability & speed: Use blame/diffs to see who changed a metric definition and why; conflicts are explicit (no silent overwrites), which reduces debugging time.

Analogy: Drive is like one shared Google Doc everyone edits at once—quick, but easy to trip over each other. Git is like each teammate drafting in their own notebook, then a librarian merges the best parts and archives every edition forever.
```
**Reflection (2–4 sentences):** What did you learn that you didn’t already know?

I used to think Drive style syncing and Git were basically the same, but I learned Git is built for true concurrent work—tracking changes line by line with authorship and safe branching/merging. I also didn’t realize how commits/tags plus env files make analyses reproducible and how blame/diff can pinpoint when a metric definition changed.


### A2. Commit → Branch → Merge: the minimal workflow
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Create a minimal, step‑by‑step workflow to:
1) initialize a repo, 2) create and switch to a feature branch, 3) commit changes,
4) merge back to main locally, 5) push to a remote named "origin".
For each step include: goal, command, and a quick verification.
```
**Paste final validated commands below and add one sentence on when to branch.**


In [None]:

# Paste your validated minimal Git workflow commands here as comments, e.g.:
# git init
# git checkout -b feature/readme-polish
# git add README.md
# git commit -m "Clarify setup steps"
# git checkout main
# git merge feature/readme-polish --no-ff
# git remote add origin <REMOTE_URL>
# git push -u origin main

# git init -b main
# git remote add origin <REMOTE_URL>
# git switch -c feature/readme-polish
# git add README.md
# git commit -m "Clarify setup steps"
# git switch main
# git merge feature/readme-polish --no-ff
# git push -u origin main
When to branch: Create a new branch before you start any scoped change (feature, bug fix, or doc tweak) so your work stays isolated and reviewable.


### A3. Resolving a simple merge conflict
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
I have a merge conflict in README.md after merging a feature branch into main.
Give a 6-step recipe to resolve it safely:
- how to open the file, identify conflict markers, choose/merge lines,
- add/commit the resolution, verify the merge, and push.
Include one common pitfall and how to avoid it.

Confirm the conflict
Goal: See unmerged files.

Open and resolve content
Goal: Inspect markers and choose/merge correct lines.

Stage the resolution
Goal: Mark the conflict as resolved.

Create the merge commit
Goal: Finalize the merge.

Double-check the result
Goal: Ensure the repo is clean and content is correct.

Push the resolved merge
Goal: Publish the fix.

Common pitfall (and fix): Blindly accepting one side (“ours/theirs”) drops changes—always manually merge, remove all markers, and skim the final README.md before staging.
```
**Reflection:** What’s your personal checklist to avoid conflicts getting messy?
Pull/rebase main before starting, keep branches short and commits small, run tests/linters, resolve carefully and search for leftover <<<<<<<, then re-read diffs once more before pushing.


---
## Group B — GitHub Collaboration (3 questions)

### B1. Branch vs. Fork vs. Clone
**Use:** Few‑Shot to drive crisp distinctions with examples.  
**Run this prompt:**
```
Answer using this format:
Term — One-sentence definition — When to use — One example.

Branch —
Fork —
Clone —

Branch — A separate line of development within a single repository. — Use when working on a new feature or bug fix without affecting the main project. — git checkout -b feature/my-new-feature
Fork — A copy of a repository under a different user's account. — Use when you want to contribute to a project you don't have push access to, or to use a project as a starting point for your own. — Creating a fork on GitHub.
Clone — Creating a local copy of a remote repository. — Use to get a working copy of a repository onto your local machine. — git clone <repository_url>
```
**Reflection:** Which one will your team use for this course and why?

We will experiment and see which is the most optimal for our team. At the moment, branch seems to make the most sense.



### B2. Pull Request (PR) checklist for this course
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Write a "PR Checklist" for a university analytics course team repo.
Include: naming convention, description template, screenshots policy, reviewers, CI checks (if any),
and a revert plan. Limit to 8 concise checklist items.
```
**Paste your final checklist below.**


In [None]:
# Example (edit to your team's needs)
pr_checklist = [
    "PR title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)",
    "Description includes: problem, approach, key files, and how to test",
    "Attach 1–2 screenshots (plots/dashboards) if visuals changed",
    "Link related issue or assignment requirement",
    "Request review from 1 teammate; no self-merge",
    "Passes notebook re-run without errors (Runtime > Run all)",
    "No secrets, tokens, or PII in code or outputs",
    "Revert plan: how to roll back quickly if needed"
]
pr_checklist


### B3. Protected `main` workflow
**Use:** Zero‑Shot + Step‑by‑Step.  
**Run this prompt:**
```
Explain how to protect the main branch in a GitHub repo for a class team:
- Require PRs, at least one review, and passing checks
- Disallow force-pushes
Provide a numbered setup guide and a 3-line "why this matters" explanation.

Go to your repository on GitHub.
Click on Settings.
In the left sidebar, click on Branches.
Under "Branch protection rules", click Add rule.
In the "Branch name pattern" field, type main.
Check the box for Require pull request reviews before merging.
Set "Required approving reviews" to 1.
(Optional) Check the box for Require status checks to pass before merging. You would need to set up CI checks (like running tests or linters) for this to be effective.
Check the box for Include administrators.
Check the box for Require linear history.
Check the box for Do not allow bypassing the above settings.
Click Create.
```
**Reflection:** Which protection rules will you actually enable first, and why?

I would first enable restrict deletion because of the fact that protecting and preserving your work is the most important.


---
## Group C — Google Colab for Analytics (3 questions)

### C1. Why Colab? Benefits & limits for this course
**Use:** Zero‑Shot.  
**Run this prompt:**
```
Act as a data science tech advisor.
List 5 advantages and 3 limitations of Google Colab for analytics coursework.
Tailor to a class that uses BigQuery and dashboards. Keep it to bullet points.

Advantages:
Free access to GPUs/TPUs: Accelerates computation for machine learning tasks.
Pre-installed libraries: Many common data science libraries are already available.
Easy sharing and collaboration: Notebooks can be easily shared and worked on collaboratively.
Integration with Google Drive and BigQuery: Seamless access to data stored in Drive and easy querying of BigQuery data.
No setup required: No need to install Python or libraries locally.

Limitations:
Session limits: Sessions have usage limits and can time out.
Dependency on internet connection: Requires a stable internet connection to work.
Limited storage: Free tier has limited storage space.
```
**Reflection:** Which two advantages will help *you* most this semester?

Colab being able to connect directly to drive and being a web notebook makes it extremely easy to use. I know for a fact it will save me a lot of time over the semester.


### C2. Authenticate to GCP in Colab and query BigQuery
**Use:** Step‑by‑Step Reasoning for a minimal working snippet.  
**Run this prompt:**
```
Provide a minimal Colab snippet to:
1) authenticate to Google Cloud,
2) run a simple BigQuery SQL (e.g., SELECT 1),
3) get results into a pandas DataFrame,
4) print row count.
Include a one-line note on costs and safe use of LIMIT.
```
**Paste your final validated code below.**


In [3]:

# Minimal BigQuery test in Colab (paste your validated version)
# from google.colab import auth
# auth.authenticate_user()
#
# from google.cloud import bigquery
# client = bigquery.Client(project="<YOUR_PROJECT_ID>")
# sql = "SELECT 1 AS test_col"
# df = client.query(sql).result().to_dataframe()
# print("Rows:", len(df))
# df.head()



### C3. Save notebooks to GitHub from Colab
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Give two safe workflows to keep Colab notebooks versioned in GitHub:
(A) using "File > Save a copy in GitHub",
(B) local git with Drive sync (brief).
Provide steps and cautions (e.g., large outputs, secrets) for each.
```
**Reflection:** Which workflow will your team adopt and why?



---
## Capstone Synthesis (end of class)

**Scenario:** Your team needs a reproducible workflow for this course: team repo on GitHub, branching, Colab auth to BigQuery, and a PR checklist.

**Run this prompt:**
```
Act as a DevEx lead for a university analytics team.
Produce a one-page "Runbook" with:
- Repo structure (folders for notebooks, data, dashboards, docs)
- Branching model (who creates branches, when to merge)
- Colab ↔ BigQuery quickstart (auth, sample query, cost-safe LIMIT)
- PR checklist (max 8 bullets) and protection rules for main
- Two risks + mitigations (e.g., secrets leakage, merge conflicts)
Use concise bullets and keep it classroom-ready.
```

**Paste your final runbook below (or attach as a Markdown file in your repo) and add a 3‑bullet reflection on what you changed after validation.**



---
## Submission Checklist (to your team repo + Brightspace link)

- [ ] All **Reflection** sections completed (A1–A3, B1–B3, C1–C3, Capstone).
- [ ] Any code snippets pasted are **validated** and include a 1‑line explanation.
- [ ] Notebook runs top‑to‑bottom without errors (where code cells exist).
- [ ] Commit message: `week2.1-prompt-practice` and open a PR for review.
- [ ] Add this notebook path to your repo **README.md** under Week 2.1.


**Reflection:** Which one will your team use for this course and why?

In [1]:
# Run this prompt to generate a PR checklist
# Write a "PR Checklist" for a university analytics course team repo.
# Include: naming convention, description template, screenshots policy, reviewers, CI checks (if any),
# and a revert plan. Limit to 8 concise checklist items.

Here's how to protect the `main` branch in your GitHub repo for a class team:

**Setup Guide:**

1.  Go to your repository on GitHub.
2.  Click on **Settings**.
3.  In the left sidebar, click on **Branches**.
4.  Under "Branch protection rules", click **Add rule**.
5.  In the "Branch name pattern" field, type `main`.
6.  Check the box for **Require pull request reviews before merging**.
7.  Set "Required approving reviews" to `1`.
8.  (Optional) Check the box for **Require status checks to pass before merging**. You would need to set up CI checks (like running tests or linters) for this to be effective.
9.  Check the box for **Include administrators**.
10. Check the box for **Require linear history**.
11. Check the box for **Do not allow bypassing the above settings**.
12. Click **Create**.

**Why this matters:**

*   Prevents accidental or unauthorized direct pushes to `main`, ensuring all changes go through review.
*   Requires at least one other team member to approve changes, improving code quality and catching potential errors.
*   Maintains a clean and stable `main` branch, making it easier to track the project's progress and reducing merge conflicts.

**Reflection:** Which two advantages will help *you* most this semester?

Colab being able to connect directly to drive and being a web notebook makes it extremely easy to use. I know for a fact it will save me a lot of time over the semester.

In [2]:
# Minimal BigQuery test in Colab (paste your validated version)
# from google.colab import auth
# auth.authenticate_user()
#
# from google.cloud import bigquery
# client = bigquery.Client(project="<YOUR_PROJECT_ID>")
# sql = "SELECT 1 AS test_col"
# df = client.query(sql).result().to_dataframe()
# print("Rows:", len(df))
# df.head()

# Note on costs and LIMIT: BigQuery charges per query; use LIMIT during development to control costs.