<a href="https://colab.research.google.com/github/mnpoliakov/mgmt467-analytics-portfolio/blob/main/Week2_1_Prompt_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Week 2.1 — Prompt Practice: Git, GitHub, and Google Colab

**Course:** MGMT 467 — AI‑Assisted Big Data Analytics in the Cloud  
**Session:** Tuesday (2.1) — Developer Environment Setup

### How to use this notebook
- This is a **practice and planning** notebook: most cells are **Markdown** with copy‑pasteable prompt templates you will run in your AI tool (e.g., Gemini).  
- After you run a prompt in your AI tool, **summarize what you learned** in the provided **Reflection** cells here.  
- When a task asks for a short code snippet (e.g., Git or Colab), paste the **final, validated** snippet in the designated cell and add a one‑sentence explanation.

> **Validate everything.** Cross‑check AI outputs with official docs or a second prompt. If two sources disagree, note it and explain which you chose and why.



---
## Prompt Patterns Quick Reference

Use these as starting points and **adapt** them to your context.

### 1) Zero‑Shot (definition/explanation)
```
Act as a clear, concise tutor for first‑year CS students.
Explain {TOPIC} in 5 bullet points max. Include one analogy and one pitfall to avoid.
```

### 2) Few‑Shot (guided answers consistent with examples)
```
You will answer in the same style as the examples.

Q: What is a "commit" in Git?  
A: A snapshot of tracked file changes with a message explaining why.

Q: What is "pushing" in Git?  
A: Sending local commits to a remote repository so others can see them.

Q: {YOUR QUESTION}
A:
```

### 3) Step‑by‑Step Reasoning (show key steps)
```
I need a **numbered, step‑by‑step plan** for {TASK}.
For each step: the goal, one command (if applicable), and a 1‑line verification check.
Avoid hidden steps; keep it to 6–8 steps total.
```



---
## Group A — Git Fundamentals (3 questions)

### A1. What problem does Git solve? How is it different from file syncing?
**Use:** Zero‑Shot, then Few‑Shot for refinement.  
**Run this prompt:**
```
Act as a version control coach.
Explain what Git is and the specific problem it solves compared to simple file syncing (e.g., Drive).
List 3 concrete benefits for a small analytics team.
End with a 2‑sentence analogy.
```
**Reflection (2–4 sentences):** What did you learn that you didn’t already know?


The main idea I was able to learn from this prompt is the true power of GIT and the ability to allow for much better history tracking and the possibility of offline work. Before, file syncing would only show limited version history, now we are able to see the exact changes that occur on the file. Offline work will also will be ability since the changes will sync once connected to the internet.


### A2. Commit → Branch → Merge: the minimal workflow
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Create a minimal, step‑by‑step workflow to:
1) initialize a repo, 2) create and switch to a feature branch, 3) commit changes,
4) merge back to main locally, 5) push to a remote named "origin".
For each step include: goal, command, and a quick verification.
```
**Paste final validated commands below and add one sentence on when to branch.**


In [2]:
# Paste your validated minimal Git workflow commands here as comments, e.g.:

# 1. Initialize a new Git repository in your current directory.
# git init

# 2. Create a new branch for your feature and switch to it. This keeps your work separate from the main development line.
# git checkout -b feature/your-feature-name

# 3. Stage your changes and commit them to the current branch.
# git add .
# git commit -m "Add descriptive message about your changes"

# 4. Switch back to the main branch and merge your feature branch's changes into it locally.
# git checkout main
# git merge feature/your-feature-name --no-ff

# 5. Push your local main branch (including the merged feature) to the remote repository named "origin".
# git remote add origin <REMOTE_URL>
# git push origin main

# When to branch: You should create a new branch whenever you start working on a new feature, bug fix, or experiment that is separate from the main development line.

We first want to initialize a repo, which will be the beginning of our journey with our code/project. Once we wan to establish a feature, we will create a new branch to isolate the code. We can add these features to our main branch in our repo. We can also push our local main branch to a remote repository. Lastly, it is important to understand when we should create a new branch, this should occur when trying to fix any bugs, new features or some type of experiments we are working on.



### A3. Resolving a simple merge conflict
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
I have a merge conflict in README.md after merging a feature branch into main.
Give a 6-step recipe to resolve it safely:
- how to open the file, identify conflict markers, choose/merge lines,
- add/commit the resolution, verify the merge, and push.
Include one common pitfall and how to avoid it.
```
**Reflection:** What’s your personal checklist to avoid conflicts getting messy?


My personal checklist to avoid conflicts from getting too messy include
- review my work before commit, which will include proof reading
- If conflict occurs, intently look at the submitted work
- Indefity the issue, and analyze what could have gone wrong
- Brainstorm of possible solutions or refer to another tool to help brainstorm
- Document the issue, discussion and solution
- implment the solution


---
## Group B — GitHub Collaboration (3 questions)

### B1. Branch vs. Fork vs. Clone
**Use:** Few‑Shot to drive crisp distinctions with examples.  
**Run this prompt:**
```
Answer using this format:
Term — One-sentence definition — When to use — One example.

Branch —
Fork —
Clone —
```
**Reflection:** Which one will your team use for this course and why?


Our team will be use the branch in this course. The main reason we decided to choose this method is the ability to work in isolation on new features, which will help push our structure as a team. We will want to assign each person a certain task, which will make a branch very easy to have people in the seperate bubbles. This can better allow for us to keep track of the work that will be done during this course



### B2. Pull Request (PR) checklist for this course
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Write a "PR Checklist" for a university analytics course team repo.
Include: naming convention, description template, screenshots policy, reviewers, CI checks (if any),
and a revert plan. Limit to 8 concise checklist items.
```
**Paste your final checklist below.**


In [6]:
# PR Checklist for university analytics course team repo
pr_checklist = [
    "PR Title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)",
    "Description: Include problem, approach, key files, and how to test.",
    "Screenshots: Attach 1-2 if visuals (plots/dashboards) changed.",
    "Linked Issue: Link related issue or assignment requirement.",
    "Reviewers: Request review from >=1 teammate; no self-merge.",
    "CI Checks: Ensure automated checks (if any) pass.",
    "Secrets/PII: No secrets, tokens, or PII in code/outputs.",
    "Revert Plan: Briefly note how to quickly revert if needed."
]

# Display the checklist (optional)
display(pr_checklist)

['PR Title: <unit>-<lab>-<short-desc> (e.g., u1-lab2-eda-trends)',
 'Description: Include problem, approach, key files, and how to test.',
 'Screenshots: Attach 1-2 if visuals (plots/dashboards) changed.',
 'Linked Issue: Link related issue or assignment requirement.',
 'Reviewers: Request review from >=1 teammate; no self-merge.',
 'CI Checks: Ensure automated checks (if any) pass.',
 'Secrets/PII: No secrets, tokens, or PII in code/outputs.',
 'Revert Plan: Briefly note how to quickly revert if needed.']


### B3. Protected `main` workflow
**Use:** Zero‑Shot + Step‑by‑Step.  
**Run this prompt:**
```
Explain how to protect the main branch in a GitHub repo for a class team:
- Require PRs, at least one review, and passing checks
- Disallow force-pushes
Provide a numbered setup guide and a 3-line "why this matters" explanation.
```
**Reflection:** Which protection rules will you actually enable first, and why?


The first protection rule we will add require pull requests, require approvals, require status checks and lastly disallowing force pushes. This is esential to set up for our main branch, for the reason we want to make sure that every update is reviewed in a proper manner. Yes, there can be bugs that are not seen when approving code, but this is a system in place we can ensure to limit the amount issues we would encounter.


---
## Group C — Google Colab for Analytics (3 questions)

### C1. Why Colab? Benefits & limits for this course
**Use:** Zero‑Shot.  
**Run this prompt:**
```
Act as a data science tech advisor.
List 5 advantages and 3 limitations of Google Colab for analytics coursework.
Tailor to a class that uses BigQuery and dashboards. Keep it to bullet points.
```
**Reflection:** Which two advantages will help *you* most this semester?


I think the first advantage of CoLab that will help me the most, is the ability to have pre installed libraries already set up. In the past I have had many issues downloading libraries when working in diferent enviorments. In addition, I think the ability to have this in the google enviorment will help me access the file from anywhere and enable sharingt he files to be much easier than via another route.


### C2. Authenticate to GCP in Colab and query BigQuery
**Use:** Step‑by‑Step Reasoning for a minimal working snippet.  
**Run this prompt:**
```
Provide a minimal Colab snippet to:
1) authenticate to Google Cloud,
2) run a simple BigQuery SQL (e.g., SELECT 1),
3) get results into a pandas DataFrame,
4) print row count.
Include a one-line note on costs and safe use of LIMIT.
```
**Paste your final validated code below.**


In [None]:
# 1) Authenticate to Google Cloud
from google.colab import auth
auth.authenticate_user()

# 2) Run a simple BigQuery SQL and 3) get results into a pandas DataFrame
from google.cloud import bigquery
client = bigquery.Client(project="<YOUR_PROJECT_ID>") # Replace with your GCP project ID
sql = "SELECT 1 AS test_col" # Simple test query
# sql = "SELECT * FROM `<YOUR_PROJECT_ID>.<YOUR_DATASET>.<YOUR_TABLE>` LIMIT 10" # Example with LIMIT

df = client.query(sql).result().to_dataframe()

# 4) Print row count
print("Rows returned:", len(df))

# Display the first few rows (optional)
# display(df.head())

# Note on costs: BigQuery charges for data processed by queries. Use LIMIT to sample data and reduce costs during exploration.


### C3. Save notebooks to GitHub from Colab
**Use:** Step‑by‑Step Reasoning.  
**Run this prompt:**
```
Give two safe workflows to keep Colab notebooks versioned in GitHub:
(A) using "File > Save a copy in GitHub",
(B) local git with Drive sync (brief).
Provide steps and cautions (e.g., large outputs, secrets) for each.
```
**Reflection:** Which workflow will your team adopt and why?


We will adopt workflow A, due to the fact that it is the most straighforward approach, and can reduce any possible issues that may occur. Since most of us are fairly new with GitHub, this will make the transition much easier for us.


---
## Capstone Synthesis (end of class)

**Scenario:** Your team needs a reproducible workflow for this course: team repo on GitHub, branching, Colab auth to BigQuery, and a PR checklist.

**Run this prompt:**
```
Act as a DevEx lead for a university analytics team.
Produce a one-page "Runbook" with:
- Repo structure (folders for notebooks, data, dashboards, docs)
- Branching model (who creates branches, when to merge)
- Colab ↔ BigQuery quickstart (auth, sample query, cost-safe LIMIT)
- PR checklist (max 8 bullets) and protection rules for main
- Two risks + mitigations (e.g., secrets leakage, merge conflicts)
Use concise bullets and keep it classroom-ready.
```

**Paste your final runbook below (or attach as a Markdown file in your repo) and add a 3‑bullet reflection on what you changed after validation.**


- Commeneted the code to reduce confusion, as code will not run without the data and files
- Added more documentation about branching model
- Updated the procedure for commits and code review to include the entire team, not just one "Project Lead"

# Analytics Team Runbook

This runbook outlines the standard workflow and best practices for our university analytics team, ensuring smooth collaboration and reproducible analysis.

## Repository Structure

Our team repository on GitHub will follow this structure to keep our project organized:

*   `/notebooks/`: Store all Google Colab notebooks (`.ipynb` files) here.
*   `/data/`: (Optional) For small, versioned datasets used directly in notebooks. Note: Large datasets should remain in BigQuery.
*   `/dashboards/`: (If applicable) Store configuration or code related to dashboarding tools.
*   `/docs/`: For project documentation, reports, or other relevant notes (e.g., this runbook).
*   `/.gitignore`: Essential for excluding unnecessary files (like large outputs or temporary files) from Git tracking.

## Branching Model

We will use a simple feature branching model:

*   **`main` branch:** This is our stable branch. Code here should always be production-ready (or assignment-ready). Direct commits to `main` are disallowed, similarily to force pushes.
*   **Feature Branches:** For every new feature, bug fix, or distinct task, create a new branch off of `main`.
    *   **Naming Convention:** Use descriptive names like `feature/add-new-viz`, `bugfix/fix-auth-error`, or `task/explore-dataset-x`.
    *   **Who Creates:** Any team member starting a new task creates a new branch.
    *   **When to Merge:** Merge back to `main` only after the work is complete, reviewed, and approved via a Pull Request. We will each review the submitted code to help alieviate any issues.

## Colab ↔ BigQuery Quickstart

Here's a quick guide to connect Colab to BigQuery:

1.  **Authenticate:** Run this code to authenticate your Colab session with Google Cloud:

In [8]:
    #from google.cloud import bigquery
    #client = bigquery.Client(project="<YOUR_PROJECT_ID>")
    #sql = "SELECT * FROM `<YOUR_PROJECT_ID>.<YOUR_DATASET>.<YOUR_TABLE>` LIMIT 100" # Example query
    #df = client.query(sql).result().to_dataframe()
    #print("Rows returned:", len(df))


---
## Submission Checklist (to your team repo + Brightspace link)

- [ ] All **Reflection** sections completed (A1–A3, B1–B3, C1–C3, Capstone).
- [ ] Any code snippets pasted are **validated** and include a 1‑line explanation.
- [ ] Notebook runs top‑to‑bottom without errors (where code cells exist).
- [ ] Commit message: `week2.1-prompt-practice` and open a PR for review.
- [ ] Add this notebook path to your repo **README.md** under Week 2.1.
