
# DS2002 — GitHub Fundamentals in Notebooks  
## Lecture: GitHub From Zero (Absolutely No Assumed Knowledge)

**Instructor:** Jason Williamson  
**Course:** DS2002 — Data Science Systems

---

This notebook assumes you have **never used GitHub before**.

No assumptions.
No shortcuts.
No “you probably already know this.”

GitHub is one of those tools that everyone claims is “easy,” but is almost never explained from first principles. As a result, many people learn just enough to get by, without ever understanding what problem GitHub is actually solving.

Today, we fix that.



# What Problem Does GitHub Solve?

Before GitHub existed, people still wrote code, reports, papers, and data analysis scripts. The problem was not creating files. The problem was **collaboration, history, and trust**.

Imagine working on a project where:
- you make a change and accidentally break something
- you want to go back to “how it worked yesterday,” but you can’t
- two people edit the same file and overwrite each other
- you email files named `final_v3_REAL_FINAL_v2.py`

These problems scale badly. As projects grow, you need a system that remembers:
- what changed
- who changed it
- when it changed
- why it changed

GitHub exists to solve *exactly* this problem.



# What GitHub Is (Plain English)

GitHub is a website that stores **repositories**.

A repository is just:
- a folder of files
- with a full history of every change ever made

GitHub adds three crucial things on top of normal folders:
1. Version history
2. Collaboration
3. A shared source of truth

You can think of GitHub as:
> Google Docs, but for projects made of files instead of paragraphs.

The important mental shift is this: GitHub is not just storage. It is *memory*.



# Git vs GitHub (This Confuses Everyone)

Git and GitHub are related, but not the same thing.

**Git** is a version control system. It is the underlying technology that tracks changes to files.

**GitHub** is a website that hosts Git repositories and adds:
- a user interface
- collaboration tools
- visibility
- sharing

You can use Git without GitHub, but in this course we will use GitHub as the central place where your work lives.



# Why We Use GitHub in This Course

In DS2002, GitHub is not optional busywork. It is the backbone of how you will organize and submit your work.

We use GitHub because:
- it creates a permanent record of your progress
- it allows instructors to see your work evolve
- it mirrors real-world data science workflows
- it prevents lost files and silent overwrites

By the end of the course, your GitHub account will act as a **portfolio** of your work.



# Core Concepts (No Commands Yet)

Before touching buttons, you need the vocabulary.

A **repository (repo)** is a project folder.
A **commit** is a saved checkpoint.
A **history** is a timeline of commits.
A **remote** is the copy stored on GitHub’s servers.
A **local copy** is the version on your computer.

You do not need to memorize these yet. Just recognize the pattern:
GitHub is about **checkpoints and history**, not about typing commands.



# Public vs Private Repositories

A public repository can be seen by anyone.
A private repository can only be seen by people you allow.

For this course:
- most of your work will be public
- public repos help you build a portfolio
- nothing sensitive should be stored in GitHub

This mirrors industry practice: share what should be shared, protect what should not.



# Creating a GitHub Account

To use GitHub, you need an account.

Steps:
1. Go to https://github.com
2. Click “Sign up”
3. Choose a username you are comfortable sharing publicly
4. Use an email you will keep long-term
5. Verify your email

Your username becomes part of your professional identity. Choose accordingly.



# Your First Repository (Conceptually)

When you create a repository, you are saying:

“This is a project I care about, and I want its history tracked.”

A repository starts empty or with a few starter files. Over time, you will add:
- notebooks
- scripts
- data (when appropriate)
- documentation

In this course, **one repository** will act as the home for most of your work.



# Lab: Create Your First GitHub Repository

This is a guided lab. Go slowly.

Step 1:
Log into GitHub.

Step 2:
Click the **+** icon in the top-right corner.
Choose **New repository**.

Step 3:
Name the repository:
`DS2002-YourLastName`

Step 4:
Add a description:
“DS2002 coursework and notebooks.”

Step 5:
Check:
✔ Public  
✔ Add a README file

Then click **Create repository**.



# What Is a README?

A README is the front page of your repository.

It explains:
- what the project is
- why it exists
- how to navigate it

A good README turns a pile of files into a story.

For now, your README can be simple. We will improve it later.



# Editing Files Directly on GitHub

GitHub allows you to edit files directly in the browser.

This is useful for:
- small changes
- quick fixes
- learning without installing tools

Later, we will connect GitHub to your local machine. For now, browser-based editing is enough.



# Lab: Edit Your README

1. Click on `README.md`
2. Click the pencil (Edit) icon
3. Replace the contents with:

# DS2002 Coursework

This repository contains my notebooks and assignments for DS2002.

4. Scroll down
5. Write a short commit message:
“Update README with course description”
6. Click **Commit changes**



# What You Just Did (Important)

You just:
- edited a file
- created a commit
- added to your repository history

This is the core GitHub workflow.

Everything else builds on this.



# Using This Repository for the Whole Course

From this point forward:
- all labs will live here
- all notebooks will live here
- this repo is your single source of truth

When you submit work, you will often submit:
- a GitHub URL
- not a file upload

This mirrors how professionals share work.



# Final Takeaways

GitHub exists to protect your work from chaos.

It gives you:
- history
- accountability
- collaboration
- a professional footprint

You do not need to master GitHub today.
You only need to understand why it exists and how to start.

Everything else builds naturally from here.
