# CS189 Content Downloader

This notebook works on **Google Colab** and **local machines**. It fetches **only the folder you ask for** from the public course repo and puts it under `cs189/<repo>/<folder_to_fetch>`.

## Where files go
- **Colab**: `/content/drive/MyDrive/cs189/<folder_to_fetch>`
- **Local**: `./cs189/<folder_to_fetch>`

## How to use
1) Make a copy of this notebook!
2) Run the cells below to set up the content downloader
3) Define the folder you want to fetch (`folder_to_fetch`) from the public course repo and the location in *your* Google Drive you want to download the files to (`gdrive_base_folder`)
  > Ex: `folder_to_fetch` can be `"hw/hw1"` or `"lec/lec02"`
  >
  > Ex: `gdrive_base_folder` can be `"school/cs189"`. Then, the content from the public course repo will be downloaded to a nested folder in your Google Drive home.

4) Call the function `fetch_repo_folder` with the arguments.
  > Note: you can also use this notebook to fetch newly-updated content from the public course repo. If the `folder_to_fetch` already exists in your Google Drive, the `fetch_repo_folder` function will first make a copy of the existing folder inside your Google Drive's `gdrive_base_folder` as a backup.
5) You should see a new folder in your Google Drive named whatever you set `gdrive_base_folder` to be. This folder will contain a `.repo` folder as well as any folders you fetched with the `fetch_repo_folder` function!
  > For context, the `.repo` folder is a barebones clone of the public course repo that contains all information needed to fetch the latest materials without actually copying the entire repo over. This way, you can download specific folders one-by-one without having to take up space in your drive by copying the whole public repo!

### Example (homework 1)
```python
folder = "hw/hw1"
path = fetch_repo_folder(folder)  # -> /.../cs189/fa25-student/hw/hw1
print("Ready at:", path)

In [1]:
#@title Check for git LFS
import os
if "google.colab" in str(get_ipython()):
    from google.colab import drive
    print("This is a Google Colab notebook.")
    !apt-get install -qq git-lfs
    !git lfs install
else:
  import shutil
  if shutil.which("git-lfs") is None:
      print("⚠️ Git LFS is not installed. Some large files may appear as pointer stubs.\n"
            "   To install:\n"
            "     • macOS:   brew install git-lfs\n"
            "     • Ubuntu:  sudo apt-get install git-lfs\n"
            "     • Windows: choco install git-lfs\n"
            "Then run: git lfs install && git lfs pull")

This is a Google Colab notebook.
Git LFS initialized.


In [2]:
#@title Setup
import os
import shutil
import subprocess
from datetime import datetime

def fetch_repo_folder(
    folder_to_fetch, # e.g. "hw/hw1"
    repo_owner="BerkeleyML",
    repo_name="fa25-student",
    branch="main",
    gdrive_base_dir="cs189"  # gdrive folder from MyDrive where folder_to_fetch will be placed
):
    """
    Sparse-checkout only `folder_to_fetch` from https://github.com/<owner>/<repo>.git
    into base_dir/folder. If the folder_to_fetch already exists, back it up
    as legacy+timestamp before overwriting. Always pulls the latest.
    Returns the absolute path to the fetched folder.
    """
    # Detect Colab
    try:
        import google.colab
        from google.colab import drive
        IS_COLAB = True
    except Exception:
        IS_COLAB = False

    # Set base dir based on local or Colab context
    if IS_COLAB:
        if not os.path.ismount("/content/drive"):
            drive.mount("/content/drive", force_remount=False)
        gdrive_base_dir_abs = os.path.join("/content/drive/MyDrive", gdrive_base_dir) # /content/drive/MyDrive/<gdrive_base_dir>
    else:
        gdrive_base_dir_abs = os.path.abspath(gdrive_base_dir)

    os.makedirs(gdrive_base_dir_abs, exist_ok=True)
    print(f"Working base directory: {gdrive_base_dir_abs}")

    repo_url = f"https://github.com/{repo_owner}/{repo_name}.git"
    folder_to_fetch = folder_to_fetch.strip("/")
    hidden_repo_path = os.path.join(gdrive_base_dir_abs, ".repo")   # hidden clone

    def run(cmd, cwd=None, check=True):
        res = subprocess.run(cmd, cwd=cwd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
        if check and res.returncode != 0:
            raise RuntimeError(f"{' '.join(cmd)}\n---\n{res.stdout}")
        return res.stdout

    # Clone or update (always ensures .repo is up to date)
    if not os.path.exists(hidden_repo_path):
        run(["git", "clone", "--filter=blob:none", "--no-checkout", repo_url, hidden_repo_path])
    else:
        # Fetch and reset, so sparse-folder always reflects latest on GitHub
        run(["git", "fetch", "origin", branch], cwd=hidden_repo_path)
        run(["git", "reset", "--hard", f"origin/{branch}"], cwd=hidden_repo_path)
        run(["git", "clean", "-xdf"], cwd=hidden_repo_path)

    # Ensure sparse mode and select folder
    try:
        run(["git", "sparse-checkout", "init", "--cone"], cwd=hidden_repo_path)
    except RuntimeError:
        run(["git", "sparse-checkout", "init"], cwd=hidden_repo_path)

    # Remove post-checkout hook if it exists to avoid hook execution errors on Google Colab
    hook = os.path.join(hidden_repo_path, ".git/hooks/post-checkout");
    if os.path.exists(hook): os.remove(hook)

    run(["git", "sparse-checkout", "set", folder_to_fetch], cwd=hidden_repo_path)
    run(["git", "checkout", branch], cwd=hidden_repo_path)

    try:
        run(["git", "lfs", "pull"], cwd=hidden_repo_path, check=False)
    except Exception:
        print("⚠️ Some files may use Git LFS. If you see small pointer files, run:\n"
              "   !apt-get install git-lfs -qq && git lfs install && git lfs pull")

    src = os.path.join(hidden_repo_path, folder_to_fetch)
    dst = os.path.join(gdrive_base_dir_abs, folder_to_fetch)
    # If src not materialized for any reason, try forcing it
    if not os.path.exists(src):
        run(["git", "checkout", branch, "--", folder_to_fetch], cwd=hidden_repo_path, check=False)

    if not os.path.exists(src):
        raise FileNotFoundError(f"'{folder_to_fetch}' not found in {repo_owner}/{repo_name}@{branch}")

    # Backup existing destination folder_to_fetch if needed
    if os.path.exists(dst):
        timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        legacy_dst = f"{dst}_legacy_{timestamp}"
        print(f"Backing up existing folder {dst} to {legacy_dst}")
        shutil.copytree(dst, legacy_dst)

        # Remove existing so we can overwrite
        shutil.rmtree(dst)

    # Copy from .repo/folder_to_fetch into the requested location
    os.makedirs(os.path.dirname(dst), exist_ok=True)
    run(["cp", "-r", src, dst.rsplit("/", 1)[0] if "/" in dst else gdrive_base_dir_abs])

    print(f"Fetched '{folder_to_fetch}' from GitHub branch '{branch}'...\n...to Google Drive location: {dst}")
    return os.path.abspath(dst)

In [5]:
#@title Run the following cell to download content from the public course repo to your Google Drive:

folder_to_fetch = "hw/hw4" # TODO: specify which folder you want to fetch from the cs189 student repo
gdrive_base_folder = "cs189" # TODO: specify the folder in your Google Drive where you want the repo to be cloned and folder_to_fetch to be located
path = fetch_repo_folder(folder_to_fetch, gdrive_base_dir=gdrive_base_folder)
print("Ready at:", path)

Working base directory: /content/drive/MyDrive/cs189
Fetched 'hw/hw4' from GitHub branch 'main'...
...to Google Drive location: /content/drive/MyDrive/cs189/hw/hw4
Ready at: /content/drive/MyDrive/cs189/hw/hw4


In [6]:
#@title List the content in your Google Drive folder
full_path = f"/content/drive/MyDrive/{gdrive_base_folder}/{folder_to_fetch}"

# Navigate to folder and list files
%cd {full_path}
%ls

/content/drive/MyDrive/cs189/hw/hw4
attention.png                   imagenet_class_id_to_name.json
chimpanzee_train.txt            requirements.txt
dna_test.txt                    resnet_figure2.png
dog_train.txt                   transformer_architecture.png
human_train.txt                 urbansound8k_fold1_train.zip
hw4_paper_question_student.pdf  urbansound8k_test.csv
hw4_paper_question_student.tex  urbansound8k_test.zip
hw4_part1.ipynb                 vision_transformer.png
hw4_part2.ipynb
