## Course Tools Notebook

This notebook consolidates some tools you can use for freeing up disk space and installing or updating the class `introdl` package.

### Update or Install Course Package

Just run the following cell.  It doesn't hurt to run it even if the package is up to date.  Note, you may need to restart the kernel in this or any other notebook (if it's already running) to load the newest version of the package.

In [0]:
!pip install introdl

### Freeing Up Diskspace

There are three places that have files you don't really need to save for longer than the duration of the current assignment you're completing.  These are:

* **home_workspace** - this directory contains model checkpoints, datasets, and pretrained model weights for things you ran on the homeserver (mostly from the earliest assignments).  This folder is synced between the home server and the computer servers.  There's not much need to save these things after the assignment has already been submitted and graded.  Removing the data here does not affect your Homework or Lessons folders.
* **cs_workspace** - this directory exists separately on each of your compute servers and is where we try to save model checkpoints, datasets, and pretrained model weights.  It does not synchronize with the home server.  Again, there's no need to keep this stuff after an assignment has been submitted and graded. 
* **Hugging Face Cache** - this directory exists separately in both home and compute servers and is located at '~/.cache/huggingface/hub'.  It's where Hugging Face caches every model you download.  This seems to be biggest consumer of disk space as we play with NLP models.  **NOTE:** If you update the course package then the cache will be in '~/home_workspace/downloads' for a home server and '~/cs_workspace/downloads' on a compute server.  You can then clean it with the instructions below for those directories.

**NOTE:**  Removing files from these directories does not affect your Homework or Lessons folders.  You can always rerun those notebooks to reproduce the results in the future so you don't really need to save all the old checkpoint files.

**NOTE 2:**  If you're running on your own machine, you've likely set up things differently, but I'll trust that you can sort it out :)


#### Clear home_workspace

Changes made to this directory will be synced between your home server and each of your compute servers.

If you'd prefer to be more selective about what you remove, use the Explorer in CoCalc to remove files.  The only files use should consider keeping are checkpoint files save in home_workspace/models.  Everything else is easily downloaded when needed.

Run the cell below to remove all the checkpoint files, datasets, pre-trained model weights, and also restore the original directory structure in home_workspace.  You can run this code from either the home server or a compute server. 

In [2]:
import os
from pathlib import Path

# Resolve the full path to ~/home_workspace
workspace_path = Path("~/home_workspace").expanduser().resolve()

# Ensure the directory exists before proceeding
if workspace_path.exists() and workspace_path.is_dir():
    # Remove all contents inside ~/home_workspace using os commands
    for item in workspace_path.iterdir():
        item_path = str(item)  # Convert Path to string for os commands
        if item.is_file():
            os.remove(item_path)  # Remove file
        elif item.is_dir():
            os.system(f"rm -rf {item_path}")  # Remove directory and its contents

    print(f"Cleared all contents inside: {workspace_path}")

    # Create new subdirectories using os
    for subdir in ["data", "downloads", "models"]:
        new_dir = workspace_path / subdir
        os.makedirs(new_dir, exist_ok=True)
        print(f"Created: {new_dir}")

else:
    print(f"Directory does not exist: {workspace_path}")


Cleared all contents inside: /home/user/home_workspace
Created: /home/user/home_workspace/data
Created: /home/user/home_workspace/downloads
Created: /home/user/home_workspace/models


#### Clear cs_workspace (must be on compute server)

This works the same as clearing home_workspace, but you must run this code on each compute server because this folder is not synced between servers.  Again you can be more selective by using Explorer running on the compute server.  The only files use should consider keeping are checkpoint files save in home_workspace/models.  Everything else is easily downloaded when needed.

Run the cell below to remove all the checkpoint files, datasets, pre-trained model weights, and also restore the original directory structure in cs_workspace.  You can run this code from either the home server or a compute server. 

In [3]:
import os
from pathlib import Path

# Resolve the full path to ~/home_workspace
workspace_path = Path("~/cs_workspace").expanduser().resolve()

# Ensure the directory exists before proceeding
if workspace_path.exists() and workspace_path.is_dir():
    # Remove all contents inside ~/home_workspace using os commands
    for item in workspace_path.iterdir():
        item_path = str(item)  # Convert Path to string for os commands
        if item.is_file():
            os.remove(item_path)  # Remove file
        elif item.is_dir():
            os.system(f"rm -rf {item_path}")  # Remove directory and its contents

    print(f"Cleared all contents inside: {workspace_path}")

    # Create new subdirectories using os
    for subdir in ["data", "downloads", "models"]:
        new_dir = workspace_path / subdir
        os.makedirs(new_dir, exist_ok=True)
        print(f"Created: {new_dir}")

else:
    print(f"Directory does not exist: {workspace_path}")


Cleared all contents inside: /home/user/cs_workspace
Created: /home/user/cs_workspace/data
Created: /home/user/cs_workspace/downloads
Created: /home/user/cs_workspace/models


#### Clear the Hugging Face Cache

Run the cell below.  Be careful about making changes to the path because you don't want to accidentally delete the wrong files. (It's permanent.)  Even if you update the course package so the cache is in one of your workspace directories, this will remove any older cached models.

In [1]:
import os
from pathlib import Path

# Resolve the full path
folder_path = Path("~/.cache/huggingface/hub").expanduser().resolve()

# Ensure the folder exists before attempting deletion
if folder_path.exists():
    os.system(f"rm -rf {folder_path}")
    print(f"Removed: {folder_path}")
else:
    print(f"Folder does not exist: {folder_path}")


Removed: /home/user/.cache/huggingface/hub


#### Remove Datasets and Models from Other Folders

Use the Explorer on either a home or compute server and look for datasets or model files.  You can delete these.  Running your notebook again will download the necessary files.  For example, many of you created copies of the Flowers102 dataset in your Homework_05 folder (use DATA_PATH as the root directory for all torchvision datasets).  You can delete those copies.

### Reset Your Compute Server

* Click the servers button on the left side of CoCalc.
* Click the Compute Servers tab
* Click settings on the computer server you wish to reset.
* Click Deprovision at the bottom of the popup window and agree to terms.
* Restart the server.  Wait several minutes.
* Make sure this notebook is running on the compute server (use the button at the top labeled Server).
* Run the cell at the top of the notebook to reinstall the course package.