<a href="https://colab.research.google.com/github/lerouxl/Wandb_files_cleaning/blob/main/Clean_WandB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clean WandB

This code will remove unused artifact from WandB, feering some place.

In [2]:
!pip install wandb

Successfully installed GitPython-3.1.27 docker-pycreds-0.4.0 gitdb-4.0.9 pathtools-0.1.2 sentry-sdk-1.5.12 setproctitle-1.2.3 shortuuid-1.0.9 smmap-5.0.0 wandb-0.12.16


In [87]:
import wandb
from tqdm.notebook import tqdm
import time

Create the function that will remove files

In [83]:
def clean_model_weight_files(file: wandb.apis.public.File, every_x_epochs: int = 5, dry_run : bool = True, verbos : bool = True) -> None:
  """
  Remove saved model weights from the Wandb storage API.
  Will ignore file starting with "best" or "lattest".
  Args:
    - file: an wandb API file. This file will be removed if it was not created in the n * every_x_epochs (with n an int). 
            File name have to have the epoch number
    - every_x_epochs : int. Every x epochs, the model weights will be keep. Other weight will be removed.
    - dry_run: bool. If True, only print witch filles will be removed.
    - verbos: bool. If True print a message at every files removed.
  """
  # Neural network files that do not have the epoch number (to skip)
  # UPDATE HERE to match your files name convension
  SKIP_FILES = ["latest", "best"]

  file_name = file.name

  # Extract epoch number.
  # UPDATE HERE to match your files name convension
  # My naming convesion is X_net.pth, so I only grab what is before the _
  epoch_number = file_name.split("_")[0]
  
  # Check if we are on the "best" or "latest" epochs 
  if epoch_number in SKIP_FILES:
    # If so, we quit
    return 

  epoch_number = int(epoch_number)
  to_del = bool(epoch_number % every_x_epochs)

  if to_del:
    message = f"Removing {file_name} from wandb"

    if dry_run:
      message = "DRY RUN " + message
    else:
      file.delete()
    
    if verbos:
      print(message)

Select the project then, for each runs, remove all saved model execept one every 5 epochs.

In [96]:
# When using artifact api methods that don't have an entity or project
#  argument, you must provide that information when instantiating the wandb.Api
api = wandb.Api(overrides={"project": "Tolerance-prediction-final"}) #, "entity": "geoff"})

for run in tqdm(api.runs(), desc= "Run progression"):
  for file in tqdm(run.files(),leave=False, desc = f"Files progression for {run.name}"):
    if file.name.endswith(".pth"):
      clean_model_weight_files(file = file,every_x_epochs= 5, dry_run = False, verbos = False)
    time.sleep(0.2) # Let's not be ban by Wandb API ^^

Run progression:   0%|          | 0/23 [00:00<?, ?it/s]

Files progression for Thingi10k_PAL0.001_normalised_4layers_new_PAL_loss:   0%|          | 0/16 [00:00<?, ?it/…

Files progression for Thingi10k_PAL0.001_normalised_4layers_new_PAL_loss:   0%|          | 0/16 [00:00<?, ?it/…

Files progression for Thingi10k_PAL0.001_normalised_4layers:   0%|          | 0/14 [00:00<?, ?it/s]

Files progression for Thingi10k_PAL0.1_normalised:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Thingi10k_MSE_normalised:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Thingi10k_IS_cleaned_max_mag0.16mm_PAL0.001_std_normalised:   0%|          | 0/28 [00:00…

Files progression for Thingi10k dataset PAL(0.001) std + normalisation:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Thingi10k dataset PAL(0.001) std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Polygons dataset PAL(0.001) std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Cubes IS dataset PAL(0.001) std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Cubes dataset PAL(0.001) std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Thingi10k dataset PAL(0.001):   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Polygons dataset PAL(0.001):   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Cubes IS dataset PAL(0.001):   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Cubes dataset PAL(0.001):   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Thingi10k IS dataset MSE + std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for polygons IS dataset MSE + std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for cubes IS dataset MSE + std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for cubes dataset MSE + std:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for Thingi10k dataset MSE:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for polygons dataset MSE:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for cubes dataset MSE:   0%|          | 0/28 [00:00<?, ?it/s]

Files progression for cubes IS dataset MSE:   0%|          | 0/28 [00:00<?, ?it/s]

List all runs available

In [None]:
for i in api.runs():
  print(i)

List all files

In [None]:
for i in api.runs()[1].files():
  print(i)