# 24h Medal Plan: APTOS 2019 (QWK target ≥ 0.915)

Objective
- Reach ≥0.915 QWK via stronger single models at higher resolution, robust CV, and disciplined ensembling.

Core Training Recipe (apply to all unless overridden)
- Head: single-logit regression; thresholds post-hoc.
- Optimizer: AdamW lr=2e-4 (3e-4 for smaller backbones), wd=1e-5.
- Schedule: cosine decay + 1 epoch linear warmup.
- Loss: SmoothL1/Huber (delta=1.0). If plateau, +2 epochs with MSE.
- EMA: timm ModelEmaV2 decay=0.9996 (heavy: 0.9998), start after epoch 1. Validate/save EMA weights.
- Augmentations (Albumentations):
  - Train: RandomResizedCrop(size, scale=(0.90,1.0) at 768–896; (0.88,1.0) at 640), ratio=(0.95,1.05); HorizontalFlip(0.5);
    Affine(scale=(0.95,1.05), translate=(0,0.05), rotate=(-12,12), border=Reflect, p=0.7);
    RandomBrightnessContrast(0.15,0.15,p=0.7); HueSaturationValue(h=5,s=8,v=8,p=0.3); optional GaussianBlur(p=0.2);
    Normalize(ImageNet); ToTensorV2.
  - Valid: Resize(size); Normalize; ToTensorV2.
- Progressive resize (heavy): 3–4 epochs @640 then 5–6 @768/896 (halve lr at jump); tighten RRC scale min ≥0.92 at target size.
- Epochs: 8–10 effective @768/896; 12 @640; patience=2–3 on val loss (EMA). Log val QWK@[0.5,1.5,2.5,3.5] each epoch.
- Folds/Seeds: 3 folds for heavy at 768/896; 5 folds for 640; seed=42. If time remains, train second seed on best model.
- Mixed precision & memory: torch.amp fp16; channels_last; grad checkpointing when supported; gradient accumulation to effective batch≈16.
- DataLoader: num_workers=2 (train) / 4 (infer); pin_memory=True; persistent_workers=False; drop_last=True (train).
- cudnn: deterministic=True, benchmark=False; env: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.

Preprocessing
- Keep circle crop + Ben Graham enhancement + light CLAHE.
- Add Shades-of-Gray/Gray-World color constancy before CLAHE.
- Per-image percentile normalization: map L-channel 99th percentile to ~0.95; clamp tails.
- Zero black borders; avoid elliptical masks at inference.

Caching
- Build 768px cache (train/test) immediately; verify 2-iteration smoke train for memory headroom.

Model Shortlist (timm model names, sizes, batches, targets on 1x T4 16GB)
1) tf_efficientnetv2_l.in21k_ft_in1k
   - 640→768 (3→6 epochs). Batch: 640 bs=6; 768 bs=3; accum to eff 12–18.
   - 3 folds. Target single OOF: 0.895–0.902 (3f). ~2.8–3.2 h/fold.
2) tf_efficientnet_b6_ns
   - 640→768 (3→6 epochs). Batch: 640 bs=8; 768 bs=4; accum as needed.
   - 3 folds. Target: 0.892–0.900 (3f). ~2.8–3.2 h/fold.
3) convnext_large.fb_in22k_ft_in1k
   - 768 flat (8–9 epochs). Batch 2–3; accum to eff 12–16; drop_path_rate=0.3.
   - 3 folds. Target: 0.888–0.895. ~2.6–3.0 h/fold.
4) seresnext101_32x8d.ah_in1k
   - 768 flat (8–9 epochs). Batch 4; accum to eff 12–16.
   - 3 folds. Target: 0.886–0.892. ~2.3–2.7 h/fold.
Optional: resnest101e.in1k @640/768; target 0.884–0.890.

Triage Rules (stop early to save time)
- After 1st epoch @640: val QWK@default ≥0.80 required.
- After 1–2 epochs @768: ≥0.85 required.
- If fold0 <0.82 by epoch 3 (640) or <0.85 by epoch 2 (768), stop that model.
- If adding a model to NNLS lowers blended OOF by >0.002, exclude it from final blend.

Inference, Calibration, Ensembling
- TTA: orig+hflip only per model (default). Multi-rot/crop only if OOF-neutral; max 5 views.
- Calibration: Prefer per-model per-fold isotonic (out_of_bounds='clip'); transform val fold and its corresponding test pass; average test transforms across folds. If not feasible, use per-model global isotonic.
- Blending: NNLS on calibrated per-model OOF EVs; clip weights to [0.05, 0.70], renormalize; cap sum of highly correlated seeds to ≤0.30–0.35.
- Thresholds:
  1) 4D Nelder–Mead with gap ≥0.12, th in [0.3,3.7]
  2) 2D grid refine on th2, th3 (±0.18 around NM solution, step=0.005)
  3) Bootstrap 200–300x and take median with gap constraints
  4) Optional th3 +0.015–0.02 safety nudge if OOF drop ≤0.0005
- Distribution alignment: optional monotonic CDF alignment (isotonic/quantile map test→OOF). Blend 0.8 aligned + 0.2 original. Use only if OOF-neutral (≤0.0005 delta).

Semi-supervised (only if early blend ≥0.905 OOF and ≥4h left)
- Select test pseudo-labels by EV margin vs thresholds: keep samples with min distance ≥0.25 (or class 0 EV<0.2 / class 4 EV>3.8).
- Weight pseudo 0.3–0.5 vs labeled 1.0.
- Finetune top-2 models (EffNetV2-L and B6/ConvNeXt-L): 2–3 epochs at target size, lr=1e-5–5e-5, same EMA/augs; re-infer, reblend NNLS; rerun thresholds.

Memory/OOM Guardrails @768–896
- AMP fp16; channels_last; grad checkpointing=True; accum to reach effective batch ~12–16.
- num_workers=2 train / 4 infer; pin_memory=True; persistent_workers=False.
- cudnn deterministic=True; benchmark=False.
- Expected per-GPU batch (no accum):
  - v2-L: 640 bs=6; 768 bs=3
  - b6-ns: 640 bs=8; 768 bs=4
  - convnext_large: 768 bs=2–3
  - seresnext101_32x8d: 768 bs=4
- If OOM: reduce bs by 2; remove GaussianBlur; raise RRC scale min by +0.02; disable EMA as last resort.

Ordered 24h Run-List (single T4)
0:00–0:20  Build 768 cache (train/test); smoke 2 iters to verify memory.
0:20–7:30  tf_efficientnetv2_l (3 folds) 640→768; triage per rules.
            After 2 folds, isotonic+NNLS with legacy best; if blend OOF <0.900, proceed but keep tight.
7:30–13:30 tf_efficientnet_b6_ns (3 folds) 640→768; reblend; aim ≥0.910 OOF.
13:30–18:30 convnext_large (3 folds) @768; reblend; drop if reduces OOF >0.002.
18:30–22:00 seresnext101_32x8d (3 folds) @768 or resnest101e if faster; reblend.
22:00–24:00 Final inference (hflip-only), per-model isotonic (per-fold if ready), NNLS (weight caps), thresholds (NM→2D th2/th3 grid→bootstrap), optional th3 nudge; optional 0.8 aligned + 0.2 raw if OOF-neutral; write 2 submissions (with/without alignment).

Logging/Discipline
- Print fold indices, epoch times, val QWK, EMA vs non-EMA, and memory stats.
- Save OOF/test EVs per model; cache blends and thresholds to .npy.
- Keep a run log (model, size, epochs, OOF mean/std, LB delta).

Stop/Abandon Criteria
- Heavy model with fold0 QWK@default <0.82 by epoch 3 (640) or <0.85 by epoch 2 (768).
- New model worsens NNLS OOF by >0.002.

Expected Outcome
- Stronger 3–4 model NNLS ensemble with robust thresholds should reach ≥0.915 QWK.

In [None]:
# Training template: tf_efficientnetv2_l.in21k_ft_in1k 640->768, 3-fold, RRC+EMA, SmoothL1, Warmup+Cosine, Grad Checkpointing
import os, sys, time, json, math, random, gc, warnings, subprocess
from pathlib import Path
import numpy as np, pandas as pd
warnings.filterwarnings('ignore')

# Ensure deps
def _pip_if_missing(pkg, import_name=None, extra=''):
    try:
        __import__(import_name or pkg)
    except Exception:
        subprocess.run([sys.executable, '-m', 'pip', 'install', pkg, *([extra] if extra else [])], check=True)

_pip_if_missing('albumentations', 'albumentations')
_pip_if_missing('timm')
_pip_if_missing('opencv-python', 'cv2')

import cv2
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
from torch.utils.data import Dataset, DataLoader
from torch.cuda.amp import autocast, GradScaler
import albumentations as A
from albumentations.pytorch import ToTensorV2
import timm
from timm.utils import ModelEmaV2
from sklearn.metrics import cohen_kappa_score

# Threading/env tuning for loader throughput
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
try:
    cv2.setNumThreads(0)
    try:
        cv2.ocl.setUseOpenCL(False)
    except Exception:
        pass
except Exception:
    pass
try:
    torch.set_num_threads(4)
except Exception:
    pass

# Repro
SEED = 42
def seed_everything(seed=SEED):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
seed_everything()
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
# Ensure HF cache to writable dir
os.environ['HF_HOME'] = str(Path('hf_cache').resolve())
os.environ['HF_HUB_CACHE'] = str(Path('hf_cache').resolve())
os.environ['HUGGINGFACE_HUB_CACHE'] = str(Path('hf_cache').resolve())
Path('hf_cache').mkdir(parents=True, exist_ok=True)

# Paths
DF_FOLDS = 'folds.csv'
TRAIN_DIR_512 = 'cache512/train'
TEST_DIR_512 = 'cache512/test'
TRAIN_DIR_640 = 'cache640/train'
TEST_DIR_640 = 'cache640/test'
TRAIN_DIR_768 = 'cache768/train'
TEST_DIR_768 = 'cache768/test'

# Prefer 640 cache for stage1; enforce 768 cache existence for stage2
SIZE_640_OK = Path(TRAIN_DIR_640).exists() and Path(TEST_DIR_640).exists()
IMG_DIR_TRAIN_S1 = TRAIN_DIR_640 if SIZE_640_OK else TRAIN_DIR_512
IMG_DIR_TEST_S1  = TEST_DIR_640 if SIZE_640_OK else TEST_DIR_512
print('Stage1 using cached dir:', IMG_DIR_TRAIN_S1, '->', IMG_DIR_TEST_S1, flush=True)

# Config
CFG = {
  'model': 'tf_efficientnetv2_l.in21k_ft_in1k',
  'folds': 3,
  'size_stage1': 640,
  'size_stage2': 768,
  'epochs_s1': 3,
  'epochs_s2': 6,
  'batch_s1': 6,  # per-GPU
  'batch_s2': 3,
  'accum_target': 16,  # effective batch target
  'lr': 2e-4,
  'wd': 1e-5,
  'ema_decay': 0.9998,
  'delta': 1.0,  # SmoothL1 beta
  'num_workers_train': 4,
  'num_workers_infer': 4,
  'eta_min_factor': 0.1,
  'patience': 2,
}

# Data
df = pd.read_csv(DF_FOLDS)
assert 'id_code' in df.columns and 'fold' in df.columns and 'diagnosis' in df.columns, 'folds.csv must have id_code, fold, diagnosis'

class RetinopathyDS(Dataset):
    def __init__(self, df, img_dir, size=640, train=True):
        self.df = df.reset_index(drop=True)
        self.img_dir = img_dir
        self.size = size
        self.train = train
        if train:
            if size == 768:
                scale_min = 0.94; ratio = (0.97, 1.03); rot = (-7, 7)
            else:
                scale_min = 0.88; ratio = (0.95, 1.05); rot = (-12, 12)
            self.tf = A.Compose([
                A.RandomResizedCrop(size=(size, size), scale=(scale_min, 1.0), ratio=ratio, interpolation=cv2.INTER_LINEAR),
                A.HorizontalFlip(p=0.5),
                A.Affine(scale=(0.95,1.05), translate_percent=(0,0.05), rotate=rot, fit_output=False, mode=cv2.BORDER_REFLECT, interpolation=cv2.INTER_LINEAR, p=0.5),
                A.RandomBrightnessContrast(0.10, 0.10, p=0.5),
                A.HueSaturationValue(hue_shift_limit=5, sat_shift_limit=8, val_shift_limit=8, p=0.2),
                A.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
                ToTensorV2(),
            ])
        else:
            self.tf = A.Compose([
                A.Resize(height=size, width=size, interpolation=cv2.INTER_LINEAR),
                A.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
                ToTensorV2(),
            ])
    def __len__(self):
        return len(self.df)
    def __getitem__(self, idx):
        r = self.df.iloc[idx]
        img_path = os.path.join(self.img_dir, f"{r['id_code']}.png")
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        if img is None:
            raise FileNotFoundError(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        out = self.tf(image=img)['image']
        y = float(r['diagnosis']) if 'diagnosis' in r and not np.isnan(r['diagnosis']) else -1.0
        return out, torch.tensor(y, dtype=torch.float32)

# Model
def build_model(model_name):
    m = timm.create_model(model_name, pretrained=True, num_classes=1, in_chans=3, cache_dir=str(Path('hf_cache').resolve()))
    if hasattr(m, 'set_grad_checkpointing'):
        try:
            m.set_grad_checkpointing(True)
        except Exception:
            pass
    return m

def preds_to_classes(p, th):
    return np.digitize(p, bins=[th[0], th[1], th[2], th[3]])

def optimize_thresholds_fast(y_true, preds, init_th=None):
    # Light coordinate descent around defaults for monitoring only
    th = np.array(init_th if init_th is not None else [0.5,1.5,2.5,3.5], dtype=float)
    for _ in range(2):
        for i in range(4):
            best_q = -1; best_v = th[i]
            for dv in (-0.10, -0.05, -0.02, -0.01, -0.005, 0.0, 0.005, 0.01, 0.02, 0.05, 0.10):
                tmp = th.copy()
                tmp[i] = np.clip(tmp[i] + dv, 0.3, 3.7)
                tmp = np.sort(tmp)
                q = cohen_kappa_score(y_true, preds_to_classes(preds, tmp), weights='quadratic')
                if q > best_q:
                    best_q, best_v = q, tmp[i]
            th[i] = best_v
    return th

# Train one stage (size, epochs, batch) with 1-epoch linear warmup then cosine schedule; EMA gated after epoch 1
def train_stage(model, ema, train_loader, val_loader, epochs, lr, wd, accum_steps, device, epoch_offset=0, early_stop_patience=0):
    scaler = GradScaler(enabled=True)
    opt = optim.AdamW(model.parameters(), lr=lr, weight_decay=wd)
    sched = CosineAnnealingLR(opt, T_max=max(1, epochs-1), eta_min=lr*CFG['eta_min_factor'])
    loss_fn = nn.SmoothL1Loss(beta=CFG['delta']) if hasattr(nn, 'SmoothL1Loss') else nn.L1Loss()
    best = {'q': -1.0, 'state': None, 'val_loss_ema': float('inf')}
    hist_rows = []
    no_improve = 0
    for epoch in range(epochs):
        t0 = time.time()
        model.train()
        if torch.cuda.is_available():
            torch.cuda.reset_peak_memory_stats()
        running = 0.0; n_seen = 0; opt.zero_grad(set_to_none=True)
        iters = len(train_loader)
        for it, (x, y) in enumerate(train_loader):
            try:
                # Per-iter linear warmup during epoch 0
                if epoch == 0:
                    warmup_frac = float(it + 1) / max(1, iters)
                    for pg in opt.param_groups:
                        pg['lr'] = lr * warmup_frac
                x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
                y = y.to(device, non_blocking=True).view(-1, 1)
                with autocast(dtype=torch.float16):
                    p = model(x)
                    loss = loss_fn(p, y)
                scaler.scale(loss / accum_steps).backward()
                do_step = ((it + 1) % accum_steps == 0) or ((it + 1) == iters)
                if do_step:
                    scaler.unscale_(opt)
                    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                    scaler.step(opt); scaler.update(); opt.zero_grad(set_to_none=True)
                    if ema is not None and epoch >= 1:
                        ema.update(model)
                running += loss.item() * x.size(0); n_seen += x.size(0)
                if it == 0 or (it+1) % 20 == 0:
                    cur_lr = opt.param_groups[0]['lr']
                    mem = (torch.cuda.max_memory_allocated()/(1024**3)) if torch.cuda.is_available() else 0.0
                    print(f"  iter {it+1}/{iters} loss={running/max(1,n_seen):.4f} lr={cur_lr:.6f} mem={mem:.2f}GB", flush=True)
            except RuntimeError as e:
                if 'out of memory' in str(e).lower():
                    print('OOM encountered during train step; consider reducing batch or accum.', flush=True)
                    raise
                else:
                    raise
        if epoch >= 1:
            sched.step()
        # Validate EMA and base
        def _eval(m_eval):
            m_eval.eval()
            preds = []; targs = []; vloss_sum = 0.0; vcount = 0
            with torch.no_grad():
                for x, y in val_loader:
                    x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
                    y = y.to(device, non_blocking=True).view(-1, 1)
                    with autocast(dtype=torch.float16):
                        pr = m_eval(x)
                        vloss = loss_fn(pr, y)
                    preds.append(pr.float().cpu().numpy().ravel())
                    targs.append(y.float().cpu().numpy().ravel())
                    vloss_sum += float(vloss.item()) * x.size(0); vcount += x.size(0)
            p = np.concatenate(preds) if len(preds) else np.zeros(0)
            y_true = np.concatenate(targs) if len(targs) else np.zeros(0)
            th_def = np.array([0.5,1.5,2.5,3.5], dtype=float)
            q = cohen_kappa_score(y_true, preds_to_classes(p, th_def), weights='quadratic') if len(y_true) else -1.0
            th_opt = optimize_thresholds_fast(y_true, p, th_def) if len(y_true) else th_def
            q_opt = cohen_kappa_score(y_true, preds_to_classes(p, th_opt), weights='quadratic') if len(y_true) else -1.0
            return q, (vloss_sum/max(1, vcount)), p, y_true, q_opt, th_opt
        q_ema, vloss_ema, p_ema, y_ema, q_opt_ema, th_opt_ema = _eval(ema.ema if ema is not None else model)
        q_base, vloss_base, _, _, q_opt_base, _ = _eval(model)
        elapsed = time.time() - t0
        max_mem = torch.cuda.max_memory_allocated() / (1024**3) if torch.cuda.is_available() else 0.0
        cur_lr = opt.param_groups[0]['lr']
        print(f"Epoch {epoch_offset+epoch+1}/{epoch_offset+epochs}: val_QWK_EMA={q_ema:.5f} (opt {q_opt_ema:.5f}) val_QWK_BASE={q_base:.5f} val_loss_EMA={vloss_ema:.5f} lr={cur_lr:.6f} time={elapsed/60:.1f}m mem={max_mem:.2f}GB", flush=True)
        hist_rows.append({'epoch': int(epoch_offset+epoch+1), 'qwk_ema': float(q_ema), 'qwk_base': float(q_base), 'qwk_opt_ema': float(q_opt_ema), 'val_loss_ema': float(vloss_ema), 'lr': float(cur_lr), 'time_min': float(elapsed/60.0), 'max_mem_gb': float(max_mem)})
        if vloss_ema + 1e-6 < best['val_loss_ema']:
            best['val_loss_ema'] = vloss_ema
            no_improve = 0
        else:
            no_improve += 1
        if q_ema > best['q']:
            best['q'] = q_ema
            best['state'] = (ema.ema.state_dict() if ema is not None else model.state_dict())
        # Save per-epoch val EVs for later calibration
        try:
            np.save(f'val_ev_e{epoch_offset+epoch+1}.npy', p_ema.astype(np.float32))
            if not Path('val_targets.npy').exists():
                np.save('val_targets.npy', y_ema.astype(np.float32))
        except Exception:
            pass
        # Early stopping (only when enabled, e.g., stage2)
        if early_stop_patience > 0 and epoch >= 1 and no_improve >= early_stop_patience:
            print(f"Early stopping triggered (no improvement {no_improve} epochs).", flush=True)
            break
    # Save per-stage log
    try:
        pd.DataFrame(hist_rows).to_csv('train_history_stage.csv', index=False)
    except Exception:
        pass
    return best

def _worker_init(_):
    try:
        import cv2 as _cv2
        _cv2.setNumThreads(0)
    except Exception:
        pass

def make_loader(ds, batch_size, shuffle, num_workers, infer=False):
    kwargs = dict(batch_size=batch_size, shuffle=shuffle, num_workers=num_workers,
                  pin_memory=True, drop_last=not infer)
    if num_workers and num_workers > 0:
        kwargs['persistent_workers'] = True
        kwargs['prefetch_factor'] = 4 if not infer else 2
        kwargs['worker_init_fn'] = _worker_init
    return DataLoader(ds, **kwargs)

def smoke_test_768(model, ds_trn2, bs_try, accum_target, device, max_iters=50):
    dl = make_loader(ds_trn2, batch_size=bs_try, shuffle=True, num_workers=CFG['num_workers_train'], infer=False)
    loss_fn = nn.SmoothL1Loss(beta=CFG['delta'])
    opt = optim.AdamW(model.parameters(), lr=1e-5, weight_decay=CFG['wd'])
    scaler = GradScaler(enabled=True)
    accum_steps = max(1, math.ceil(accum_target / bs_try))
    model.train()
    iters = 0
    try:
        for it, (x, y) in enumerate(dl):
            x = x.to(device).to(memory_format=torch.channels_last); y = y.to(device).view(-1,1)
            with autocast(dtype=torch.float16):
                p = model(x); loss = loss_fn(p, y)
            scaler.scale(loss/accum_steps).backward()
            if ((it+1) % accum_steps == 0) or ((it+1) == len(dl)):
                scaler.unscale_(opt); torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                scaler.step(opt); scaler.update(); opt.zero_grad(set_to_none=True)
            iters += 1
            if iters >= max_iters:
                break
        return True
    except RuntimeError as e:
        if 'out of memory' in str(e).lower():
            return False
        raise
    finally:
        del dl; gc.collect(); torch.cuda.empty_cache()

# Fold loop (skeleton); saves per-fold OOF EV and best weights
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
all_oof = np.zeros(len(df), dtype=np.float32)
folds = sorted(df['fold'].unique())[:CFG['folds']]
for fold in folds:
    trn = df[df['fold'] != fold].copy(); val = df[df['fold'] == fold].copy()
    # Stage 1 @640
    ds_trn = RetinopathyDS(trn, IMG_DIR_TRAIN_S1, size=CFG['size_stage1'], train=True)
    ds_val = RetinopathyDS(val, IMG_DIR_TRAIN_S1, size=CFG['size_stage1'], train=False)
    bs1 = CFG['batch_s1']; accum1 = max(1, math.ceil(CFG['accum_target'] / bs1))
    dl_trn = make_loader(ds_trn, batch_size=bs1, shuffle=True, num_workers=CFG['num_workers_train'], infer=False)
    dl_val = make_loader(ds_val, batch_size=max(1, bs1*2), shuffle=False, num_workers=CFG['num_workers_infer'], infer=True)
    # Early loader diagnostics
    print(f"Fold {fold}: len(train)={len(dl_trn)} len(val)={len(dl_val)}", flush=True)
    t0_fb = time.time()
    _x,_y = next(iter(dl_trn));
    del _x,_y
    print(f"Fold {fold}: first batch load {time.time()-t0_fb:.2f}s", flush=True)
    model = build_model(CFG['model']).to(device).to(memory_format=torch.channels_last)
    ema = ModelEmaV2(model, decay=CFG['ema_decay'], device=device)
    print(f"Fold {fold}: Stage1 640 - bs={bs1}, accum={accum1}", flush=True)
    best_s1 = train_stage(model, ema, dl_trn, dl_val, CFG['epochs_s1'], CFG['lr'], CFG['wd'], accum1, device, epoch_offset=0, early_stop_patience=0)
    # Load best EMA weights into both ema and model
    if ema is not None and best_s1['state'] is not None:
        ema.ema.load_state_dict(best_s1['state'])
        model.load_state_dict(best_s1['state'])
    # Stage 2 @768 (halve lr) using 768 cache from originals
    assert Path(TRAIN_DIR_768).exists() and Path(TEST_DIR_768).exists(), 'cache768 is required for stage2; build it from originals before training'
    ds_trn2 = RetinopathyDS(trn, TRAIN_DIR_768, size=CFG['size_stage2'], train=True)
    ds_val2 = RetinopathyDS(val, TRAIN_DIR_768, size=CFG['size_stage2'], train=False)
    bs2 = CFG['batch_s2']
    accum2 = max(1, math.ceil(CFG['accum_target'] / bs2))
    dl_trn2 = make_loader(ds_trn2, batch_size=bs2, shuffle=True, num_workers=CFG['num_workers_train'], infer=False)
    dl_val2 = make_loader(ds_val2, batch_size=max(1, bs2*2), shuffle=False, num_workers=CFG['num_workers_infer'], infer=True)
    print(f"Fold {fold}: len(train768)={len(dl_trn2)} len(val768)={len(dl_val2)}", flush=True)
    t0_fb2 = time.time()
    _x2,_y2 = next(iter(dl_trn2));
    del _x2,_y2
    print(f"Fold {fold}: first batch 768 load {time.time()-t0_fb2:.2f}s", flush=True)
    print(f"Fold {fold}: Stage2 768 - initial bs={bs2}, accum={accum2} (smoke test)", flush=True)
    # Smoke test @768
    ok = smoke_test_768(model, ds_trn2, bs2, CFG['accum_target'], device, max_iters=50)
    if not ok:
        print('768 smoke test failed at bs={}; retry bs=2'.format(bs2), flush=True)
        bs2 = 2
        accum2 = max(1, math.ceil(CFG['accum_target'] / bs2))
        dl_trn2 = make_loader(ds_trn2, batch_size=bs2, shuffle=True, num_workers=CFG['num_workers_train'], infer=False)
        dl_val2 = make_loader(ds_val2, batch_size=max(1, bs2*2), shuffle=False, num_workers=CFG['num_workers_infer'], infer=True)
        ok2 = smoke_test_768(model, ds_trn2, bs2, CFG['accum_target'], device, max_iters=50)
        if not ok2:
            print('768 smoke test still failing; consider raising RRC scale_min or removing blur.', flush=True)
    print(f"Fold {fold}: Stage2 768 - bs={bs2}, accum={accum2}", flush=True)
    best_s2 = train_stage(model, ema, dl_trn2, dl_val2, CFG['epochs_s2'], CFG['lr']*0.5, CFG['wd'], accum2, device, epoch_offset=CFG['epochs_s1'], early_stop_patience=CFG['patience'])
    if ema is not None and best_s2['state'] is not None:
        ema.ema.load_state_dict(best_s2['state'])
        model.load_state_dict(best_s2['state'])
    # Final fold inference on 768 val for OOF EV
    (ema.ema if ema is not None else model).eval()
    preds = []; targs = []
    with torch.no_grad():
        for x, y in dl_val2:
            x = x.to(device, non_blocking=True).to(memory_format=torch.channels_last)
            with autocast(dtype=torch.float16):
                pr = (ema.ema if ema is not None else model)(x)
            preds.append(pr.float().cpu().numpy().ravel()); targs.append(y.cpu().numpy().ravel())
    p = np.concatenate(preds) if len(preds) else np.zeros(0); y_true = np.concatenate(targs) if len(targs) else np.zeros(0)
    all_oof[val.index.values] = p.astype(np.float32)
    # Save fold checkpoint & OOF snapshot
    ckpt_path = f"ckpt_{CFG['model'].replace('/', '_')}_f{fold}.pth"
    torch.save({'state_dict': (ema.ema if ema is not None else model).state_dict(), 'fold': fold, 'best_q': best_s2['q']}, ckpt_path)
    print(f"Fold {fold} done. Best EMA QWK@def ~ {best_s2['q']:.5f}; saved {ckpt_path}", flush=True)
    del model, ema, ds_trn, ds_val, ds_trn2, ds_val2, dl_trn, dl_val, dl_trn2, dl_val2; gc.collect(); torch.cuda.empty_cache()

# Save OOF EVs
np.save(f"oof_ev_{CFG['model'].replace('/', '_')}_3f.npy", all_oof)
print('Saved OOF EVs:', f"oof_ev_{CFG['model'].replace('/', '_')}_3f.npy")

# Next: implement per-fold isotonic + NNLS with weight caps and robust thresholding (NM -> 2D th2/th3 grid -> bootstrap).

In [None]:
# Build 768px cache from original images with circle crop + Ben Graham + CLAHE + Gray-World
import os, sys, math, time, gc
from pathlib import Path
import numpy as np
import cv2

SRC_TR = Path('train_images')
SRC_TE = Path('test_images')
DST_TR = Path('cache768/train')
DST_TE = Path('cache768/test')
DST_TR.mkdir(parents=True, exist_ok=True)
DST_TE.mkdir(parents=True, exist_ok=True)

SIZE = 768

def gray_world(img):
    # Simple Gray-World color constancy
    imgf = img.astype(np.float32) + 1e-6
    means = imgf.reshape(-1, 3).mean(axis=0)
    gm = float(np.mean(means))
    scale = gm / means
    imgf *= scale
    imgf = np.clip(imgf, 0, 255)
    return imgf.astype(np.uint8)

def circle_crop(img):
    h, w = img.shape[:2]
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    mask = gray > 10
    if not np.any(mask):
        # Fallback to center square crop
        side = min(h, w)
        y0 = (h - side) // 2
        x0 = (w - side) // 2
        return img[y0:y0+side, x0:x0+side]
    ys, xs = np.where(mask)
    y_min, y_max = int(ys.min()), int(ys.max())
    x_min, x_max = int(xs.min()), int(xs.max())
    cy = (y_min + y_max) // 2
    cx = (x_min + x_max) // 2
    r = int(0.5 * max(y_max - y_min, x_max - x_min))
    side = 2 * r
    y0 = max(0, cy - r); y1 = min(h, cy + r)
    x0 = max(0, cx - r); x1 = min(w, cx + r)
    crop = img[y0:y1, x0:x1]
    # Pad to square if needed
    ch, cw = crop.shape[:2]
    side2 = max(ch, cw)
    top = (side2 - ch) // 2; bottom = side2 - ch - top
    left = (side2 - cw) // 2; right = side2 - cw - left
    crop = cv2.copyMakeBorder(crop, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(0,0,0))
    return crop

def ben_graham_enhance(img, sigma=10):
    # Expect BGR uint8
    blur = cv2.GaussianBlur(img, (0,0), sigma)
    out = cv2.addWeighted(img, 4, blur, -4, 128)
    return out

def apply_clahe(img):
    lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
    l, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
    l2 = clahe.apply(l)
    lab2 = cv2.merge([l2, a, b])
    return cv2.cvtColor(lab2, cv2.COLOR_LAB2BGR)

def preprocess_one(img):
    img = gray_world(img)
    img = circle_crop(img)
    img = cv2.resize(img, (SIZE, SIZE), interpolation=cv2.INTER_CUBIC)
    img = ben_graham_enhance(img, sigma=10)
    img = apply_clahe(img)
    return img

def process_dir(src_dir: Path, dst_dir: Path, limit=None):
    names = [p.name for p in src_dir.glob('*.png')]
    total = len(names)
    if limit is not None:
        names = names[:limit]
    t0 = time.time()
    done = 0
    for i, name in enumerate(names, 1):
        src = src_dir / name
        dst = dst_dir / name
        if dst.exists():
            done += 1
            if i % 200 == 0:
                elapsed = time.time() - t0
                print(f"{dst_dir.name}: {i}/{total} (skipped exist) elapsed {elapsed/60:.1f}m", flush=True)
            continue
        img = cv2.imread(str(src), cv2.IMREAD_COLOR)
        if img is None:
            print('WARN: failed to read', src, flush=True)
            continue
        try:
            out = preprocess_one(img)
            cv2.imwrite(str(dst), out, [cv2.IMWRITE_PNG_COMPRESSION, 3])
        except Exception as e:
            print('ERR on', src, e, flush=True)
        done += 1
        if i % 100 == 0:
            elapsed = time.time() - t0
            print(f"{dst_dir.name}: {i}/{total} processed elapsed {elapsed/60:.1f}m", flush=True)
    elapsed = time.time() - t0
    print(f"Done {dst_dir} | processed {done}/{total} in {elapsed/60:.1f}m", flush=True)

print('Building cache768 ...', flush=True)
process_dir(SRC_TR, DST_TR)
gc.collect()
process_dir(SRC_TE, DST_TE)
gc.collect()
print('cache768 build complete.', flush=True)

In [1]:
# GPU diagnostics and (if needed) install CUDA-enabled PyTorch
import sys, subprocess, os, importlib, platform
print('Python:', sys.version)
try:
    import torch
    print('Torch pre-imported:', torch.__version__)
except Exception as e:
    print('Torch not importable before install:', e)
    torch = None

def print_cuda_info():
    import torch
    print(f"GPU Available: {torch.cuda.is_available()}")
    print(f"GPU Count: {torch.cuda.device_count()}")
    if torch.cuda.is_available():
        print(f"GPU Name: {torch.cuda.get_device_name(0)}")
        props = torch.cuda.get_device_properties(0)
        print(f"GPU Memory: {props.total_memory / 1024**3:.1f} GB")
        print('CUDA runtime version (torch):', torch.version.cuda)
    else:
        print('CUDA not available in torch; attempting to install cu121 wheels...')

print('=== Before install ===')
try:
    print_cuda_info()
except Exception as e:
    print('Error checking CUDA info:', e)

need_install = False
try:
    import torch as _t
    need_install = not _t.cuda.is_available()
except Exception:
    need_install = True

if need_install:
    print('Installing CUDA 12.1 wheels for torch/torchvision/torchaudio ...')
    cmd = [sys.executable, '-m', 'pip', 'install', '--upgrade', '--index-url', 'https://download.pytorch.org/whl/cu121', 'torch', 'torchvision', 'torchaudio']
    print('RUN:', ' '.join(cmd))
    subprocess.run(cmd, check=True)
    import importlib as _il
    torch = _il.reload(importlib.import_module('torch'))
    print('Re-imported torch:', torch.__version__)
    print('=== After install ===')
    print_cuda_info()
else:
    print('CUDA is already available in torch. No install required.')

Python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]


Torch pre-imported: 2.5.1+cu121
=== Before install ===
GPU Available: False
GPU Count: 0
CUDA not available in torch; attempting to install cu121 wheels...
Installing CUDA 12.1 wheels for torch/torchvision/torchaudio ...
RUN: /usr/bin/python3.11 -m pip install --upgrade --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio




Looking in indexes: https://download.pytorch.org/whl/cu121


Collecting torch
  Downloading https://download.pytorch.org/whl/cu121/torch-2.5.1%2Bcu121-cp311-cp311-linux_x86_64.whl (780.5 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/780.5 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.7/780.5 MB[0m [31m261.1 MB/s[0m eta [36m0:00:03[0m

[2K     [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.4/780.5 MB[0m [31m276.9 MB/s[0m eta [36m0:00:03[0m

[2K     [91m━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m164.6/780.5 MB[0m [31m282.7 MB/s[0m eta [36m0:00:03[0m

[2K     [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.3/780.5 MB[0m [31m289.9 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.9/780.5 MB[0m [31m290.8 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m340.3/780.5 MB[0m [31m289.0 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━[0m [32m402.7/780.5 MB[0m [31m295.6 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m465.0/780.5 MB[0m [31m296.0 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m521.9/780.5 MB[0m [31m297.5 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m584.3/780.5 MB[0m [31m302.9 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m646.2/780.5 MB[0m [31m302.4 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m708.1/780.5 MB[0m [31m301.9 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m769.7/780.5 MB[0m [31m307.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m780.5/780.5 MB[0m [31m306.7 MB/s[0m  [33m0:00:02[0m
[?25h

Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu121/torchvision-0.20.1%2Bcu121-cp311-cp311-linux_x86_64.whl (7.3 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/7.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.3/7.3 MB[0m [31m310.9 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cu121/torchaudio-2.5.1%2Bcu121-cp311-cp311-linux_x86_64.whl (3.4 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m194.6 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting filelock (from torch)
  Downloading https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions>=4.8.0 (from torch)


  Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting networkx (from torch)
  Downloading https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl.metadata (5.1 kB)


Collecting jinja2 (from torch)
  Downloading https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch)


  Downloading https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/23.7 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m259.8 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/823.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m600.5 MB/s[0m  [33m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)


  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/14.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m311.1 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/664.8 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.2/664.8 MB[0m [31m291.4 MB/s[0m eta [36m0:00:03[0m

[2K     [91m━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.6/664.8 MB[0m [31m300.2 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.6/664.8 MB[0m [31m300.2 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.7/664.8 MB[0m [31m294.1 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m290.5/664.8 MB[0m [31m286.6 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━[0m [32m352.8/664.8 MB[0m [31m290.4 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━[0m [32m410.0/664.8 MB[0m [31m285.0 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m469.2/664.8 MB[0m [31m285.5 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━[0m [32m523.0/664.8 MB[0m [31m286.7 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━[0m [32m585.1/664.8 MB[0m [31m290.1 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m641.5/664.8 MB[0m [31m283.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m288.8 MB/s[0m  [33m0:00:02[0m
[?25h

Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/410.6 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/410.6 MB[0m [31m311.6 MB/s[0m eta [36m0:00:02[0m

[2K     [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.8/410.6 MB[0m [31m310.6 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/410.6 MB[0m [31m305.2 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━[0m [32m246.7/410.6 MB[0m [31m306.5 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m309.1/410.6 MB[0m [31m306.2 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━[0m [32m371.5/410.6 MB[0m [31m306.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 MB[0m [31m305.7 MB/s[0m  [33m0:00:01[0m
[?25h

Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/121.6 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m62.4/121.6 MB[0m [31m311.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m310.1 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/56.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 MB[0m [31m310.4 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/124.2 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m52.4/124.2 MB[0m [31m262.4 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━[0m [32m115.1/124.2 MB[0m [31m286.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.2/124.2 MB[0m [31m287.8 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/196.0 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.0/196.0 MB[0m [31m189.5 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━[0m [32m100.7/196.0 MB[0m [31m250.5 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m162.8/196.0 MB[0m [31m269.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.0/196.0 MB[0m [31m275.4 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-nccl-cu12==2.21.5 (from torch)
  Downloading https://download.pytorch.org/whl/nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/188.7 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/188.7 MB[0m [31m312.1 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━[0m [32m124.8/188.7 MB[0m [31m310.5 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m187.2/188.7 MB[0m [31m310.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.7/188.7 MB[0m [31m304.3 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-nvtx-cu12==12.1.105 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)


Collecting triton==3.1.0 (from torch)
  Downloading https://download.pytorch.org/whl/triton-3.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.5 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.5 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K     [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.9/209.5 MB[0m [31m310.1 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━[0m [32m124.0/209.5 MB[0m [31m309.3 MB/s[0m eta [36m0:00:01[0m

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━[0m [32m185.3/209.5 MB[0m [31m307.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.5/209.5 MB[0m [31m307.3 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting sympy==1.13.1 (from torch)
  Downloading https://download.pytorch.org/whl/sympy-1.13.1-py3-none-any.whl (6.2 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/6.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m346.5 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch)
  Downloading https://download.pytorch.org/whl/nvidia_nvjitlink_cu12-12.9.86-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy==1.13.1->torch)
  Downloading https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/536.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.2/536.2 kB[0m [31m585.3 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting numpy (from torchvision)
  Downloading https://download.pytorch.org/whl/numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/18.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m318.0 MB/s[0m  [33m0:00:00[0m
[?25h

Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading https://download.pytorch.org/whl/pillow-11.0.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (9.1 kB)


Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading https://download.pytorch.org/whl/MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28 kB)
Downloading https://download.pytorch.org/whl/pillow-11.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (4.4 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/4.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.4/4.4 MB[0m [31m332.7 MB/s[0m  [33m0:00:00[0m
[?25hDownloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl (11 kB)
Downloading https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl (177 kB)
Downloading https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl (133 kB)
Downloading https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl (1.7 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.7/39.7 MB[0m [31m306.2 MB/s[0m  [33m0:00:00[0m
[?25h

Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio
[?25l[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 0/26[0m [mpmath]

[2K   [91m━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 1/26[0m [typing-extensions][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy][2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 2/26[0m [sympy]

[2K   [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 3/26[0m [pillow][2K   [91m━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 5/26[0m [nvidia-nvjitlink-cu12]

[2K   [91m━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 5/26[0m [nvidia-nvjitlink-cu12][2K   [91m━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 5/26[0m [nvidia-nvjitlink-cu12]

[2K   [91m━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 5/26[0m [nvidia-nvjitlink-cu12][2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12]

[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12][2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12]

[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12][2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12]

[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12][2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12]

[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12][2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12]

[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12][2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12]

[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 6/26[0m [nvidia-nccl-cu12][2K   [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 7/26[0m [nvidia-curand-cu12]

[2K   [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 7/26[0m [nvidia-curand-cu12][2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12]

[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12][2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12]

[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12][2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12]

[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12][2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12]

[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12][2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m 8/26[0m [nvidia-cufft-cu12]

[2K   [91m━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10/26[0m [nvidia-cuda-nvrtc-cu12][2K   [91m━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10/26[0m [nvidia-cuda-nvrtc-cu12]

[2K   [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11/26[0m [nvidia-cuda-cupti-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m12/26[0m [nvidia-cublas-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy][2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy][2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy][2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy][2K   [91m━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m13/26[0m [numpy]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m14/26[0m [networkx][2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m14/26[0m [networkx]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m14/26[0m [networkx][2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m14/26[0m [networkx]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m14/26[0m [networkx][2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m14/26[0m [networkx]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m18/26[0m [triton]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━[0m [32m19/26[0m [nvidia-cusparse-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━[0m [32m20/26[0m [nvidia-cudnn-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━[0m [32m22/26[0m [nvidia-cusolver-cu12]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m23/26[0m [torch][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m24/26[0m [torchvision]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m24/26[0m [torchvision][2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m24/26[0m [torchvision]

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m25/26[0m [torchaudio][2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26/26[0m [torchaudio]
[?25h[1A[2KSuccessfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 numpy-1.26.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.9.86 nvidia-nvtx-cu12-12.1.105 pillow-11.0.0 sympy-1.13.1 torch-2.5.1+cu121 torchaudio-2.5.1+cu121 torchvision-0.20.1+cu121 triton-3.1.0 typing-extensions-4.12.2


[0m

RuntimeError: Only a single TORCH_LIBRARY can be used to register the namespace triton; please put all of your definitions in a single TORCH_LIBRARY block.  If you were trying to specify implementations, consider using TORCH_LIBRARY_IMPL (which can be duplicated).  If you really intended to define operators for a single namespace in a distributed way, you can use TORCH_LIBRARY_FRAGMENT to explicitly indicate this.  Previous registration of TORCH_LIBRARY was registered at /dev/null:2504; latest registration was registered at /dev/null:2504