# ALASKA2 Image Steganalysis — Plan (updated per expert advice)

Goal: 4-class training (Cover, JMiPOD, JUNIWARD, UERD). Submission Label = 1 - p_cover (probability of stego). Target weighted-AUROC medal.

Protocol:
- Env check first (nvidia-smi); install PyTorch cu121 stack only after confirming GPU.
- Data audit: build manifest with columns [path, label_4c, is_stego, jpeg_qf, source_id, fold].
- CV: StratifiedGroupKFold(n_splits=5, stratify=4-class label, groups=source_id parsed from filename) to avoid leakage across images from same base cover.
- Preprocessing:
  - Use YCbCr (or Y-only) inputs. Start with 256 for smoke, move to 512 (8×8 JPEG grid aligned center-crop).
  - Augmentations: only flips/transpose (no rotations, no color jitter, no random-resized-crop).
  - Scale to [0,1]; avoid per-image z-scoring; keep JPEG artifacts intact; consistent decoder.

Baseline (hours 0–3):
- tf_efficientnet_b2 (timm), 4-class head, AMP, AdamW (lr=2e-4, wd=1e-4), cosine w/ warmup, label smoothing 0.1.
- Input 256 (RGB for fastest bring-up if YCbCr loader not ready), 3 epochs/fold, balanced batches (~50% cover, 50% stego split across 3).
- HFlip TTA at inference. Save OOF logits and per-fold weights.

Push to medal (hours 3–20):
- Switch to YCbCr/Y-only at 512 with grid-aligned center-crop.
- Stronger backbones: tf_efficientnet_b3/b4 or convnext_tiny.
- Add SRM residual channels (few fixed kernels or ~30 maps with TLU) concatenated to Y; adapt first conv (1×1) to pretrained backbone. Expect +0.01–0.03 CV.
- Train 2–3 diverse models/seeds/resolutions.

Ensembling (hours 20–24):
- Logit-average using OOF to set simple weights; if rushed, equal-weight logit avg.
- Diversity: SRM vs no-SRM, Y-only vs YCbCr, different backbones/resolutions/seeds.

Milestones & reviews:
1) After env check + manifest/folds (request expert).
2) After baseline 1–2 folds sanity and OOF.
3) After full 5-fold YCbCr/512 run.
4) After second backbone and/or SRM variant; then ensemble.

Next: add env/GPU check code cell, then implement manifest builder + folds.

In [None]:
# Env/GPU check and quick dataset audit
import os, subprocess, glob, time, shutil
print('=== nvidia-smi ===', flush=True)
subprocess.run(['bash','-lc','nvidia-smi || true'], check=False)
print('\n=== Dataset counts ===', flush=True)
base='.'
cls_dirs=['Cover','JMiPOD','JUNIWARD','UERD','Test']
for d in cls_dirs:
    t0=time.time()
    files=glob.glob(os.path.join(base,d,'*.jpg'))
    print(f'{d}: {len(files)} files (scan {time.time()-t0:.2f}s)', flush=True)
print('\nSample files:', flush=True)
for d in ['Cover','JMiPOD','JUNIWARD','UERD','Test']:
    files=glob.glob(os.path.join(base,d,'*.jpg'))[:3]
    print(d, [os.path.basename(x) for x in files], flush=True)
print('\nDisk usage of repo (du -h -d1):', flush=True)
subprocess.run(['bash','-lc','du -h -d1 . | sort -h | tail -n 10'], check=False)

In [None]:
# Build manifest and 5-fold StratifiedGroupKFold
import os, glob, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold

label_map = {'Cover':0, 'JMiPOD':1, 'JUNIWARD':2, 'UERD':3}
rows = []
for cls in ['Cover','JMiPOD','JUNIWARD','UERD']:
    for p in glob.glob(os.path.join(cls,'*.jpg')):
        fname = os.path.basename(p)
        source_id = os.path.splitext(fname)[0]  # group by base id
        rows.append({
            'path': p,
            'filename': fname,
            'label_4c': label_map[cls],
            'is_stego': 0 if cls=='Cover' else 1,
            'jpeg_qf': -1,  # placeholder; optional extraction later
            'source_id': source_id
        })
df = pd.DataFrame(rows)
print('Manifest size:', df.shape, 'cover:', (df.label_4c==0).sum(), 'stego:', (df.label_4c!=0).sum(), flush=True)

# StratifiedGroupKFold by 4-class with grouping by source_id
n_splits = 5
skf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=42)
df['fold'] = -1
for fold, (trn_idx, val_idx) in enumerate(skf.split(df, y=df['label_4c'], groups=df['source_id'])):
    df.loc[val_idx, 'fold'] = fold
print('Fold distribution (counts by label per fold):', flush=True)
print(df.groupby(['fold','label_4c']).size().unstack(1), flush=True)

out_csv = 'manifest.csv'
df.to_csv(out_csv, index=False)
print('Saved', out_csv, 'with columns:', list(df.columns), flush=True)

In [2]:
# Clean reinstall PyTorch cu124 and purge shadowed .pip-target dirs (install only; no torch import here)
import sys, subprocess, time, os, shutil
t0=time.time()
print('Uninstalling any existing torch stack...', flush=True)
subprocess.run([sys.executable,'-m','pip','uninstall','-y','torch','torchvision','torchaudio'], check=False)
print('Purging shadowing dirs under /app/.pip-target ...', flush=True)
purge_dirs = [
    '/app/.pip-target/torch',
    '/app/.pip-target/torch-2.4.1.dist-info',
    '/app/.pip-target/torch-2.5.1.dist-info',
    '/app/.pip-target/torchvision',
    '/app/.pip-target/torchvision-0.19.1.dist-info',
    '/app/.pip-target/torchvision-0.20.1.dist-info',
    '/app/.pip-target/torchaudio',
    '/app/.pip-target/torchaudio-2.4.1.dist-info',
    '/app/.pip-target/torchaudio-2.5.1.dist-info'
]
for d in purge_dirs:
    if os.path.exists(d):
        print('Removing', d, flush=True)
        shutil.rmtree(d, ignore_errors=True)
print('Installing torch/cu124...', flush=True)
subprocess.run([sys.executable,'-m','pip','install','--force-reinstall','--no-cache-dir',
                '--index-url','https://download.pytorch.org/whl/cu124',
                'torch==2.5.1','torchvision==0.20.1','torchaudio==2.5.1'], check=True)
print('Install done in %.1fs' % (time.time()-t0), flush=True)
print('NOTE: After this cell, restart kernel then run the verify cell (idx 5).', flush=True)

Uninstalling any existing torch stack...


Found existing installation: torch 2.4.1+cu121


Uninstalling torch-2.4.1+cu121:
  Successfully uninstalled torch-2.4.1+cu121
Found existing installation: torchvision 0.19.1+cu121
Uninstalling torchvision-0.19.1+cu121:
  Successfully uninstalled torchvision-0.19.1+cu121
Found existing installation: torchaudio 2.4.1+cu121


Uninstalling torchaudio-2.4.1+cu121:
  Successfully uninstalled torchaudio-2.4.1+cu121
Installing torch/cu124...


Looking in indexes: https://download.pytorch.org/whl/cu124


Collecting torch==2.5.1
  Downloading https://download.pytorch.org/whl/cu124/torch-2.5.1%2Bcu124-cp311-cp311-linux_x86_64.whl (908.3 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 908.3/908.3 MB 536.7 MB/s eta 0:00:00


Collecting torchvision==0.20.1
  Downloading https://download.pytorch.org/whl/cu124/torchvision-0.20.1%2Bcu124-cp311-cp311-linux_x86_64.whl (7.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 125.1 MB/s eta 0:00:00


Collecting torchaudio==2.5.1
  Downloading https://download.pytorch.org/whl/cu124/torchaudio-2.5.1%2Bcu124-cp311-cp311-linux_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 257.9 MB/s eta 0:00:00


Collecting jinja2
  Downloading https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 KB 418.1 MB/s eta 0:00:00
Collecting typing-extensions>=4.8.0
  Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl (37 kB)


Collecting nvidia-cusparse-cu12==12.3.1.170
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.5/207.5 MB 522.7 MB/s eta 0:00:00


Collecting triton==3.1.0
  Downloading https://download.pytorch.org/whl/triton-3.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.5/209.5 MB 557.5 MB/s eta 0:00:00


Collecting nvidia-cuda-cupti-cu12==12.4.127
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 454.4 MB/s eta 0:00:00


Collecting nvidia-nvjitlink-cu12==12.4.127
  Downloading https://download.pytorch.org/whl/cu124/nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 343.2 MB/s eta 0:00:00


Collecting nvidia-nvtx-cu12==12.4.127
  Downloading https://download.pytorch.org/whl/cu124/nvidia_nvtx_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (99 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 KB 433.1 MB/s eta 0:00:00


Collecting nvidia-cudnn-cu12==9.1.0.70
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 330.7 MB/s eta 0:00:00


Collecting filelock
  Downloading https://download.pytorch.org/whl/filelock-3.13.1-py3-none-any.whl (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 313.1 MB/s eta 0:00:00


Collecting nvidia-cuda-runtime-cu12==12.4.127
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 883.7/883.7 KB 462.6 MB/s eta 0:00:00


Collecting fsspec
  Downloading https://download.pytorch.org/whl/fsspec-2024.6.1-py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.6/177.6 KB 479.9 MB/s eta 0:00:00


Collecting nvidia-nccl-cu12==2.21.5
  Downloading https://download.pytorch.org/whl/nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 188.7/188.7 MB 553.3 MB/s eta 0:00:00


Collecting networkx
  Downloading https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 498.4 MB/s eta 0:00:00


Collecting sympy==1.13.1
  Downloading https://download.pytorch.org/whl/sympy-1.13.1-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 345.5 MB/s eta 0:00:00


Collecting nvidia-cusolver-cu12==11.6.1.9
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 127.9/127.9 MB 544.5 MB/s eta 0:00:00


Collecting nvidia-cublas-cu12==12.4.5.8
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.4/363.4 MB 524.5 MB/s eta 0:00:00


Collecting nvidia-curand-cu12==10.3.5.147
  Downloading https://download.pytorch.org/whl/cu124/nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 457.7 MB/s eta 0:00:00


Collecting nvidia-cufft-cu12==11.2.1.3
  Downloading https://download.pytorch.org/whl/cu124/nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)


     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.5/211.5 MB 341.7 MB/s eta 0:00:00


Collecting numpy
  Downloading https://download.pytorch.org/whl/numpy-1.26.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 293.7 MB/s eta 0:00:00


Collecting pillow!=8.3.*,>=5.3.0
  Downloading https://download.pytorch.org/whl/pillow-11.0.0-cp311-cp311-manylinux_2_28_x86_64.whl (4.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 421.2 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 492.2 MB/s eta 0:00:00


Collecting MarkupSafe>=2.0
  Downloading https://download.pytorch.org/whl/MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28 kB)


Installing collected packages: mpmath, typing-extensions, sympy, pillow, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, torchvision, torchaudio


Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 numpy-1.26.3 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.4.127 pillow-11.0.0 sympy-1.13.1 torch-2.5.1+cu124 torchaudio-2.5.1+cu124 torchvision-0.20.1+cu124 triton-3.1.0 typing-extensions-4.12.2




Install done in 53.9s


In [None]:
# CUDA diagnostics via subprocess with LD_LIBRARY_PATH set
import os, sys, subprocess, textwrap
env = os.environ.copy()
ld = env.get('LD_LIBRARY_PATH','')
prefix = '/usr/local/nvidia/lib:/usr/local/nvidia/lib64'
env['LD_LIBRARY_PATH'] = f"{prefix}:{ld}" if ld else prefix
print('LD_LIBRARY_PATH =', env['LD_LIBRARY_PATH'], flush=True)
code = textwrap.dedent('''
import os, torch, traceback
print('torch:', torch.__version__, 'built CUDA:', getattr(torch.version,'cuda',None))
print('CUDA available:', torch.cuda.is_available())
try:
    import ctypes, sys
    for p in ('/usr/local/nvidia/lib','/usr/local/nvidia/lib64'):
        try:
            ctypes.CDLL(p+'/libcudart.so', mode=ctypes.RTLD_GLOBAL)
        except Exception:
            pass
    if torch.cuda.is_available():
        print('GPU:', torch.cuda.get_device_name(0))
        print('Device count:', torch.cuda.device_count())
except Exception as e:
    print('Exception during CUDA check:', e)
    traceback.print_exc()
''')
print('--- Subprocess torch CUDA check ---', flush=True)
res = subprocess.run([sys.executable, '-c', code], env=env, text=True, capture_output=True)
print(res.stdout, flush=True)
print(res.stderr, flush=True)
print('Return code:', res.returncode, flush=True)

In [6]:
# Verify Torch CUDA after clean install (set env before import) + diagnostics (hard purge .pip-target shadowing)
import os, subprocess, sys, shutil
print('Python:', sys.executable)
print('Initial CUDA_VISIBLE_DEVICES=', os.getenv('CUDA_VISIBLE_DEVICES'))
print('Initial NVIDIA_VISIBLE_DEVICES=', os.getenv('NVIDIA_VISIBLE_DEVICES'))
if os.getenv('CUDA_VISIBLE_DEVICES') in (None, '', 'void'):
    os.environ['CUDA_VISIBLE_DEVICES'] = '0'; print('Set CUDA_VISIBLE_DEVICES=0')
if os.getenv('NVIDIA_VISIBLE_DEVICES') in (None, '', 'void'):
    os.environ['NVIDIA_VISIBLE_DEVICES'] = 'all'; print('Set NVIDIA_VISIBLE_DEVICES=all')
os.environ['PYTHONNOUSERSITE'] = '1'
print('Set PYTHONNOUSERSITE=1 to ignore user site')
print('Effective CUDA_VISIBLE_DEVICES=', os.getenv('CUDA_VISIBLE_DEVICES'))
print('Effective NVIDIA_VISIBLE_DEVICES=', os.getenv('NVIDIA_VISIBLE_DEVICES'))
subprocess.run(['bash','-lc','ldconfig -p | grep libcuda || true'], check=False)

# Hard purge entire /app/.pip-target to avoid any shadowing
pip_target = '/app/.pip-target'
if os.path.exists(pip_target):
    print('Removing entire', pip_target)
    shutil.rmtree(pip_target, ignore_errors=True)

# Clean sys.modules if torch was previously imported in this kernel
for mod in list(sys.modules.keys()):
    if mod == 'torch' or mod.startswith('torch.'):
        del sys.modules[mod]

# Clean sys.path of any user directories and .pip-target entries
sys.path = [p for p in sys.path if p and '/.pip-target' not in p and '/app/.local' not in p]
print('sys.path cleaned (first 8):', sys.path[:8])

try:
    import importlib
    torch = importlib.import_module('torch')
    print('Torch:', torch.__version__, 'built CUDA:', getattr(torch.version,'cuda',None), flush=True)
    print('Torch file:', getattr(torch, '__file__', 'unknown'))
    try:
        from importlib.metadata import version
        print('importlib.metadata torch version:', version('torch'))
    except Exception as e:
        print('metadata read error:', e)
    print('CUDA available:', torch.cuda.is_available(), flush=True)
    if torch.cuda.is_available():
        print('GPU:', torch.cuda.get_device_name(0), 'Count:', torch.cuda.device_count(), flush=True)
    else:
        try:
            torch.zeros(1).cuda()
        except Exception as e:
            print('CUDA init error:', e)
except Exception as e:
    print('Torch import failed:', e)
print('Done verify.')

Python: /usr/bin/python3.11
Initial CUDA_VISIBLE_DEVICES= 0
Initial NVIDIA_VISIBLE_DEVICES= all
Set PYTHONNOUSERSITE=1 to ignore user site
Effective CUDA_VISIBLE_DEVICES= 0
Effective NVIDIA_VISIBLE_DEVICES= all
	libcudart.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
	libcudadebugger.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1
	libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1
Removing shadowed torch dir: /app/.pip-target/torch


Removing /app/.pip-target/torch-2.4.1.dist-info
sys.path cleaned (first 8): ['/app', '/app/src', '/usr/lib/python311.zip', '/usr/lib/python3.11', '/usr/lib/python3.11/lib-dynload', '', '/usr/local/lib/python3.11/dist-packages', '/usr/lib/python3/dist-packages']
Torch: 2.4.1+cu121 built CUDA: 12.1


Torch file: /app/.pip-target/torch/__init__.py
metadata read error: No package metadata was found for torch
CUDA available: False


CUDA init error: No CUDA GPUs are available
Done verify.


In [12]:
# CPU fallback: DCT histogram features + LightGBM multiclass baseline
import os, gc, time, glob, numpy as np, pandas as pd
from joblib import Parallel, delayed
from PIL import Image
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb

def load_Y(path):
    im = Image.open(path).convert('YCbCr')
    y, _, _ = im.split()
    return np.asarray(y, dtype=np.float32)

# Precompute 8x8 DCT matrix (orthonormal)
def dct_matrix_8():
    N = 8
    C = np.zeros((N,N), dtype=np.float32)
    for k in range(N):
        for n in range(N):
            alpha = np.sqrt(1/N) if k==0 else np.sqrt(2/N)
            C[k,n] = alpha * np.cos((np.pi*(2*n+1)*k)/(2*N))
    return C
C8 = dct_matrix_8()

# Zigzag indices for 8x8
def zigzag_indices(n=8):
    idx = []
    for s in range(2*n-1):
        if s%2==0:
            for i in range(s, -1, -1):
                j = s - i
                if i<n and j<n: idx.append((i,j))
        else:
            for j in range(s, -1, -1):
                i = s - j
                if i<n and j<n: idx.append((i,j))
    return idx
zz = zigzag_indices(8)
ac_positions = zz[1:21]  # first 20 AC coefficients (skip DC at [0,0])

def dct_hist_features(y, clip=10, bins=21):
    H, W = y.shape
    H8, W8 = (H//8)*8, (W//8)*8
    y = y[:H8, :W8]
    # blockify to (nb, 8, 8)
    y_blocks = y.reshape(H8//8, 8, W8//8, 8).transpose(0,2,1,3).reshape(-1,8,8)
    # DCT for all blocks: C @ B @ C.T
    tmp = np.einsum('ab,nbc->nac', C8, y_blocks, optimize=True)
    dcts = np.einsum('nab,cb->nac', tmp, C8, optimize=True)  # (nb,8,8)
    # hist per selected AC
    feats = []
    for (i,j) in ac_positions:
        v = np.rint(dcts[:, i, j]).astype(np.int32)
        v = np.clip(v, -clip, clip)
        # bins centered at integers from -clip..clip
        hist = np.bincount(v + clip, minlength=bins).astype(np.float32)
        # L1 normalize
        s = hist.sum()
        feats.append(hist / (s if s>0 else 1.0))
    return np.concatenate(feats, axis=0)  # 20*21 = 420

def extract_one(path):
    try:
        y = load_Y(path)
        return dct_hist_features(y)
    except Exception as e:
        return np.full(420, np.nan, dtype=np.float32)

def _extract_in_chunks(paths, chunk=5000, n_jobs=36, tag='train'):
    all_parts = []
    t0 = time.time()
    for i in range(0, len(paths), chunk):
        t1 = time.time()
        part_paths = paths[i:i+chunk]
        part = Parallel(n_jobs=n_jobs, batch_size=64, prefer='threads')(delayed(extract_one)(p) for p in part_paths)
        part = np.vstack(part).astype(np.float32)
        all_parts.append(part)
        done = i + len(part_paths)
        print(f'[{tag}] processed {done}/{len(paths)} in chunk {i//chunk+1}, elapsed {time.time()-t1:.1f}s total {time.time()-t0:.1f}s', flush=True)
    return np.vstack(all_parts) if all_parts else np.zeros((0,420), dtype=np.float32)

def _save_paths(paths, fname):
    with open(fname, 'w') as f:
        for p in paths:
            f.write(str(p)+'\n')

def _load_paths(fname):
    if not os.path.exists(fname):
        return None
    with open(fname, 'r') as f:
        return [line.strip() for line in f.readlines()]

def build_features(manifest_csv='manifest.csv', cache_train='train_feats.npy', cache_test='test_feats.npy',
                   subsample_n=None, chunk=5000, n_jobs=36, seed=42):
    df = pd.read_csv(manifest_csv)
    # Optional stratified subsample for smoke runs
    if subsample_n is not None and subsample_n < len(df):
        per = max(1, subsample_n // 4)
        dfs = []
        for c in [0,1,2,3]:
            sub = df[df['label_4c']==c].sample(n=per, random_state=seed)
            dfs.append(sub)
        df = pd.concat(dfs, axis=0).sort_values('path').reset_index(drop=True)  # deterministic order
        print('Subsampled manifest to', len(df), 'rows (approx stratified by label)', flush=True)
        cache_train = f'train_feats_{len(df)}.npy'
    else:
        df = df.sort_values('path').reset_index(drop=True)
    train_paths = df['path'].tolist()
    train_paths_txt = cache_train + '.paths.txt'
    test_files = sorted(glob.glob('Test/*.jpg'))
    test_paths_txt = cache_test + '.paths.txt'
    # Train features with cache validation
    need_train = True
    if os.path.exists(cache_train) and os.path.exists(train_paths_txt):
        cached = _load_paths(train_paths_txt)
        if cached == train_paths:
            need_train = False
    if need_train:
        t0=time.time()
        X = _extract_in_chunks(train_paths, chunk=chunk, n_jobs=n_jobs, tag='train')
        np.save(cache_train, X)
        _save_paths(train_paths, train_paths_txt)
        print('Saved', cache_train, X.shape, 'in %.1fs' % (time.time()-t0), flush=True)
    else:
        X = np.load(cache_train); print('Loaded', cache_train, X.shape)
    # Test features with cache validation
    need_test = True
    if os.path.exists(cache_test) and os.path.exists(test_paths_txt):
        cached_t = _load_paths(test_paths_txt)
        if cached_t == test_files:
            need_test = False
    if need_test:
        t0=time.time()
        XT = _extract_in_chunks(test_files, chunk=chunk, n_jobs=n_jobs, tag='test')
        np.save(cache_test, XT)
        _save_paths(test_files, test_paths_txt)
        print('Saved', cache_test, XT.shape, 'in %.1fs' % (time.time()-t0), flush=True)
    else:
        XT = np.load(cache_test); print('Loaded', cache_test, XT.shape)
    return df, X, test_files, XT

def train_lgbm(df, X, n_splits=5, seed=42):
    params = dict(objective='multiclass', num_class=4, learning_rate=0.05,
                  max_depth=-1, num_leaves=255, feature_fraction=0.9, bagging_fraction=0.9,
                  bagging_freq=1, min_data_in_leaf=50, lambda_l1=0.0, lambda_l2=0.0,
                  n_jobs=36, verbose=-1)
    oof = np.zeros((len(df), 4), dtype=np.float32)
    skf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=seed)
    for fold, (trn_idx, val_idx) in enumerate(skf.split(X, y=df['label_4c'], groups=df['source_id'])):
        t0=time.time()
        Xtr, Xva = X[trn_idx], X[val_idx]
        ytr, yva = df['label_4c'].values[trn_idx], df['label_4c'].values[val_idx]
        dtr = lgb.Dataset(Xtr, label=ytr)
        dva = lgb.Dataset(Xva, label=yva)
        print(f'[Fold {fold}] train {Xtr.shape} val {Xva.shape}', flush=True)
        # Use fixed rounds to avoid early_stopping API mismatch
        bst = lgb.train(params, dtr, num_boost_round=600, valid_sets=[dtr, dva], valid_names=['tr','va'])
        oof[val_idx] = bst.predict(Xva)
        # AUROC over stego vs cover
        p_cover = oof[val_idx, 0]
        y_bin = (df.iloc[val_idx]['label_4c'].values!=0).astype(np.int32)
        try:
            auc = roc_auc_score(y_bin, 1.0 - p_cover)
            print(f'[Fold {fold}] bin AUC: {auc:.4f} | time {time.time()-t0:.1f}s', flush=True)
        except Exception:
            print(f'[Fold {fold}] AUC failed | time {time.time()-t0:.1f}s', flush=True)
        del Xtr, Xva, ytr, yva, dtr, dva, bst; gc.collect()
    return oof

def fit_and_predict(subsample_n=None, n_splits=5, chunk=5000, n_jobs=36):
    df, X, test_files, XT = build_features(subsample_n=subsample_n, chunk=chunk, n_jobs=n_jobs)
    oof = train_lgbm(df, X, n_splits=n_splits, seed=42)
    # Train final model on full (subsampled) data
    params = dict(objective='multiclass', num_class=4, learning_rate=0.05,
                  max_depth=-1, num_leaves=255, feature_fraction=0.9, bagging_fraction=0.9,
                  bagging_freq=1, min_data_in_leaf=50, lambda_l1=0.0, lambda_l2=0.0,
                  n_jobs=36, verbose=-1)
    dfull = lgb.Dataset(X, label=df['label_4c'].values)
    print('Training final model on full data...', flush=True)
    bst = lgb.train(params, dfull, num_boost_round=800)
    Ptest = bst.predict(XT)
    # Submission: Label = 1 - p_cover
    p_stego = 1.0 - Ptest[:,0]
    sub = pd.DataFrame({'Id': [os.path.basename(p) for p in test_files], 'Label': p_stego})
    sub = sub.sort_values('Id')
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape, flush=True)

# To run: fit_and_predict(subsample_n=20000, n_splits=3)  # smoke; then scale up
print('CPU DCT-hist pipeline cell ready. Call fit_and_predict(subsample_n=..., n_splits=...) to start.', flush=True)

CPU DCT-hist pipeline cell ready. Call fit_and_predict(subsample_n=..., n_splits=...) to start.


In [17]:
# Start CPU feature extraction/training/submission (smoke run first)
import os
# Avoid BLAS over-subscription when using joblib threads
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
print('Launching fit_and_predict(subsample_n=20000, n_splits=3) ...', flush=True)
fit_and_predict(subsample_n=20000, n_splits=3, chunk=5000, n_jobs=36)

Launching fit_and_predict(subsample_n=20000, n_splits=3) ...


Subsampled manifest to 20000 rows (approx stratified by label)


[train] processed 5000/20000 in chunk 1, elapsed 34.5s total 34.5s


[train] processed 10000/20000 in chunk 2, elapsed 34.5s total 69.0s


[train] processed 15000/20000 in chunk 3, elapsed 34.6s total 103.6s


[train] processed 20000/20000 in chunk 4, elapsed 34.4s total 138.0s


Saved train_feats_20000.npy (20000, 420) in 138.0s


Loaded test_feats.npy (5000, 420)


[Fold 0] train (13332, 420) val (6668, 420)


[Fold 0] bin AUC: 0.4974 | time 81.7s


[Fold 1] train (13336, 420) val (6664, 420)


[Fold 1] bin AUC: 0.5147 | time 78.0s


[Fold 2] train (13332, 420) val (6668, 420)


[Fold 2] bin AUC: 0.5180 | time 79.2s


Training final model on full data...


Saved submission.csv (5000, 2)


In [15]:
# Sanity checks: alignment, splits, and cache validity for subsample_n=20000
import os, glob, numpy as np, pandas as pd
from sklearn.model_selection import StratifiedGroupKFold

manifest_csv = 'manifest.csv'
cache_train = 'train_feats_20000.npy'
paths_txt = cache_train + '.paths.txt'

df = pd.read_csv(manifest_csv)
# Recreate the deterministic 20k subsample logic (per cell 6): per-class sample then sort by path
per = max(1, 20000 // 4)
dfs = []
for c in [0,1,2,3]:
    sub = df[df['label_4c']==c].sample(n=per, random_state=42)
    dfs.append(sub)
df_sub = pd.concat(dfs, axis=0).sort_values('path').reset_index(drop=True)
train_paths_expected = df_sub['path'].tolist()

print('Subsampled df_sub:', df_sub.shape, 'cover:', (df_sub.label_4c==0).sum(), flush=True)

# Compare cached paths (if exists)
cached_ok = False
if os.path.exists(paths_txt):
    with open(paths_txt, 'r') as f:
        cached_paths = [line.strip() for line in f.readlines()]
    mism = sum(a!=b for a,b in zip(cached_paths, train_paths_expected)) if len(cached_paths)==len(train_paths_expected) else -1
    cached_ok = (mism == 0 and len(cached_paths)==len(train_paths_expected))
    print('Cache paths exists:', True, '| same length:', len(cached_paths)==len(train_paths_expected), '| mismatches:', mism, flush=True)
else:
    print('Cache paths file missing:', paths_txt, flush=True)

# Basic split/group sanity on df_sub
skf = StratifiedGroupKFold(n_splits=3, shuffle=True, random_state=42)
fold_ids = np.full(len(df_sub), -1, dtype=int)
for k, (tr, va) in enumerate(skf.split(df_sub, y=df_sub['label_4c'], groups=df_sub['source_id'])):
    fold_ids[va] = k
df_sub['fold'] = fold_ids
print('Fold/label counts:\n', df_sub.groupby(['fold','label_4c']).size().unstack(1), flush=True)
# Ensure no source_id in both train and val per fold
for k in range(3):
    tr_sid = set(df_sub.loc[df_sub['fold']!=k, 'source_id'])
    va_sid = set(df_sub.loc[df_sub['fold']==k, 'source_id'])
    inter = tr_sid.intersection(va_sid)
    print(f'Fold {k} group leakage count:', len(inter), flush=True)

# Feature health if cache present
if os.path.exists(cache_train):
    X = np.load(cache_train)
    print('Loaded X:', X.shape, flush=True)
    if X.shape[0] != len(df_sub):
        print('ROW COUNT MISMATCH: X rows vs df_sub rows ->', X.shape[0], len(df_sub), flush=True)
    n_nan = int(np.isnan(X).sum())
    n_zero_var = int((X.std(0)==0).sum())
    print('NaNs:', n_nan, '| zero-variance feats:', n_zero_var, flush=True)
else:
    print('Cache feature file missing:', cache_train, flush=True)

# If cache paths missing or mismatch, advise deletion
if (not os.path.exists(paths_txt)) or (not cached_ok):
    print('Recommendation: delete stale cache to force regeneration:', cache_train, 'and', paths_txt, flush=True)

Subsampled df_sub: (20000, 7) cover: 5000


Cache paths file missing: train_feats_20000.npy.paths.txt


Fold/label counts:
 label_4c     0     1     2     3
fold                            
0         1667  1667  1667  1667
1         1666  1666  1666  1666
2         1667  1667  1667  1667


Fold 0 group leakage count: 0


Fold 1 group leakage count: 0


Fold 2 group leakage count: 0


Loaded X: (20000, 420)


NaNs: 0 | zero-variance feats: 0


Recommendation: delete stale cache to force regeneration: train_feats_20000.npy and train_feats_20000.npy.paths.txt


In [16]:
# Force-regenerate cached features for 20k subset
import os
targets = ['train_feats_20000.npy', 'train_feats_20000.npy.paths.txt']
for t in targets:
    if os.path.exists(t):
        try:
            os.remove(t)
            print('Deleted', t, flush=True)
        except Exception as e:
            print('Failed to delete', t, e, flush=True)
    else:
        print('Not found (ok):', t, flush=True)

Deleted train_feats_20000.npy


Not found (ok): train_feats_20000.npy.paths.txt


In [19]:
# SRM-lite residuals + co-occurrences (CPU) and LightGBM baseline
import os, time, glob, numpy as np, pandas as pd, gc
from PIL import Image
from joblib import Parallel, delayed
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
from scipy.signal import convolve2d

def load_Y(path):
    im = Image.open(path).convert('YCbCr')
    y, _, _ = im.split()
    return np.asarray(y, dtype=np.float32)

# 6 SRM-lite kernels (as per expert advice) with symmetric padding
K3_1 = np.array([[0,1,0],[1,-4,1],[0,1,0]], dtype=np.float32)
K3_2 = np.array([[-1,2,-1],[2,-4,2],[-1,2,-1]], dtype=np.float32)
K3_H = np.array([[-1,2,-1],[-1,2,-1],[-1,2,-1]], dtype=np.float32)
K3_V = K3_H.T.copy()
K3_D = np.array([[ 2,-1, 0],[-1, 0, 1],[ 0, 1,-2]], dtype=np.float32)
K5_S = (1.0/12.0) * np.array([[-1, 2,-2, 2,-1],
                               [ 2,-6, 8,-6, 2],
                               [-2, 8,-12, 8,-2],
                               [ 2,-6, 8,-6, 2],
                               [-1, 2,-2, 2,-1]], dtype=np.float32)
SRM_KERNELS = [K3_1, K3_2, K3_H, K3_V, K3_D, K5_S]

def _quantize(res, T=3):
    q = np.rint(res)
    return np.clip(q, -T, T).astype(np.int8)  # values in [-T..T]

def _hist7(v):
    # v int in [-3..3] -> bins 0..6
    idx = (v + 3).ravel().astype(np.int32)
    h = np.bincount(idx, minlength=7).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def _cooc2(v, axis=1):
    # 2nd-order co-occurrence along axis: returns 49-dim (7x7) L1-normalized
    if axis==1:
        a = v[:, :-1].astype(np.int32) + 3
        b = v[:,  1: ].astype(np.int32) + 3
    else:
        a = v[:-1, :].astype(np.int32) + 3
        b = v[ 1:, :].astype(np.int32) + 3
    idx = (a * 7 + b).ravel()
    h = np.bincount(idx, minlength=49).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def srm_features_from_Y(y, T=3):
    feats = []
    for k in SRM_KERNELS:
        r = convolve2d(y, k, mode='same', boundary='symm')
        q = _quantize(r, T=T)
        feats.append(_hist7(q))           # 7
        feats.append(_cooc2(q, axis=1))   # 49 horizontal
        feats.append(_cooc2(q, axis=0))   # 49 vertical
    return np.concatenate(feats).astype(np.float32)  # 6*(7+49+49) = 630

def srm_extract_one(path):
    try:
        y = load_Y(path)
        return srm_features_from_Y(y, T=3)
    except Exception:
        return np.full(630, np.nan, dtype=np.float32)

def _paths_save(paths, fname):
    with open(fname, 'w') as f:
        for p in paths: f.write(str(p)+'\n')
def _paths_load(fname):
    if not os.path.exists(fname): return None
    with open(fname,'r') as f: return [line.strip() for line in f.readlines()]

def _extract_in_chunks_srm(paths, chunk=4000, n_jobs=36, tag='train'):
    from joblib import Parallel, delayed
    all_parts = []; t0=time.time()
    for i in range(0, len(paths), chunk):
        t1=time.time(); part_paths = paths[i:i+chunk]
        part = Parallel(n_jobs=n_jobs, batch_size=64, prefer='threads')(delayed(srm_extract_one)(p) for p in part_paths)
        part = np.vstack(part).astype(np.float32); all_parts.append(part)
        print(f'[{tag}-srm] processed {i+len(part_paths)}/{len(paths)} chunk {i//chunk+1} elapsed {time.time()-t1:.1f}s total {time.time()-t0:.1f}s', flush=True)
    return np.vstack(all_parts) if all_parts else np.zeros((0,630), dtype=np.float32)

def build_features_srm(manifest_csv='manifest.csv', subsample_n=None, seed=42, n_jobs=36, chunk=4000):
    df = pd.read_csv(manifest_csv)
    if subsample_n is not None and subsample_n < len(df):
        per = max(1, subsample_n // 4)
        dfs = []
        for c in [0,1,2,3]:
            dfs.append(df[df['label_4c']==c].sample(n=per, random_state=seed))
        df = pd.concat(dfs, axis=0).sort_values('path').reset_index(drop=True)
        cache_train = f'train_feats_srm_{len(df)}.npy'
    else:
        df = df.sort_values('path').reset_index(drop=True)
        cache_train = 'train_feats_srm.npy'
    train_paths = df['path'].tolist()
    train_paths_txt = cache_train + '.paths.txt'
    test_paths = sorted(glob.glob('Test/*.jpg'))
    cache_test = 'test_feats_srm.npy'
    test_paths_txt = cache_test + '.paths.txt'
    # Train cache validate
    need_train = True
    if os.path.exists(cache_train) and os.path.exists(train_paths_txt):
        if _paths_load(train_paths_txt) == train_paths:
            need_train = False
    if need_train:
        X = _extract_in_chunks_srm(train_paths, chunk=chunk, n_jobs=n_jobs, tag='train')
        np.save(cache_train, X); _paths_save(train_paths, train_paths_txt)
        print('Saved', cache_train, X.shape, flush=True)
    else:
        X = np.load(cache_train); print('Loaded', cache_train, X.shape, flush=True)
    # Test cache validate
    need_test = True
    if os.path.exists(cache_test) and os.path.exists(test_paths_txt):
        if _paths_load(test_paths_txt) == test_paths:
            need_test = False
    if need_test:
        XT = _extract_in_chunks_srm(test_paths, chunk=chunk, n_jobs=n_jobs, tag='test')
        np.save(cache_test, XT); _paths_save(test_paths, test_paths_txt)
        print('Saved', cache_test, XT.shape, flush=True)
    else:
        XT = np.load(cache_test); print('Loaded', cache_test, XT.shape, flush=True)
    return df, X, test_paths, XT, cache_train, cache_test

def train_lgbm_srm(df, X, n_splits=5, seed=42):
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
                  max_depth=8, num_leaves=128, min_data_in_leaf=100,
                  feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
                  lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
    oof = np.zeros((len(df), 4), dtype=np.float32)
    skf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=seed)
    for fold, (tr, va) in enumerate(skf.split(X, y=df['label_4c'], groups=df['source_id'])):
        t0=time.time()
        dtr = lgb.Dataset(X[tr], label=df['label_4c'].values[tr])
        dva = lgb.Dataset(X[va], label=df['label_4c'].values[va])
        print(f'[SRM Fold {fold}] train {len(tr)} val {len(va)}', flush=True)
        bst = lgb.train(params, dtr, num_boost_round=1200, valid_sets=[dtr, dva], valid_names=['tr','va'])
        oof[va] = bst.predict(X[va])
        y_bin = (df.iloc[va]['label_4c'].values!=0).astype(np.int32)
        try:
            auc = roc_auc_score(y_bin, 1.0 - oof[va,0])
            print(f'[SRM Fold {fold}] bin AUC: {auc:.4f} | time {time.time()-t0:.1f}s', flush=True)
        except Exception:
            print(f'[SRM Fold {fold}] AUC failed | time {time.time()-t0:.1f}s', flush=True)
        del dtr, dva; gc.collect()
    return oof

def fit_and_predict_srm(subsample_n=20000, n_splits=3, chunk=4000, n_jobs=36):
    print('Launching SRM-lite pipeline...', flush=True)
    df, X, test_paths, XT, ctr, cte = build_features_srm(subsample_n=subsample_n, chunk=chunk, n_jobs=n_jobs)
    print('Features SRM:', X.shape, XT.shape, flush=True)
    oof = train_lgbm_srm(df, X, n_splits=n_splits, seed=42)
    # Final model
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
                  max_depth=8, num_leaves=128, min_data_in_leaf=100,
                  feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
                  lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
    print('Training final SRM model...', flush=True)
    dfull = lgb.Dataset(X, label=df['label_4c'].values)
    bst = lgb.train(params, dfull, num_boost_round=1500)
    P = bst.predict(XT)
    p_stego = 1.0 - P[:,0]
    sub = pd.DataFrame({'Id': [os.path.basename(p) for p in test_paths], 'Label': p_stego}).sort_values('Id')
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape, flush=True)
    return oof

In [21]:
# Run SRM-lite smoke test (smaller subset for quick signal check)
import os
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
print('Launching fit_and_predict_srm(subsample_n=5000, n_splits=3) ...', flush=True)
oof_srm = fit_and_predict_srm(subsample_n=5000, n_splits=3, chunk=4000, n_jobs=36)

Launching fit_and_predict_srm(subsample_n=5000, n_splits=3) ...


Launching SRM-lite pipeline...


[train-srm] processed 4000/5000 chunk 1 elapsed 217.8s total 217.8s


[train-srm] processed 5000/5000 chunk 2 elapsed 53.7s total 271.5s


Saved train_feats_srm_5000.npy (5000, 630)


[test-srm] processed 4000/5000 chunk 1 elapsed 216.9s total 216.9s


[test-srm] processed 5000/5000 chunk 2 elapsed 54.2s total 271.1s


Saved test_feats_srm.npy (5000, 630)


Features SRM: (5000, 630) (5000, 630)


[SRM Fold 0] train 3332 val 1668


[SRM Fold 0] bin AUC: 0.5379 | time 10.0s


[SRM Fold 1] train 3332 val 1668


[SRM Fold 1] bin AUC: 0.5368 | time 7.6s


[SRM Fold 2] train 3336 val 1664


[SRM Fold 2] bin AUC: 0.5183 | time 9.4s


Training final SRM model...


Saved submission.csv (5000, 2)


In [22]:
# DCTR-lite JPEG-domain features (CPU) + LightGBM
import os, time, glob, numpy as np, pandas as pd, gc, subprocess, sys
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb

# Ensure jpeg2dct installed
try:
    import jpeg2dct
except Exception:
    subprocess.run([sys.executable, '-m', 'pip', 'install', '--no-cache-dir', 'jpeg2dct'], check=True)
    import jpeg2dct

# First 12 AC zigzag positions (skip DC (0,0))
def zigzag_8x8():
    n=8; idx=[]
    for s in range(2*n-1):
        if s%2==0:
            for i in range(s, -1, -1):
                j=s-i
                if i<n and j<n: idx.append((i,j))
        else:
            for j in range(s, -1, -1):
                i=s-j
                if i<n and j<n: idx.append((i,j))
    return idx
zz = zigzag_8x8()[1:13]  # 12 AC subbands

def _ternary(x, thr=1):
    # map to {-1,0,1} with dead-zone
    t = np.zeros_like(x, dtype=np.int8)
    t[x >  thr] = 1
    t[x < -thr] = -1
    return t

def _cooc3_1d(arr, order=3, base=3):
    # arr values in {-1,0,1} -> map to {0,1,2}; 3^order bins
    v = (arr + 1).astype(np.int32)
    if order == 3:
        a = v[:, :-2]; b = v[:, 1:-1]; c = v[:, 2:]
        idx = (a * (base*base) + b * base + c).ravel()
        hist = np.bincount(idx, minlength=base**order).astype(np.float32)
    else:
        raise ValueError('order!=3 not supported')
    s = hist.sum()
    return hist / (s if s>0 else 1.0)

def dctr_lite_one(path, subbands=zz, thr=1):
    try:
        coefs = jpeg2dct.load(path)
        # coefs.y shape: (H/8, W/8, 8, 8) of quantized DCT ints
        Y = coefs.y  # int16/32
        feats = []
        for (i,j) in subbands:
            sub = Y[..., i, j].astype(np.int32)
            # inter-block horizontal/vertical diffs
            Dx = sub[:, 1:] - sub[:, :-1]
            Dy = sub[1:, :] - sub[:-1, :]
            tx = _ternary(Dx, thr=thr)
            ty = _ternary(Dy, thr=thr)
            # 3rd-order co-occ along rows for Dx and along cols for Dy
            hx = _cooc3_1d(tx, order=3, base=3)  # 27
            hy = _cooc3_1d(ty.T, order=3, base=3)  # also along rows after transpose -> 27
            feats.append(hx); feats.append(hy)
        return np.concatenate(feats).astype(np.float32)  # 12 * (27+27) = 648
    except Exception:
        return np.full(12*54, np.nan, dtype=np.float32)

def _paths_save(paths, fname):
    with open(fname, 'w') as f:
        for p in paths: f.write(str(p)+'\n')
def _paths_load(fname):
    if not os.path.exists(fname): return None
    with open(fname,'r') as f: return [line.strip() for line in f.readlines()]

def _extract_in_chunks_dctr(paths, chunk=6000, n_jobs=36, tag='train'):
    from joblib import Parallel, delayed
    all_parts = []; t0=time.time()
    for i in range(0, len(paths), chunk):
        t1=time.time(); part_paths = paths[i:i+chunk]
        part = Parallel(n_jobs=n_jobs, batch_size=64, prefer='threads')(delayed(dctr_lite_one)(p) for p in part_paths)
        part = np.vstack(part).astype(np.float32); all_parts.append(part)
        print(f'[{tag}-dctr] processed {i+len(part_paths)}/{len(paths)} chunk {i//chunk+1} elapsed {time.time()-t1:.1f}s total {time.time()-t0:.1f}s', flush=True)
    return np.vstack(all_parts) if all_parts else np.zeros((0,648), dtype=np.float32)

def build_features_dctr(manifest_csv='manifest.csv', subsample_n=None, seed=42, n_jobs=36, chunk=6000):
    df = pd.read_csv(manifest_csv)
    if subsample_n is not None and subsample_n < len(df):
        per = max(1, subsample_n // 4)
        dfs = [df[df['label_4c']==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]
        df = pd.concat(dfs, axis=0).sort_values('path').reset_index(drop=True)
        cache_train = f'train_feats_dctr_{len(df)}.npy'
    else:
        df = df.sort_values('path').reset_index(drop=True)
        cache_train = 'train_feats_dctr.npy'
    train_paths = df['path'].tolist()
    train_paths_txt = cache_train + '.paths.txt'
    test_paths = sorted(glob.glob('Test/*.jpg'))
    cache_test = 'test_feats_dctr.npy'
    test_paths_txt = cache_test + '.paths.txt'
    # Train cache validate
    need_train = True
    if os.path.exists(cache_train) and os.path.exists(train_paths_txt):
        if _paths_load(train_paths_txt) == train_paths:
            need_train = False
    if need_train:
        X = _extract_in_chunks_dctr(train_paths, chunk=chunk, n_jobs=n_jobs, tag='train')
        np.save(cache_train, X); _paths_save(train_paths, train_paths_txt)
        print('Saved', cache_train, X.shape, flush=True)
    else:
        X = np.load(cache_train); print('Loaded', cache_train, X.shape, flush=True)
    # Test cache validate
    need_test = True
    if os.path.exists(cache_test) and os.path.exists(test_paths_txt):
        if _paths_load(test_paths_txt) == test_paths:
            need_test = False
    if need_test:
        XT = _extract_in_chunks_dctr(test_paths, chunk=chunk, n_jobs=n_jobs, tag='test')
        np.save(cache_test, XT); _paths_save(test_paths, test_paths_txt)
        print('Saved', cache_test, XT.shape, flush=True)
    else:
        XT = np.load(cache_test); print('Loaded', cache_test, XT.shape, flush=True)
    return df, X, test_paths, XT, cache_train, cache_test

def train_lgbm_dctr(df, X, n_splits=3, seed=42):
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
                  max_depth=8, num_leaves=128, min_data_in_leaf=100,
                  feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
                  lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
    oof = np.zeros((len(df), 4), dtype=np.float32)
    skf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=seed)
    for fold, (tr, va) in enumerate(skf.split(X, y=df['label_4c'], groups=df['source_id'])):
        t0=time.time()
        dtr = lgb.Dataset(X[tr], label=df['label_4c'].values[tr])
        dva = lgb.Dataset(X[va], label=df['label_4c'].values[va])
        print(f'[DCTR Fold {fold}] train {len(tr)} val {len(va)}', flush=True)
        bst = lgb.train(params, dtr, num_boost_round=1500, valid_sets=[dtr, dva], valid_names=['tr','va'])
        oof[va] = bst.predict(X[va])
        y_bin = (df.iloc[va]['label_4c'].values!=0).astype(np.int32)
        try:
            auc = roc_auc_score(y_bin, 1.0 - oof[va,0])
            print(f'[DCTR Fold {fold}] bin AUC: {auc:.4f} | time {time.time()-t0:.1f}s', flush=True)
        except Exception:
            print(f'[DCTR Fold {fold}] AUC failed | time {time.time()-t0:.1f}s', flush=True)
        del dtr, dva; gc.collect()
    return oof

def fit_and_predict_dctr(subsample_n=5000, n_splits=3, chunk=6000, n_jobs=36):
    print('Launching DCTR-lite pipeline...', flush=True)
    df, X, test_paths, XT, ctr, cte = build_features_dctr(subsample_n=subsample_n, chunk=chunk, n_jobs=n_jobs)
    print('Features DCTR:', X.shape, XT.shape, flush=True)
    oof = train_lgbm_dctr(df, X, n_splits=n_splits, seed=42)
    # Final model
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
                  max_depth=8, num_leaves=128, min_data_in_leaf=100,
                  feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
                  lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
    print('Training final DCTR model...', flush=True)
    dfull = lgb.Dataset(X, label=df['label_4c'].values)
    bst = lgb.train(params, dfull, num_boost_round=1800)
    P = bst.predict(XT)
    p_stego = 1.0 - P[:,0]
    sub = pd.DataFrame({'Id': [os.path.basename(p) for p in test_paths], 'Label': p_stego}).sort_values('Id')
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape, flush=True)
    return oof

Collecting jpeg2dct
  Downloading jpeg2dct-0.2.4.tar.gz (106 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 106.5/106.5 KB 4.7 MB/s eta 0:00:00
  Preparing metadata (setup.py): started


  Preparing metadata (setup.py): finished with status 'done'


Collecting numpy
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.3/18.3 MB 196.9 MB/s eta 0:00:00
Building wheels for collected packages: jpeg2dct
  Building wheel for jpeg2dct (setup.py): started


  Building wheel for jpeg2dct (setup.py): finished with status 'error'
  Running setup.py clean for jpeg2dct
Failed to build jpeg2dct


  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [63 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.11
      creating build/lib.linux-x86_64-3.11/test
      copying test/__init__.py -> build/lib.linux-x86_64-3.11/test
      creating build/lib.linux-x86_64-3.11/jpeg2dct
      copying jpeg2dct/__init__.py -> build/lib.linux-x86_64-3.11/jpeg2dct
      creating build/lib.linux-x86_64-3.11/test/tensorflow
      copying test/tensorflow/__init__.py -> build/lib.linux-x86_64-3.11/test/tensorflow
      copying test/tensorflow/test_decode.py -> build/lib.linux-x86_64-3.11/test/tensorflow
      creating build/lib.linux-x86_64-3.11/test/numpy
      copying test/numpy/test_load.py -> build/lib.linux-x86_64-3.11/test/numpy
      copying test/numpy/__init__.py -> build/lib.linux-x86_64-3.11/test/numpy
      creating build/lib.linu

Installing collected packages: numpy, jpeg2dct


  Running setup.py install for jpeg2dct: started


  Running setup.py install for jpeg2dct: finished with status 'error'


  error: subprocess-exited-with-error
  
  × Running setup.py install for jpeg2dct did not run successfully.
  │ exit code: 1
  ╰─> [65 lines of output]
      running install
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.11
      creating build/lib.linux-x86_64-3.11/test
      copying test/__init__.py -> build/lib.linux-x86_64-3.11/test
      creating build/lib.linux-x86_64-3.11/jpeg2dct
      copying jpeg2dct/__init__.py -> build/lib.linux-x86_64-3.11/jpeg2dct
      creating build/lib.linux-x86_64-3.11/test/tensorflow
      copying test/tensorflow/__init__.py -> build/lib.linux-x86_64-3.11/test/tensorflow
      copying test/tensorflow/test_decode.py -> build/lib.linux-x86_64-3.11/test/tensorflow
      creating build/lib.linux-x86_64-3.11/test/numpy
      copying test/numpy/test_load.py -> build/lib.linux-x86_64-3.11/test/numpy
      copying test/numpy/__init__.py -> build/lib.linux-x86_64-3.11/test/numpy
      creating build/li

CalledProcessError: Command '['/usr/bin/python3.11', '-m', 'pip', 'install', '--no-cache-dir', 'jpeg2dct']' returned non-zero exit status 1.

In [23]:
# SRM-lite v2: more kernels + diagonal co-occurrences (CPU) + LightGBM
import os, time, glob, numpy as np, pandas as pd, gc
from PIL import Image
from joblib import Parallel, delayed
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
from scipy.signal import convolve2d

def load_Y(path):
    im = Image.open(path).convert('YCbCr')
    y, _, _ = im.split()
    return np.asarray(y, dtype=np.float32)

# Expanded kernel set (12):
# basic Laplacians
K3_L4 = np.array([[0,1,0],[1,-4,1],[0,1,0]], dtype=np.float32)
K3_L8 = np.array([[1,1,1],[1,-8,1],[1,1,1]], dtype=np.float32)
# directional (second-derivative-like)
K3_H2 = np.array([[-1,2,-1],[-1,2,-1],[-1,2,-1]], dtype=np.float32)
K3_V2 = K3_H2.T.copy()
K3_D1 = np.array([[ 2,-1, 0],[-1, 0, 1],[ 0, 1,-2]], dtype=np.float32)
K3_D2 = np.array([[ 0,-1, 2],[-1, 0, 1],[ 2, 1, 0]], dtype=np.float32)
# Sobel-like
K3_SX = np.array([[ 1, 2, 1],[ 0, 0, 0],[-1,-2,-1]], dtype=np.float32)
K3_SY = K3_SX.T.copy()
# Prewitt-like
K3_PX = np.array([[ 1, 1, 1],[ 0, 0, 0],[-1,-1,-1]], dtype=np.float32)
K3_PY = K3_PX.T.copy()
# strong 5x5
K5_S = (1.0/12.0) * np.array([[-1, 2,-2, 2,-1],[ 2,-6, 8,-6, 2],[-2, 8,-12, 8,-2],[ 2,-6, 8,-6, 2],[-1, 2,-2, 2,-1]], dtype=np.float32)
# small sharpening
K3_SH = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]], dtype=np.float32)

SRM_KERNELS_V2 = [K3_L4, K3_L8, K3_H2, K3_V2, K3_D1, K3_D2, K3_SX, K3_SY, K3_PX, K3_PY, K5_S, K3_SH]

def _quantize(res, T=2):
    q = np.rint(res)
    return np.clip(q, -T, T).astype(np.int8)

def _hist_levels(v, T=2):
    # bins: [-T..T] -> 2T+1
    offset = T
    idx = (v + offset).ravel().astype(np.int32)
    h = np.bincount(idx, minlength=2*T+1).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def _cooc2_axis(v, T=2, axis=1):
    Q = 2*T+1
    off = T
    if axis==1:
        a = v[:, :-1].astype(np.int32) + off
        b = v[:,  1: ].astype(np.int32) + off
    else:
        a = v[:-1, :].astype(np.int32) + off
        b = v[ 1:, :].astype(np.int32) + off
    idx = (a * Q + b).ravel()
    h = np.bincount(idx, minlength=Q*Q).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def _cooc2_diag(v, T=2, diag_type='main'):
    # diag_type: 'main' for i,j with (i+1,j+1); 'anti' for (i+1,j-1)
    Q = 2*T+1; off=T
    if diag_type=='main':
        a = v[:-1, :-1].astype(np.int32) + off
        b = v[ 1:,  1:].astype(np.int32) + off
    else:
        a = v[:-1, 1:].astype(np.int32) + off
        b = v[ 1:, :-1].astype(np.int32) + off
    idx = (a * Q + b).ravel()
    h = np.bincount(idx, minlength=Q*Q).astype(np.float32)
    s = h.sum()
    return h / (s if s>0 else 1.0)

def srm_v2_features_from_Y(y, T=2):
    feats = []
    for k in SRM_KERNELS_V2:
        r = convolve2d(y, k, mode='same', boundary='symm')
        q = _quantize(r, T=T)
        # 1D hist
        feats.append(_hist_levels(q, T=T))                    # (2T+1)=5
        # 2nd-order co-occ: H, V, diag, anti-diag
        feats.append(_cooc2_axis(q, T=T, axis=1))             # 25
        feats.append(_cooc2_axis(q, T=T, axis=0))             # 25
        feats.append(_cooc2_diag(q, T=T, diag_type='main'))   # 25
        feats.append(_cooc2_diag(q, T=T, diag_type='anti'))   # 25
    return np.concatenate(feats).astype(np.float32)  # per map: 5+25*4=105; 12 maps -> 1260 dims

def srm_v2_extract_one(path):
    try:
        y = load_Y(path)
        return srm_v2_features_from_Y(y, T=2)
    except Exception:
        return np.full(1260, np.nan, dtype=np.float32)

def _paths_save(paths, fname):
    with open(fname, 'w') as f:
        for p in paths: f.write(str(p)+'\n')
def _paths_load(fname):
    if not os.path.exists(fname): return None
    with open(fname,'r') as f: return [line.strip() for line in f.readlines()]

def _extract_in_chunks_srm_v2(paths, chunk=3000, n_jobs=36, tag='train'):
    all_parts = []; t0=time.time()
    for i in range(0, len(paths), chunk):
        t1=time.time(); part_paths = paths[i:i+chunk]
        part = Parallel(n_jobs=n_jobs, batch_size=64, prefer='threads')(delayed(srm_v2_extract_one)(p) for p in part_paths)
        part = np.vstack(part).astype(np.float32); all_parts.append(part)
        print(f'[{tag}-srmv2] processed {i+len(part_paths)}/{len(paths)} chunk {i//chunk+1} elapsed {time.time()-t1:.1f}s total {time.time()-t0:.1f}s', flush=True)
    return np.vstack(all_parts) if all_parts else np.zeros((0,1260), dtype=np.float32)

def build_features_srm_v2(manifest_csv='manifest.csv', subsample_n=None, seed=42, n_jobs=36, chunk=3000):
    df = pd.read_csv(manifest_csv)
    if subsample_n is not None and subsample_n < len(df):
        per = max(1, subsample_n // 4)
        dfs = [df[df['label_4c']==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]
        df = pd.concat(dfs, axis=0).sort_values('path').reset_index(drop=True)
        cache_train = f'train_feats_srmv2_{len(df)}.npy'
    else:
        df = df.sort_values('path').reset_index(drop=True)
        cache_train = 'train_feats_srmv2.npy'
    train_paths = df['path'].tolist()
    train_paths_txt = cache_train + '.paths.txt'
    test_paths = sorted(glob.glob('Test/*.jpg'))
    cache_test = 'test_feats_srmv2.npy'
    test_paths_txt = cache_test + '.paths.txt'
    need_train = True
    if os.path.exists(cache_train) and os.path.exists(train_paths_txt):
        if _paths_load(train_paths_txt) == train_paths: need_train = False
    if need_train:
        X = _extract_in_chunks_srm_v2(train_paths, chunk=chunk, n_jobs=n_jobs, tag='train')
        np.save(cache_train, X); _paths_save(train_paths, train_paths_txt)
        print('Saved', cache_train, X.shape, flush=True)
    else:
        X = np.load(cache_train); print('Loaded', cache_train, X.shape, flush=True)
    need_test = True
    if os.path.exists(cache_test) and os.path.exists(test_paths_txt):
        if _paths_load(test_paths_txt) == test_paths: need_test = False
    if need_test:
        XT = _extract_in_chunks_srm_v2(test_paths, chunk=chunk, n_jobs=n_jobs, tag='test')
        np.save(cache_test, XT); _paths_save(test_paths, test_paths_txt)
        print('Saved', cache_test, XT.shape, flush=True)
    else:
        XT = np.load(cache_test); print('Loaded', cache_test, XT.shape, flush=True)
    return df, X, test_paths, XT, cache_train, cache_test

def train_lgbm_srm_v2(df, X, n_splits=3, seed=42):
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
                  max_depth=8, num_leaves=128, min_data_in_leaf=100,
                  feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
                  lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
    oof = np.zeros((len(df), 4), dtype=np.float32)
    skf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=seed)
    for fold, (tr, va) in enumerate(skf.split(X, y=df['label_4c'], groups=df['source_id'])):
        t0=time.time()
        dtr = lgb.Dataset(X[tr], label=df['label_4c'].values[tr])
        dva = lgb.Dataset(X[va], label=df['label_4c'].values[va])
        print(f'[SRMv2 Fold {fold}] train {len(tr)} val {len(va)}', flush=True)
        bst = lgb.train(params, dtr, num_boost_round=1500, valid_sets=[dtr, dva], valid_names=['tr','va'])
        oof[va] = bst.predict(X[va])
        y_bin = (df.iloc[va]['label_4c'].values!=0).astype(np.int32)
        try:
            auc = roc_auc_score(y_bin, 1.0 - oof[va,0])
            print(f'[SRMv2 Fold {fold}] bin AUC: {auc:.4f} | time {time.time()-t0:.1f}s', flush=True)
        except Exception:
            print(f'[SRMv2 Fold {fold}] AUC failed | time {time.time()-t0:.1f}s', flush=True)
        del dtr, dva; gc.collect()
    return oof

def fit_and_predict_srm_v2(subsample_n=5000, n_splits=3, chunk=3000, n_jobs=36):
    print('Launching SRM-lite v2 pipeline...', flush=True)
    df, X, test_paths, XT, ctr, cte = build_features_srm_v2(subsample_n=subsample_n, chunk=chunk, n_jobs=n_jobs)
    print('Features SRMv2:', X.shape, XT.shape, flush=True)
    oof = train_lgbm_srm_v2(df, X, n_splits=n_splits, seed=42)
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
                  max_depth=8, num_leaves=128, min_data_in_leaf=100,
                  feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
                  lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
    print('Training final SRMv2 model...', flush=True)
    dfull = lgb.Dataset(X, label=df['label_4c'].values)
    bst = lgb.train(params, dfull, num_boost_round=1800)
    P = bst.predict(XT)
    p_stego = 1.0 - P[:,0]
    sub = pd.DataFrame({'Id': [os.path.basename(p) for p in test_paths], 'Label': p_stego}).sort_values('Id')
    sub.to_csv('submission.csv', index=False)
    print('Saved submission.csv', sub.shape, flush=True)
    return oof

In [24]:
# Run SRM-lite v2 smoke test (5k)
import os
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
print('Launching fit_and_predict_srm_v2(subsample_n=5000, n_splits=3) ...', flush=True)
oof_srm_v2 = fit_and_predict_srm_v2(subsample_n=5000, n_splits=3, chunk=3000, n_jobs=36)

Launching fit_and_predict_srm_v2(subsample_n=5000, n_splits=3) ...


Launching SRM-lite v2 pipeline...


[train-srmv2] processed 3000/5000 chunk 1 elapsed 327.9s total 327.9s


[train-srmv2] processed 5000/5000 chunk 2 elapsed 218.0s total 545.9s


Saved train_feats_srmv2_5000.npy (5000, 1260)


[test-srmv2] processed 3000/5000 chunk 1 elapsed 327.0s total 327.0s


[test-srmv2] processed 5000/5000 chunk 2 elapsed 217.2s total 544.2s


Saved test_feats_srmv2.npy (5000, 1260)


Features SRMv2: (5000, 1260) (5000, 1260)


[SRMv2 Fold 0] train 3332 val 1668


[SRMv2 Fold 0] bin AUC: 0.5288 | time 17.1s


[SRMv2 Fold 1] train 3332 val 1668


[SRMv2 Fold 1] bin AUC: 0.5240 | time 19.4s


[SRMv2 Fold 2] train 3336 val 1664


[SRMv2 Fold 2] bin AUC: 0.5346 | time 19.5s


Training final SRMv2 model...


Saved submission.csv (5000, 2)


In [25]:
# Install JPEG dev libs and jpeg2dct (for DCTR-lite)
import subprocess, sys
print('Installing system deps...', flush=True)
subprocess.run(['bash','-lc','apt-get update && apt-get install -y --no-install-recommends libjpeg-dev zlib1g-dev build-essential'], check=True)
print('Installing jpeg2dct...', flush=True)
subprocess.run([sys.executable, '-m', 'pip', 'install', '--no-cache-dir', 'jpeg2dct==0.2.4'], check=True)
import jpeg2dct, numpy as np
print('jpeg2dct version OK, import success.')

Installing system deps...


Reading package lists...


E: List directory /var/lib/apt/lists/partial is missing. - Acquire (30: Read-only file system)


CalledProcessError: Command '['bash', '-lc', 'apt-get update && apt-get install -y --no-install-recommends libjpeg-dev zlib1g-dev build-essential']' returned non-zero exit status 100.

In [26]:
# Combine SRM (630) + SRM-v2 (1260) features and train LGBM (5k subset)
import numpy as np, pandas as pd, os, time, gc
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb

subsample_n = 5000
seed = 42
print('Loading cached SRM/SRMv2 features for 5k...', flush=True)
X1 = np.load('train_feats_srm_5000.npy') if os.path.exists('train_feats_srm_5000.npy') else None
X2 = np.load('train_feats_srmv2_5000.npy') if os.path.exists('train_feats_srmv2_5000.npy') else None
XT1 = np.load('test_feats_srm.npy') if os.path.exists('test_feats_srm.npy') else None
XT2 = np.load('test_feats_srmv2.npy') if os.path.exists('test_feats_srmv2.npy') else None
assert X1 is not None and X2 is not None and XT1 is not None and XT2 is not None, 'Missing cached features; run cells 11 and 14 first.'
assert X1.shape[0] == subsample_n and X2.shape[0] == subsample_n, f'Row mismatch: {X1.shape} {X2.shape}'
assert XT1.shape[0] == XT2.shape[0] == 5000, f'Test rows mismatch: {XT1.shape} {XT2.shape}'
X = np.concatenate([X1, X2], axis=1).astype(np.float32)
XT = np.concatenate([XT1, XT2], axis=1).astype(np.float32)
print('Combined shapes:', X.shape, XT.shape, flush=True)

# Rebuild the deterministic 5k df to align labels/folds
df_all = pd.read_csv('manifest.csv')
per = subsample_n // 4
dfs = []
for c in [0,1,2,3]:
    dfs.append(df_all[df_all['label_4c']==c].sample(n=per, random_state=seed))
df = pd.concat(dfs, axis=0).sort_values('path').reset_index(drop=True)
print('df shape:', df.shape, '| label counts:', df['label_4c'].value_counts().to_dict(), flush=True)

# CV train
params = dict(objective='multiclass', num_class=4, learning_rate=0.03,
              max_depth=8, num_leaves=128, min_data_in_leaf=100,
              feature_fraction=0.6, bagging_fraction=0.7, bagging_freq=1,
              lambda_l1=1.0, lambda_l2=1.0, n_jobs=36, verbose=-1)
skf = StratifiedGroupKFold(n_splits=3, shuffle=True, random_state=seed)
oof = np.zeros((len(df), 4), dtype=np.float32)
t0 = time.time()
for fold, (tr, va) in enumerate(skf.split(X, y=df['label_4c'], groups=df['source_id'])):
    dtr = lgb.Dataset(X[tr], label=df['label_4c'].values[tr])
    dva = lgb.Dataset(X[va], label=df['label_4c'].values[va])
    print(f'[COMB Fold {fold}] train {len(tr)} val {len(va)}', flush=True)
    bst = lgb.train(params, dtr, num_boost_round=1500, valid_sets=[dtr, dva], valid_names=['tr','va'])
    oof[va] = bst.predict(X[va])
    y_bin = (df.iloc[va]['label_4c'].values!=0).astype(np.int32)
    auc = roc_auc_score(y_bin, 1.0 - oof[va,0])
    print(f'[COMB Fold {fold}] bin AUC: {auc:.4f}', flush=True)
print('CV done in %.1fs' % (time.time()-t0), flush=True)

# Final model and submission
print('Training final combined model...', flush=True)
dfull = lgb.Dataset(X, label=df['label_4c'].values)
bst = lgb.train(params, dfull, num_boost_round=1800)
P = bst.predict(XT)
p_stego = 1.0 - P[:,0]
import glob
test_paths = sorted(glob.glob('Test/*.jpg'))
import pandas as pd, os
sub = pd.DataFrame({'Id': [os.path.basename(p) for p in test_paths], 'Label': p_stego}).sort_values('Id')
sub.to_csv('submission.csv', index=False)
print('Saved submission.csv', sub.shape, flush=True)

Loading cached SRM/SRMv2 features for 5k...


Combined shapes: (5000, 1890) (5000, 1890)


df shape: (5000, 7) | label counts: {0: 1250, 1: 1250, 2: 1250, 3: 1250}


[COMB Fold 0] train 3332 val 1668


[COMB Fold 0] bin AUC: 0.5350


[COMB Fold 1] train 3332 val 1668


[COMB Fold 1] bin AUC: 0.5259


[COMB Fold 2] train 3336 val 1664


[COMB Fold 2] bin AUC: 0.5331


CV done in 76.1s


Training final combined model...


Saved submission.csv (5000, 2)


In [58]:
# Rich SRM (per expert) + JPEG proxy (block DCT + boundaries + Qtable) features; LGBM trainer
import os, glob, time, numpy as np, pandas as pd, gc
from PIL import Image
from joblib import Parallel, delayed
from scipy.signal import convolve2d
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.metrics import roc_auc_score
import lightgbm as lgb

# --- Utils ---
def load_Y(path):
    im = Image.open(path).convert('YCbCr')
    y, _, _ = im.split()
    return np.asarray(y, dtype=np.float32)

def zigzag_indices(n=8):
    idx=[]
    for s in range(2*n-1):
        if s%2==0:
            for i in range(s,-1,-1):
                j=s-i
                if i<n and j<n: idx.append((i,j))
        else:
            for j in range(s,-1,-1):
                i=s-j
                if i<n and j<n: idx.append((i,j))
    return idx
ZZ_FULL = zigzag_indices(8)

def dct_matrix_8():
    N=8; C=np.zeros((N,N), dtype=np.float32)
    for k in range(N):
        for n in range(N):
            alpha = np.sqrt(1/N) if k==0 else np.sqrt(2/N)
            C[k,n] = alpha * np.cos((np.pi*(2*n+1)*k)/(2*N))
    return C
C8 = dct_matrix_8()

# --- Rich SRM subset (8 kernels), T=3, 2nd-order 4 dirs + 3rd-order (H,V,D,A) for top-4 maps ---
K3_L4 = np.array([[0,1,0],[1,-4,1],[0,1,0]], dtype=np.float32)
K3_L8 = np.array([[1,1,1],[1,-8,1],[1,1,1]], dtype=np.float32)
K3_H2 = np.array([[-1,2,-1],[-1,2,-1],[-1,2,-1]], dtype=np.float32)
K3_V2 = K3_H2.T.copy()
K3_D1 = np.array([[ 2,-1, 0],[-1, 0, 1],[ 0, 1,-2]], dtype=np.float32)
# Zero-sum diagonal kernel (fixed) to avoid saturation
K3_D2 = np.array([[ 0,-1, 2],
                  [-1, 0, 1],
                  [-2, 1, 0]], dtype=np.float32)
K5_S  = (1.0/12.0)*np.array([[-1,2,-2,2,-1],[2,-6,8,-6,2],[-2,8,-12,8,-2],[2,-6,8,-6,2],[-1,2,-2,2,-1]], dtype=np.float32)
K3_2  = np.array([[-1,2,-1],[ 2,-4, 2],[-1,2,-1]], dtype=np.float32)
SRM8 = [K3_L4, K3_L8, K3_H2, K3_V2, K3_D1, K3_D2, K5_S, K3_2]
TOP4_IDX = {0,1,2,3}  # indices in SRM8
# Expert-refined per-kernel scaling
SRM_SCALE = [2,8,2,2,3,5,1,2]

def _tlu_q(v, T=3):
    q = np.rint(v)
    return np.clip(q, -T, T).astype(np.int8)

def _hist_levels(q, T=3):
    Q = 2*T+1; off=T
    idx = (q + off).ravel().astype(np.int32)
    h = np.bincount(idx, minlength=Q).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def _cooc2_dir(q, T=3, mode='H'):
    Q = 2*T+1; off=T
    if mode=='H':
        a = q[:, :-1].astype(np.int32)+off; b = q[:, 1:].astype(np.int32)+off
    elif mode=='V':
        a = q[:-1, :].astype(np.int32)+off; b = q[1:, :].astype(np.int32)+off
    elif mode=='D':
        a = q[:-1, :-1].astype(np.int32)+off; b = q[1:, 1:].astype(np.int32)+off
    else:  # 'A' anti-diag
        a = q[:-1, 1:].astype(np.int32)+off; b = q[1:, :-1].astype(np.int32)+off
    idx = (a*Q + b).ravel()
    h = np.bincount(idx, minlength=Q*Q).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def _cooc3_axis(q, T=3, axis='H'):
    Q = 2*T+1; off=T
    if axis=='H':
        v = (q[:, :-2].astype(np.int32)+off, q[:, 1:-1].astype(np.int32)+off, q[:, 2:].astype(np.int32)+off)
    else:
        v = (q[:-2, :].astype(np.int32)+off, q[1:-1, :].astype(np.int32)+off, q[2:, :].astype(np.int32)+off)
    a,b,c = v
    idx = (a*(Q*Q) + b*Q + c).ravel()
    h = np.bincount(idx, minlength=Q**3).astype(np.float32)
    s = h.sum();
    return h / (s if s>0 else 1.0)

def _cooc3_diag(q, T=3, kind='D'):
    # 3rd-order along diagonals on q in [-T..T]; returns 343 dims
    Q = 2*T+1; off=T
    v = q.astype(np.int32) + off
    H, W = v.shape
    acc = np.zeros(Q**3, dtype=np.float64); total = 0
    if kind == 'D':
        # main diagonals
        for k in range(-(H-1), W):
            d = np.diag(v, k=k)
            if d.size >= 3:
                a, b, c = d[:-2], d[1:-1], d[2:]
                idx = (a*(Q*Q) + b*Q + c).ravel()
                h = np.bincount(idx, minlength=Q**3)
                acc += h; total += h.sum()
    else:
        # anti-diagonals: flip left-right then reuse main
        vf = np.fliplr(v)
        for k in range(-(H-1), W):
            d = np.diag(vf, k=k)
            if d.size >= 3:
                a, b, c = d[:-2], d[1:-1], d[2:]
                idx = (a*(Q*Q) + b*Q + c).ravel()
                h = np.bincount(idx, minlength=Q**3)
                acc += h; total += h.sum()
    if total == 0:
        out = np.zeros(Q**3, dtype=np.float32); out[0] = 1.0; return out
    return (acc / total).astype(np.float32)

def srm_rich_from_Y(y):
    feats = []
    for ki, (k, s) in enumerate(zip(SRM8, SRM_SCALE)):
        r = convolve2d(y, k, mode='same', boundary='symm')
        r = r / float(s)
        q = _tlu_q(r, T=3)
        feats.append(_hist_levels(q, T=3))             # 7
        feats.append(_cooc2_dir(q, T=3, mode='H'))     # 49
        feats.append(_cooc2_dir(q, T=3, mode='V'))     # 49
        feats.append(_cooc2_dir(q, T=3, mode='D'))     # 49
        feats.append(_cooc2_dir(q, T=3, mode='A'))     # 49
        if ki in TOP4_IDX:
            feats.append(_cooc3_axis(q, T=3, axis='H'))  # 343
            feats.append(_cooc3_axis(q, T=3, axis='V'))  # 343
            feats.append(_cooc3_diag(q, T=3, kind='D'))  # 343
            feats.append(_cooc3_diag(q, T=3, kind='A'))  # 343
    return np.concatenate(feats).astype(np.float32)  # 8*(7+4*49)+4*(4*343)=7112

# --- JPEG proxy from decoded Y: block DCT + interblock co-occ; boundaries; qtable stats ---
QTABLE_FALLBACKS = 0
def get_luma_qtable(path):
    global QTABLE_FALLBACKS
    try:
        im = Image.open(path)
        qd = getattr(im, 'quantization', None)
        if isinstance(qd, dict) and len(qd)>0:
            key = 0 if 0 in qd else sorted(qd.keys())[0]
            q = qd[key]
            if isinstance(q, list) and len(q)==64:
                t = np.zeros((8,8), dtype=np.float32)
                for k,(i,j) in enumerate(ZZ_FULL): t[i,j] = q[k]
                return t
    except Exception:
        pass
    # Fallback to standard luminance QF75 table
    QTABLE_FALLBACKS += 1
    tbl = [16,11,10,16,24,40,51,61,12,12,14,19,26,58,60,55,14,13,16,24,40,57,69,56,14,17,22,29,51,87,80,62,18,22,37,56,68,109,103,77,24,35,55,64,81,104,113,92,49,64,78,87,103,121,120,101,72,92,95,98,112,100,103,99]
    t = np.zeros((8,8), dtype=np.float32)
    for k,(i,j) in enumerate(ZZ_FULL): t[i,j] = tbl[k]
    return t

def _ternary_sign(x, thr):
    t = np.zeros_like(x, dtype=np.int8)
    t[x >  thr] = 1
    t[x < -thr] = -1
    return t

def _cooc3_1d(arr):
    # arr in {-1,0,1}, map to {0,1,2}, 3^3=27 bins along rows
    v = (arr + 1).astype(np.int32)
    if v.shape[1] < 3:
        h = np.zeros(27, dtype=np.float32); h[0]=1.0; return h
    a = v[:, :-2]; b = v[:, 1:-1]; c = v[:, 2:]
    idx = (a * 9 + b * 3 + c).ravel()
    h = np.bincount(idx, minlength=27).astype(np.float32)
    s = h.sum(); return h / (s if s>0 else 1.0)

def _cooc3_1d_cols(arr):
    # along columns -> operate on transposed to reuse 1D rows logic
    return _cooc3_1d(arr.T)

def _cooc3_diag_main(arr):
    # along main diagonal sequences (top-left to bottom-right)
    H, W = arr.shape
    if H < 3 or W < 3:
        h = np.zeros(27, dtype=np.float32); h[0]=1.0; return h
    feats = []
    for off in range(-(H-1), W):
        diag = np.diag(arr, k=off)
        if diag.size >= 3:
            feats.append(_cooc3_1d(diag[None, :]))
    if len(feats)==0:
        h = np.zeros(27, dtype=np.float32); h[0]=1.0; return h
    return np.mean(np.stack(feats, axis=0), axis=0)

def _cooc3_diag_anti(arr):
    # along anti-diagonal sequences (top-right to bottom-left)
    return _cooc3_diag_main(np.fliplr(arr))

def block_dct_qproxy_feats(y, tY, ac_count=29):
    # Includes DC band (zz[0]) + first 29 AC -> 30 bands total
    H8,W8 = y.shape
    Yb = y.reshape(H8//8,8,W8//8,8).transpose(0,2,1,3).reshape(-1,8,8)
    # JPEG level shift
    Yb = Yb - 128.0
    tmp = np.einsum('ab,nbc->nac', C8, Yb, optimize=True)
    D = np.einsum('nab,cb->nac', tmp, C8, optimize=True)  # (nb,8,8)
    h8=H8//8; w8=W8//8
    D = D.reshape(h8,w8,8,8)
    # Quantize by luminance table
    Q = np.rint(D / (tY + 1e-9)).astype(np.int32)
    feats = []
    bands = ZZ_FULL[0:1+ac_count]  # include DC
    for (i,j) in bands:
        S = Q[..., i,j]
        # Inter-block differences
        Dx = S[:,1:] - S[:,:-1] if S.shape[1] > 1 else np.zeros_like(S[:, :0])
        Dy = S[1:,:] - S[:-1,:] if S.shape[0] > 1 else np.zeros_like(S[:0, :])
        # Branch A: dead-zone |>1|
        tx1 = _ternary_sign(Dx, thr=1); ty1 = _ternary_sign(Dy, thr=1)
        hx1 = _cooc3_1d(tx1); hy1 = _cooc3_1d_cols(ty1)
        hd1 = _cooc3_diag_main(tx1)  # approximate diagonals on H diffs
        ha1 = _cooc3_diag_anti(tx1)
        feats.extend([hx1, hy1, hd1, ha1])
        # Branch B: no-deadzone (>0)
        tx0 = _ternary_sign(Dx, thr=0); ty0 = _ternary_sign(Dy, thr=0)
        hx0 = _cooc3_1d(tx0); hy0 = _cooc3_1d_cols(ty0)
        hd0 = _cooc3_diag_main(tx0)
        ha0 = _cooc3_diag_anti(tx0)
        feats.extend([hx0, hy0, hd0, ha0])
    return np.concatenate(feats).astype(np.float32)  # 30 bands * (2 branches * 4 dirs * 27) = 6480

def boundary_feats(y):
    H8,W8 = y.shape
    def hist_and_cooc_1d(arr, clip=5, bins=11):
        if arr.size==0:
            h = np.zeros(bins, dtype=np.float32); h[0]=1.0
            c = np.zeros(bins*bins, dtype=np.float32); c[0]=1.0
            return np.concatenate([h,c])
        v = np.clip(np.rint(arr).astype(np.int32), -clip, clip)
        h = np.bincount(v+clip, minlength=bins).astype(np.float32);
        h /= (h.sum() if h.sum()>0 else 1.0)
        if v.size>1:
            a = v[:-1]+clip; b = v[1:]+clip; idx=(a*bins+b).ravel()
            c = np.bincount(idx, minlength=bins*bins).astype(np.float32);
            c /= (c.sum() if c.sum()>0 else 1.0)
        else:
            c = np.zeros(bins*bins, dtype=np.float32); c[0]=1.0
        return np.concatenate([h,c])
    # Per-line vertical boundaries
    v_lines = []
    for k in range(7, W8-1, 8):
        diff = y[:,k] - y[:,k+1]
        v_lines.append(hist_and_cooc_1d(diff))
    if len(v_lines)==0:
        fv = np.concatenate([np.eye(1,11,0,dtype=np.float32).sum(0), np.eye(1,121,0,dtype=np.float32).sum(0)])
    else:
        fv = np.mean(np.stack(v_lines, axis=0), axis=0)
    # Per-line horizontal boundaries
    h_lines = []
    for k in range(7, H8-1, 8):
        diff = y[k,:] - y[k+1,:]
        h_lines.append(hist_and_cooc_1d(diff))
    if len(h_lines)==0:
        fh = np.concatenate([np.eye(1,11,0,dtype=np.float32).sum(0), np.eye(1,121,0,dtype=np.float32).sum(0)])
    else:
        fh = np.mean(np.stack(h_lines, axis=0), axis=0)
    return np.concatenate([fv, fh]).astype(np.float32)  # 2*(11+121)=264

def qtable_stats(tY):
    vals = tY.ravel().astype(np.float32)
    feats = [vals.min(), vals.max(), vals.mean(), np.median(vals), vals.std(), np.log1p(vals).sum()]
    m = vals.mean()
    bucket = [1.0 if m<=5 else 0.0, 1.0 if (m>5 and m<=10) else 0.0, 1.0 if m>10 else 0.0]
    return np.array(feats + bucket, dtype=np.float32)  # 9

# Updated lengths after adding SRM 3rd-order diagonals for TOP4
SRM_LEN = 7112
JPEG_LEN = 6480
BOUND_LEN = 264
QT_LEN = 9
TOTAL_LEN = SRM_LEN + JPEG_LEN + BOUND_LEN + QT_LEN  # 13865

def rich_features_one(path):
    try:
        y = load_Y(path)
        # Align all branches to same 8x8 grid
        H,W = y.shape; H8,W8 = (H//8)*8, (W//8)*8; y = y[:H8,:W8]
        # Rich SRM
        f_srm = srm_rich_from_Y(y)
        # JPEG proxy
        tY = get_luma_qtable(path)
        f_dct = block_dct_qproxy_feats(y, tY, ac_count=29)
        f_bnd = boundary_feats(y)
        f_qt  = qtable_stats(tY)
        return np.concatenate([f_srm, f_dct, f_bnd, f_qt]).astype(np.float32)
    except Exception:
        return np.full(TOTAL_LEN, np.nan, dtype=np.float32)

def _paths_save(paths, fname):
    with open(fname,'w') as f:
        for p in paths: f.write(str(p)+'\n')
def _paths_load(fname):
    if not os.path.exists(fname): return None
    with open(fname,'r') as f: return [line.strip() for line in f.readlines()]

def build_features_rich(manifest_csv='manifest.csv', subsample_n=5000, seed=42, n_jobs=36, chunk=1000, compute_test=True):
    df_all = pd.read_csv(manifest_csv)
    per = subsample_n//4
    df = pd.concat([df_all[df_all['label_4c']==c].sample(n=per, random_state=seed) for c in [0,1,2,3]], axis=0)
    df = df.sort_values('path').reset_index(drop=True)
    train_paths = df['path'].tolist()
    cache_train = f'train_feats_rich_{len(df)}.npy'
    train_paths_txt = cache_train + '.paths.txt'
    test_paths = sorted(glob.glob('Test/*.jpg'))
    cache_test = 'test_feats_rich.npy'
    test_paths_txt = cache_test + '.paths.txt'
    # Train cache
    need_train = True
    if os.path.exists(cache_train) and os.path.exists(train_paths_txt):
        if _paths_load(train_paths_txt) == train_paths: need_train=False
    if need_train:
        all_parts=[]; t0=time.time()
        for i in range(0, len(train_paths), chunk):
            t1=time.time(); part=train_paths[i:i+chunk]
            feats = Parallel(n_jobs=n_jobs, batch_size=32, prefer='threads')(delayed(rich_features_one)(p) for p in part)
            feats = np.vstack(feats).astype(np.float32); all_parts.append(feats)
            print(f'[train-rich] {i+len(part)}/{len(train_paths)} elapsed {time.time()-t1:.1f}s total {time.time()-t0:.1f}s', flush=True)
        X = np.vstack(all_parts) if all_parts else np.zeros((0,TOTAL_LEN), dtype=np.float32)
        np.save(cache_train, X); _paths_save(train_paths, train_paths_txt); print('Saved', cache_train, X.shape, flush=True)
    else:
        X = np.load(cache_train); print('Loaded', cache_train, X.shape, flush=True)
    # Test cache (optional)
    XT = None
    if compute_test:
        need_test = True
        if os.path.exists(cache_test) and os.path.exists(test_paths_txt):
            if _paths_load(test_paths_txt) == test_paths: need_test=False
        if need_test:
            all_parts=[]; t0=time.time()
            for i in range(0, len(test_paths), chunk):
                t1=time.time(); part=test_paths[i:i+chunk]
                feats = Parallel(n_jobs=n_jobs, batch_size=32, prefer='threads')(delayed(rich_features_one)(p) for p in part)
                feats = np.vstack(feats).astype(np.float32); all_parts.append(feats)
                print(f'[test-rich]  {i+len(part)}/{len(test_paths)} elapsed {time.time()-t1:.1f}s total {time.time()-t0:.1f}s', flush=True)
            XT = np.vstack(all_parts) if all_parts else np.zeros((0,TOTAL_LEN), dtype=np.float32)
            np.save(cache_test, XT); _paths_save(test_paths, test_paths_txt); print('Saved', cache_test, XT.shape, flush=True)
        else:
            XT = np.load(cache_test); print('Loaded', cache_test, XT.shape, flush=True)
    print('QTable fallbacks so far:', QTABLE_FALLBACKS, flush=True)
    return df, X, test_paths if compute_test else [], XT

def train_lgbm_rich(df, X, n_splits=3, seed=42):
    params = dict(objective='multiclass', num_class=4, learning_rate=0.02,
                  max_depth=-1, num_leaves=256, min_data_in_leaf=50,
                  feature_fraction=0.45, bagging_fraction=0.75, bagging_freq=1,
                  lambda_l1=1.5, lambda_l2=1.5, n_jobs=36, verbose=-1)
    oof = np.zeros((len(df),4), dtype=np.float32)
    skf = StratifiedGroupKFold(n_splits=n_splits, shuffle=True, random_state=seed)
    for fold, (tr, va) in enumerate(skf.split(X, y=df['label_4c'], groups=df['source_id'])):
        t0=time.time()
        dtr = lgb.Dataset(X[tr], label=df['label_4c'].values[tr])
        dva = lgb.Dataset(X[va], label=df['label_4c'].values[va])
        print(f'[RICH Fold {fold}] train {len(tr)} val {len(va)}', flush=True)
        bst = lgb.train(params, dtr, num_boost_round=2600, valid_sets=[dtr, dva], valid_names=['tr','va'])
        oof[va] = bst.predict(X[va])
        yb = (df.iloc[va]['label_4c'].values!=0).astype(np.int32)
        try:
            auc = roc_auc_score(yb, 1.0 - oof[va,0]);
            print(f'[RICH Fold {fold}] bin AUC: {auc:.4f} | time {time.time()-t0:.1f}s', flush=True)
        except Exception:
            print(f'[RICH Fold {fold}] AUC failed | time {time.time()-t0:.1f}s', flush=True)
        del dtr, dva; gc.collect()
    return oof

def fit_and_predict_rich(subsample_n=5000, n_splits=3, n_jobs=36, chunk=1000, compute_test=True):
    print('Launching RICH features pipeline...', flush=True)
    df, X, test_paths, XT = build_features_rich(subsample_n=subsample_n, n_jobs=n_jobs, chunk=chunk, compute_test=compute_test)
    print('Features RICH:', X.shape, (None if XT is None else XT.shape), flush=True)
    oof = train_lgbm_rich(df, X, n_splits=n_splits, seed=42)
    if compute_test:
        # Final model
        params = dict(objective='multiclass', num_class=4, learning_rate=0.02,
                      max_depth=-1, num_leaves=256, min_data_in_leaf=50,
                      feature_fraction=0.45, bagging_fraction=0.75, bagging_freq=1,
                      lambda_l1=1.5, lambda_l2=1.5, n_jobs=36, verbose=-1)
        print('Training final RICH model...', flush=True)
        dfull = lgb.Dataset(X, label=df['label_4c'].values)
        bst = lgb.train(params, dfull, num_boost_round=2600)
        P = bst.predict(XT); p_stego = 1.0 - P[:,0]
        sub = pd.DataFrame({'Id':[os.path.basename(p) for p in test_paths], 'Label':p_stego}).sort_values('Id')
        sub.to_csv('submission.csv', index=False); print('Saved submission.csv', sub.shape, flush=True)
    return oof

print('Rich SRM + JPEG proxy cell ready. Call fit_and_predict_rich(subsample_n=5000, n_splits=3).', flush=True)

Rich SRM + JPEG proxy cell ready. Call fit_and_predict_rich(subsample_n=5000, n_splits=3).


In [43]:
# Patch: fix K3_D2 to zero-sum to prevent saturation; rebuild SRM8
import numpy as np

# Zero-sum diagonal kernel variant (corrected bottom-left to -2)
K3_D2 = np.array([[ 0,-1, 2],
                  [-1, 0, 1],
                  [-2, 1, 0]], dtype=np.float32)

# Rebuild SRM8 with corrected K3_D2
SRM8 = [K3_L4, K3_L8, K3_H2, K3_V2, K3_D1, K3_D2, K5_S, K3_2]

print('Patched K3_D2 sum =', float(K3_D2.sum()))
print('SRM8[5] updated; SRM_SCALE =', SRM_SCALE)

Patched K3_D2 sum = 0.0
SRM8[5] updated; SRM_SCALE = [2, 6, 2, 2, 4, 4, 1, 2]


In [None]:
# Run Rich features smoke test (5k, 3 folds)
import os, time
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
sub_n = 5000; folds = 3; jobs = 36; chunk = 300
print(f'Launching fit_and_predict_rich(subsample_n={sub_n}, n_splits={folds}, n_jobs={jobs}, chunk={chunk}, compute_test=False) ...', flush=True)
t0=time.time()
oof_rich = fit_and_predict_rich(subsample_n=sub_n, n_splits=folds, n_jobs=jobs, chunk=chunk, compute_test=False)
print('Rich smoke test done in %.1fs' % (time.time()-t0), flush=True)

Launching fit_and_predict_rich(subsample_n=5000, n_splits=3, n_jobs=36, chunk=300, compute_test=False) ...


Launching RICH features pipeline...


In [29]:
# Diagnostics for RICH features (run after 5k smoke or standalone on small subset)
import numpy as np, pandas as pd, glob, os, time
from scipy.stats import ks_2samp

def rich_feature_batch(paths, n_jobs=36):
    from joblib import Parallel, delayed
    return np.vstack(Parallel(n_jobs=n_jobs, batch_size=32, prefer='threads')(delayed(rich_features_one)(p) for p in paths)).astype(np.float32)

def single_pair_diff_probe():
    # pick a source present in all four classes
    ids = set(os.path.splitext(os.path.basename(p))[0] for p in glob.glob('Cover/*.jpg'))
    for cls in ['JMiPOD','JUNIWARD','UERD']:
        ids = ids.intersection(set(os.path.splitext(os.path.basename(p))[0] for p in glob.glob(f'{cls}/*.jpg')))
    sid = sorted(list(ids))[0]
    paths = {c: f"{c}/{sid}.jpg" for c in ['Cover','JMiPOD','JUNIWARD','UERD']}
    feats = {k: rich_features_one(v) for k,v in paths.items()}
    cover = feats['Cover']
    parts = {'SRM':(0,4368), 'JPEG':(4368,4368+1080), 'BOUND':(4368+1080, 4368+1080+264), 'QTABLE':(4368+1080+264, 4368+1080+264+9)}
    print('Probe source_id:', sid)
    for k in ['JMiPOD','JUNIWARD','UERD']:
        d = feats[k] - cover
        print(f'-- {k} -- L1:{np.sum(np.abs(d)):.2f} max|d|:{np.max(np.abs(d)):.2f} nz>{(np.abs(d)>1e-6).sum()}')
        for name,(a,b) in parts.items():
            dd = np.abs(d[a:b]); m = dd.mean();
            top = np.argsort(-dd)[:10] + a
            print(f'   {name}: mean|d|={m:.6f} top10_idx={top.tolist()}')

def per_feature_signal_probe(cache='train_feats_rich_5000.npy', manifest_csv='manifest.csv', subsample_n=5000, seed=42):
    if os.path.exists(cache):
        X = np.load(cache);
        df_all = pd.read_csv(manifest_csv)
        per = subsample_n//4
        df = pd.concat([df_all[df_all.label_4c==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]).sort_values('path').reset_index(drop=True)
    else:
        print('Cache not found; building small subset (200 per class)...')
        df_all = pd.read_csv(manifest_csv)
        per = 200
        df = pd.concat([df_all[df_all.label_4c==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]).sort_values('path').reset_index(drop=True)
        X = rich_feature_batch(df.path.tolist())
    y_bin = (df.label_4c.values!=0).astype(np.int32)
    Xc = X[y_bin==0]; Xs = X[y_bin==1]
    ks_stats = []; mad = []
    for j in range(X.shape[1]):
        try:
            ks = ks_2samp(Xc[:,j], Xs[:,j]).statistic
        except Exception:
            ks = 0.0
        ks_stats.append(ks);
        mad.append(float(np.abs(Xs[:,j].mean() - Xc[:,j].mean())))
    ks_stats = np.array(ks_stats); mad = np.array(mad)
    def topk(arr, k=20):
        idx = np.argsort(-arr)[:k];
        return list(zip(idx.tolist(), arr[idx].round(6).tolist()))
    print('Top20 KS:', topk(ks_stats))
    print('Top20 |mean diff|:', topk(mad))
    # block ownership
    blocks = np.array(['SRM']*4368 + ['JPEG']*1080 + ['BOUND']*264 + ['QTABLE']*9)
    for name in ['SRM','JPEG','BOUND','QTABLE']:
        mks = ks_stats[blocks==name].mean(); mmad = mad[blocks==name].mean()
        print(f'{name} mean KS={mks:.6f} mean |mean diff|={mmad:.6f}')

def shuffle_label_check(cache='train_feats_rich_5000.npy', manifest_csv='manifest.csv', subsample_n=5000, seed=42):
    import lightgbm as lgb
    from sklearn.model_selection import StratifiedGroupKFold
    from sklearn.metrics import roc_auc_score
    if os.path.exists(cache):
        X = np.load(cache)
        df_all = pd.read_csv(manifest_csv)
        per = subsample_n//4
        df = pd.concat([df_all[df_all.label_4c==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]).sort_values('path').reset_index(drop=True)
    else:
        print('Cache not found; building small subset (200 per class)...')
        df_all = pd.read_csv(manifest_csv)
        per = 200
        df = pd.concat([df_all[df_all.label_4c==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]).sort_values('path').reset_index(drop=True)
        X = rich_feature_batch(df.path.tolist())
    y4 = df.label_4c.values.copy()
    grp = df.source_id.values
    # true labels AUC
    skf = StratifiedGroupKFold(n_splits=3, shuffle=True, random_state=42)
    oof = np.zeros((len(df),4), np.float32)
    params = dict(objective='multiclass', num_class=4, learning_rate=0.03, max_depth=8, num_leaves=128, min_data_in_leaf=50, feature_fraction=0.55, bagging_fraction=0.75, bagging_freq=1, lambda_l1=1.5, lambda_l2=1.5, n_jobs=36, verbose=-1)
    for f,(tr,va) in enumerate(skf.split(X, y=y4, groups=grp)):
        dtr=lgb.Dataset(X[tr], label=y4[tr]); dva=lgb.Dataset(X[va], label=y4[va])
        bst=lgb.train(params, dtr, num_boost_round=600)
        oof[va]=bst.predict(X[va])
    auc_true = roc_auc_score((y4!=0).astype(int), 1.0 - oof[:,0])
    # shuffled labels AUC
    rng = np.random.RandomState(123); y4_shuf = y4.copy(); rng.shuffle(y4_shuf)
    oof_s = np.zeros_like(oof)
    for f,(tr,va) in enumerate(skf.split(X, y=y4_shuf, groups=grp)):
        dtr=lgb.Dataset(X[tr], label=y4_shuf[tr]); dva=lgb.Dataset(X[va], label=y4_shuf[va])
        bst=lgb.train(params, dtr, num_boost_round=600)
        oof_s[va]=bst.predict(X[va])
    auc_shuf = roc_auc_score((y4_shuf!=0).astype(int), 1.0 - oof_s[:,0])
    print(f'Shuffle check: true AUC={auc_true:.4f} | shuffled AUC={auc_shuf:.4f}')

print('Diagnostics ready: single_pair_diff_probe(); per_feature_signal_probe(); shuffle_label_check()')

Diagnostics ready: single_pair_diff_probe(); per_feature_signal_probe(); shuffle_label_check()


In [30]:
# Run RICH diagnostics (expects train_feats_rich_5000.npy cache present)
import os
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
print('=== Single cover-stego pair diff probe ===', flush=True)
single_pair_diff_probe()
print('\n=== Per-feature signal probe (KS, |mean diff|) ===', flush=True)
per_feature_signal_probe(cache='train_feats_rich_5000.npy', manifest_csv='manifest.csv', subsample_n=5000, seed=42)
print('\n=== Shuffle-label check ===', flush=True)
shuffle_label_check(cache='train_feats_rich_5000.npy', manifest_csv='manifest.csv', subsample_n=5000, seed=42)

=== Single cover-stego pair diff probe ===


Probe source_id: 00001
-- JMiPOD -- L1:1.46 max|d|:0.01 nz>4933
   SRM: mean|d|=0.000094 top10_idx=[3965, 3968, 3962, 4011, 3966, 4024, 111, 4164, 4060, 4091]
   JPEG: mean|d|=0.000869 top10_idx=[5434, 5353, 4894, 4867, 5056, 5164, 4766, 4892, 4858, 4921]
   BOUND: mean|d|=0.000413 top10_idx=[5580, 5458, 5585, 5457, 5589, 5588, 5700, 5587, 5591, 5449]
   QTABLE: mean|d|=0.000000 top10_idx=[5712, 5713, 5714, 5715, 5716, 5717, 5718, 5719, 5720]
-- JUNIWARD -- L1:1.72 max|d|:0.01 nz>4992
   SRM: mean|d|=0.000101 top10_idx=[3962, 3968, 3966, 3965, 4164, 4060, 4099, 4011, 4024, 3975]
   JPEG: mean|d|=0.001053 top10_idx=[5083, 5407, 5353, 4900, 5319, 5297, 4894, 5110, 5333, 4867]
   BOUND: mean|d|=0.000535 top10_idx=[5580, 5458, 5457, 5591, 5585, 5448, 5586, 5579, 5455, 5587]
   QTABLE: mean|d|=0.000000 top10_idx=[5712, 5713, 5714, 5715, 5716, 5717, 5718, 5719, 5720]
-- UERD -- L1:2.39 max|d|:0.02 nz>5007
   SRM: mean|d|=0.000228 top10_idx=[3968, 2765, 3965, 2729, 3962, 2814, 2778, 2863, 282

Top20 KS: [(2525, 0.0344), (4308, 0.029067), (4319, 0.028267), (2572, 0.027467), (3128, 0.026133), (4270, 0.025867), (3127, 0.025333), (2540, 0.025333), (2475, 0.025067), (2467, 0.025067), (4340, 0.024533), (4302, 0.024533), (4122, 0.024267), (4158, 0.024), (2624, 0.024), (2571, 0.024), (5500, 0.023733), (2409, 0.023733), (5499, 0.023733), (2912, 0.0232)]
Top20 |mean diff|: [(5407, 0.003737), (5434, 0.003558), (5110, 0.002681), (5380, 0.002678), (5353, 0.002649), (5164, 0.002599), (5083, 0.002551), (5137, 0.002548), (5458, 0.00248), (5448, 0.002468), (4867, 0.002417), (5580, 0.0024), (5590, 0.002376), (5299, 0.002333), (5326, 0.002322), (5029, 0.002318), (3962, 0.002254), (3968, 0.002244), (4678, 0.002242), (5002, 0.002228)]
SRM mean KS=0.011115 mean |mean diff|=0.000049
JPEG mean KS=0.008387 mean |mean diff|=0.000148
BOUND mean KS=0.015695 mean |mean diff|=0.000156
QTABLE mean KS=0.000000 mean |mean diff|=0.000000

=== Shuffle-label check ===


Shuffle check: true AUC=0.5293 | shuffled AUC=0.4995


In [59]:
# Clear RICH caches to recompute with fixes
import os
targets = [
    'train_feats_rich_5000.npy',
    'train_feats_rich_5000.npy.paths.txt',
    'test_feats_rich.npy',
    'test_feats_rich.npy.paths.txt'
]
for t in targets:
    if os.path.exists(t):
        try:
            os.remove(t); print('Deleted', t, flush=True)
        except Exception as e:
            print('Failed to delete', t, e, flush=True)
    else:
        print('Not found (ok):', t, flush=True)

Deleted train_feats_rich_5000.npy


Deleted train_feats_rich_5000.npy.paths.txt


Not found (ok): test_feats_rich.npy


Not found (ok): test_feats_rich.npy.paths.txt


In [37]:
# Updated diagnostics for RICH v2 dims (6261) + SRM bin-use check
import numpy as np, pandas as pd, glob, os
from scipy.stats import ks_2samp

SRM_LEN = 4368
JPEG_LEN = 1620
BOUND_LEN = 264
QT_LEN = 9
TOTAL_LEN = SRM_LEN + JPEG_LEN + BOUND_LEN + QT_LEN

def single_pair_diff_probe_v2():
    ids = set(os.path.splitext(os.path.basename(p))[0] for p in glob.glob('Cover/*.jpg'))
    for cls in ['JMiPOD','JUNIWARD','UERD']:
        ids &= set(os.path.splitext(os.path.basename(p))[0] for p in glob.glob(f'{cls}/*.jpg'))
    sid = sorted(list(ids))[0]
    paths = {c: f"{c}/{sid}.jpg" for c in ['Cover','JMiPOD','JUNIWARD','UERD']}
    feats = {k: rich_features_one(v) for k,v in paths.items()}
    cover = feats['Cover']
    parts = {
        'SRM': (0, SRM_LEN),
        'JPEG': (SRM_LEN, SRM_LEN+JPEG_LEN),
        'BOUND': (SRM_LEN+JPEG_LEN, SRM_LEN+JPEG_LEN+BOUND_LEN),
        'QTABLE': (SRM_LEN+JPEG_LEN+BOUND_LEN, TOTAL_LEN)
    }
    print('Probe source_id:', sid)
    for k in ['JMiPOD','JUNIWARD','UERD']:
        d = feats[k] - cover
        print(f'-- {k} -- L1:{np.sum(np.abs(d)):.4f} max|d|:{np.max(np.abs(d)):.4f} nz>{(np.abs(d)>1e-6).sum()}')
        for name,(a,b) in parts.items():
            dd = np.abs(d[a:b]);
            m = float(dd.mean());
            top = np.argsort(-dd)[:10] + a
            print(f'   {name}: mean|d|={m:.6f} top10_idx={top.tolist()}')

def per_feature_signal_probe_v2(cache='train_feats_rich_5000.npy', manifest_csv='manifest.csv', subsample_n=5000, seed=42):
    if os.path.exists(cache):
        X = np.load(cache)
        assert X.shape[1] == TOTAL_LEN, f'Unexpected feature dim: {X.shape}'
        df_all = pd.read_csv(manifest_csv)
        per = subsample_n//4
        df = pd.concat([df_all[df_all.label_4c==c].sample(n=per, random_state=seed) for c in [0,1,2,3]]).sort_values('path').reset_index(drop=True)
    else:
        print('Cache not found; exiting.')
        return
    y_bin = (df.label_4c.values!=0).astype(np.int32)
    Xc = X[y_bin==0]; Xs = X[y_bin==1]
    ks_stats = np.zeros(X.shape[1], dtype=np.float32)
    mad = np.zeros(X.shape[1], dtype=np.float32)
    for j in range(X.shape[1]):
        try:
            ks_stats[j] = ks_2samp(Xc[:,j], Xs[:,j]).statistic
        except Exception:
            ks_stats[j] = 0.0
        mad[j] = float(abs(Xs[:,j].mean() - Xc[:,j].mean()))
    def topk(arr, k=20):
        idx = np.argsort(-arr)[:k]
        return list(zip(idx.tolist(), np.round(arr[idx],6).tolist()))
    print('Top20 KS:', topk(ks_stats))
    print('Top20 |mean diff|:', topk(mad))
    blocks = np.array(['SRM']*SRM_LEN + ['JPEG']*JPEG_LEN + ['BOUND']*BOUND_LEN + ['QTABLE']*QT_LEN)
    for name in ['SRM','JPEG','BOUND','QTABLE']:
        mks = ks_stats[blocks==name].mean(); mmad = mad[blocks==name].mean()
        print(f'{name} mean KS={mks:.6f} mean |mean diff|={mmad:.6f}')

def srm_bin_use_check(sample_paths=None, max_show=1):
    # Inspect q-bin usage for key kernels to tune SRM_SCALE quickly
    if sample_paths is None:
        df = pd.read_csv('manifest.csv')
        # choose first few cover images for stability
        sample_paths = df[df.label_4c==0].sort_values('path').path.tolist()[:max_show]
    for p in sample_paths:
        y = load_Y(p); H,W=y.shape; y = y[:(H//8)*8, :(W//8)*8]
        print('Path:', p)
        for ki,(k,s) in enumerate(zip(SRM8, SRM_SCALE)):
            r = convolve2d(y, k, mode='same', boundary='symm')/float(s)
            q = np.clip(np.rint(r), -3, 3).astype(np.int8)
            vals, cnts = np.unique(q, return_counts=True)
            frac = (cnts / cnts.sum()).round(4)
            print(f'  Kernel {ki} scale {s}: bins {vals.tolist()} frac {frac.tolist()}')
        break

print('Updated diagnostics ready: single_pair_diff_probe_v2(); per_feature_signal_probe_v2(); srm_bin_use_check()')

Updated diagnostics ready: single_pair_diff_probe_v2(); per_feature_signal_probe_v2(); srm_bin_use_check()


In [44]:
# Execute updated diagnostics for RICH v2
import os
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
print('=== SRM bin-use check (1 cover) ===', flush=True)
srm_bin_use_check(max_show=1)
print('\n=== Per-feature signal probe v2 (KS, |mean diff|) ===', flush=True)
per_feature_signal_probe_v2(cache='train_feats_rich_5000.npy', manifest_csv='manifest.csv', subsample_n=5000, seed=42)

=== SRM bin-use check (1 cover) ===


Path: Cover/00001.jpg
  Kernel 0 scale 2: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.299, 0.1093, 0.0402, 0.1232, 0.0387, 0.1022, 0.2874]


  Kernel 1 scale 6: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.2859, 0.094, 0.0766, 0.1096, 0.0732, 0.0875, 0.2731]
  Kernel 2 scale 2: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.3425, 0.0726, 0.0259, 0.0822, 0.0272, 0.0769, 0.3727]
  Kernel 3 scale 2: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.3276, 0.0836, 0.0304, 0.0937, 0.031, 0.0871, 0.3466]
  Kernel 4 scale 4: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.2536, 0.097, 0.0739, 0.1355, 0.0739, 0.0995, 0.2666]
  Kernel 5 scale 4: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.3757, 0.0533, 0.0346, 0.0597, 0.0352, 0.055, 0.3866]
  Kernel 6 scale 1: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.0858, 0.0981, 0.1858, 0.2415, 0.1941, 0.1043, 0.0904]
  Kernel 7 scale 2: bins [-3, -2, -1, 0, 1, 2, 3] frac [0.1897, 0.1431, 0.0641, 0.2076, 0.0632, 0.1443, 0.188]

=== Per-feature signal probe v2 (KS, |mean diff|) ===


Top20 KS: [(3170, 0.041600000113248825), (2912, 0.04106700047850609), (2366, 0.03946699947118759), (2624, 0.03840000182390213), (2961, 0.031199999153614044), (3169, 0.02826699987053871), (2575, 0.02826699987053871), (2623, 0.02800000086426735), (3121, 0.027732999995350838), (2415, 0.02693299949169159), (2359, 0.026667000725865364), (3177, 0.026667000725865364), (3010, 0.026399999856948853), (6040, 0.026399999856948853), (2554, 0.026399999856948853), (2574, 0.02613300085067749), (2424, 0.025867000222206116), (3084, 0.025333000347018242), (2367, 0.025333000347018242), (2905, 0.025066999718546867)]
Top20 |mean diff|: [(5866, 0.0037640000227838755), (5407, 0.003737000050023198), (5839, 0.0036410000175237656), (5434, 0.00355900009162724), (5461, 0.0032679999712854624), (5488, 0.0029430000577121973), (5785, 0.002747999969869852), (5812, 0.0027340000960975885), (5110, 0.002718999981880188), (5380, 0.00267699989490211), (5353, 0.0026489999145269394), (5164, 0.002597999991849065), (5083, 0.0025