# Fine & Golden Autoscan v19
Colab‑ready notebook integrating all current fixes and clustering options
*Last updated:* 2025-07-14

## Overview
This notebook unifies the latest data‑loading fixes, delta analysis, and clustering options we’ve discussed. It is designed to run end‑to‑end in **Google Colab**. Update the configuration cell (below) with your preferred version/tag and make sure the required Excel files are in the specified folder.

---
### Expected inputs
- `plateaus_raw_v{VERSION}.xlsx`
- `plateaus_ops_v{VERSION}.xlsx`

### Key steps
1. Mount Google Drive (if necessary)
2. Load raw and operations data
3. Compute delta vs fine‑structure constant, φ, etc.
4. Cluster slices using the chosen distance metric & bandwidth
5. Visualise and export results

Feel free to adapt/extend as needed!

In [1]:

# === CONFIGURATION ======================================
VERSION = 19                     # numeric tag for data files
DATA_DIR = '/content/drive/MyDrive/fine_golden'  # change if different
# Distance metric for clustering: 'euclidean', 'manhattan', 'cosine', etc.
DIST_METRIC = 'euclidean'
BANDWIDTH  = 0.001               # bandwidth (δ) for DBSCAN / MeanShift etc.
# ========================================================


In [2]:

# ⚠️ Skip this cell if running locally.
try:
    import google.colab, os
    from google.colab import drive
    drive.mount('/content/drive')
    print('✔ Google Drive mounted.')
except ModuleNotFoundError:
    print('Running outside Colab – mounting skipped.')


Mounted at /content/drive
✔ Google Drive mounted.


In [3]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist, squareform
from sklearn.cluster import DBSCAN, MeanShift
from pathlib import Path


In [4]:

# Physical / mathematical constants
ALPHA_INV = 137.035999084  # CODATA 2018 inverse fine‑structure constant
PHI       = (1 + 5**0.5) / 2
SQRT2     = 2**0.5
EULER_E   = np.e

def delta(x, ref=ALPHA_INV):
    """Absolute fractional difference between x and ref."""
    return abs(x - ref) / ref


In [5]:

def load_plateau_dfs(data_dir:str, version:int):
    raw_path  = Path(data_dir) / f'plateaus_raw_v{version}.xlsx'
    ops_path  = Path(data_dir) / f'plateaus_ops_v{version}.xlsx'
    if not raw_path.exists() or not ops_path.exists():
        raise FileNotFoundError(f'Missing expected files: {raw_path.name} / {ops_path.name}')
    raw_df = pd.read_excel(raw_path)
    ops_df = pd.read_excel(ops_path)
    return raw_df, ops_df

raw_df, ops_df = load_plateau_dfs(DATA_DIR, VERSION)
print(f'✔ Loaded raw: {raw_df.shape} and ops: {ops_df.shape}')


FileNotFoundError: Missing expected files: plateaus_raw_v19.xlsx / plateaus_ops_v19.xlsx

In [None]:

# Example: compute delta (vs α) for a 'slice' column in ops_df
if 'slice' in ops_df.columns:
    ops_df['delta_alpha'] = ops_df['slice'].apply(lambda x: delta(x, ALPHA_INV))
    display(ops_df.head())
else:
    print('Column "slice" not found – update column name as needed.')


In [None]:

# Build distance matrix on chosen column
VALUE_COL = 'slice'  # or whichever numeric column you wish to cluster
vals = ops_df[VALUE_COL].values.reshape(-1,1)

# Choose clustering algorithm
if DIST_METRIC in {'euclidean','manhattan'}:
    # DBSCAN with eps = BANDWIDTH
    clustering = DBSCAN(eps=BANDWIDTH, metric=DIST_METRIC, min_samples=2).fit(vals)
    labels = clustering.labels_
else:
    # MeanShift fallback (metric-independent)
    clustering = MeanShift(bandwidth=BANDWIDTH).fit(vals)
    labels = clustering.labels_

ops_df['cluster'] = labels
print('Clusters assigned:')
print(ops_df['cluster'].value_counts())


In [None]:

plt.figure(figsize=(10,5))
scatter = plt.scatter(ops_df.index, ops_df[VALUE_COL], c=ops_df['cluster'])
plt.title('Clustered slices')
plt.xlabel('Index')
plt.ylabel(VALUE_COL)
plt.show()


In [None]:

out_path = Path(DATA_DIR) / f'ops_with_clusters_v{VERSION}.csv'
ops_df.to_csv(out_path, index=False)
print(f'✔ Results exported to {out_path}')
