# EXFOR Data Loading & Verification with On-Demand AME2020/NUBASE2020 Enrichment

## Real Experimental Nuclear Data from IAEA EXFOR

This notebook demonstrates how to load and verify **real experimental nuclear data** from the IAEA EXFOR database using the X4Pro SQLite format.

**NUCML-Next uses ONLY real data:**
- ✅ Uses REAL experimental cross-section measurements
- ✅ X4Pro SQLite ingestion (efficient, single-file database)
- ✅ **Lean ingestion:** Only EXFOR data written to Parquet (~10x smaller files)
- ✅ **On-demand enrichment:** AME2020/NUBASE2020 loaded during feature generation
- ✅ Production-grade data quality

## Lean Ingestion + On-Demand Enrichment Architecture

**Key Design:** AME2020/NUBASE2020 enrichment happens **during feature generation**, not during ingestion.

**Benefits:**
- ⚡ **Faster ingestion**: No AME file parsing during ingestion
- 💾 **Smaller Parquet files**: ~10x size reduction (only EXFOR data, no duplication)
- 🎯 **Flexible**: Load only the AME data tiers you need
- 📦 **Simple**: AME files auto-detected in `data/` directory

**Data Flow:**
```
1. Ingestion:    X4 SQLite → Lean Parquet (EXFOR only)
2. Loading:      Parquet → DataFrame with Z, A, MT, Energy, CrossSection
3. Enrichment:   AME files (data/*.mas20.txt) → Merged on (Z, A)
4. Features:     Select tier columns as needed
```

**What's in Lean Parquet:**
```
Core EXFOR only: Entry, Z, A, N, MT, Energy, CrossSection, Uncertainty
AME enrichment:  Loaded on-demand from data/*.mas20.txt files
```

---

## Focus Isotopes for This Tutorial

We'll focus on two isotopes that represent different data availability scenarios:

### 1. **U-235 (Uranium-235) - Well-Understood**
- **Why:** Critical for nuclear reactors (LWR fuel)
- **Data Quality:** Extensive experimental measurements since 1940s
- **Key Reactions:** (n,f) fission at MT=18, (n,γ) capture at MT=102
- **Expected Data:** 1000+ measurements across energy range
- **AME2020:** Full enrichment available (mass, energetics, structure, Q-values)

### 2. **Cl-35 (Chlorine-35) (n,p) - Research Interest**
- **Why:** Important for nuclear astrophysics, medical isotope production
- **Data Quality:** Limited experimental data, sparse measurements
- **Key Reaction:** (n,p) at MT=103 → Produces S-35 (medical tracer)
- **Expected Data:** 10-100 measurements (much sparser!)
- **AME2020:** Full enrichment available for comparison with U-235

**Educational Value:** These isotopes demonstrate ML performance on data-rich vs. data-sparse scenarios.

---

## Data Ingestion (One-Time Setup)

**Recommended: Lean Ingestion (Current Architecture)**

```bash
# Step 1: Ingest EXFOR data (lean Parquet, fast!)
python scripts/ingest_exfor.py \
    --x4-db data/x4sqlite1_sample.db \
    --output data/exfor_processed.parquet

# Step 2: Download AME2020/NUBASE2020 files to data/ directory (one-time)
cd data/
wget https://www-nds.iaea.org/amdc/ame2020/mass_1.mas20.txt
wget https://www-nds.iaea.org/amdc/ame2020/rct1.mas20.txt
wget https://www-nds.iaea.org/amdc/ame2020/rct2_1.mas20.txt
wget https://www-nds.iaea.org/amdc/ame2020/nubase_4.mas20.txt
cd ..

# That's it! AME enrichment happens automatically during feature generation
```

**Using Full X4 Database**

```bash
# Download full EXFOR database from https://www-nds.iaea.org/x4/
python scripts/ingest_exfor.py \
    --x4-db ~/data/x4sqlite1.db \
    --output data/exfor_processed.parquet
```

**What Happens:**
1. **Ingestion:** Extracts EXFOR measurements from X4 SQLite → Lean Parquet (~10x smaller)
2. **Loading:** NucmlDataset loads Parquet and auto-detects AME files in `data/`
3. **Enrichment:** AME data merged on-demand when tiers=['B', 'C', 'D', or 'E'] requested
4. **Features:** Column selection for requested tiers (fast!)

---

In [None]:
import sys
sys.path.append('..')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

from nucml_next.examples import load_dataset, print_dataset_summary

print("✓ Imports successful")
print("✓ NUCML-Next: X4Pro SQLite ingestion for real EXFOR data")

## Step 1: Verify EXFOR Data Exists

In [None]:
# Check if EXFOR data has been processed
exfor_path = Path('../data/exfor_processed.parquet')

if not exfor_path.exists():
    print("❌ EXFOR data not found!")
    print("\nPlease run the ingestor first:")
    print("  python scripts/ingest_exfor.py --x4-db data/x4sqlite1_sample.db --output data/exfor_processed.parquet")
    print("\nOr use the quick_ingest helper in your code:")
    print("  from nucml_next.examples import quick_ingest")
    print("  df = quick_ingest()")
    raise FileNotFoundError(f"EXFOR data not found at {exfor_path}")
else:
    print(f"✓ Found EXFOR data at {exfor_path}")
    # Check size
    if exfor_path.is_dir():
        print(f"  Type: Partitioned dataset (directory)")
    else:
        size_mb = exfor_path.stat().st_size / (1024**2)
        print(f"  Size: {size_mb:.1f} MB")

## Step 2: Load Real EXFOR Data

**Note:** `data_path` is REQUIRED. If you don't provide it or if the file doesn't exist, NucmlDataset will raise an error immediately. This prevents accidental misuse.

In [None]:
# Load EXFOR data focusing on our two isotopes
# DEMONSTRATION: Using legacy filters for simple isotope selection
# For production training, use DataSelection for physics-aware filtering!

# U-235: Z=92, A=235, MT=18 (fission), MT=102 (capture)
# Cl-35: Z=17, A=35, MT=103 (n,p)

from nucml_next.examples import load_dataset

dataset = load_dataset(
    data_path='../data/exfor_processed.parquet',
    mode='tabular',
    filters={  # Simple filters for demonstration
        'Z': [92, 17],     # Uranium and Chlorine
        'A': [235, 35],    # U-235 and Cl-35
        'MT': [18, 102, 103]  # Fission, capture, (n,p)
    },
)

print(f"\n✓ Loaded {len(dataset.df)} REAL experimental data points")
print(f"  Isotopes: {dataset.df[['Z', 'A']].drop_duplicates().shape[0]}")
print(f"  Reactions: {dataset.df['MT'].nunique()}")
print(f"  Energy range: {dataset.df['Energy'].min():.2e} - {dataset.df['Energy'].max():.2e} eV")

# Show breakdown by isotope
print("\n📊 Data Distribution by Isotope:")
for (z, a), group in dataset.df.groupby(['Z', 'A']):
    isotope_name = f"{'U' if z==92 else 'Cl'}-{a}"
    print(f"  {isotope_name:8s}: {len(group):>6,} measurements")
    for mt, mt_group in group.groupby('MT'):
        mt_name = {18: 'Fission', 102: 'Capture', 103: '(n,p)'}.get(int(mt), f'MT={mt}')
        print(f"    └─ {mt_name:12s}: {len(mt_group):>6,} points")

print("\n" + "="*80)
print("💡 TIP: For production training, use DataSelection for physics-aware filtering:")
print("="*80)
print("""
from nucml_next.data import DataSelection, NucmlDataset

selection = DataSelection(
    # PROJECTILE: 'neutron' | 'all'
    projectile='neutron',
    
    # ENERGY RANGE (eV)
    energy_min=1e-5,    # Thermal (0.01 eV)
    energy_max=2e7,     # Fast (20 MeV)
    
    # MT MODE: 'reactor_core' | 'threshold_only' | 'fission_details' | 'all_physical' | 'custom'
    mt_mode='reactor_core',  # Essential reactions: MT 2,4,16,18,102,103,107
    # mt_mode='threshold_only',  # Threshold reactions: MT 16,17,103-107
    # mt_mode='fission_details',  # Fission channels: MT 18,19,20,21,38
    # mt_mode='all_physical',     # All physical (< 9000)
    # mt_mode='custom',           # Use custom_mt_codes below
    
    custom_mt_codes=None,  # Example: [2, 18, 102] when mt_mode='custom'
    
    # EXCLUSIONS
    exclude_bookkeeping=True,  # Exclude MT 0, 1, >= 9000
    drop_invalid=True,         # Drop NaN/non-positive
    
    # HOLDOUT for extrapolation testing
    holdout_isotopes=None      # Example: [(92, 235), (17, 35)]
)

dataset = NucmlDataset(
    data_path='data/exfor_processed.parquet',
    selection=selection
)

→ Enables predicate pushdown (90% I/O reduction!)
→ Scientifically defensible defaults
→ Explicit physics rationale
""")
print("="*80)
print()

## Step 3: Inspect Real EXFOR Data

In [None]:
# Show sample of real experimental data
print("Sample EXFOR Data (first 20 rows):")
print(dataset.df.head(20))

# Check AME2020/NUBASE2020 availability (on-demand enrichment)
print("\n" + "="*80)
print("AME2020/NUBASE2020 ENRICHMENT STATUS (On-Demand Architecture)")
print("="*80)

# Check if AME files are available in data/ directory
from pathlib import Path

ame_search_paths = [
    Path('../data'),        # Notebooks directory
    Path('data'),           # Current working directory
    Path('../data'),        # Parent directory
]

ame_files = {
    'mass_1.mas20.txt': 'Tier C (Mass & Binding Energy)',
    'rct1.mas20.txt': 'Tier C (Separation Energies S_n, S_p)',
    'rct2_1.mas20.txt': 'Tier C (Two-particle Separation Energies)',
    'nubase_4.mas20.txt': 'Tier D (Nuclear Structure: Spin, Parity)'
}

ame_dir = None
for search_path in ame_search_paths:
    if search_path.exists() and (search_path / 'mass_1.mas20.txt').exists():
        ame_dir = search_path
        break

if ame_dir:
    print(f"✓ AME2020/NUBASE2020 files found in: {ame_dir.absolute()}")
    print("\nAvailable enrichment files:")
    for filename, description in ame_files.items():
        file_path = ame_dir / filename
        if file_path.exists():
            size_kb = file_path.stat().st_size / 1024
            print(f"  ✓ {filename:25s} ({size_kb:>6.1f} KB) - {description}")
        else:
            print(f"  ✗ {filename:25s} (missing) - {description}")
    
    print("\n✓ On-demand enrichment ready!")
    print("  → When you load data with tiers=['C', 'D'], AME files are automatically loaded")
    print("  → Enrichment happens once during NucmlDataset initialization")
    print("  → No file I/O during feature generation (just column selection)")
else:
    print("⚠️  AME2020/NUBASE2020 files not found in common locations:")
    for path in ame_search_paths:
        print(f"      - {path.absolute()}")
    print("\n  To enable AME enrichment (required for Tier B/C/D/E features):")
    print("      1. Download files:")
    print("         cd data/")
    print("         wget https://www-nds.iaea.org/amdc/ame2020/mass_1.mas20.txt")
    print("         wget https://www-nds.iaea.org/amdc/ame2020/rct1.mas20.txt")
    print("         wget https://www-nds.iaea.org/amdc/ame2020/rct2_1.mas20.txt")
    print("         wget https://www-nds.iaea.org/amdc/ame2020/nubase_4.mas20.txt")
    print("      2. Enrichment will happen automatically when loading data with tiers")

print("="*80)

# Check for uncertainties
has_uncertainty = dataset.df['Uncertainty'].notna().sum()
print(f"\n✓ {has_uncertainty}/{len(dataset.df)} points have experimental uncertainty")

# Show current loaded columns
print(f"\n📋 Columns in loaded DataFrame:")
print(f"   {', '.join(dataset.df.columns.tolist())}")
print(f"\n💡 TIP: To load with AME enrichment, use:")
print("   from nucml_next.data import NucmlDataset, DataSelection")
print("   dataset = NucmlDataset(")
print("       data_path='../data/exfor_processed.parquet',")
print("       selection=DataSelection(tiers=['A', 'C', 'D'])  # Auto-loads AME!")
print("   )")

## Step 4: Visualize Real Experimental Data

This shows **actual EXFOR measurements** with experimental scatter - comparing data-rich U-235 vs. data-sparse Cl-35.

In [None]:
# Create comparative visualization: U-235 (data-rich) vs Cl-35 (data-sparse)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# LEFT PLOT: U-235 Fission (data-rich)
u235_fission = dataset.df[(dataset.df['Z'] == 92) & 
                           (dataset.df['A'] == 235) & 
                           (dataset.df['MT'] == 18)]

if len(u235_fission) > 0:
    if 'Uncertainty' in u235_fission.columns:
        has_unc = u235_fission['Uncertainty'].notna()
        
        # Points with uncertainty
        ax1.errorbar(
            u235_fission[has_unc]['Energy'],
            u235_fission[has_unc]['CrossSection'],
            yerr=u235_fission[has_unc]['Uncertainty'],
            fmt='o', markersize=3, alpha=0.6,
            label=f'With uncertainty ({has_unc.sum()} pts)',
            color='blue', elinewidth=0.8
        )
        
        # Points without uncertainty
        if (~has_unc).sum() > 0:
            ax1.scatter(
                u235_fission[~has_unc]['Energy'],
                u235_fission[~has_unc]['CrossSection'],
                marker='x', s=15, alpha=0.5,
                label=f'No uncertainty ({(~has_unc).sum()} pts)',
                color='orange'
            )
    else:
        ax1.scatter(
            u235_fission['Energy'],
            u235_fission['CrossSection'],
            marker='o', s=8, alpha=0.6,
            label=f'EXFOR Data ({len(u235_fission)} pts)'
        )
    
    ax1.set_xlabel('Energy (eV)', fontsize=12, fontweight='bold')
    ax1.set_ylabel('Cross Section (barns)', fontsize=12, fontweight='bold')
    ax1.set_title('U-235 Fission: WELL-UNDERSTOOD (Data-Rich)\n' + 
                  f'{len(u235_fission):,} EXFOR measurements',
                  fontsize=13, fontweight='bold', color='darkblue')
    ax1.legend(fontsize=10)
    ax1.set_xscale('log')
    ax1.set_yscale('log')
    ax1.grid(True, alpha=0.3)
else:
    ax1.text(0.5, 0.5, 'No U-235 fission data in dataset\n(Check EXFOR ingestion)',
             ha='center', va='center', transform=ax1.transAxes, fontsize=12)
    ax1.set_title('U-235 Fission (No Data)', fontsize=13, fontweight='bold')

# RIGHT PLOT: Cl-35 (n,p) (data-sparse)
cl35_np = dataset.df[(dataset.df['Z'] == 17) & 
                      (dataset.df['A'] == 35) & 
                      (dataset.df['MT'] == 103)]

if len(cl35_np) > 0:
    if 'Uncertainty' in cl35_np.columns:
        has_unc = cl35_np['Uncertainty'].notna()
        
        # Points with uncertainty
        if has_unc.sum() > 0:
            ax2.errorbar(
                cl35_np[has_unc]['Energy'],
                cl35_np[has_unc]['CrossSection'],
                yerr=cl35_np[has_unc]['Uncertainty'],
                fmt='o', markersize=5, alpha=0.7,
                label=f'With uncertainty ({has_unc.sum()} pts)',
                color='green', elinewidth=1.2
            )
        
        # Points without uncertainty
        if (~has_unc).sum() > 0:
            ax2.scatter(
                cl35_np[~has_unc]['Energy'],
                cl35_np[~has_unc]['CrossSection'],
                marker='s', s=30, alpha=0.7,
                label=f'No uncertainty ({(~has_unc).sum()} pts)',
                color='red'
            )
    else:
        ax2.scatter(
            cl35_np['Energy'],
            cl35_np['CrossSection'],
            marker='o', s=25, alpha=0.7,
            label=f'EXFOR Data ({len(cl35_np)} pts)'
        )
    
    ax2.set_xlabel('Energy (eV)', fontsize=12, fontweight='bold')
    ax2.set_ylabel('Cross Section (barns)', fontsize=12, fontweight='bold')
    ax2.set_title('Cl-35 (n,p): RESEARCH INTEREST (Data-Sparse)\n' + 
                  f'{len(cl35_np):,} EXFOR measurements',
                  fontsize=13, fontweight='bold', color='darkgreen')
    ax2.legend(fontsize=10)
    ax2.set_xscale('log')
    ax2.set_yscale('log')
    ax2.grid(True, alpha=0.3)
else:
    ax2.text(0.5, 0.5, 'No Cl-35 (n,p) data in dataset\n(Check EXFOR ingestion or expand --max-files)',
             ha='center', va='center', transform=ax2.transAxes, fontsize=11)
    ax2.set_title('Cl-35 (n,p) (No Data)', fontsize=13, fontweight='bold')

plt.tight_layout()
plt.show()

print("\n" + "="*80)
print("🔍 KEY OBSERVATIONS:")
print("="*80)
print(f"  LEFT (U-235 Fission):")
print(f"    • {'MANY' if len(u235_fission) > 100 else 'FEW'} measurements → ML models have rich training data")
print(f"    • Dense energy coverage → Good interpolation possible")
print(f"    • Well-characterized resonances → Physics well-understood")
print()
print(f"  RIGHT (Cl-35 (n,p)):")
print(f"    • {'SPARSE' if len(cl35_np) < 100 else 'MODERATE'} measurements → ML models face data scarcity")
print(f"    • Energy gaps → Interpolation/extrapolation challenging")
print(f"    • Active research area → Models can help guide new experiments!")
print("="*80)

## Step 5: Production Data Statistics

In [None]:
# Get comprehensive statistics
stats = dataset.get_statistics()

print("\n" + "="*70)
print("EXFOR PRODUCTION DATA STATISTICS")
print("="*70)
for key, value in stats.items():
    if isinstance(value, tuple):
        print(f"{key:25s}: {value[0]:.2e} - {value[1]:.2e}")
    else:
        print(f"{key:25s}: {value}")
print("="*70)

# Reaction breakdown
print("\nReaction Types in Dataset:")
reaction_counts = dataset.df.groupby('MT').size().sort_values(ascending=False)
for mt, count in reaction_counts.items():
    mt_name = {2: 'Elastic', 18: 'Fission', 102: 'Capture', 16: '(n,2n)'}.get(int(mt), f'MT={mt}')
    print(f"  {mt_name:20s}: {count:>8,} points")

## Step 6: Verify Data Quality

Production checks to ensure data is suitable for training.

In [None]:
# Quality checks
print("\nData Quality Checks:")
print("="*70)

# 1. No infinite values
has_inf = np.isinf(dataset.df['CrossSection']).sum()
print(f"✓ Infinite cross sections: {has_inf} (should be 0)")

# 2. No NaN in critical columns
critical_cols = ['Z', 'A', 'Energy', 'CrossSection']
for col in critical_cols:
    nan_count = dataset.df[col].isna().sum()
    status = "✓" if nan_count == 0 else "❌"
    print(f"{status} NaN in {col:20s}: {nan_count}")

# 3. Positive cross sections
negative = (dataset.df['CrossSection'] < 0).sum()
print(f"✓ Negative cross sections: {negative} (should be 0)")

# 4. Energy range coverage
energy_decades = np.log10(dataset.df['Energy'].max() / dataset.df['Energy'].min())
print(f"✓ Energy range: {energy_decades:.1f} decades")

# 5. Natural targets flagged
if 'Is_Natural_Target' in dataset.df.columns:
    natural_count = dataset.df['Is_Natural_Target'].sum()
    print(f"✓ Natural targets flagged: {natural_count}")

print("="*70)

## Step 7: Ready for Production Training

This dataset is now ready to use with:
- Baseline models (XGBoost, Decision Trees)
- GNN-Transformer
- Physics-informed training
- OpenMC validation

**All with REAL experimental data!**

In [None]:
# Demonstrate production-safe usage with on-demand enrichment
print("\n🎯 Production Training Example (On-Demand Enrichment):")
print("="*80)
print("# Load EXFOR data with automatic AME enrichment")
print("from nucml_next.data import NucmlDataset, DataSelection")
print("from nucml_next.baselines import XGBoostEvaluator")
print("")
print("# Load data - AME files auto-detected in data/ directory")
print("dataset = NucmlDataset(")
print("    data_path='data/exfor_processed.parquet',")
print("    selection=DataSelection(")
print("        projectile='neutron',")
print("        energy_min=1e-5, energy_max=2e7,")
print("        mt_mode='reactor_core',")
print("        tiers=['A', 'C', 'D']  # → Triggers AME auto-load!")
print("    )")
print(")")
print("")
print("# Get tier-based features (MT codes → particle vectors + AME enrichment)")
print("df_features = dataset.to_tabular()")
print("")
print("# Train on real data with tier-based features")
print("xgb = XGBoostEvaluator()")
print("xgb.train(df_features)")
print("="*80)

print("\n" + "="*80)
print("ON-DEMAND ENRICHMENT BENEFITS")
print("="*80)
print("⚡ Faster ingestion:   No AME parsing during ingestion")
print("💾 Smaller Parquet:    ~10x size reduction (EXFOR data only)")
print("🎯 Flexible:           Load only the tier data you need")
print("📦 Auto-detection:     AME files found automatically in data/")
print("🔄 One-time merge:     Enrichment happens once per dataset load")
print("="*80)

print("\n✅ This dataset is production-ready!")
print(f"✅ {len(dataset.df):,} real EXFOR measurements")
print(f"✅ {dataset.df[['Z', 'A']].drop_duplicates().shape[0]} isotopes")
print(f"✅ {dataset.df['MT'].nunique()} reaction types")
print(f"✅ Lean Parquet ingestion + on-demand AME enrichment")
print(f"   → Fast ingestion, flexible feature selection")

print("\nContinue to baseline/GNN-Transformer training notebooks →")

---

## 🎓 Key Takeaway

> **NUCML-Next: Lean Ingestion + On-Demand Enrichment Architecture**
>
> **What we've learned:**
> - ✅ X4Pro SQLite provides single-file, efficient database access
> - ✅ **Lean ingestion:** Only EXFOR data written to Parquet (~10x smaller)
> - ✅ **On-demand enrichment:** AME2020/NUBASE2020 loaded during dataset initialization
> - ✅ **Auto-detection:** AME files found automatically in data/ directory
> - ✅ **Flexible:** Load only the tier columns you need
>
> **Architecture Benefits:**
> - ⚡ **Faster ingestion** (no AME file parsing)
> - 💾 **Smaller files** (~10x size reduction)
> - 🎯 **Flexible** (load only needed tiers)
> - 📦 **Simple** (auto-detection of AME files)
>
> **Data Flow:**
> ```
> 1. Ingestion:    X4 SQLite → Lean Parquet (EXFOR only)
>                 └─ Fast! No AME parsing
> 
> 2. Loading:      Parquet + AME files (data/*.mas20.txt)
>                 └─ Auto-detected and merged on (Z, A)
> 
> 3. Features:     Column selection for requested tiers
>                 └─ Fast! No file I/O, just DataFrame operations
> ```
>
> **Setup (One-Time):**
> ```bash
> # Step 1: Ingest EXFOR data (lean Parquet)
> python scripts/ingest_exfor.py \
>     --x4-db data/x4sqlite1_sample.db \
>     --output data/exfor_processed.parquet
> 
> # Step 2: Download AME files to data/ directory
> cd data/
> wget https://www-nds.iaea.org/amdc/ame2020/mass_1.mas20.txt
> wget https://www-nds.iaea.org/amdc/ame2020/rct1.mas20.txt
> wget https://www-nds.iaea.org/amdc/ame2020/rct2_1.mas20.txt
> wget https://www-nds.iaea.org/amdc/ame2020/nubase_4.mas20.txt
> cd ..
> ```
>
> **Usage (Automatic Enrichment):**
> ```python
> from nucml_next.data import NucmlDataset, DataSelection
> 
> # Load data - AME enrichment happens automatically!
> dataset = NucmlDataset(
>     data_path='data/exfor_processed.parquet',
>     selection=DataSelection(
>         projectile='neutron',
>         energy_min=1e-5,
>         energy_max=2e7,
>         mt_mode='reactor_core',
>         tiers=['A', 'C', 'D']  # ← Triggers AME auto-load
>     )
> )
> 
> # Enrichment done! AME columns now available
> df = dataset.df  # Has Z, A, MT, Energy, CrossSection + AME columns
> 
> # Generate features (just column selection)
> df_features = dataset.to_tabular(mode='physics')
> ```
>
> **Focus isotopes demonstrate different scenarios:**
> - **U-235 (data-rich)**: Extensive measurements, well-understood physics
> - **Cl-35 (n,p) (data-sparse)**: Limited measurements, active research area
> - **Both:** AME2020/NUBASE2020 enrichment for tier-based features

**Next:** See how classical ML handles these different data scenarios in `00_Baselines_and_Limitations.ipynb` →

---