diff --git a/hf_model_evaluation/SKILL.md b/hf_model_evaluation/SKILL.md
index 3a82c11..ebec9d7 100644
--- a/hf_model_evaluation/SKILL.md
+++ b/hf_model_evaluation/SKILL.md
@@ -17,20 +17,36 @@ This skill provides tools to add structured evaluation results to Hugging Face m
 
 # Dependencies
 - huggingface_hub>=0.26.0
+- markdown-it-py>=3.0.0
 - python-dotenv>=1.2.1
 - pyyaml>=6.0.3
 - requests>=2.32.5
 - inspect-ai>=0.3.0
 - re (built-in)
 
+# IMPORTANT: Using This Skill
+
+**Use `--help` for the latest workflow guidance.** Works with plain Python or `uv run`:
+```bash
+uv run scripts/evaluation_manager.py --help
+uv run scripts/evaluation_manager.py inspect-tables --help
+uv run scripts/evaluation_manager.py extract-readme --help
+```
+Key workflow (matches CLI help):
+1) `inspect-tables` → find table numbers/columns  
+2) `extract-readme --table N` → prints YAML by default  
+3) add `--apply` (push) or `--create-pr` to write changes
+
 # Core Capabilities
 
-## 1. Extract Evaluation Tables from README
-- **Parse Markdown Tables**: Automatically detect and parse evaluation tables in model READMEs
-- **Multiple Table Support**: Handle models with multiple benchmark tables
-- **Format Detection**: Recognize common evaluation table formats (benchmarks as rows/columns, or transposed with models as rows)
-- **Smart Model Matching**: Find and extract scores for specific models in comparison tables
-- **Smart Conversion**: Convert parsed tables to model-index YAML format
+## 1. Inspect and Extract Evaluation Tables from README
+- **Inspect Tables**: Use `inspect-tables` to see all tables in a README with structure, columns, and sample rows
+- **Parse Markdown Tables**: Accurate parsing using markdown-it-py (ignores code blocks and examples)
+- **Table Selection**: Use `--table N` to extract from a specific table (required when multiple tables exist)
+- **Format Detection**: Recognize common formats (benchmarks as rows, columns, or comparison tables with multiple models)
+- **Column Matching**: Automatically identify model columns/rows; prefer `--model-column-index` (index from inspect output). Use `--model-name-override` only with exact column header text.
+- **YAML Generation**: Convert selected table to model-index YAML format
+- **Task Typing**: `--task-type` sets the `task.type` field in model-index output (e.g., `text-generation`, `summarization`)
 
 ## 2. Import from Artificial Analysis
 - **API Integration**: Fetch benchmark scores directly from Artificial Analysis
@@ -56,148 +72,42 @@ This skill provides tools to add structured evaluation results to Hugging Face m
 The skill includes Python scripts in `scripts/` to perform operations.
 
 ### Prerequisites
-- Install dependencies: `uv add huggingface_hub python-dotenv pyyaml inspect-ai`
+- Preferred: use `uv run` (PEP 723 header auto-installs deps)
+- Or install manually: `pip install huggingface-hub markdown-it-py python-dotenv pyyaml requests`
 - Set `HF_TOKEN` environment variable with Write-access token
 - For Artificial Analysis: Set `AA_API_KEY` environment variable
-- Activate virtual environment: `source .venv/bin/activate`
-
-### Method 1: Extract from README
+- `.env` is loaded automatically if `python-dotenv` is installed
 
-Extract evaluation tables from a model's existing README and add them to model-index metadata.
+### Method 1: Extract from README (CLI workflow)
 
-**Basic Usage:**
+Recommended flow (matches `--help`):
 ```bash
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "username/model-name"
+# 1) Inspect tables to get table numbers and column hints
+uv run scripts/evaluation_manager.py inspect-tables --repo-id "username/model"
+
+# 2) Extract a specific table (prints YAML by default)
+uv run scripts/evaluation_manager.py extract-readme \
+  --repo-id "username/model" \
+  --table 1 \
+  [--model-column-index <column index shown by inspect-tables>] \
+  [--model-name-override "<column header/model name>"]  # use exact header text if you can't use the index
+
+# 3) Apply changes (push or PR)
+uv run scripts/evaluation_manager.py extract-readme \
+  --repo-id "username/model" \
+  --table 1 \
+  --apply       # push directly
+# or
+uv run scripts/evaluation_manager.py extract-readme \
+  --repo-id "username/model" \
+  --table 1 \
+  --create-pr   # open a PR
 ```
 
-**With Custom Task Type:**
-```bash
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "username/model-name" \
-  --task-type "text-generation" \
-  --dataset-name "Custom Benchmarks"
-```
-
-**Dry Run (Preview Only):**
-```bash
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "username/model-name" \
-  --dry-run
-```
-
-#### Supported Table Formats
-
-**Format 1: Benchmarks as Rows**
-```markdown
-| Benchmark | Score |
-|-----------|-------|
-| MMLU      | 85.2  |
-| HumanEval | 72.5  |
-```
-
-**Format 2: Benchmarks as Columns**
-```markdown
-| MMLU | HumanEval | GSM8K |
-|------|-----------|-------|
-| 85.2 | 72.5      | 91.3  |
-```
-
-**Format 3: Multiple Metrics**
-```markdown
-| Benchmark | Accuracy | F1 Score |
-|-----------|----------|----------|
-| MMLU      | 85.2     | 0.84     |
-```
-
-**Format 4: Transposed Tables (Models as Rows)**
-```markdown
-| Model          | MMLU | HumanEval | GSM8K | ARC  |
-|----------------|------|-----------|-------|------|
-| GPT-4          | 86.4 | 67.0      | 92.0  | 96.3 |
-| Claude-3       | 86.8 | 84.9      | 95.0  | 96.4 |
-| **Your-Model** | 85.2 | 72.5      | 91.3  | 95.8 |
-```
-
-In this format, the script will:
-- Detect that models are in rows (first column) and benchmarks in columns (header)
-- Find the row matching your model name (handles bold/markdown formatting)
-- Extract all benchmark scores from that specific row only
-
-#### Validating Extraction Results
-
-**CRITICAL**: Always validate extracted results before creating a PR or pushing changes.
-
-After running `extract-readme`, you MUST:
-
-1. **Use `--dry-run` first** to preview the extraction:
-```bash
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "username/model-name" \
-  --dry-run
-```
-
-2. **Manually verify the output**:
-   - Check that the correct model's scores were extracted (not other models)
-   - Verify benchmark names are correct
-   - Confirm all expected benchmarks are present
-   - Ensure numeric values match the README exactly
-
-3. **For transposed tables** (models as rows):
-   - Verify only ONE model's row was extracted
-   - Check that it matched the correct model name
-   - Look for warnings like "Could not find model 'X' in transposed table"
-   - If scores from multiple models appear, the table format was misdetected
-
-4. **Compare against the source**:
-   - Open the model README in a browser
-   - Cross-reference each extracted score with the table
-   - Verify no scores are mixed from different rows/columns
-
-5. **Common validation failures**:
-   - **Multiple models extracted**: Wrong table format detected
-   - **Missing benchmarks**: Column headers not recognized
-   - **Wrong scores**: Matched wrong model row or column
-   - **Empty metrics list**: Table not detected or parsing failed
-
-**Example validation workflow**:
-```bash
-# Step 1: Dry run to preview
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "allenai/Olmo-3-1125-32B" \
-  --dry-run
-
-# Step 2: If model name not found in table, script shows available models
-# ⚠ Could not find model 'Olmo-3-1125-32B' in transposed table
-#
-# Available models in table:
-#   1. **Open-weight Models**
-#   2. Qwen-2.5-32B
-#   ...
-#   12. **Olmo 3-32B**
-#
-# Please select the correct model name from the list above.
-
-# Step 3: Re-run with the correct model name
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "allenai/Olmo-3-1125-32B" \
-  --model-name-override "**Olmo 3-32B**" \
-  --dry-run
-
-# Step 4: Review the YAML output carefully
-# Verify: Are these all benchmarks for Olmo-3-32B ONLY?
-# Verify: Do the scores match the README table?
-
-# Step 5: If validation passes, create PR
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "allenai/Olmo-3-1125-32B" \
-  --model-name-override "**Olmo 3-32B**" \
-  --create-pr
-
-# Step 6: Validate the model card after update
-python scripts/evaluation_manager.py show \
-  --repo-id "allenai/Olmo-3-1125-32B"
-```
+Validation checklist:
+- YAML is printed by default; compare against the README table before applying.
+- Prefer `--model-column-index`; if using `--model-name-override`, the column header text must be exact.
+- For transposed tables (models as rows), ensure only one row is extracted.
 
 ### Method 2: Import from Artificial Analysis
 
@@ -267,46 +177,42 @@ python scripts/run_eval_job.py \
 
 ### Commands Reference
 
-**List Available Commands:**
+**Top-level help and version:**
+```bash
+uv run scripts/evaluation_manager.py --help
+uv run scripts/evaluation_manager.py --version
+```
+
+**Inspect Tables (start here):**
 ```bash
-python scripts/evaluation_manager.py --help
+uv run scripts/evaluation_manager.py inspect-tables --repo-id "username/model-name"
 ```
 
 **Extract from README:**
 ```bash
-python scripts/evaluation_manager.py extract-readme \
+uv run scripts/evaluation_manager.py extract-readme \
   --repo-id "username/model-name" \
+  --table N \
+  [--model-column-index N] \
+  [--model-name-override "Exact Column Header or Model Name"] \
   [--task-type "text-generation"] \
   [--dataset-name "Custom Benchmarks"] \
-  [--model-name-override "Model Name From Table"] \
-  [--dry-run] \
-  [--create-pr]
+  [--apply | --create-pr]
 ```
 
-The `--model-name-override` flag is useful when:
-- The model name in the table differs from the repo name
-- Working with transposed tables where models are listed with different formatting
-- The script cannot automatically match the model name
-
 **Import from Artificial Analysis:**
 ```bash
-python scripts/evaluation_manager.py import-aa \
+AA_API_KEY=... uv run scripts/evaluation_manager.py import-aa \
   --creator-slug "creator-name" \
   --model-name "model-slug" \
   --repo-id "username/model-name" \
   [--create-pr]
 ```
 
-**View Current Evaluations:**
-```bash
-python scripts/evaluation_manager.py show \
-  --repo-id "username/model-name"
-```
-
-**Validate Model-Index:**
+**View / Validate:**
 ```bash
-python scripts/evaluation_manager.py validate \
-  --repo-id "username/model-name"
+uv run scripts/evaluation_manager.py show --repo-id "username/model-name"
+uv run scripts/evaluation_manager.py validate --repo-id "username/model-name"
 ```
 
 **Run Evaluation Job:**
@@ -354,41 +260,6 @@ model-index:
 
 WARNING: Do not use markdown formatting in the model name. Use the exact name from the table. Only use urls in the source.url field.
 
-### Advanced Usage
-
-**Extract Multiple Tables:**
-```bash
-# The script automatically detects and processes all evaluation tables
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "username/model-name" \
-  --merge-tables
-```
-
-**Custom Metric Mapping:**
-```bash
-# Use a JSON file to map column names to metric types
-python scripts/evaluation_manager.py extract-readme \
-  --repo-id "username/model-name" \
-  --metric-mapping "$(cat metric_mapping.json)"
-```
-
-Example `metric_mapping.json`:
-```json
-{
-  "MMLU": {"type": "mmlu", "name": "Massive Multitask Language Understanding"},
-  "HumanEval": {"type": "humaneval", "name": "Code Generation (HumanEval)"},
-  "GSM8K": {"type": "gsm8k", "name": "Grade School Math"}
-}
-```
-
-**Batch Processing:**
-```bash
-# Process multiple models from a list
-while read repo_id; do
-  python scripts/evaluation_manager.py extract-readme --repo-id "$repo_id"
-done < models.txt
-```
-
 ### Error Handling
 - **Table Not Found**: Script will report if no evaluation tables are detected
 - **Invalid Format**: Clear error messages for malformed tables
@@ -399,15 +270,15 @@ done < models.txt
 
 ### Best Practices
 
-1. **ALWAYS Validate Extraction**: Use `--dry-run` first and manually verify all extracted scores match the README exactly before pushing
-2. **Check for Transposed Tables**: If the README has comparison tables with multiple models, verify only YOUR model's scores were extracted
-3. **Validate After Updates**: Run `validate` and `show` commands to ensure proper formatting
-4. **Source Attribution**: Include source information for traceability
-5. **Regular Updates**: Keep evaluation scores current as new benchmarks emerge
-6. **Create PRs for Others**: Use `--create-pr` when updating models you don't own
-7. **Monitor Costs**: Evaluation Jobs are billed by usage. Ensure you check running jobs and costs
-8. **One model per repo**: Only add one model's 'results' to the model-index. The main model of the repo. No derivatives or forks!
-9. **Markdown formatting**: Never use markdown formatting in the model name. Use the exact name from the table. Only use urls in the source.url field.
+1. **Always start with `inspect-tables`**: See table structure and get the correct extraction command
+2. **Use `--help` for guidance**: Run `inspect-tables --help` to see the complete workflow
+3. **Preview first**: Default behavior prints YAML; review it before using `--apply` or `--create-pr`
+4. **Verify extracted values**: Compare YAML output against the README table manually
+5. **Use `--table N` for multi-table READMEs**: Required when multiple evaluation tables exist
+6. **Use `--model-name-override` for comparison tables**: Copy the exact column header from `inspect-tables` output
+7. **Create PRs for Others**: Use `--create-pr` when updating models you don't own
+8. **One model per repo**: Only add the main model's results to model-index
+9. **No markdown in YAML names**: The model name field in YAML should be plain text
 
 ### Model Name Matching
 
diff --git a/hf_model_evaluation/scripts/evaluation_manager.py b/hf_model_evaluation/scripts/evaluation_manager.py
index 4ecc32f..b58e89b 100644
--- a/hf_model_evaluation/scripts/evaluation_manager.py
+++ b/hf_model_evaluation/scripts/evaluation_manager.py
@@ -2,6 +2,7 @@
 # requires-python = ">=3.13"
 # dependencies = [
 #     "huggingface-hub>=1.1.4",
+#     "markdown-it-py>=3.0.0",
 #     "python-dotenv>=1.2.1",
 #     "pyyaml>=6.0.3",
 #     "requests>=2.32.5",
@@ -21,17 +22,61 @@
 import argparse
 import os
 import re
+from textwrap import dedent
 from typing import Any, Dict, List, Optional, Tuple
 
-import dotenv
-import requests
-import yaml
-from huggingface_hub import ModelCard
 
-dotenv.load_dotenv()
+def load_env() -> None:
+    """Load .env if python-dotenv is available; keep help usable without it."""
+    try:
+        import dotenv  # type: ignore
+    except ModuleNotFoundError:
+        return
+    dotenv.load_dotenv()
+
+
+def require_markdown_it():
+    try:
+        from markdown_it import MarkdownIt  # type: ignore
+    except ModuleNotFoundError as exc:
+        raise ModuleNotFoundError(
+            "markdown-it-py is required for table parsing. "
+            "Install with `uv add markdown-it-py` or `pip install markdown-it-py`."
+        ) from exc
+    return MarkdownIt
+
+
+def require_model_card():
+    try:
+        from huggingface_hub import ModelCard  # type: ignore
+    except ModuleNotFoundError as exc:
+        raise ModuleNotFoundError(
+            "huggingface-hub is required for model card operations. "
+            "Install with `uv add huggingface_hub` or `pip install huggingface-hub`."
+        ) from exc
+    return ModelCard
+
+
+def require_requests():
+    try:
+        import requests  # type: ignore
+    except ModuleNotFoundError as exc:
+        raise ModuleNotFoundError(
+            "requests is required for Artificial Analysis import. "
+            "Install with `uv add requests` or `pip install requests`."
+        ) from exc
+    return requests
+
 
-HF_TOKEN = os.getenv("HF_TOKEN")
-AA_API_KEY = os.getenv("AA_API_KEY")
+def require_yaml():
+    try:
+        import yaml  # type: ignore
+    except ModuleNotFoundError as exc:
+        raise ModuleNotFoundError(
+            "PyYAML is required for YAML output. "
+            "Install with `uv add pyyaml` or `pip install pyyaml`."
+        ) from exc
+    return yaml
 
 
 # ============================================================================
@@ -275,7 +320,8 @@ def extract_metrics_from_table(
     header: List[str],
     rows: List[List[str]],
     table_format: str = "auto",
-    model_name: Optional[str] = None
+    model_name: Optional[str] = None,
+    model_column_index: Optional[int] = None
 ) -> List[Dict[str, Any]]:
     """
     Extract metrics from parsed table data.
@@ -297,21 +343,28 @@ def extract_metrics_from_table(
         if is_transposed_table(header, rows):
             table_format = "transposed"
         else:
-            # Heuristic: if first row has mostly numeric values, benchmarks are columns
-            try:
-                numeric_count = sum(
-                    1 for cell in rows[0] if cell and
-                    re.match(r"^\d+\.?\d*%?$", cell.replace(",", "").strip())
-                )
-                table_format = "columns" if numeric_count > len(rows[0]) / 2 else "rows"
-            except (IndexError, ValueError):
+            # Check if first column header is empty/generic (indicates benchmarks in rows)
+            first_header = header[0].lower().strip() if header else ""
+            is_first_col_benchmarks = not first_header or first_header in ["", "benchmark", "task", "dataset", "metric", "eval"]
+
+            if is_first_col_benchmarks:
                 table_format = "rows"
+            else:
+                # Heuristic: if first row has mostly numeric values, benchmarks are columns
+                try:
+                    numeric_count = sum(
+                        1 for cell in rows[0] if cell and
+                        re.match(r"^\d+\.?\d*%?$", cell.replace(",", "").strip())
+                    )
+                    table_format = "columns" if numeric_count > len(rows[0]) / 2 else "rows"
+                except (IndexError, ValueError):
+                    table_format = "rows"
 
     if table_format == "rows":
         # Benchmarks are in rows, scores in columns
         # Try to identify the main model column if model_name is provided
-        target_column = None
-        if model_name:
+        target_column = model_column_index
+        if target_column is None and model_name:
             target_column = find_main_model_column(header, model_name)
 
         for row in rows:
@@ -438,7 +491,9 @@ def extract_evaluations_from_readme(
     task_type: str = "text-generation",
     dataset_name: str = "Benchmarks",
     dataset_type: str = "benchmark",
-    model_name_override: Optional[str] = None
+    model_name_override: Optional[str] = None,
+    table_index: Optional[int] = None,
+    model_column_index: Optional[int] = None
 ) -> Optional[List[Dict[str, Any]]]:
     """
     Extract evaluation results from a model's README.
@@ -448,13 +503,17 @@ def extract_evaluations_from_readme(
         task_type: Task type for model-index (e.g., "text-generation")
         dataset_name: Name for the benchmark dataset
         dataset_type: Type identifier for the dataset
-        model_name_override: Override model name for matching (useful for transposed tables)
+        model_name_override: Override model name for matching (column header for comparison tables)
+        table_index: 1-indexed table number from inspect-tables output
 
     Returns:
         Model-index formatted results or None if no evaluations found
     """
     try:
-        card = ModelCard.load(repo_id, token=HF_TOKEN)
+        load_env()
+        ModelCard = require_model_card()
+        hf_token = os.getenv("HF_TOKEN")
+        card = ModelCard.load(repo_id, token=hf_token)
         readme_content = card.content
 
         if not readme_content:
@@ -468,28 +527,59 @@ def extract_evaluations_from_readme(
         else:
             model_name = repo_id.split("/")[-1] if "/" in repo_id else repo_id
 
-        # Extract all tables
-        tables = extract_tables_from_markdown(readme_content)
+        # Use markdown-it parser for accurate table extraction
+        all_tables = extract_tables_with_parser(readme_content)
 
-        if not tables:
+        if not all_tables:
             print(f"No tables found in README for {repo_id}")
             return None
 
-        # Parse and filter evaluation tables
+        # If table_index specified, use that specific table
+        if table_index is not None:
+            if table_index < 1 or table_index > len(all_tables):
+                print(f"Invalid table index {table_index}. Found {len(all_tables)} tables.")
+                print("Run inspect-tables to see available tables.")
+                return None
+            tables_to_process = [all_tables[table_index - 1]]
+        else:
+            # Filter to evaluation tables only
+            eval_tables = []
+            for table in all_tables:
+                header = table.get("headers", [])
+                rows = table.get("rows", [])
+                if is_evaluation_table(header, rows):
+                    eval_tables.append(table)
+
+            if len(eval_tables) > 1:
+                print(f"\n⚠ Found {len(eval_tables)} evaluation tables.")
+                print("Run inspect-tables first, then use --table to select one:")
+                print(f'  uv run scripts/evaluation_manager.py inspect-tables --repo-id "{repo_id}"')
+                return None
+            elif len(eval_tables) == 0:
+                print(f"No evaluation tables found in README for {repo_id}")
+                return None
+
+            tables_to_process = eval_tables
+
+        # Extract metrics from selected table(s)
         all_metrics = []
-        for table_str in tables:
-            header, rows = parse_markdown_table(table_str)
-
-            if is_evaluation_table(header, rows):
-                metrics = extract_metrics_from_table(header, rows, model_name=model_name)
-                all_metrics.extend(metrics)
+        for table in tables_to_process:
+            header = table.get("headers", [])
+            rows = table.get("rows", [])
+            metrics = extract_metrics_from_table(
+                header,
+                rows,
+                model_name=model_name,
+                model_column_index=model_column_index
+            )
+            all_metrics.extend(metrics)
 
         if not all_metrics:
-            print(f"No evaluation tables found in README for {repo_id}")
+            print(f"No metrics extracted from table")
             return None
 
         # Build model-index structure
-        model_name = repo_id.split("/")[-1] if "/" in repo_id else repo_id
+        display_name = repo_id.split("/")[-1] if "/" in repo_id else repo_id
 
         results = [{
             "task": {"type": task_type},
@@ -511,6 +601,185 @@ def extract_evaluations_from_readme(
         return None
 
 
+# ============================================================================
+# Table Inspection (using markdown-it-py for accurate parsing)
+# ============================================================================
+
+
+def extract_tables_with_parser(markdown_content: str) -> List[Dict[str, Any]]:
+    """
+    Extract tables from markdown using markdown-it-py parser.
+    Uses GFM (GitHub Flavored Markdown) which includes table support.
+    """
+    MarkdownIt = require_markdown_it()
+    # Disable linkify to avoid optional dependency errors; not needed for table parsing.
+    md = MarkdownIt("gfm-like", {"linkify": False})
+    tokens = md.parse(markdown_content)
+
+    tables = []
+    i = 0
+    while i < len(tokens):
+        token = tokens[i]
+
+        if token.type == "table_open":
+            table_data = {"headers": [], "rows": []}
+            current_row = []
+            in_header = False
+
+            i += 1
+            while i < len(tokens) and tokens[i].type != "table_close":
+                t = tokens[i]
+                if t.type == "thead_open":
+                    in_header = True
+                elif t.type == "thead_close":
+                    in_header = False
+                elif t.type == "tr_open":
+                    current_row = []
+                elif t.type == "tr_close":
+                    if in_header:
+                        table_data["headers"] = current_row
+                    else:
+                        table_data["rows"].append(current_row)
+                    current_row = []
+                elif t.type == "inline":
+                    current_row.append(t.content.strip())
+                i += 1
+
+            if table_data["headers"] or table_data["rows"]:
+                tables.append(table_data)
+
+        i += 1
+
+    return tables
+
+
+def detect_table_format(table: Dict[str, Any], repo_id: str) -> Dict[str, Any]:
+    """Analyze a table to detect its format and identify model columns."""
+    headers = table.get("headers", [])
+    rows = table.get("rows", [])
+
+    if not headers or not rows:
+        return {"format": "unknown", "columns": headers, "model_columns": [], "row_count": 0, "sample_rows": []}
+
+    first_header = headers[0].lower() if headers else ""
+    is_first_col_benchmarks = not first_header or first_header in ["", "benchmark", "task", "dataset", "metric", "eval"]
+
+    # Check for numeric columns
+    numeric_columns = []
+    for col_idx in range(1, len(headers)):
+        numeric_count = 0
+        for row in rows[:5]:
+            if col_idx < len(row):
+                try:
+                    val = re.sub(r'\s*\([^)]*\)', '', row[col_idx])
+                    float(val.replace("%", "").replace(",", "").strip())
+                    numeric_count += 1
+                except (ValueError, AttributeError):
+                    pass
+        if numeric_count > len(rows[:5]) / 2:
+            numeric_columns.append(col_idx)
+
+    # Determine format
+    if is_first_col_benchmarks and len(numeric_columns) > 1:
+        format_type = "comparison"
+    elif is_first_col_benchmarks and len(numeric_columns) == 1:
+        format_type = "simple"
+    elif len(numeric_columns) > len(headers) / 2:
+        format_type = "transposed"
+    else:
+        format_type = "unknown"
+
+    # Find model columns
+    model_columns = []
+    model_name = repo_id.split("/")[-1] if "/" in repo_id else repo_id
+    model_tokens, _ = normalize_model_name(model_name)
+
+    for idx, header in enumerate(headers):
+        if idx == 0 and is_first_col_benchmarks:
+            continue
+        if header:
+            header_tokens, _ = normalize_model_name(header)
+            is_match = model_tokens == header_tokens
+            is_partial = model_tokens.issubset(header_tokens) or header_tokens.issubset(model_tokens)
+            model_columns.append({
+                "index": idx,
+                "header": header,
+                "is_exact_match": is_match,
+                "is_partial_match": is_partial and not is_match
+            })
+
+    return {
+        "format": format_type,
+        "columns": headers,
+        "model_columns": model_columns,
+        "row_count": len(rows),
+        "sample_rows": [row[0] for row in rows[:5] if row]
+    }
+
+
+def inspect_tables(repo_id: str) -> None:
+    """Inspect and display all evaluation tables in a model's README."""
+    try:
+        load_env()
+        ModelCard = require_model_card()
+        hf_token = os.getenv("HF_TOKEN")
+        card = ModelCard.load(repo_id, token=hf_token)
+        readme_content = card.content
+
+        if not readme_content:
+            print(f"No README content found for {repo_id}")
+            return
+
+        tables = extract_tables_with_parser(readme_content)
+
+        if not tables:
+            print(f"No tables found in README for {repo_id}")
+            return
+
+        print(f"\n{'='*70}")
+        print(f"Tables found in README for: {repo_id}")
+        print(f"{'='*70}")
+
+        eval_table_count = 0
+        for table in tables:
+            analysis = detect_table_format(table, repo_id)
+
+            if analysis["format"] == "unknown" and not analysis.get("sample_rows"):
+                continue
+
+            eval_table_count += 1
+            print(f"\n## Table {eval_table_count}")
+            print(f"   Format: {analysis['format']}")
+            print(f"   Rows: {analysis['row_count']}")
+
+            print(f"\n   Columns ({len(analysis['columns'])}):")
+            for col_info in analysis.get("model_columns", []):
+                idx = col_info["index"]
+                header = col_info["header"]
+                if col_info["is_exact_match"]:
+                    print(f"      [{idx}] {header}  ✓ EXACT MATCH")
+                elif col_info["is_partial_match"]:
+                    print(f"      [{idx}] {header}  ~ partial match")
+                else:
+                    print(f"      [{idx}] {header}")
+
+            if analysis.get("sample_rows"):
+                print(f"\n   Sample rows (first column):")
+                for row_val in analysis["sample_rows"][:5]:
+                    print(f"      - {row_val}")
+
+        if eval_table_count == 0:
+            print("\nNo evaluation tables detected.")
+        else:
+            print("\nSuggested next step:")
+            print(f'  uv run scripts/evaluation_manager.py extract-readme --repo-id "{repo_id}" --table <table-number> [--model-column-index <column-index>]')
+
+        print(f"\n{'='*70}\n")
+
+    except Exception as e:
+        print(f"Error inspecting tables: {e}")
+
+
 # ============================================================================
 # Method 2: Import from Artificial Analysis
 # ============================================================================
@@ -527,12 +796,16 @@ def get_aa_model_data(creator_slug: str, model_name: str) -> Optional[Dict[str,
     Returns:
         Model data dictionary or None if not found
     """
+    load_env()
+    AA_API_KEY = os.getenv("AA_API_KEY")
     if not AA_API_KEY:
         raise ValueError("AA_API_KEY environment variable is not set")
 
     url = "https://artificialanalysis.ai/api/v2/data/llms/models"
     headers = {"x-api-key": AA_API_KEY}
 
+    requests = require_requests()
+
     try:
         response = requests.get(url, headers=headers, timeout=30)
         response.raise_for_status()
@@ -650,12 +923,15 @@ def update_model_card_with_evaluations(
     Returns:
         True if successful, False otherwise
     """
-    if not HF_TOKEN:
-        raise ValueError("HF_TOKEN environment variable is not set")
-
     try:
+        load_env()
+        ModelCard = require_model_card()
+        hf_token = os.getenv("HF_TOKEN")
+        if not hf_token:
+            raise ValueError("HF_TOKEN environment variable is not set")
+
         # Load existing card
-        card = ModelCard.load(repo_id, token=HF_TOKEN)
+        card = ModelCard.load(repo_id, token=hf_token)
 
         # Get model name
         model_name = repo_id.split("/")[-1] if "/" in repo_id else repo_id
@@ -693,7 +969,7 @@ def update_model_card_with_evaluations(
         # Push update
         card.push_to_hub(
             repo_id,
-            token=HF_TOKEN,
+            token=hf_token,
             commit_message=commit_message,
             commit_description=commit_description,
             create_pr=create_pr
@@ -711,7 +987,10 @@ def update_model_card_with_evaluations(
 def show_evaluations(repo_id: str) -> None:
     """Display current evaluations in a model card."""
     try:
-        card = ModelCard.load(repo_id, token=HF_TOKEN)
+        load_env()
+        ModelCard = require_model_card()
+        hf_token = os.getenv("HF_TOKEN")
+        card = ModelCard.load(repo_id, token=hf_token)
 
         if "model-index" not in card.data:
             print(f"No model-index found in {repo_id}")
@@ -756,7 +1035,10 @@ def show_evaluations(repo_id: str) -> None:
 def validate_model_index(repo_id: str) -> bool:
     """Validate model-index format in a model card."""
     try:
-        card = ModelCard.load(repo_id, token=HF_TOKEN)
+        load_env()
+        ModelCard = require_model_card()
+        hf_token = os.getenv("HF_TOKEN")
+        card = ModelCard.load(repo_id, token=hf_token)
 
         if "model-index" not in card.data:
             print(f"✗ No model-index found in {repo_id}")
@@ -805,28 +1087,85 @@ def validate_model_index(repo_id: str) -> bool:
 
 def main():
     parser = argparse.ArgumentParser(
-        description="Manage evaluation results in Hugging Face model cards"
+        description=(
+            "Manage evaluation results in Hugging Face model cards.\n\n"
+            "Use standard Python or `uv run scripts/evaluation_manager.py ...` "
+            "to auto-resolve dependencies from the PEP 723 header."
+        ),
+        formatter_class=argparse.RawTextHelpFormatter,
+        epilog=dedent(
+            """\
+            Typical workflows:
+              - Inspect tables first:
+                  uv run scripts/evaluation_manager.py inspect-tables --repo-id <model>
+              - Extract from README (prints YAML by default):
+                  uv run scripts/evaluation_manager.py extract-readme --repo-id <model> --table N
+              - Apply changes:
+                  uv run scripts/evaluation_manager.py extract-readme --repo-id <model> --table N --apply
+              - Import from Artificial Analysis:
+                  AA_API_KEY=... uv run scripts/evaluation_manager.py import-aa --creator-slug org --model-name slug --repo-id <model>
+
+            Tips:
+              - YAML is printed by default; use --apply or --create-pr to write changes.
+              - Set HF_TOKEN (and AA_API_KEY for import-aa); .env is loaded automatically if python-dotenv is installed.
+              - When multiple tables exist, run inspect-tables then select with --table N.
+              - To apply changes (push or PR), rerun extract-readme with --apply or --create-pr.
+            """
+        ),
     )
+    parser.add_argument("--version", action="version", version="evaluation_manager 1.2.0")
 
     subparsers = parser.add_subparsers(dest="command", help="Command to execute")
 
     # Extract from README command
     extract_parser = subparsers.add_parser(
         "extract-readme",
-        help="Extract evaluation tables from model README"
+        help="Extract evaluation tables from model README",
+        formatter_class=argparse.RawTextHelpFormatter,
+        description="Parse README tables into model-index YAML. Default behavior prints YAML; use --apply/--create-pr to write changes.",
+        epilog=dedent(
+            """\
+            Examples:
+              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model
+              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model --table 2 --model-column-index 3
+              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model --table 2 --model-name-override \"**Model 7B**\"  # exact header text
+              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model --table 2 --create-pr
+
+            Apply changes:
+              - Default: prints YAML to stdout (no writes).
+              - Add --apply to push directly, or --create-pr to open a PR.
+            Model selection:
+              - Preferred: --model-column-index <header index shown by inspect-tables>
+              - If using --model-name-override, copy the column header text exactly.
+            """
+        ),
     )
     extract_parser.add_argument("--repo-id", type=str, required=True, help="HF repository ID")
-    extract_parser.add_argument("--task-type", type=str, default="text-generation", help="Task type")
+    extract_parser.add_argument("--table", type=int, help="Table number (1-indexed, from inspect-tables output)")
+    extract_parser.add_argument("--model-column-index", type=int, help="Preferred: column index from inspect-tables output (exact selection)")
+    extract_parser.add_argument("--model-name-override", type=str, help="Exact column header/model name for comparison/transpose tables (when index is not used)")
+    extract_parser.add_argument("--task-type", type=str, default="text-generation", help="Sets model-index task.type (e.g., text-generation, summarization)")
     extract_parser.add_argument("--dataset-name", type=str, default="Benchmarks", help="Dataset name")
     extract_parser.add_argument("--dataset-type", type=str, default="benchmark", help="Dataset type")
-    extract_parser.add_argument("--model-name-override", type=str, help="Override model name for table matching")
     extract_parser.add_argument("--create-pr", action="store_true", help="Create PR instead of direct push")
-    extract_parser.add_argument("--dry-run", action="store_true", help="Preview without updating")
+    extract_parser.add_argument("--apply", action="store_true", help="Apply changes (default is to print YAML only)")
+    extract_parser.add_argument("--dry-run", action="store_true", help="Preview YAML without updating (default)")
 
     # Import from AA command
     aa_parser = subparsers.add_parser(
         "import-aa",
-        help="Import evaluation scores from Artificial Analysis"
+        help="Import evaluation scores from Artificial Analysis",
+        formatter_class=argparse.RawTextHelpFormatter,
+        description="Fetch scores from Artificial Analysis API and write them into model-index.",
+        epilog=dedent(
+            """\
+            Examples:
+              AA_API_KEY=... uv run scripts/evaluation_manager.py import-aa --creator-slug anthropic --model-name claude-sonnet-4 --repo-id username/model
+              uv run scripts/evaluation_manager.py import-aa --creator-slug openai --model-name gpt-4o --repo-id username/model --create-pr
+
+            Requires: AA_API_KEY in env (or .env if python-dotenv installed).
+            """
+        ),
     )
     aa_parser.add_argument("--creator-slug", type=str, required=True, help="AA creator slug")
     aa_parser.add_argument("--model-name", type=str, required=True, help="AA model name")
@@ -836,71 +1175,114 @@ def main():
     # Show evaluations command
     show_parser = subparsers.add_parser(
         "show",
-        help="Display current evaluations in model card"
+        help="Display current evaluations in model card",
+        formatter_class=argparse.RawTextHelpFormatter,
+        description="Print model-index content from the model card (requires HF_TOKEN for private repos).",
     )
     show_parser.add_argument("--repo-id", type=str, required=True, help="HF repository ID")
 
     # Validate command
     validate_parser = subparsers.add_parser(
         "validate",
-        help="Validate model-index format"
+        help="Validate model-index format",
+        formatter_class=argparse.RawTextHelpFormatter,
+        description="Schema sanity check for model-index section of the card.",
     )
     validate_parser.add_argument("--repo-id", type=str, required=True, help="HF repository ID")
 
+    # Inspect tables command
+    inspect_parser = subparsers.add_parser(
+        "inspect-tables",
+        help="Inspect tables in README → outputs suggested extract-readme command",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Workflow:
+  1. inspect-tables     → see table structure, columns, and table numbers
+  2. extract-readme     → run with --table N (from step 1); YAML prints by default
+  3. apply changes      → rerun extract-readme with --apply or --create-pr
+
+Reminder:
+  - Preferred: use --model-column-index <index>. If needed, use --model-name-override with the exact column header text.
+"""
+    )
+    inspect_parser.add_argument("--repo-id", type=str, required=True, help="HF repository ID")
+
     args = parser.parse_args()
 
     if not args.command:
         parser.print_help()
         return
 
-    # Execute command
-    if args.command == "extract-readme":
-        results = extract_evaluations_from_readme(
-            repo_id=args.repo_id,
-            task_type=args.task_type,
-            dataset_name=args.dataset_name,
-            dataset_type=args.dataset_type,
-            model_name_override=args.model_name_override
-        )
+    try:
+        # Execute command
+        if args.command == "extract-readme":
+            results = extract_evaluations_from_readme(
+                repo_id=args.repo_id,
+                task_type=args.task_type,
+                dataset_name=args.dataset_name,
+                dataset_type=args.dataset_type,
+                model_name_override=args.model_name_override,
+                table_index=args.table,
+                model_column_index=args.model_column_index
+            )
 
-        if not results:
-            print("No evaluations extracted")
-            return
+            if not results:
+                print("No evaluations extracted")
+                return
+
+            apply_changes = args.apply or args.create_pr
+
+            # Default behavior: print YAML (dry-run)
+            yaml = require_yaml()
+            print("\nExtracted evaluations (YAML):")
+            print(
+                yaml.dump(
+                    {"model-index": [{"name": args.repo_id.split('/')[-1], "results": results}]},
+                    sort_keys=False
+                )
+            )
+
+            if apply_changes:
+                if args.model_name_override and args.model_column_index is not None:
+                    print("Note: --model-column-index takes precedence over --model-name-override.")
+                update_model_card_with_evaluations(
+                    repo_id=args.repo_id,
+                    results=results,
+                    create_pr=args.create_pr,
+                    commit_message="Extract evaluation results from README"
+                )
+
+        elif args.command == "import-aa":
+            results = import_aa_evaluations(
+                creator_slug=args.creator_slug,
+                model_name=args.model_name,
+                repo_id=args.repo_id
+            )
+
+            if not results:
+                print("No evaluations imported")
+                return
 
-        if args.dry_run:
-            print("\nPreview of extracted evaluations:")
-            print(yaml.dump({"model-index": [{"name": args.repo_id.split("/")[-1], "results": results}]}, sort_keys=False))
-        else:
             update_model_card_with_evaluations(
                 repo_id=args.repo_id,
                 results=results,
                 create_pr=args.create_pr,
-                commit_message="Extract evaluation results from README"
+                commit_message=f"Add Artificial Analysis evaluations for {args.model_name}"
             )
 
-    elif args.command == "import-aa":
-        results = import_aa_evaluations(
-            creator_slug=args.creator_slug,
-            model_name=args.model_name,
-            repo_id=args.repo_id
-        )
-
-        if not results:
-            print("No evaluations imported")
-            return
-
-        update_model_card_with_evaluations(
-            repo_id=args.repo_id,
-            results=results,
-            create_pr=args.create_pr,
-            commit_message=f"Add Artificial Analysis evaluations for {args.model_name}"
-        )
+        elif args.command == "show":
+            show_evaluations(args.repo_id)
 
-    elif args.command == "show":
-        show_evaluations(args.repo_id)
+        elif args.command == "validate":
+            validate_model_index(args.repo_id)
 
-    elif args.command == "validate":
-        validate_model_index(args.repo_id)
+        elif args.command == "inspect-tables":
+            inspect_tables(args.repo_id)
+    except ModuleNotFoundError as exc:
+        # Surface dependency hints cleanly when user only needs help output
+        print(exc)
+    except Exception as exc:
+        print(f"Error: {exc}")
 
 
 if __name__ == "__main__":