[Bug]: Custom dataset name not getting reflected in sample_idx_map.json

### Bug Description

When we run a benchmark with custom dataset name, the run fails at accuracy evaluation stage with `KeyError`. When looking at the `sample_idx_map.json`, it seems like for dataset name which is not predefined like open_orca, "Dataset" becomes the generic key.


<img width="997" height="167" alt="Image" src="https://github.com/user-attachments/assets/2c09d873-5ffa-4748-b841-d0a07340df4b" />

Note:

The flow successfully executes if we modify config snippet the following way:
```
datasets:
  - name: "Dataset"
    type: "accuracy"
    samples: 24576
    path: "/home/anandhusooraj/endpoints/open_orca_gpt4_tokenized_llama.sampled_24576.parquet"
    accuracy_config:
      eval_method: "rouge"
      extractor: "identity_extractor"
      ground_truth: "output"
  - name: "Dataset"
    type: "performance"
    samples: 24576
    path: "/home/anandhusooraj/endpoints/open_orca_gpt4_tokenized_llama.sampled_24576.parquet" 
```


### Steps to Reproduce

1. Launch a vllm server with llama2-70b/llama2-7b
2. Config file:

```
# Online Latency Benchmark
name: "online-llama2-70b-orca-benchmark"
version: "1.0"
type: "offline"
  #benchmark_mode: "online"

model_params:
  name: "meta-llama/Llama-2-7b-chat-hf"
  temperature: 0
  top_p: 1
  max_new_tokens: 1024

datasets:
  - name: "Dataset-openorca"
    type: "accuracy"
    samples: 24576
    path: "/home/anandhusooraj/endpoints/open_orca_gpt4_tokenized_llama.sampled_24576.parquet"
    accuracy_config:
      eval_method: "rouge"
      extractor: "identity_extractor"
      ground_truth: "output"
  - name: "Dataset-openorca"
    type: "performance"
    samples: 24576
    path: "/home/anandhusooraj/endpoints/open_orca_gpt4_tokenized_llama.sampled_24576.parquet" 
settings:
  runtime:
    min_duration_ms: 600000 # 1 minute
      #max_duration_ms: 600000 # 10 minutes
    scheduler_random_seed: 42 # For Poisson/distribution sampling
    dataloader_random_seed: 42 # For dataset shuffling
    n_samples_to_issue: 24576
  load_pattern:
    type: "max_throughput"
      #target_qps: 10

  client:
    num_workers: 4

metrics:
  collect:
    - "throughput"
    - "latency"
    - "ttft"
    - "tpot"

endpoint_config:
  endpoints:
    - "http://localhost:9000"
  api_key: null

report_dir: results/llama2_70b_orca_benchmark_mlperf_parq/

```
3. Run command: 

```
inference-endpoint benchmark from-config -c examples/06_Llama2-70B_Example/online_llama2_70b_orca_backup.yaml --timeout 600000
```


### Environment

OS: Ubuntu 24.04
Python: 3.12.3
Endpoints repo latest commit hash: 8c0c63d840280c2bd538922b87244b5a38abd3fe

### Relevant Logs

```shell
Error log:


(endp) anandhusooraj@mlc2:~/endpoints$ inference-endpoint benchmark from-config -c examples/06_Llama2-70B_Example/online_llama2_70b_orca_backup.yaml --timeout 600000
2026-04-16 14:46:42,753 - inference_endpoint.endpoint_client.cpu_affinity - INFO - CPU affinity: 224 online CPUs available to process
2026-04-16 14:46:42,763 - inference_endpoint.endpoint_client.cpu_affinity - INFO - CPU affinity: 112 physical cores across 2 NUMA nodes, requesting 5 for loadgen, 4 workers
2026-04-16 14:46:42,772 - inference_endpoint.endpoint_client.cpu_affinity - INFO - LoadGen pinned to 10 CPUs (5 physical cores)
2026-04-16 14:46:42,777 - inference_endpoint.commands.benchmark.execute - INFO - Loading tokenizer for model: meta-llama/Llama-2-7b-chat-hf
2026-04-16 14:46:42,884 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 401 Unauthorized"
2026-04-16 14:46:42,947 - httpx - INFO - HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 401 Unauthorized"
2026-04-16 14:46:43,000 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
2026-04-16 14:46:43,064 - httpx - INFO - HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
2026-04-16 14:46:43,240 - inference_endpoint.commands.benchmark.execute - INFO - Tokenizer loaded successfully
2026-04-16 14:46:43,241 - inference_endpoint.commands.benchmark.execute - INFO - Streaming: disabled (off)
2026-04-16 14:46:43,757 - inference_endpoint.commands.benchmark.execute - INFO - Loaded <inference_endpoint.dataset_manager.dataset.Dataset object at 0x7958a9e0e810> - 24576 samples
2026-04-16 14:46:44,319 - inference_endpoint.commands.benchmark.execute - INFO - Loaded 24576 samples
2026-04-16 14:46:44,319 - inference_endpoint.commands.benchmark.execute - INFO - Mode: TestMode.PERF, Target QPS: None, Responses: False
2026-04-16 14:46:44,319 - inference_endpoint.commands.benchmark.execute - INFO - Min Duration: 600.0s, Expected samples: 49152
2026-04-16 14:46:44,320 - inference_endpoint.commands.benchmark.execute - INFO - Scheduler: MaxThroughputScheduler (pattern: max_throughput)
meta-llama/Llama-2-7b-chat-hf (Streaming: False):   0%|                                                                                                         | 0/49152 [00:00<?, ?it/s]2026-04-16 14:46:44,327 - inference_endpoint.commands.benchmark.execute - INFO - Connecting: ['http://localhost:9000']
2026-04-16 14:46:46,889 - inference_endpoint.endpoint_client.http_client - INFO - EndpointClient initialized with num_workers=4, endpoints=['http://localhost:9000/v1/chat/completions'], adapter=OpenAIMsgspecAdapter, accumulator=OpenAISSEAccumulator, transport=zmq
2026-04-16 14:46:46,890 - inference_endpoint.commands.benchmark.execute - INFO - Running...
2026-04-16 14:46:47,550 - inference_endpoint.load_generator.session - INFO - All performance samples issued
2026-04-16 14:46:48,121 - inference_endpoint.load_generator.session - INFO - All accuracy samples issued
meta-llama/Llama-2-7b-chat-hf (Streaming: False): 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 49152/49152 [17:40<00:00, 46.37it/s]----------------- Summary -----------------
Version: 0.1.0
Git SHA: 8c0c63d
Test started at: (timestamp_ns):4653411357432119, approx. wall-clock time: (2026-04-16 14:46:46)
Total samples issued: 24576
Total samples completed: 24576
Total samples failed: 0
Duration: 654.46 seconds
QPS: 37.55
TPS: 11089.98
----------------- End of Summary -----------------
2026-04-16 15:04:25,890 - inference_endpoint.load_generator.session - INFO - Report saved to results/llama2_70b_orca_benchmark_mlperf_parq/report.txt
2026-04-16 15:04:25,910 - inference_endpoint.commands.benchmark.execute - INFO - Cleaning up...
meta-llama/Llama-2-7b-chat-hf (Streaming: False): 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 49152/49152 [17:41<00:00, 46.30it/s]
2026-04-16 15:04:25,912 - inference_endpoint.endpoint_client.http_client - INFO - [bfdeb7ec] Shutting down...
2026-04-16 15:04:26,424 - inference_endpoint.endpoint_client.http_client - INFO - [bfdeb7ec] Shutdown complete.
Traceback (most recent call last):
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/main.py", line 128, in run
    app.meta()
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/cyclopts/core.py", line 1889, in __call__
    result = _run_maybe_async_command(command, bound, resolved_backend)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/cyclopts/_run.py", line 50, in _run_maybe_async_command
    return command(*bound.args, **bound.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/main.py", line 73, in launcher
    app(tokens)
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/cyclopts/core.py", line 1889, in __call__
    result = _run_maybe_async_command(command, bound, resolved_backend)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/cyclopts/_run.py", line 50, in _run_maybe_async_command
    return command(*bound.args, **bound.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/commands/benchmark/cli.py", line 112, in from_config
    _run(resolved, [], test_mode)
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/commands/benchmark/cli.py", line 54, in _run
    run_benchmark(config, mode)
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/commands/benchmark/execute.py", line 481, in run_benchmark
    finalize_benchmark(ctx, report, collector)
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/commands/benchmark/execute.py", line 405, in finalize_benchmark
    scorer_instance = eval_cfg.scorer(
                      ^^^^^^^^^^^^^^^^
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/evaluation/scoring.py", line 228, in __init__
    super().__init__(*args, **kwargs)
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/evaluation/scoring.py", line 112, in __init__
    self.sample_index_map = self._load_sample_index_map()
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhusooraj/endpoints/endp/lib/python3.12/site-packages/inference_endpoint/evaluation/scoring.py", line 123, in _load_sample_index_map
    return d[self.dataset_name]  # Implicitly raises KeyError
           ~^^^^^^^^^^^^^^^^^^^
KeyError: 'Dataset-openorca'
```

### Before submitting

- [x] I searched existing issues and found no duplicates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Custom dataset name not getting reflected in sample_idx_map.json #284

Bug Description

Steps to Reproduce

Environment

Relevant Logs

Before submitting

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Custom dataset name not getting reflected in sample_idx_map.json #284

Description

Bug Description

Steps to Reproduce

Environment

Relevant Logs

Before submitting

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions