# Comparison of dataset generation parameters
This notebook compares two different backbones, qiskit and qudit-kit, for dataset generation. Moreover, it evaluates the impact of (not) optimizing circuits using `qiskit.transpile` during generation.

Comparison is done regarding
1. Time for generation and
2. Number of unqiue circuits vs generated circuits.

The notebook was run on a local machine without GPU.

## Results

| Backbone | Optimized | Num Samples | Time | Percentage of unique samples
| --- | --- | --- | --- | --- |
| Qiskit | True | 10k | 50.17s | 98.34%
| Qiskit | True | 100k | 511.54s | 94.47%
| Qiskit | False | 10k | 32.10s | 99.95%
| Qiskit | False | 100k | 710.96s | 99.75%
| --- | --- | --- | --- | --- |
| Qudit-Kit | True | 10k | 130.67s | 99.93%
| Qudit-Kit | True | 100k | 1182.70s | 99.74%
| Qudit-Kit | False | 10k | 31.94s | 99.98%
| Qudit-Kit | False | 100k | 298.21s | 99.75%

In [1]:
import sys
sys.path.append("..")

import time
import logging
logging.getLogger("stevedore").setLevel("WARNING")

from scripts.generate_dataset import main

## Import default configs

- gate_set: H, CX
- num_qubits: 5
- num_samples: 10000
- min_gates: 4
- max_gates: 20
- backbone: qiskit
- condition_type: SRV
- optimized: True
- output_path: ./datasets/srv_dataset
- device: auto

In [2]:
from hydra import initialize, compose

with initialize(version_base=None, config_path="../conf"):
    cfg = compose(config_name="config.yaml")

## Backbone: Qiskit

### Qiskit with optimization, 10.000 samples

#### Results
- Time: 50.17 seconds
- generated qubits: 9978
- unique qubits: 9812 (98.34%)

In [3]:
cfg.datasets.backbone = "qiskit"
cfg.datasets.optimized = True
cfg.datasets.num_samples = 10000
cfg.datasets.output_path = "./tests/datasets/qiskit_optimized_10k"

main(cfg)

2025-12-07 21:44:28 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 10000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'qiskit', 'condition_type': 'SRV', 'optimized': True, 'output_path': './tests/datasets/qiskit_optimized_10k', 'device': 'auto'}
2025-12-07 21:44:28 - quantum_diffusion.data.dataset - INFO - Generating dataset with 10000 samples, 5 qubits
2025-12-07 21:44:28 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 9973 valid circuits.
[INFO]: After filtering unique circuits: 9802.
[INFO]: Saving tensor to `tests/datasets/qiskit_optimized_10k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/qiskit_optimized_10k/dataset/ds_y.safetensors`.
2025-12-07 21:46:32 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/qiskit_optimized_10k
2025-12-07 21:46:32 - scripts.generate_dataset - INFO - Dataset generation completed succ

### Qiskit with optimization, 100.000 samples

#### Results
- Time: 511.54 seconds
- generated qubits: 99846
- unique qubits: 94328 (94.47%)

In [4]:
cfg.datasets.backbone = "qiskit"
cfg.datasets.optimized = True
cfg.datasets.num_samples: 100000
cfg.datasets.output_path = "./tests/datasets/qiskit_optimized_100k"

main(cfg)

2025-12-07 21:46:32 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 10000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'qiskit', 'condition_type': 'SRV', 'optimized': True, 'output_path': './tests/datasets/qiskit_optimized_100k', 'device': 'auto'}
2025-12-07 21:46:32 - quantum_diffusion.data.dataset - INFO - Generating dataset with 10000 samples, 5 qubits
2025-12-07 21:46:32 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 9977 valid circuits.
[INFO]: After filtering unique circuits: 9763.
[INFO]: Saving tensor to `tests/datasets/qiskit_optimized_100k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/qiskit_optimized_100k/dataset/ds_y.safetensors`.
2025-12-07 21:48:07 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/qiskit_optimized_100k
2025-12-07 21:48:07 - scripts.generate_dataset - INFO - Dataset generation completed 

### Qiskit without optimization, 10.000 samples

#### Results
- Time: 32.10 seconds
- generated qubits: 9984
- unique qubits: 9979 (99.95%)

In [5]:
cfg.datasets.backbone = "qiskit"
cfg.datasets.optimized = False
cfg.datasets.num_samples = 10000
cfg.datasets.output_path = "./tests/datasets/qiskit_not_optimized_10k"

main(cfg)

2025-12-07 21:48:07 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 10000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'qiskit', 'condition_type': 'SRV', 'optimized': False, 'output_path': './tests/datasets/qiskit_not_optimized_10k', 'device': 'auto'}
2025-12-07 21:48:07 - quantum_diffusion.data.dataset - INFO - Generating dataset with 10000 samples, 5 qubits
2025-12-07 21:48:07 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 9984 valid circuits.
[INFO]: After filtering unique circuits: 9982.
[INFO]: Saving tensor to `tests/datasets/qiskit_not_optimized_10k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/qiskit_not_optimized_10k/dataset/ds_y.safetensors`.
2025-12-07 21:49:10 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/qiskit_not_optimized_10k
2025-12-07 21:49:10 - scripts.generate_dataset - INFO - Dataset generati

### Qiskit without optimization, 100.000 samples

#### Results
- Time: 710.96 seconds
- generated qubits: 99968
- unique qubits: 99715 (99.75%)

In [6]:
cfg.datasets.backbone = "qiskit"
cfg.datasets.optimized = False
cfg.datasets.num_samples = 100000
cfg.datasets.output_path = "./tests/datasets/qiskit_not_optimized_100k"

main(cfg)

2025-12-07 21:49:10 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 100000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'qiskit', 'condition_type': 'SRV', 'optimized': False, 'output_path': './tests/datasets/qiskit_not_optimized_100k', 'device': 'auto'}
2025-12-07 21:49:10 - quantum_diffusion.data.dataset - INFO - Generating dataset with 100000 samples, 5 qubits
2025-12-07 21:49:10 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 99968 valid circuits.
[INFO]: After filtering unique circuits: 99713.
[INFO]: Saving tensor to `tests/datasets/qiskit_not_optimized_100k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/qiskit_not_optimized_100k/dataset/ds_y.safetensors`.
2025-12-07 22:00:09 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/qiskit_not_optimized_100k
2025-12-07 22:00:09 - scripts.generate_dataset - INFO - Dataset 

## Backbone: Quditkit

### Quditkit with optimization, 10.000 samples

#### Results
- Time: 130.67 seconds
- generated qubits: 9984
- unique qubits: 9977 (99.93%)

In [7]:
cfg.datasets.backbone = "quditkit"
cfg.datasets.optimized = True
cfg.datasets.num_samples = 10000
cfg.datasets.output_path = "./tests/datasets/quditkit_optimized_10k"

main(cfg)

2025-12-07 22:00:09 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 10000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'quditkit', 'condition_type': 'SRV', 'optimized': True, 'output_path': './tests/datasets/quditkit_optimized_10k', 'device': 'auto'}
2025-12-07 22:00:09 - quantum_diffusion.data.dataset - INFO - Generating dataset with 10000 samples, 5 qubits
2025-12-07 22:00:09 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 9984 valid circuits.
[INFO]: After filtering unique circuits: 9980.
[INFO]: Saving tensor to `tests/datasets/quditkit_optimized_10k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/quditkit_optimized_10k/dataset/ds_y.safetensors`.
2025-12-07 22:02:09 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/quditkit_optimized_10k
2025-12-07 22:02:09 - scripts.generate_dataset - INFO - Dataset generation comp

### Quditkit with optimization, 100.000 samples

#### Results
- Time: 1182.70 seconds
- generated qubits: 99967
- unique qubits: 99706 (99.74 %)

In [8]:
cfg.datasets.backbone = "quditkit"
cfg.datasets.optimized = True
cfg.datasets.num_samples = 100000
cfg.datasets.output_path = "./tests/datasets/quditkit_optimized_100k"

main(cfg)

2025-12-07 22:02:09 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 100000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'quditkit', 'condition_type': 'SRV', 'optimized': True, 'output_path': './tests/datasets/quditkit_optimized_100k', 'device': 'auto'}
2025-12-07 22:02:09 - quantum_diffusion.data.dataset - INFO - Generating dataset with 100000 samples, 5 qubits
2025-12-07 22:02:09 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 99968 valid circuits.
[INFO]: After filtering unique circuits: 99732.
[INFO]: Saving tensor to `tests/datasets/quditkit_optimized_100k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/quditkit_optimized_100k/dataset/ds_y.safetensors`.
2025-12-07 22:19:25 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/quditkit_optimized_100k
2025-12-07 22:19:25 - scripts.generate_dataset - INFO - Dataset generat

### Quditkit without optimization, 10.000 samples

#### Results
- Time: 31.94 seconds
- generated qubits: 9983
- unique qubits: 9981 (99.98 %)

In [9]:
cfg.datasets.backbone = "quditkit"
cfg.datasets.optimized = False
cfg.datasets.num_samples = 10000
cfg.datasets.output_path = "./tests/datasets/quditkit_not_optimized_10k"

main(cfg)

2025-12-07 22:19:25 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 10000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'quditkit', 'condition_type': 'SRV', 'optimized': False, 'output_path': './tests/datasets/quditkit_not_optimized_10k', 'device': 'auto'}
2025-12-07 22:19:25 - quantum_diffusion.data.dataset - INFO - Generating dataset with 10000 samples, 5 qubits
2025-12-07 22:19:25 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 9984 valid circuits.
[INFO]: After filtering unique circuits: 9981.
[INFO]: Saving tensor to `tests/datasets/quditkit_not_optimized_10k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/quditkit_not_optimized_10k/dataset/ds_y.safetensors`.
2025-12-07 22:20:09 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/quditkit_not_optimized_10k
2025-12-07 22:20:09 - scripts.generate_dataset - INFO - Datase

### Quditkit without optimization, 100.000 samples

#### Results
- Time: 298.21 seconds
- generated qubits: 99968 
- unique qubits: 99714 (99.75 %)

In [10]:
cfg.datasets.backbone = "quditkit"
cfg.datasets.optimized = False
cfg.datasets.num_samples = 100000
cfg.datasets.output_path = "./tests/datasets/quditkit_not_optimized_100k"

main(cfg)

2025-12-07 22:20:09 - scripts.generate_dataset - INFO - Dataset configuration: {'gate_set': ['h', 'cx'], 'num_qubits': 5, 'num_samples': 100000, 'min_gates': 4, 'max_gates': 20, 'backbone': 'quditkit', 'condition_type': 'SRV', 'optimized': False, 'output_path': './tests/datasets/quditkit_not_optimized_100k', 'device': 'auto'}
2025-12-07 22:20:09 - quantum_diffusion.data.dataset - INFO - Generating dataset with 100000 samples, 5 qubits
2025-12-07 22:20:09 - quantum_diffusion.data.dataset - INFO - Starting circuit generation for SRV...
[INFO]: Generated 99967 valid circuits.
[INFO]: After filtering unique circuits: 99706.
[INFO]: Saving tensor to `tests/datasets/quditkit_not_optimized_100k/dataset/ds_x.safetensors`.
[INFO]: Saving tensor to `tests/datasets/quditkit_not_optimized_100k/dataset/ds_y.safetensors`.
2025-12-07 22:26:05 - quantum_diffusion.data.dataset - INFO - SRV dataset saved to tests/datasets/quditkit_not_optimized_100k
2025-12-07 22:26:05 - scripts.generate_dataset - INFO 