# Dataset Generation Tutorial

This notebook demonstrates how to generate quantum circuit datasets using the QuantumDiffusion framework.

## 1. Setup and Imports

In [None]:
import sys
sys.path.append('../src')

from quantum_diffusion.data import DatasetGenerator, PRESET_CONFIGS
from quantum_diffusion.utils import setup_logging, Logger

# Setup logging
setup_logging(log_level="INFO")
logger = Logger(__name__)

## 2. Generate a Simple Dataset

Let's start by generating a small dataset of Clifford circuits:

In [None]:
# Initialize the dataset generator
generator = DatasetGenerator()

# Define dataset parameters
config = {
    "gate_set": ['h', 'cx', 'cz', 's', 'x', 'y', 'z'],  # Clifford gates
    "num_qubits": 3,
    "num_samples": 100,  # Small dataset for demo
    "min_gates": 2,
    "max_gates": 10,
    "condition_type": "UNITARY",
    "output_path": "./demo_dataset"
}

# Generate the dataset
result = generator.generate_dataset(**config)
print("Dataset generated successfully!")
print(f"Configuration: {result}")

## 3. Using Preset Configurations

The framework provides several preset configurations for common use cases:

In [None]:
# Show available presets
print("Available preset configurations:")
for preset_name, config in PRESET_CONFIGS.items():
    print(f"\n{preset_name}:")
    for key, value in config.items():
        print(f"  {key}: {value}")

In [None]:
# Use a preset configuration
preset_config = PRESET_CONFIGS["clifford_3q_unitary"].copy()
preset_config["num_samples"] = 50  # Reduce for demo
preset_config["output_path"] = "./preset_dataset"

result = generator.generate_dataset(**preset_config)
print("Preset dataset generated successfully!")

## 4. Generate Multiple Datasets

You can generate multiple datasets with different configurations:

In [None]:
# Define multiple configurations
configs = [
    {
        "gate_set": ['h', 'cx'],  # Simple 2-gate set
        "num_qubits": 2,
        "num_samples": 50,
        "condition_type": "SRV",
        "output_path": "./dataset_2q_simple"
    },
    {
        "gate_set": ['h', 'cx', 'rz', 'ry'],  # Universal gate set
        "num_qubits": 3,
        "num_samples": 50,
        "condition_type": "UNITARY",
        "output_path": "./dataset_3q_universal"
    }
]

# Generate all datasets
results = generator.generate_multiple_datasets(configs)
print(f"Generated {len(results)} datasets successfully!")

## 5. Loading and Inspecting Datasets

After generating datasets, you can load and inspect them:

In [None]:
from quantum_diffusion.data import DatasetLoader

# Initialize dataset loader
loader = DatasetLoader()

# Load a dataset
dataset = loader.load_dataset("./demo_dataset")
print(f"Loaded dataset with {len(dataset)} samples")

# Inspect dataset metadata
metadata = loader.inspect_dataset("./demo_dataset")
print("\nDataset metadata:")
for key, value in metadata.items():
    if key != 'config':  # Skip the detailed config for brevity
        print(f"  {key}: {value}")

## 6. Creating Data Loaders for Training

Convert datasets to PyTorch data loaders for training:

In [None]:
# Create data loaders
dataloaders = loader.get_dataloaders(
    dataset, 
    batch_size=16,
    split_ratio=0.2  # 20% for validation
)

print("Data loaders created successfully!")
print(f"Training batches: {len(dataloaders['train'])}")
print(f"Validation batches: {len(dataloaders['valid'])}")

## 7. Advanced Dataset Generation

For more complex scenarios, you can customize various aspects of dataset generation:

In [None]:
# Advanced configuration with specific parameters
advanced_config = {
    "gate_set": ['h', 'cx', 'cz', 's', 'x', 'y', 'z', 'rx', 'ry', 'rz'],
    "num_qubits": 4,
    "num_samples": 200,
    "min_gates": 5,
    "max_gates": 25,
    "condition_type": "UNITARY",
    "output_path": "./advanced_dataset"
}

logger.info("Generating advanced dataset...")
result = generator.generate_dataset(**advanced_config)
print("Advanced dataset generated!")

# Show generation statistics
metadata = loader.inspect_dataset("./advanced_dataset")
print(f"\nGenerated {metadata['num_samples']} circuits")
print(f"Gate set: {advanced_config['gate_set']}")
print(f"Qubit count: {advanced_config['num_qubits']}")

## Summary

This notebook showed how to:

1. Generate simple quantum circuit datasets
2. Use preset configurations for common scenarios
3. Generate multiple datasets with different parameters
4. Load and inspect existing datasets
5. Create data loaders for training
6. Configure advanced dataset generation options

The generated datasets can now be used for training diffusion models in the next tutorial!