# Quantum Synthetic Data Generation

This notebook demonstrates how to generate synthetic data using quantum computing techniques with Q#. We'll create realistic datasets that can be used for machine learning, testing, and privacy-preserving analytics.

## Key Features:
- **Quantum Superposition**: Generate probabilistic data distributions
- **Quantum Entanglement**: Create correlated features (e.g., age and income)
- **Quantum Walks**: Advanced categorical data generation
- **Variational Circuits**: Parameterized probability distributions
- **Quantum Noise**: Data augmentation and privacy enhancement

In [None]:
import qsharp
from qsharp_widgets import Circuit, Histogram
import json
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

## Load the JSON Configuration

First, let's examine our synthetic data configuration that defines the structure and characteristics of the data we want to generate.

In [None]:
# Load the JSON configuration
with open('sample_data.json', 'r') as f:
    config = json.load(f)

print("Data Generation Configuration:")
print(json.dumps(config, indent=2))

## Basic Quantum Data Generation

Let's start with our basic quantum synthetic data generator that creates correlated features using quantum entanglement.

In [None]:
%%qsharp

"// This will load our GeneratorEntryPoint.qs file\n",
        "open GeneratorEntryPoint;\n",,

// Run the basic quantum data generator
operation RunBasicGenerator() : DataSample[] {
    return GenerateQuantumSyntheticData();
}

In [None]:
# Generate basic synthetic data
basic_samples = qsharp.run("RunBasicGenerator()")
print(f"Generated {len(basic_samples)} basic samples")

# Display first few samples
for i, sample in enumerate(basic_samples[:5]):
    print(f"Sample {i+1}: Age={sample['Age']}, Income=${sample['Income']:,}, Category={sample['Category']}, Flag={sample['BinaryFlag']}")

## Advanced Quantum Data Generation

Now let's use our advanced generator that includes quantum walks, variational circuits, and quantum fingerprinting.

In [None]:
%%qsharp

// Load the advanced generator
open QuantumDataGenerator;

// Run the advanced quantum data generator
operation RunAdvancedGenerator() : QuantumDataSample[] {
    return AdvancedQuantumDataGenerator();
}

In [None]:
# Generate advanced synthetic data
advanced_samples = qsharp.run("RunAdvancedGenerator()")
print(f"Generated {len(advanced_samples)} advanced samples")

# Display first few samples with quantum fingerprints
for i, sample in enumerate(advanced_samples[:5]):
    fingerprint = ''.join(map(str, sample['QuantumFingerprint']))
    print(f"Sample {sample['Id']}: Age={sample['Age']}, Income=${sample['Income']:,}, "
          f"Category={sample['Category']}, Flag={sample['BinaryFlag']}, QF={fingerprint}")

## Visualize the Quantum Circuit

Let's examine the quantum circuit used for generating entangled age and income features.

In [None]:
# Visualize the quantum circuit for entangled feature generation
circuit_qsharp = """
operation VisualizeEntanglementCircuit() : Unit {
    use qubits = Qubit[8];
    
    // Create Bell pairs for correlation
    H(qubits[0]);
    CNOT(qubits[0], qubits[4]);
    
    // Age generation circuit
    H(qubits[1]);
    H(qubits[2]);
    H(qubits[3]);
    
    // Income generation circuit (entangled with age)
    H(qubits[5]);
    H(qubits[6]);
    H(qubits[7]);
    
    // Add cross-correlations
    CNOT(qubits[1], qubits[5]);
    CNOT(qubits[2], qubits[6]);
    
    ResetAll(qubits);
}
"""

# Generate and display the circuit
circuit = qsharp.circuit(circuit_qsharp)
Circuit(circuit)

## Data Analysis and Visualization

Let's analyze the generated synthetic data and compare it with our target distributions.

In [None]:
# Convert quantum samples to pandas DataFrame for analysis
df_basic = pd.DataFrame(basic_samples)
df_advanced = pd.DataFrame(advanced_samples)

print("Basic Generator Statistics:")
print(df_basic.describe())
print("\nAdvanced Generator Statistics:")
print(df_advanced.describe())

In [None]:
# Create visualization plots
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
fig.suptitle('Quantum Synthetic Data Analysis', fontsize=16)

# Age distribution
axes[0,0].hist(df_basic['Age'], bins=15, alpha=0.7, label='Basic', color='blue')
axes[0,0].hist(df_advanced['Age'], bins=15, alpha=0.7, label='Advanced', color='red')
axes[0,0].set_title('Age Distribution')
axes[0,0].set_xlabel('Age')
axes[0,0].set_ylabel('Frequency')
axes[0,0].legend()

# Income distribution
axes[0,1].hist(df_basic['Income'], bins=15, alpha=0.7, label='Basic', color='blue')
axes[0,1].hist(df_advanced['Income'], bins=15, alpha=0.7, label='Advanced', color='red')
axes[0,1].set_title('Income Distribution')
axes[0,1].set_xlabel('Income ($)')
axes[0,1].set_ylabel('Frequency')
axes[0,1].legend()

# Category distribution
category_counts_basic = df_basic['Category'].value_counts()
category_counts_advanced = df_advanced['Category'].value_counts()
categories = ['A', 'B', 'C', 'D']
x = np.arange(len(categories))
width = 0.35

axes[0,2].bar(x - width/2, [category_counts_basic.get(cat, 0) for cat in categories], 
             width, label='Basic', color='blue', alpha=0.7)
axes[0,2].bar(x + width/2, [category_counts_advanced.get(cat, 0) for cat in categories], 
             width, label='Advanced', color='red', alpha=0.7)
axes[0,2].set_title('Category Distribution')
axes[0,2].set_xlabel('Category')
axes[0,2].set_ylabel('Count')
axes[0,2].set_xticks(x)
axes[0,2].set_xticklabels(categories)
axes[0,2].legend()

# Age vs Income correlation
axes[1,0].scatter(df_basic['Age'], df_basic['Income'], alpha=0.7, label='Basic', color='blue')
axes[1,0].scatter(df_advanced['Age'], df_advanced['Income'], alpha=0.7, label='Advanced', color='red')
axes[1,0].set_title('Age vs Income Correlation')
axes[1,0].set_xlabel('Age')
axes[1,0].set_ylabel('Income ($)')
axes[1,0].legend()

# Binary flag distribution
flag_counts_basic = df_basic['BinaryFlag'].value_counts()
flag_counts_advanced = df_advanced['BinaryFlag'].value_counts()
flags = [True, False]
x_flags = np.arange(len(flags))

axes[1,1].bar(x_flags - width/2, [flag_counts_basic.get(flag, 0) for flag in flags], 
             width, label='Basic', color='blue', alpha=0.7)
axes[1,1].bar(x_flags + width/2, [flag_counts_advanced.get(flag, 0) for flag in flags], 
             width, label='Advanced', color='red', alpha=0.7)
axes[1,1].set_title('Binary Flag Distribution')
axes[1,1].set_xlabel('Flag Value')
axes[1,1].set_ylabel('Count')
axes[1,1].set_xticks(x_flags)
axes[1,1].set_xticklabels(['True', 'False'])
axes[1,1].legend()

# Quantum fingerprint visualization (for advanced samples)
if len(df_advanced) > 0:
    fingerprints = np.array([sample['QuantumFingerprint'] for sample in advanced_samples])
    axes[1,2].imshow(fingerprints.T, cmap='Blues', aspect='auto')
    axes[1,2].set_title('Quantum Fingerprints')
    axes[1,2].set_xlabel('Sample ID')
    axes[1,2].set_ylabel('Fingerprint Bit')
else:
    axes[1,2].text(0.5, 0.5, 'No Advanced Samples', ha='center', va='center')
    axes[1,2].set_title('Quantum Fingerprints')

plt.tight_layout()
plt.show()

## Quantum vs Classical Comparison

Let's compare our quantum-generated data with classical random data to see the unique properties.

In [None]:
# Generate classical random data for comparison
np.random.seed(42)
n_classical = len(df_advanced)

classical_data = {
    'Age': np.random.normal(40, 15, n_classical).astype(int),
    'Income': np.random.exponential(30000, n_classical).astype(int) + 20000,
    'Category': np.random.choice(['A', 'B', 'C', 'D'], n_classical, p=[0.4, 0.3, 0.2, 0.1]),
    'BinaryFlag': np.random.choice([True, False], n_classical, p=[0.7, 0.3])
}

# Clamp values to realistic ranges
classical_data['Age'] = np.clip(classical_data['Age'], 18, 80)
classical_data['Income'] = np.clip(classical_data['Income'], 20000, 150000)

df_classical = pd.DataFrame(classical_data)

print("Correlation Analysis:")
print(f"Quantum Age-Income Correlation: {df_advanced['Age'].corr(df_advanced['Income']):.3f}")
print(f"Classical Age-Income Correlation: {df_classical['Age'].corr(df_classical['Income']):.3f}")

# Visualize the comparison
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].scatter(df_advanced['Age'], df_advanced['Income'], alpha=0.7, color='red', label='Quantum')
axes[0].set_title('Quantum Generated: Age vs Income')
axes[0].set_xlabel('Age')
axes[0].set_ylabel('Income ($)')

axes[1].scatter(df_classical['Age'], df_classical['Income'], alpha=0.7, color='blue', label='Classical')
axes[1].set_title('Classical Random: Age vs Income')
axes[1].set_xlabel('Age')
axes[1].set_ylabel('Income ($)')

plt.tight_layout()
plt.show()

## Export Generated Data

Finally, let's export our quantum-generated synthetic data for use in machine learning or other applications.

In [None]:
# Export to CSV files
df_basic.to_csv('quantum_basic_data.csv', index=False)
df_advanced.to_csv('quantum_advanced_data.csv', index=False)
df_classical.to_csv('classical_random_data.csv', index=False)

print("Data exported successfully!")
print(f"- Basic quantum data: {len(df_basic)} samples")
print(f"- Advanced quantum data: {len(df_advanced)} samples")
print(f"- Classical random data: {len(df_classical)} samples")

# Create a summary report
summary = {
    "generation_method": "quantum_superposition_entanglement",
    "total_samples": len(df_advanced),
    "features": {
        "age": {
            "type": "numeric",
            "range": [int(df_advanced['Age'].min()), int(df_advanced['Age'].max())],
            "mean": float(df_advanced['Age'].mean()),
            "std": float(df_advanced['Age'].std())
        },
        "income": {
            "type": "numeric", 
            "range": [int(df_advanced['Income'].min()), int(df_advanced['Income'].max())],
            "mean": float(df_advanced['Income'].mean()),
            "std": float(df_advanced['Income'].std())
        },
        "category": {
            "type": "categorical",
            "values": df_advanced['Category'].value_counts().to_dict()
        },
        "binary_flag": {
            "type": "binary",
            "true_percentage": float(df_advanced['BinaryFlag'].mean() * 100)
        }
    },
    "quantum_properties": {
        "entanglement_used": True,
        "superposition_used": True,
        "quantum_walks_used": True,
        "variational_circuits_used": True,
        "quantum_fingerprints": True
    }
}

with open('quantum_data_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("\nGeneration Summary:")
print(json.dumps(summary, indent=2))

## Conclusion

This quantum synthetic data generator demonstrates several advantages over classical approaches:

1. **Natural Correlations**: Quantum entanglement creates realistic feature correlations (age-income dependency)
2. **Rich Distributions**: Quantum superposition enables complex probability distributions
3. **Authenticity**: Quantum fingerprints provide data provenance and authenticity verification
4. **Privacy**: Quantum noise can enhance privacy preservation
5. **Scalability**: Quantum algorithms can generate large datasets efficiently

The generated data can be used for:
- Machine learning model training and testing
- Privacy-preserving analytics
- Database testing and development
- Research and academic studies

This approach opens new possibilities for synthetic data generation that leverages the unique properties of quantum computing.