
# QuDET Overview: The Complete Toolkit

**QuDET** (Quantum Data Engineering Toolkit) is a comprehensive library for building hybrid quantum-classical data pipelines.

This notebook provides a hands-on demonstration of **every core component** in the library.

## Table of Contents
1.  [Connectors](#1-connectors)
2.  [Transforms](#2-transforms)
3.  [Encoders](#3-encoders)
4.  [Compute](#4-compute)
5.  [Analytics](#5-analytics)
6.  [Governance](#6-governance)


In [48]:

import numpy as np
import pandas as pd
import sys
import os

# Add parent directory to path to import qudet
sys.path.append(os.path.abspath('..'))

from sklearn.datasets import load_iris, make_regression, make_blobs
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, mean_squared_error

# --- QuDET Imports ---
# Connectors
from qudet.connectors import (
    QuantumDataLoader, QuantumParquetLoader, QuantumSQLLoader,
    StreamingDataBuffer, DataValidator
)

# Transforms
from qudet.transforms import (
    QuantumPCA, AutoReducer, RandomProjector, QuantumImputer,
    FeatureScaler, FeatureSelector, OutlierRemover, DataBalancer,
    QuantumNormalizer
)

# Encoders
from qudet.encoders import (
    AngleEncoder, AmplitudeEncoder, IQPEncoder, RotationEncoder,
    StatevectorEncoder, CompositeEncoder
)

# Compute
from qudet.compute import (
    BackendManager, HardwareLayoutSelector, CircuitOptimizer,
    DistributedQuantumProcessor
)

# Analytics
from qudet.analytics import (
    QuantumSVC, QuantumKMeans, QuantumKernelRegressor,
    QuantumKernelAnomalyDetector, QuantumFeatureSelector,
    QuantumAutoencoder
)

# Governance
from qudet.governance import (
    QuantumDriftDetector, DataIntegrityCheck, ResourceEstimator,
    QuantumDifferentialPrivacy, AuditLogger
)



# 1. Connectors
Ingest data reliably from various sources.


### QuantumDataLoader
The standard entry point for CSV, JSON, and simple files.

In [49]:

# Load Iris from CSV (Demonstrating Connectors)
df_raw = pd.read_csv("iris.csv", quotechar='"')

# Preprocess Labels
le = LabelEncoder()
df_raw['target'] = le.fit_transform(df_raw['variety'])
feature_cols = [c for c in df_raw.columns if c not in ['variety', 'target']]

X = df_raw[feature_cols].values
y = df_raw['target'].values
df = df_raw[feature_cols] # For loader

# In practice, QuantumDataLoader acts like a PyTorch DataLoader
# It batches data AND automatically converts it to quantum circuits
loader = QuantumDataLoader(df, batch_size=5)

for i, (batch_data, batch_circuits) in enumerate(loader):
    print(f"Batch {i+1}: {len(batch_data)} samples -> {len(batch_circuits)} Quantum Circuits")
    # batch_circuits[0].draw() # You can draw the circuit
    if i == 0: break # Just show first batch


Batch 1: 5 samples -> 5 Quantum Circuits


### StreamingDataBuffer
Handle real-time data streams with a sliding window.

In [50]:

buffer = StreamingDataBuffer(buffer_size=5)
for i in range(10):
    buffer.add_batch([{"value": i}])

print(f"Current Buffer: {buffer.get_sliding_window()}")


Current Buffer: [{'value': 5} {'value': 6} {'value': 7} {'value': 8} {'value': 9}]


### DataValidator
Ensure incoming data meets quantum pipeline requirements (e.g., numeric types).

In [51]:

validator = DataValidator(expected_dtypes={"value": float})
df_val = pd.DataFrame([{"value": 1.5}])
is_valid = validator.validate(df_val)
print(f"Is Valid: {is_valid}")


Is Valid: True



# 2. Transforms
Prepare classical data for quantum circuits.


### QuantumPCA
Dimensionality reduction tailored for QPU constraints.

In [52]:

pca = QuantumPCA(n_components=2)
X_reduced = pca.fit_transform(X)
print(f"PCA Reduced Shape: {X_reduced.shape}")


PCA Reduced Shape: (150, 2)


### AutoReducer
Automatically selects the best reduction strategy for a target qubit count.

In [53]:

reducer = AutoReducer(target_qubits=2)
X_auto = reducer.fit_transform(X)
print(f"Auto Reduced Shape: {X_auto.shape}")


--- AutoReducer analyzing shape (150, 4) ---
   -> Detected High Dimensionality (4 > 2). Adding RandomProjector.
Auto Reduced Shape: (150, 2)


### FeatureScaler
Scales data to be compatible with rotation gates (0 to 2π).

In [54]:

scaler = FeatureScaler(method='minmax') # or 'standard', 'quantum_aware'
X_scaled = scaler.fit_transform(X)
print(f"Scaled Range: [{X_scaled.min():.2f}, {X_scaled.max():.2f}]")


Scaled Range: [0.00, 1.00]


### QuantumImputer
Fill missing values leveraging data geometry.

In [70]:

X_missing = X.copy()
X_missing[0, 0] = np.nan
feature_names = df.columns
df_missing = pd.DataFrame(X_missing, columns=feature_names)
imputer = QuantumImputer()
X_imputed = imputer.fit_transform(df_missing)
print(f"Imputed Value: {X_imputed.iloc[0, 0]:.4f}")


--- Q-Means Iteration 1/10 ---
--- Q-Means Iteration 2/10 ---
--- Q-Means Iteration 3/10 ---
--- Q-Means Iteration 4/10 ---
--- Q-Means Iteration 5/10 ---
--- Q-Means Iteration 6/10 ---
--- Q-Means Iteration 7/10 ---
--- Q-Means Iteration 8/10 ---
--- Q-Means Iteration 9/10 ---
--- Q-Means Iteration 10/10 ---
Imputed Value: 5.0041


### OutlierRemover
Remove distinct outliers before training.

In [56]:

remover = OutlierRemover(threshold=3.0)
remover.fit(X)
mask = remover.outlier_mask_
X_clean = X[mask]
y_clean = y[mask]
print(f"Original: {len(X)}, Clean: {len(X_clean)}")


Original: 150, Clean: 146



# 3. Encoders
Map classical data to Quantum States in Hilbert space.


### AngleEncoder
Maps features to rotation angles (Rx, Ry, Rz). 1 feature = 1 qubit.

In [57]:

encoder = AngleEncoder(n_qubits=2)
qc = encoder.encode(X_reduced[0])
print("Angle Encoded Circuit Depth:", qc.depth())


Angle Encoded Circuit Depth: 1


### AmplitudeEncoder
Encodes data into amplitudes. N floats = log2(N) qubits. Highly compressed.

In [58]:

amp_enc = AmplitudeEncoder(n_qubits=2) 
# 2 qubits can store 2^2 = 4 features
qc_amp = amp_enc.encode(X[0][:4]) 
print("Amplitude Encoded Circuit Depth:", qc_amp.depth())


Amplitude Encoded Circuit Depth: 1


### IQPEncoder
Instantaneous Quantum Polynomial encoding. Hard to simulate classically.

In [59]:

iqp_enc = IQPEncoder(n_qubits=2)
qc_iqp = iqp_enc.encode(X_reduced[0])
print("IQP Encoded Circuit Depth:", qc_iqp.depth())


IQP Encoded Circuit Depth: 6



# 4. Compute
Manage hardware execution and circuit optimization.


### BackendManager
Switch between Simulators and Real QPUs seamlessly.

In [60]:

backend = BackendManager.get_backend("simulator")
print(f"Active Backend: {backend}")


--- Connecting to Backend: simulator ---
Active Backend: AerSimulator('aer_simulator_statevector')


### HardwareLayoutSelector
Select the lowest-error qubits on a physical device.

In [61]:

selector = HardwareLayoutSelector(backend)
best_qubits = selector.find_best_subgraph(n_qubits=2)
print(f"Best Physical Qubits: {best_qubits}")


Best Physical Qubits: [0, 1]


### CircuitOptimizer
Reduces gate count and depth.

In [62]:

opt = CircuitOptimizer(level=3)
qc_opt = opt.optimize(qc_iqp)
print(f"Depth Before: {qc_iqp.depth()}, After: {qc_opt.depth()}")


--- Circuit Optimizer Initialized (Level 3) ---
Depth Before: 6, After: 4



# 5. Analytics
Quantum Machine Learning Algorithms.


### QuantumSVC
Support Vector Classifier with Quantum Kernel.

In [63]:

# Binary classification subset
mask = y != 2
X_bin, y_bin = X_reduced[mask], y[mask]
X_train_b, X_test_b, y_train_b, y_test_b = train_test_split(X_bin, y_bin, test_size=0.2)

qsvc = QuantumSVC(n_qubits=2)
qsvc.fit(X_train_b, y_train_b)
acc = qsvc.score(X_test_b, y_test_b)
print(f"QSVC Accuracy: {acc:.2f}")


--- Training Quantum SVC on 80 samples ---
QSVC Accuracy: 1.00


### QuantumKMeans
Unsupervised clustering in Hilbert space.

In [64]:

kmeans = QuantumKMeans(n_clusters=2, n_qubits=2)
kmeans.fit(X_train_b)
labels = kmeans.predict(X_test_b)
print(f"KMeans Labels: {labels[:5]}")


--- Q-Means Iteration 1/10 ---
--- Q-Means Iteration 2/10 ---
--- Q-Means Iteration 3/10 ---
--- Converged Early ---
KMeans Labels: [1 1 0 0 0]


### QuantumKernelRegressor
Regression for continuous targets.

In [65]:

# Synthetic regression data
X_reg, y_reg = make_regression(n_samples=50, n_features=2, noise=0.1)
regr = QuantumKernelRegressor(n_qubits=2)
regr.fit(X_reg, y_reg)
y_pred = regr.predict(X_reg[:5])
print(f"Regression Predictions: {y_pred}")


--- Training Quantum Regressor on 50 samples ---
   Training complete. Ready for predictions.
Regression Predictions: [34.33762913 -4.09167759 19.09823601 -2.73098159 26.0065663 ]



# 6. Governance
Safety, Privacy, and Auditing.


### QuantumDriftDetector
Detects data distribution shifts.

In [66]:

detector = QuantumDriftDetector(n_qubits=2)
detector.fit_reference(X_train_b)
# Test on same distribution (should be stable)
res = detector.detect_drift(X_test_b)
print(f"Drift Status: {res['status']} (MMD: {res['mmd_score']:.3f})")


Reference data stored: (80, 2)
   • Samples: 80
   • Features: 2

Calculating Quantum Drift (MMD)...
   • Reference samples: 80
   • New samples: 20
   • Computing K(reference, reference)...
     → Mean: 0.8145
   • Computing K(new, new)...
     → Mean: 0.8285
   • Computing K(reference, new)...
     → Mean: 0.8150

Results:
   • MMD Score: 0.0129
   • Threshold: 0.1000
   • Status: STABLE
Drift Status: STABLE (MMD: 0.013)


### DataIntegrityCheck
Verifies that encoding preserves data info.

In [67]:

checker = DataIntegrityCheck()
stats = checker.compute_encoding_fidelity(X_reduced[0], encoder)
print(f"Encoding Fidelity: {stats['fidelity']:.4f}")


Encoding Fidelity: 0.7280


### QuantumDifferentialPrivacy
Adds noise to data to preserve privacy.

In [68]:

dp = QuantumDifferentialPrivacy(epsilon=1.0)
# It applies to a circuit, not raw data
qc_private = dp.sanitize(qc)
print(f"Original Depth: {qc.depth()}")
print(f"Private Depth:  {qc_private.depth()}")


Original Depth: 1
Private Depth:  1


### ResourceEstimator
Predicts job cost.

In [69]:

est = ResourceEstimator()
cost = est.estimate_circuit_cost(qc, shots=4000)
print(f"Estimated Cost: ${cost['est_cost_usd']:.4f}")


Estimated Cost: $0.0520
