# QAOA Portfolio Optimization with Genetic Algorithms (DEAP) and DWE

This notebook implements a Genetic Algorithm (GA) using the DEAP framework to optimize QAOA parameters for portfolio optimization with Domain Wall Encoding (DWE). Instead of independent local optimizations, this approach uses a population-based search strategy, leveraging multiprocessing for parallel evaluation of individuals.

The DWE technique ensures the Hamming-weight constraint is handled as a penalty within the objective function, making the QAOA optimization simpler.

**Important Note on Multiprocessing in Jupyter:**
For multiprocessing to work reliably in Jupyter notebooks, especially with libraries like DEAP, it's crucial that the function being parallelized (`evaluate_qaoa_fitness`) resides in a separate Python file and that the `if __name__ == "__main__":` guard is correctly used when setting up the multiprocessing pool.

## 1. Setup and Imports

We'll import all necessary libraries, including DEAP and the refactored `evaluate_qaoa_fitness` function from `qaoa_ga_evaluation.py`.

In [13]:
# pip install --upgrade qokit

In [14]:
import numpy as np
import os 
import time 
import multiprocessing 
from functools import partial 

# Import DEAP modules
from deap import base, creator, tools, algorithms

# NEW: Import the evaluation function from the dedicated file
from qaoa_ga_evaluation import evaluate_qaoa_fitness

from qiskit_aer import AerSimulator
from qokit.portfolio_optimization import get_sk_ini # Still useful for initial parameter range insight, though GA initializes randomly
from qokit.qaoa_objective_portfolio import get_qaoa_portfolio_objective

# Optional: for later analysis of optimal bitstring
from qiskit.circuit import ParameterVector
from qiskit.primitives import Sampler
from qokit.qaoa_circuit_portfolio import get_parameterized_qaoa_circuit

## 2. Define Problem Parameters (with DWE)

This section remains exactly the same, defining your portfolio optimization problem with DWE. The `po_problem` dictionary will be passed to the QAOA objective and evaluation functions.

In [15]:
## Problem Statement & Setup for Dicke + DWE Technique

# Define problem parameters
N = 20
# Number of assets. Change N to your desired value (e.g., 20, 25, 30)
K = int(N * 0.4) # Example: select 40% of assets. Adjust as needed, ensure K < N.
p = 1 # Number of QAOA layers (start with p=1, higher p will be much slower for simulation)

# --- Generate random mu and Sigma for larger N (as you won't have specific data) ---
np.random.seed(42) # for reproducibility of the problem definition
mu = np.random.uniform(0.05, 0.20, N)
# Generate a random symmetric positive semi-definite matrix for Sigma
Sigma = np.random.uniform(0.001, 0.015, (N, N))
Sigma = (Sigma + Sigma.T) / 2 # Make it symmetric
# Add a small diagonal component to ensure it's positive semi-definite (for stability)
Sigma = Sigma + np.eye(N) * 0.005

q = 0.5 # Risk aversion parameter
lambda_sum = 100 # DWE-inspired penalty coefficient for sum constraint

# --- Calculate Coefficients for Total Cost Hamiltonian (H_C = H_C^objective + H_C^DWE-penalty) ---
# This logic is from your previous detailed plan
factor_J_obj = (2 * q) / (K**2) 
factor_h_linear_obj = -1 / K
factor_h_diagonal_obj = q / (K**2) 

J_coeffs_objective = {}
h_coeffs_objective = {}

for i in range(N):
    for j in range(i + 1, N):
        J_coeffs_objective[(i, j)] = factor_J_obj * Sigma[i, j]

for i in range(N):
    h_coeffs_objective[i] = factor_h_linear_obj * mu[i] + factor_h_diagonal_obj * Sigma[i, i]

J_coeffs_total = {} # J_coeffs_total calculated here
h_coeffs_total = {} # h_coeffs_total calculated here

for (i, j), val in J_coeffs_objective.items():
    J_coeffs_total[(i, j)] = val + 2 * lambda_sum

for i, val in h_coeffs_objective.items():
    h_coeffs_total[i] = val - 5 * lambda_sum

# Construct the custom po_problem dictionary with DWE coefficients
po_problem = {
    "N": N,
    "K": K,
    "q": q, 
    "J": J_coeffs_total, # This is the DWE-transformed QUBO J
    "h": h_coeffs_total, # This is the DWE-transformed QUBO h
    "means": mu,         # Storing original for later classical evaluation
    "cov": Sigma,        # Storing original for later classical evaluation
    "q_orig": q,         # Storing original q for later classical evaluation (renamed to avoid conflict with `q` above)
    "scale": 1.0         
}

print(f"--- Problem Parameters & Custom QUBO Defined for N={N} ---")
print(f"Number of assets (N): {N}")
print(f"Assets to select (K): {K}")
print(f"QAOA Layers (p): {p}")
print(f"DWE Penalty Lambda_sum: {lambda_sum}")

# --- Handle best_portfolio (from brute-force) ---
# Brute-force is not feasible for N > 20.
# We will set best_portfolio to None if N > 20, so AR calculation is skipped.
best_portfolio = (None, None) # Default placeholder
if N <= 20: # Example threshold for brute-force feasibility
    try:
        from qokit.portfolio_optimization import portfolio_brute_force
        print("Calculating classical brute-force solution (may take time for N up to 20)...")
        # For brute-force, use a simpler problem definition that doesn't have DWE penalty already
        # The brute-force solver expects the original mu/Sigma/q.
        original_po_problem_for_brute_force = {"N":N, "K":K, "q":q, "means":mu, "cov":Sigma}
        best_portfolio = portfolio_brute_force(original_po_problem_for_brute_force, return_bitstring=False)
        print(f"Brute-force classical optimal energy: {best_portfolio[0]:.6f}")
        print(f"Brute-force classical worst energy: {best_portfolio[1]:.6f}")
    except ImportError:
        print("qokit.portfolio_optimization.portfolio_brute_force not available or failed.")
        print("Approximation Ratio will not be calculated.")
    except Exception as e:
        print(f"Brute-force calculation failed: {e}")
        print("Approximation Ratio will not be calculated.")
else:
    print(f"Brute-force calculation is not feasible for N = {N} > 20. Approximation Ratio will not be calculated.")

print("-" * 50)

--- Problem Parameters & Custom QUBO Defined for N=20 ---
Number of assets (N): 20
Assets to select (K): 8
QAOA Layers (p): 1
DWE Penalty Lambda_sum: 100
Calculating classical brute-force solution (may take time for N up to 20)...
Brute-force calculation failed: 'scale'
Approximation Ratio will not be calculated.
--------------------------------------------------


## 3. `evaluate_qaoa_fitness` Function Reference

This is the core evaluation function that DEAP will use. It is now defined in `qaoa_ga_evaluation.py` and imported. It takes a list of QAOA parameters (`individual`) and returns a tuple containing the energy. Lower energy values are considered better fitness.

```python
# qaoa_ga_evaluation.py (contents of the new file)

import numpy as np
from qiskit_aer import AerSimulator
from qokit.qaoa_objective_portfolio import get_qaoa_portfolio_objective

def evaluate_qaoa_fitness(individual, po_problem_arg, p_layers):
    # Each parallel process needs its own simulator instance.
    simulator_backend_for_process = AerSimulator(method='statevector')
    simulator_backend_for_process.set_options(statevector_parallel_threshold=16)

    # Create the QAOA objective function for this specific evaluation.
    qaoa_obj_for_evaluation = get_qaoa_portfolio_objective(
        po_problem=po_problem_arg,
        p=p_layers,
        ini='dicke',
        mixer='trotter_ring',
        T=1,
        simulator=simulator_backend_for_process,
        mixer_topology='linear'
    )

    # Directly evaluate the QAOA objective with the given individual (parameters).
    energy = qaoa_obj_for_evaluation(individual).real

    return (energy,)
```

In [16]:
# This cell is for reference. The function is imported from qaoa_ga_evaluation.py

## 4. Execute Parallel Genetic Algorithm (DEAP)

This is the main block where the DEAP Genetic Algorithm is set up and executed. It uses `multiprocessing.Pool` to distribute the fitness evaluations of individuals across your CPU cores, enabling parallel search for the optimal QAOA parameters.

In [17]:
if __name__ == "__main__":
    try:
        # Ensure 'spawn' start method for robustness in Jupyter

        multiprocessing.set_start_method('spawn', force=True)
        print("Multiprocessing start method set to 'spawn'.")
    except RuntimeError:
        print("Multiprocessing start method already set.")

    # --- DEAP Type Definition ---
    # Define a minimizing fitness (lower energy is better)
    creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
    # Define an individual as a list of floats (our QAOA parameters)
    creator.create("Individual", list, fitness=creator.FitnessMin)

    # --- DEAP Toolbox Setup ---
    toolbox = base.Toolbox()

    # Register attribute generator for parameters (gamma and beta) - values typically 0 to 2*pi
    # There are 2*p parameters in total for p QAOA layers.
    # We initialize randomly within the expected range for QAOA angles.
    toolbox.register("attr_float", np.random.uniform, 0.0, 2*np.pi) 

    # Register individual and population creation
    toolbox.register("individual", tools.initRepeat, creator.Individual, 
                     toolbox.attr_float, n=2*p) # Each individual has 2*p parameters (p betas, p gammas)
    toolbox.register("population", tools.initRepeat, list, toolbox.individual)

    # Register the evaluation function (from qaoa_ga_evaluation.py)
    # We use functools.partial to pass the fixed po_problem and p to the evaluation function
    toolbox.register("evaluate", partial(evaluate_qaoa_fitness, po_problem_arg=po_problem, p_layers=p))

    # Register genetic operators
    toolbox.register("mate", tools.cxBlend, alpha=0.5) # Blend crossover is good for real-valued parameters
    # Gaussian mutation: mu=mean, sigma=std_dev of the added noise, indpb=independent probability for each attribute to be mutated
    toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=0.5, indpb=0.1) 
    toolbox.register("select", tools.selTournament, tournsize=3) # Tournament selection

    # --- Multiprocessing Pool Setup for DEAP ---
    num_parallel_processes = os.cpu_count() or 4 # Use all available cores, or default to 4
    print(f"Using {num_parallel_processes} processes for parallel evaluation of individuals.")
    pool = multiprocessing.Pool(processes=num_parallel_processes)
    toolbox.register("map", pool.map) # Register the multiprocessing pool with DEAP

    # --- Genetic Algorithm Parameters ---
    POPULATION_SIZE = 50  # Number of individuals in each generation
    NGEN = 100            # Number of generations to run the GA for
    CXPB = 0.7            # Probability of crossover (two individuals exchange parts of their genomes)
    MUTPB = 0.3           # Probability of mutation (an individual's genome is randomly altered)

    # --- Run the Genetic Algorithm ---
    print(f"Starting Genetic Algorithm for QAOA parameter optimization (N={N}, p={p})...")
    print(f"Population Size: {POPULATION_SIZE}, Generations: {NGEN}")

    start_time = time.perf_counter()

    # Initialize population
    population = toolbox.population(n=POPULATION_SIZE)

    # DEAP's HallOfFame to store the best individual found throughout the evolution
    hof = tools.HallOfFame(1) # Stores only the single best individual
    
    # Statistics to track progress
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("avg", np.mean)
    stats.register("std", np.std)
    stats.register("min", np.min)
    stats.register("max", np.max)

    # Run the simple genetic algorithm (eaSimple)
    population, logbook = algorithms.eaSimple(population, toolbox, cxpb=CXPB, mutpb=MUTPB,
                                              ngen=NGEN, stats=stats, halloffame=hof, verbose=True)

    end_time = time.perf_counter()
    print(f"Genetic Algorithm completed in {end_time - start_time:.2f} seconds.")

    # --- Cleanup Multiprocessing Pool ---
    pool.close()
    pool.join()

    # --- Collect and Display Results ---
    best_individual = hof[0]
    best_energy = best_individual.fitness.values[0] # Fitness is a tuple, energy is the first element
    best_overall_params = np.array(best_individual) # Convert to numpy array for consistency with later analysis

    # Calculate Approximation Ratio for N <= 20 if brute-force result is available
    best_overall_ar = 'N/A (Brute-force not feasible or not computed)'
    if best_portfolio[0] is not None and best_portfolio[1] is not None and N <= 20:
        # Assuming best_energy here is comparable to the objective function's energy.
        # If the DWE penalty significantly shifts the absolute scale, you might need
        # a separate function to evaluate the classical energy of the portfolio 
        # corresponding to the optimal bitstring derived from the GA solution for accurate AR.
        # For now, we use the energy directly from the GA.
        best_overall_ar = (best_energy - best_portfolio[1]) / (best_portfolio[0] - best_portfolio[1])

    print("\n" + "---" * 20)
    print("### Summary of Genetic Algorithm Optimization Results ###")
    print(f"Best overall energy found: {best_energy:.8f}")
    print(f"Corresponding Approximation Ratio (if N <= 20): {best_overall_ar}")
    print(f"Optimal QAOA parameters: {best_overall_params}")
    print("---" * 20)

    # Store results for optional bitstring analysis section
    # The 'all_successful_results' variable is now a list containing a single entry for the GA's best result.
    all_successful_results = [{'energy': best_energy, 'ar': best_overall_ar, 'optimized_params': best_overall_params}]

    # You can also inspect the logbook for generation-by-generation statistics:
    # import pandas as pd
    # stats_df = pd.DataFrame(logbook)
    # print("\nGeneration Statistics:")
    # print(stats_df)



Multiprocessing start method set to 'spawn'.
Using 10 processes for parallel evaluation of individuals.
Starting Genetic Algorithm for QAOA parameter optimization (N=20, p=1)...
Population Size: 50, Generations: 100


Process SpawnPoolWorker-26:
Process SpawnPoolWorker-30:
Process SpawnPoolWorker-24:
Process SpawnPoolWorker-21:
Process SpawnPoolWorker-17:
Process SpawnPoolWorker-19:
Process SpawnPoolWorker-18:
Process SpawnPoolWorker-20:
Process SpawnPoolWorker-15:
Process SpawnPoolWorker-10:
Process SpawnPoolWorker-7:
Process SpawnPoolWorker-28:
Process SpawnPoolWorker-3:
Process SpawnPoolWorker-27:
Process SpawnPoolWorker-25:
Process SpawnPoolWorker-23:
Process SpawnPoolWorker-22:
Process SpawnPoolWorker-29:
Process SpawnPoolWorker-4:
Process SpawnPoolWorker-12:
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-16:
Process SpawnPoolWorker-6:
Process SpawnPoolWorker-5:
Process SpawnPoolWorker-1:
Process SpawnPoolWorker-2:
Process SpawnPoolWorker-13:
Process SpawnPoolWorker-14:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Process SpawnPoolWorker-9:
Process SpawnPoolWorker-11:
Traceback (most recent call l

KeyboardInterrupt: 

In [12]:
import qokit
import qiskit
from qokit.qaoa_objective_portfolio import get_qaoa_portfolio_objective
import inspect

print("--- QOKit Version ---")
print(qokit.__version__)

print("\n--- Qiskit Version ---")
print(qiskit.__version__)

print("\n--- get_qaoa_portfolio_objective Signature ---")
try:
    signature = inspect.signature(get_qaoa_portfolio_objective)
    print(signature)
except Exception as e:
    print(f"Could not inspect signature: {e}")

--- QOKit Version ---


AttributeError: module 'qokit' has no attribute '__version__'

## 5. Optional: Analyze the Best Optimal Bitstring

This section remains the same, using the `best_overall_params` found by the Genetic Algorithm to sample the QAOA circuit and evaluate the classical energy of the most probable bitstring. This helps to understand the practical outcome of the optimized QAOA parameters.

In [None]:
if __name__ == "__main__" and all_successful_results and best_overall_params is not None:
    print("\n" + "---" * 20)
    print("### Analyzing the Single Best Result for Optimal Bitstring ###")
    
    optimal_parameters = best_overall_params 
    print(f"Using optimal parameters from best GA run: {optimal_parameters}")

    # --- Extracting the Optimal Bitstring (Measurement) ---
    gamma_opt = ParameterVector('gamma', p)
    beta_opt = ParameterVector('beta', p)

    optimal_circuit = get_parameterized_qaoa_circuit(
        po_problem=po_problem,
        depth=p,
        ini_type='dicke',
        mixer_type='trotter_ring',
        T=1,
        simulator=None, # Use the Sampler directly for measurement
        mixer_topology='linear',
        gamma=gamma_opt,
        beta=beta_opt
    )

    optimal_circuit_bound = optimal_circuit.assign_parameters({
        gamma_opt: optimal_parameters[:p],
        beta_opt: optimal_parameters[p:]
    })

    optimal_circuit_bound.measure_all()

    sampler_shots = 1024 
    sampler = Sampler()

    print(f"\n--- Sampling for Optimal Bitstring ({sampler_shots} shots) ---")
    sampler_job = sampler.run(optimal_circuit_bound, shots=sampler_shots)
    sampler_result = sampler_job.result()

    counts = sampler_result.quasi_dists[0].binary_probabilities()
    classical_counts = {k: v for k, v in counts.items()} 

    sorted_counts = sorted(classical_counts.items(), key=lambda item: item[1], reverse=True)

    print("Top 5 most frequent bitstrings and their probabilities:")
    optimal_bitstring = None
    for i, (bitstring, probability) in enumerate(sorted_counts[:5]):
        print(f"  {bitstring}: {probability:.4f}")
        if i == 0:
            optimal_bitstring = bitstring

    if optimal_bitstring:
        optimal_selection = np.array([int(b) for b in optimal_bitstring])
        selected_assets_count = np.sum(optimal_selection)

        print(f"\nMost probable portfolio selection (bitstring): {optimal_bitstring}")
        print(f"Number of selected assets: {selected_assets_count} (Expected K={K})")

        # Function to evaluate the classical objective energy for a given bitstring
        def evaluate_bitstring_energy(bitstring_array, original_mu, original_Sigma, original_q):
            x = bitstring_array
            portfolio_variance = np.dot(x, np.dot(original_Sigma, x))
            portfolio_return = np.dot(original_mu, x)
            return original_q * portfolio_variance - portfolio_return

        optimal_classical_energy_for_bitstring = evaluate_bitstring_energy(
            optimal_selection,
            po_problem['means'], 
            po_problem['cov'],    
            po_problem['q_orig'] 
        )
        print(f"Classical objective energy for the most probable bitstring: {optimal_classical_energy_for_bitstring:.6f}")
        print(f"Number of assets selected by most probable bitstring: {np.sum(optimal_selection)}")
    else:
        print("Could not determine optimal bitstring.")
