# QUBO Formulation for Quantum Credit Scoring

In the previous reports of the project, spcifically for the WP5, we defined the QUBO forumulation for the rating scale definition problem. The aim of the project is to divide $n$ counterparts in $m$ grades in accordance with several constraints. 

In this notebook we provide the toy model development of the cost function and several remarks about it. Moreover, we solve the problem with different methods checking if the achieved solutions satisfy all the requested constraints.

## Libraries and iperparameters definition

In [32]:
import dimod
import hybrid
import math
import time
import itertools

In [33]:
config = {
    'random_data': 'yes',       # select 'yes' to generate random dataset
    'data_path': 'data/dataset-ispq.csv', # could be omitted to generate random data
    'default_prob': 0.4,        # probability of default (range: 0 - 1)

    'attributes': {             # select optional attributes from the database
        'years': [2012, 2015],  # list of (not consecutive) years from 2012 to 2020, [] = ignore attribute
        'sector': 'no',
        'revenue': 'no',
        'geo_area': 'no',
    },

    'n_counterpart': 10,      # could be a number or 'all'
    'm_company': 4,

    'alpha_concentration': 0.05,
    'alpha_heterogeneity': 0.01,
    'alpha_homogeneity': 0.05,

    'shots': 100000,

    'constraints': {
        'one_class': True,
        'logic': True,
        'conentration': True,
        'min_thr': True,
        'max_thr': True,
        'monotonicity': True,
    },

    'test': {
        'logic': True,
        'conentration': True,
        'min_thr': True,
        'max_thr': True,
        'monotonicity': True,
        'heterogeneity': False,
        'homogeneity': False,
    },

    'mu': {
        'one_calss': 100,
        'logic': 100,
        'concentration': 20,
        'min_thr': 10,
        'max_thr': 10,
        'monotonicity': 10,
    }
}

## Constraints definitions

Sections 1, 2 and 3 of the WP5 report describe the financial constraints description for the rating scale definition problem. Chapter 6 reports their mathematical definition for the QUBO formulation.
Following these studies, in the following sections we implement the functions for the problem's constraints, specifically:
* **Logic constraint**: the result of the QUBO problem must be a binary staircase matrix;
* **Monotonicity constraint**: the default rate of the grades increases as the grade index increases;
* **Concentration constraint**: the counterparts are divided into grades to avoid high concentrations;
* **Grade cardinality threshold constraints**: the number of counterparts per grade is limited from above and below.

We provide a brief description of the code for each constraint, especially for those that required slight changes from what was described in the report.

We considered not implementing the heterogeneity constraint in the cost function because the number of variables introduced to solve this constraint alone would make the entire problem intractable. As with the homogeneity test described in Chapter 4 of the WP5 report, we decided to implement only the test for this constraint.

### Logic constraint

Prova

In [34]:
def one_class_const(m, n, mu=1):
    Q = np.zeros([n*m, n*m])
    c = 0

    # penalty: "one counterpart per class"
    for ii in range(n):
        for jj in range(m):
            tt = ii*m+jj
            Q[tt][tt] += -1
        for jj in range(m-1):
            for kk in range(jj+1,m):
                tt = ii*m+jj
                rr = ii*m+kk
                Q[tt][rr] += 1
                Q[rr][tt] += 1
        c += 1
    return (mu*Q, mu*c)
    
def first_counterpart_const(m, n, mu=1):
    Q = np.zeros([n*m, n*m])
    
    # penalty: "first counterpart in first class"
    for jj in range(1, m):
        Q[jj][jj] += 1
        Q[0][jj] -= 0.5
        Q[jj][0] -= 0.5
    return mu*Q

def last_counterpart_const(m, n, mu=1):
    Q = np.zeros([n*m, n*m])

    # penalty: "last counterpart in the last class"
    for jj in range(m-1):
        tt = (n-1)*m+jj
        Q[tt][tt] += 1
        Q[(n*m)-1][tt] -= 0.5
        Q[tt][(n*m)-1] -= 0.5
    return mu*Q

def staircase_constr(m, n, mu=1):
    Q = first_counterpart_const(m,n) + last_counterpart_const(m,n)

    # penalty: "penalize not permitted submatrix"
    # a submatrix is
    # [[x1, x1], [x3, x4]]
    for ii in range(n-1):
        for jj in range(m-1):
            x1 = ii*m+jj
            x2 = x1+1
            x3 = (ii+1)*m+jj
            x4 = x3+1

            # add linear terms
            Q[x1][x1] += 1
            Q[x4][x4] += 1

            # add quadratic terms
            Q[x1][x2] += 0.5
            Q[x2][x1] += 0.5

            Q[x1][x3] -= 0.5
            Q[x3][x1] -= 0.5

            Q[x1][x4] -= 1
            Q[x4][x1] -= 1

            Q[x2][x3] += 0.5
            Q[x3][x2] += 0.5

            Q[x2][x4] -= 0.5
            Q[x4][x2] -= 0.5

            Q[x3][x4] += 0.5
            Q[x4][x3] += 0.5

    # penalty: "penalize restarting from class 0"
    for ii in range(n-1):
        x1 = ii*m
        x2 = x1+1
        x3 = (ii+1)*m

        Q[x3][x3] += 1

        Q[x1][x3] -= 0.5
        Q[x3][x1] -= 0.5

        Q[x2][x3] -= 0.5
        Q[x3][x2] -= 0.5

        Q[x1][x2] += 0.5
        Q[x2][x1] += 0.5

    return mu*Q

### Monotonicity constraint

Prova

In [None]:
def monotonicity():
    
    return

### Concentration constraint

Prova

In [35]:
def concentration_constr(m, n, mu=1):
    Q = np.zeros([n*m, n*m])

    u = np.zeros([n * n * m, 2], dtype=int)
    index = 0
    for i1 in range(n):
        for i2 in range(n):
            for j in range(m):
                u[index] = [(i1)*m+j, (i2)*m+j]
                index += 1

    # penalty: "concentration"
    c = 1/(1-m)
    gamma = m/(m-1)
    for (u1, u2) in u:
        if u1==u2:
            Q[u1][u2] += gamma
        else:
            Q[u1][u2] += gamma/2

    return (mu*Q, mu*c)

### Grade cardinality threshold constraint

Prova

In [36]:
def compute_lower_thrs(n):
    return math.floor(n*0.01) if math.floor(n*0.01) != 0 else 1

def compute_upper_thrs(n, grades):
    return math.floor(n*0.15) if grades > 7 and math.floor(n*0.15) != 0 else (n-grades+1)
    
def threshold_constr(m, n, offset, minmax, mu=1):

    if minmax == 'min':
        thr = compute_lower_thrs(n)
        slack_vars = math.floor(1+math.log2(n-thr)) # to check
    elif minmax == 'max':
        thr = compute_upper_thrs(n, m)
        slack_vars = math.floor(1+math.log2(thr)) # to check
    else:
        print("Error in threshold function call")
        sys.exit(1)

    # initialize Q and c
    dim = offset+slack_vars*m
    Q = np.zeros([dim, dim])
    c = m * thr * thr

    for i1 in range(n):
        for i2 in range(n):
            for j in range(m):
                u2 = [i1*m+j, i2*m+j]
                if u2[0]==u2[1]: # questo l'ho modificato, forse c'era un typo
                    Q[u2[0]][u2[1]] += 1
                else:
                    Q[u2[0]][u2[1]] += 0.5
                    Q[u2[1]][u2[0]] += 0.5

    for l1 in range(slack_vars):
        for l2 in range(slack_vars):
            for j in range(m):
                v2 = [l1*m+j, l2*m+j]
                tmp = math.pow(2,math.floor((v2[0]+1)/m)+math.floor((v2[1]+1)/m))
                if v2[0]==v2[1]:
                    Q[offset+v2[0]][offset+v2[1]] += tmp
                else:
                    Q[offset+v2[0]][offset+v2[1]] += 0.5*tmp
                    Q[offset+v2[1]][offset+v2[0]] += 0.5*tmp


    for i in range(n):
        for j in range(m):
            u = i*m+j
            Q[u,u] -= 2*thr

    index = 0
    for l in range(slack_vars):
        for j in range(m):
            Q[offset+index][offset+index] += thr*math.pow(2,1+math.floor((l*m+j+1)/m))
            index += 1

    for i in range(n):
        for l in range(slack_vars):
            for j in range(m):
                w2 = [i*m+j, l*m+j]
                tmp = math.pow(2,1+math.floor((w2[1]+1)/m))
                Q[w2[0]][offset+w2[1]] -= -0.5*tmp
                Q[offset+w2[1]][w2[0]] -= -0.5*tmp

    return (mu*Q, mu*c)

## Test the constraints

In the file `check_constraint.py` we implemented the tests for all the constraints described in the report: given a matrix and the appropriate iperparameters, one function per constraint tests if that matrix fulfill that specific requirement.

All the functions are commented appropriately so we refer the reader directly to the code for further details.

In [37]:
from src.check_constraints import *

## Solve the QUBO formulation

(TO FIX FROM HERE)

In [None]:
def from_matrix_to_bqm(matrix, c):
    
    Q_dict = {(i, j): matrix[i, j] for i in range(matrix.shape[0]) for j in range(matrix.shape[1])}# if matrix[i, j] != 0}
    #print(Q_dict)
    bqm = dimod.BinaryQuadraticModel.from_qubo(Q_dict, c)

    return bqm

def exact_solver(bqm):
    
    sampler = dimod.ExactSolver()
    sampleset = sampler.sample(bqm)

    return sampleset

def annealer_solver(dim, bqm, shots):

    # Set up the sampler with an initial state
    sampler = hybrid.samplers.SimulatedAnnealingProblemSampler(num_sweeps=shots)
    state = hybrid.core.State.from_sample({i: 0 for i in range(dim)}, bqm)
 
    # Sample the problem
    new_state = sampler.run(state).result()
 
    return new_state

def brute_force_solver(Q, c, dim):

    # compute C(Y) = (Y^T)QY + (G^T)Y + c for every Y
    Ylist = list(itertools.product([0, 1], repeat=dim))
    Cmin = float('inf')

    for ii in range(len(Ylist)):
        Y = np.array(Ylist[ii])
        Cy=(Y.dot(Q).dot(Y.transpose()))+c
        if ( Cy < Cmin ):
            Cmin = Cy
            Ymin = Y.copy()
    
    return (np.array(Ymin), Cmin)

In [None]:
# Gen Q matrix
start_time = time.perf_counter_ns()
Q = np.zeros([m*n, m*n])
c = 0
if constr_one_class == True:
    (Q_one_class,c_one_class) = one_class_const(m,n,mu_one_calss)
    Q = Q + Q_one_class
    c = c + c_one_class
if constr_logic == True:
    Q = Q + staircase_constr(m,n,mu_logic)
if constr_conentration == True:
    (Q_conc,c_conc) = concentration_constr(m, n, mu_concentration)
    Q = Q + Q_conc
    c = c + c_conc
if constr_min_thr == True:
    (Q_min_thr, c_min_thr) = threshold_constr(m, n, Q.shape[0], 'min', mu_min_thr)
    pad = Q_min_thr.shape[0] - Q.shape[0]
    Q = np.pad(Q, pad_width=((0,pad), (0, pad)), mode='constant', constant_values=0) + Q_min_thr
    c = c + c_min_thr
if constr_max_thr == True:
    (Q_max_thr, c_max_thr) = threshold_constr(m, n, Q.shape[0], 'max', mu_max_thr)
    pad = Q_max_thr.shape[0] - Q.shape[0]
    Q = np.pad(Q, pad_width=((0,pad), (0, pad)), mode='constant', constant_values=0) + Q_max_thr
    c = c + c_max_thr

# BQM generation
bqm = from_matrix_to_bqm(Q, c)
end_time = time.perf_counter_ns()
print(f"Matrix size:{Q.shape}")
print(f"Time of generation: {(end_time - start_time)/10e9} s")

Matrix size:(16, 16)
Time of generation: 0.0002841545 s


In [15]:
# Solving with brute force
start_time = time.perf_counter_ns()
(result_bf, cost) = brute_force_solver(Q,c,Q.shape[0])
end_time = time.perf_counter_ns()
if constr_min_thr == True:
    result_bf = result_bf[:m*n]
print(f"\nBrute Force result:\n{result_bf.reshape(n,m)}")
print(f"Time of brute force solution: {(end_time - start_time)/10e9} s\n")


Brute Force result:
[[1 0]
 [1 0]
 [0 1]
 [0 1]]
Time of brute force solution: 0.0166132987 s



In [16]:
# Solving exactly with dwave
start_time = time.perf_counter_ns()
e_result = exact_solver(bqm)
df_result = e_result.lowest().to_pandas_dataframe()
end_time = time.perf_counter_ns()
elapsed_time_ns = end_time - start_time
# Print all the solutions
result_exact_solver = df_result.iloc[:, :m*n].to_numpy()
# print(f"All exact solutions:\n{df_result}")
print(f"Exact solutions with dwave: {int(result_exact_solver.size/(m*n))}")
for sol in result_exact_solver[:]:
    print(f"solution:\n{sol.reshape(n, m)}")
print(f"Time of all exact solutions: {elapsed_time_ns/10e9} s")
# print(f"First solution:\n{result_exact_solver[0].reshape(n, m)}")

Exact solutions with dwave: 1
solution:
[[1 0]
 [1 0]
 [0 1]
 [0 1]]
Time of all exact solutions: 0.0061611707 s


In [17]:
# Solving with annealing 
start_time = time.perf_counter_ns()
result = annealer_solver(Q.shape[0], bqm, shots)
end_time = time.perf_counter_ns()
result_ann = np.array([int(x) for x in result.samples.first.sample.values()])[:m*n]
annealing_matrix = result_ann.reshape(n, m)
print(f"\nAnnealing result:\n{annealing_matrix}")    
print(f"Time of annealing solution: {(end_time - start_time)/10e9} s\n")


Annealing result:
[[1 0]
 [1 0]
 [0 1]
 [0 1]]
Time of annealing solution: 0.0029548517 s



In [None]:
print("Result validation:")
verbose = True
check_staircase(annealing_matrix, verbose)
check_concentration(annealing_matrix, alpha_concentration, verbose)
check_lower_thrs(annealing_matrix, compute_lower_thrs(n), verbose)
check_upper_thrs(annealing_matrix, compute_upper_thrs(n,m), verbose)

Result validation:
Staircase matrix constraint checked
Concentration constraint checked
Lower threshold limit constraint checked
Upper threshold limit constraint checked


True