# Coding A Rank Order Assignment Algorithm

The goal of the project was to write a piece of software to matches N Doctors with K Hospitals with open residency positions. Each Doctor provides a ranked list of their preference for Hospitals, without Hospitals ranking the Doctors.

### Section 1: Background
Blabla Hungarian Algorithm.  PUT ACTUAL STUFF HERE once we finish README.md

For a more in depth look at the Hungarian Algorithm, you can reference <a href="https://www.topcoder.com/thrive/articles/Assignment%20Problem%20and%20Hungarian%20Algorithm" target="_parent"> this article on topcoder.com.

### Section 2: The Data

We assume our data has been collected into a spreadsheet (a CSV file) and organized so that each doctor has their own row, which is itself populated with preference rankings [1, K] for each hospital in it's corresponding column.  

We load this in as a **PANDAS** dataframe.

In [1]:
import Hungarian_Alg_Steps as HAS
import Hungarian_Alg_LineCheck as HAL
import HungarianAlgoImportData as HAID

import numpy as np
import pandas as pd

In [2]:
df = HAID.import_data('Large_Test.csv')
print(df)

  Hospital  H1  H2  H3  H4  H5  H6  H7
0      Num   3   2   1   2   4   3   1
1       D1   1  34  45  14  67  56  89
2       D2  29  83  72  63  23  19  14
3       D3  41  67  15  36  49  84  55
4       D4  53  19  97  22  78  80  11
5       D5  88  25  46  10  35  50  99


From this dataframe, we extract the details provided for each hospital and strip the dataframe of its row and column labels to allow us to work with just the cost matrix. 

In [12]:
hi = HAID.get_hospital_info(df)

prepped_df = HAID.prep_data(df)
prepped_array = prepped_df.to_numpy(copy = True)

print(f'get_hospital_info returns the number of positions available per hospital\n\n {hi}')
print(f'\nAnd we use this information to fill out the matrix, which then gets padded to become square\n\n {prepped_df}')
print(f'\nAt the end of the importing process we are left with the 2D array:\n\n {prepped_array}')

get_hospital_info returns the number of positions available per hospital

    H1  H2  H3  H4  H5  H6  H7
0   3   2   1   2   4   3   1

And we use this information to fill out the matrix, which then gets padded to become square

     H1A  H2A  H3  H4A  H5A  H6A  H7  H1B  H1C  H2B  H4B  H5B  H5C  H5D  H6B  \
0     1   34  45   14   67   56  89    1    1   34   14   67   67   67   56   
1    29   83  72   63   23   19  14   29   29   83   63   23   23   23   19   
2    41   67  15   36   49   84  55   41   41   67   36   49   49   49   84   
3    53   19  97   22   78   80  11   53   53   19   22   78   78   78   80   
4    88   25  46   10   35   50  99   88   88   25   10   35   35   35   50   
5     0    0   0    0    0    0   0    0    0    0    0    0    0    0    0   
6     0    0   0    0    0    0   0    0    0    0    0    0    0    0    0   
7     0    0   0    0    0    0   0    0    0    0    0    0    0    0    0   
8     0    0   0    0    0    0   0    0    0    0    0    

### Section 3: Reduce the Matrix

With our cost matrix created we then begin the steps of the Hungarian Algorithm.  

As previously explained, the first step is to reduce the matrix by subtracting the minimum value of each row from each row and repeating that for each column. 

In [14]:
row_reduced_matrix = HAS.step1_row_reduction(prepped_array)
print(f'Our original cost matrix was:\n {prepped_array}\n\n and following our row reduction we are given:\n {row_reduced_matrix}')

reduced_matrix = HAS.step2_col_reduction(row_reduced_matrix)
print(f'\nAfter column reduction, we obtain our final reduced matrix:\n {reduced_matrix}')



Our original cost matrix was:
 [[ 1 34 45 14 67 56 89  1  1 34 14 67 67 67 56 56]
 [29 83 72 63 23 19 14 29 29 83 63 23 23 23 19 19]
 [41 67 15 36 49 84 55 41 41 67 36 49 49 49 84 84]
 [53 19 97 22 78 80 11 53 53 19 22 78 78 78 80 80]
 [88 25 46 10 35 50 99 88 88 25 10 35 35 35 50 50]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]

 and following our row reduction we are given:
 [[ 0. 33. 44. 13. 66. 55. 88.  0.  0. 33. 13. 66. 66. 66. 55. 55.]
 [15. 69. 58. 49.  9.  5.  0. 15. 15

### Section 4: Crossing Out the Zeros

With the matrix both row and column reduced, we proceed to the process of "crossing out zeros" with the minimal number of lines.

Unfortunately, we are unaware of a python function that draws lines in a numpy array.  So, we turn to matrix manipulations.



*To begin, we pass the numpy array through and convert it into a boolean array where values indicate whether the entry is 0 (True) or not (False).
A copy of this array is passed to a helper that fills in a first set of zeros simply by picking the first occurence in each row.
The coordinates that we get back are then separated by row into row and column.*

*We then sift through the boolean matrix column by column in order to iteratively obtain the minimal horizontal and vertical "lines" needed.*


In [15]:
row_lines, col_lines, marked_zeros = HAL.Step_3_Line_Check(reduced_matrix)

print(f'This step returns the rows and columns that have lines crossing through them:')
print(f'Rows: {row_lines}')
print(f'Columns: {col_lines}')
print(f'Marked Zeros:\n {marked_zeros}')
print('\nAnd by finding the length of the line arrays and summing them, we can determine whether we move on to step 4 or 5')
print(f'Row lines({len(row_lines)}) + Column lines({len(col_lines)}) = {len(row_lines) + len(col_lines)}.')

if len(row_lines) + len(col_lines) == len(reduced_matrix):
    print("\nWe're moving to Step 5!!")
    fork = 5
else:
    print("\nWe have to do Step 4!")
    fork = 4

This step returns the rows and columns that have lines crossing through them:
Rows: [ 0  2  4  5  6  7  8  9 10 11 12 13 14 15]
Columns: [6]
Marked Zeros:
 [[ 1  6]
 [ 2  2]
 [ 4  3]
 [ 0  0]
 [ 5  1]
 [ 6  4]
 [ 7  5]
 [ 8  7]
 [ 9  8]
 [10  9]
 [11 10]
 [12 11]
 [13 12]
 [14 13]
 [15 14]]

And by finding the length of the line arrays and summing them, we can determine whether we move on to step 4 or 5
Row lines(14) + Column lines(1) = 15.

We have to do Step 4!


### Section 5: Adjusting the Matrix

If we fail to transform the matrix such that an optimal assignment can be obtained, we need to move to uncover new zeros.

The process works by identifying uncovered cells (those not included in any row or column covered by a line) and modifying their values relative to the smallest uncovered entry. The goal is to create new zeros in the uncovered region and preserve the zeros we already have. Let’s go through this process step by step.

We **first** determine entries that are *not* covered by any of the drawn lines.  Then we select the smallest of these values, called here m.

"m" is then *subtracted from all uncovered entries* and *added to intersection entries* (intersection entries are entries at the intersection of a row and column line from Step 3).

STEP4 then returns the updated cost matrix which gets passed back to STEP3 until we have N lines.

In [16]:
if fork == 4:
    while len(row_lines) + len(col_lines) != len(reduced_matrix):
        print(f'First we have {len(row_lines) + len(col_lines), len(reduced_matrix)}')
        reduced_matrix = HAS.step4_adjust_matrix(reduced_matrix, row_lines, col_lines)
        row_lines, col_lines, marked_zeros = HAL.Step_3_Line_Check(reduced_matrix)

print(f'Then we move on to Step5 because we have {len(row_lines) + len(col_lines), len(reduced_matrix)}')

First we have (15, 16)
Then we move on to Step5 because we have (16, 16)


### Section 6: Finding the Optimal Assignments

Once we have reduced the cost matrix and identified the zeros, the next step involves assigning rows to columns. This process, implemented in the function STEP5_find_solution, seeks to assign each row to a unique column while ensuring only 1:1 matching and, hopefully, efficiency.

To ensure we get our optimal assignments efficiently, the function does 2 passes.  The first pass goes ahead with directly matching any doctor that only has 1 candidate for assignment (i.e. we fully process rows with only 0).

The second (and beyond) passes take care of matching candidates with the next lowest number of options.  Repeating this until we have no more unmatched candidates (though if the matrix was padded, they may, in reality, be unmatched).

This gives us our final result.

In [None]:
assignments = HAL.STEP5_find_solution(marked_zeros)
num_to_match = len(HAID.get_doctors(df))
matches_df = HAL.match_residents(assignments, df)
matches_df = matches_df.sort_values("Doctor")
matches_numeric = HAL.match_residents_numeric(assignments, prepped_df, num_to_match)
score, max_score = HAID.get_score(prepped_df, matches_numeric)
print(matches_df)
print(f'\nWe have achieved a solution of: {(1 - score / max_score) * 100:.1f}%')

  Doctor Hospital Position
3     D1               H1A
4     D2               H6A
0     D3                H3
1     D4                H7
2     D5               H4A

We have achieved a solution of: 87.6%
