# Coding A Rank Order Assignment Algorithm

The goal of the project was to write a piece of software to matches N Doctors with K Hospitals with open residency positions. Each Doctor provides a ranked list of their preference for Hospitals, without Hospitals ranking the Doctors.

### Section 1: Background
Blabla Hungarian Algorithm.  PUT ACTUAL STUFF HERE once we finish README.md

For a more in depth look at the Hungarian Algorithm, you can reference <a href="https://www.topcoder.com/thrive/articles/Assignment%20Problem%20and%20Hungarian%20Algorithm" target="_parent"> this article on topcoder.com.

### Section 2: The Data

We assume our data has been collected into a spreadsheet (a CSV file) and organized so that each doctor has their own row, which is itself populated with preference rankings [1, K] for each hospital in it's corresponding column.  

We load this in as a **PANDAS** dataframe.

In [1]:
import HungarianAlgoImportData as HAID

df = HAID.import_data('BDD Test.csv')
print(df)

  Hospital  H1  H2  H3
0      Num   2   2   1
1       D1   1   2   3
2       D2   2   1   3
3       D3   3   2   1
4       D4   1   3   2


From this dataframe, we extract the details provided for each hospital and strip the dataframe of its row and column labels to allow us to work with just the cost matrix. 

In [2]:
hi = HAID.get_hospital_info(df)

prepped_df = HAID.prep_data(df)
prepped_array = prepped_df.to_numpy(copy = True)

print(f'get_hospital_info returns the number of positions available per hospital\n {hi}')
print(f'At the end of the importing process we are left with the 2D array:\n {prepped_array}')

get_hospital_info returns the number of positions available per hospital
   Hospital  H1  H2  H3
0      Num   2   2   1
At the end of the importing process we are left with the 2D array:
 [[1 2 3 1]
 [2 1 3 2]
 [3 2 1 3]
 [1 3 2 1]]


### Section 3: Reduce the Matrix

With our cost matrix created we then begin the steps of the Hungarian Algorithm.  

As previously explained, the first step is to reduce the matrix by subtracting te minimum value of each row from each row and repeating that for each column. 

In [3]:
import Hungarian_Alg_Steps as HAS

row_reduced_matrix = HAS.step1_row_reduction(prepped_array)
print(f'Our original cost matrix was:\n {prepped_array}\n\n and following our row reduction we are given:\n {row_reduced_matrix}')

reduced_matrix = HAS.step2_col_reduction(row_reduced_matrix)
print(f'\nAfter column reduction, we obtain our final reduced matrix:\n {reduced_matrix}')

Our original cost matrix was:
 [[1 2 3 1]
 [2 1 3 2]
 [3 2 1 3]
 [1 3 2 1]]

 and following our row reduction we are given:
 [[0. 1. 2. 0.]
 [1. 0. 2. 1.]
 [2. 1. 0. 2.]
 [0. 2. 1. 0.]]

After column reduction, we obtain our final reduced matrix:
 [[0. 1. 2. 0.]
 [1. 0. 2. 1.]
 [2. 1. 0. 2.]
 [0. 2. 1. 0.]]


### Section 4: Crossing Out the Zeros

With the matrix both row and column reduced, we proceed to the process of "crossing out zeros" with the minimal number of lines.

Unfortunately, we are unaware of a python function that draws lines in a numpy array.  So, we turn to matrix manipulations.



*To begin, we pass the numpy array through and convert it into a boolean array where values indicate whether the entry is 0 (True) or not (False).
A copy of this array is passed to a helper that fills in a first set of zeros simply by picking the first occurence in each row.
The coordinates that we get back are then separated by row into row and column.*


CODE:

    def Step_3_Line_Check(mat):

        # Matrix --> Boolean Matrix (0 True, else False)
        current_mat = mat
        zero_bool_mat = (current_mat == 0)
        zero_bool_mat_copy = zero_bool_mat.copy()

        # Find the indices of zeros in the matrix (only the first in each row)
        marked_zero = []
        while (True in zero_bool_mat_copy):
            min_zero_row(zero_bool_mat_copy, marked_zero)

        # Separate row and col indices for later use
        marked_zero_row = []
        marked_zero_col = []
        for i in range(len(marked_zero)):
            marked_zero_row.append(marked_zero[i][0])
            marked_zero_col.append(marked_zero[i][1])

        # Collect number of rows with 0s
        non_marked_row = list(set(range(current_mat.shape[0])) - set(marked_zero_row))

*We then sift through the boolean matrix column by column in order to iteratively obtain the minimal horizontal and vertical "lines" needed.*

CODE:
    marked_cols = []
        change_made = True
        while change_made: # If we hit a run that changes nothing --> STOP
            change_made = False
            for curr_row in range(len(non_marked_row)):
                row_array = zero_bool_mat[non_marked_row[curr_row], :]
                for curr_col in range(row_array.shape[0]):
                    # In each row, we check each bool_val for unmrked zeros (Trues)
                    if row_array[curr_col] == True and curr_col not in marked_cols:
                        # If we missed smth, we add to our marked cols and keep going
                        marked_cols.append(curr_col)
                        change_made = True

            for row_num, col_num in marked_zero:
                # Update the set of non-marked rows for reduction to minimal set
                if row_num not in non_marked_row and col_num in marked_cols:
                    non_marked_row.append(row_num)
                    change_made = True
        # Reduce set of rows down based on non-marked rows
        marked_rows = list(set(range(mat.shape[0])) - set(non_marked_row))
        
        return np.array(marked_rows), np.array(marked_cols), np.array(marked_zero)

In [11]:
import Hungarian_Alg_LineCheck as HAL

row_lines, col_lines, marked_zeros = HAL.Step_3_Line_Check(reduced_matrix)

print(f'This step returns the rows and columns that have lines crossing through them:')
print(row_lines)
print(col_lines)
print(marked_zeros)
print('\nAnd by finding the length of the line arrays and summing them, we can determine whether we move on to step 4 or 5')
print(f'Row lines({len(row_lines)}) + Column lines({len(col_lines)}) = {len(row_lines) + len(col_lines)}.')

if len(row_lines) + len(col_lines) == len(reduced_matrix):
    print("\nWe're moving to Step 5!!")
else:
    print("\nWe have to do Step 4!")

This step returns the rows and columns that have lines crossing through them:
[0 1 2 3]
[]
[[1 1]
 [2 2]
 [0 0]
 [3 3]]

And by finding the length of the line arrays and summing them, we can determine whether we move on to step 4 or 5
Row lines(4) + Column lines(0) = 4.

We're moving to Step 5!!
