# Coding A Rank Order Assignment Algorithm

The goal of the project was to write a piece of software to matches N Doctors with K Hospitals with open residency positions. Each Doctor provides a ranked list of their preference for Hospitals, without Hospitals ranking the Doctors.

### Section 1: Background
Blabla Hungarian Algorithm.  PUT ACTUAL STUFF HERE once we finish README.md

For a more in depth look at the Hungarian Algorithm, you can reference <a href="https://www.topcoder.com/thrive/articles/Assignment%20Problem%20and%20Hungarian%20Algorithm" target="_parent"> this article on topcoder.com.

### Section 2: The Data

We assume our data has been collected into a spreadsheet (a CSV file) and organized so that each doctor has their own row, which is itself populated with preference rankings [1, K] for each hospital in it's corresponding column.  

We load this in as a **PANDAS** dataframe.

In [1]:
import HungarianAlgoImportData as HAID

df = HAID.import_data('Test_VarB.csv')
print(df)

  Hospital  H1
0      Num   1
1       D1   1
2       D2   2
3       D3   3


From this dataframe, we extract the details provided for each hospital and strip the dataframe of its row and column labels to allow us to work with just the cost matrix. 

In [None]:
import numpy as np

hi = HAID.get_hospital_info(df)

prepped_df = HAID.prep_data(df)
prepped_array = prepped_df.to_numpy(copy = True)

print(f'get_hospital_info returns the number of positions available per hospital\n\n {hi}')
print(f'\nAnd we use this information to fill out the matrix, which then gets padded to become square\n\n {prepped_df}')
print(f'\nAt the end of the importing process we are left with the 2D array:\n\n {prepped_array}')

   H1  No Match
1   1         0
2   2         0
3   3         0
get_hospital_info returns the number of positions available per hospital

    H1
0   1

And we use this information to fill out the matrix, which then gets padded to become square

    H1  No Match
1   1         0
2   2         0
3   3         0

At the end of the importing process we are left with the 2D array:

 [[1 0]
 [2 0]
 [3 0]]


### Section 3: Reduce the Matrix

With our cost matrix created we then begin the steps of the Hungarian Algorithm.  

As previously explained, the first step is to reduce the matrix by subtracting the minimum value of each row from each row and repeating that for each column. 

In [3]:
import Hungarian_Alg_Steps as HAS

row_reduced_matrix = HAS.step1_row_reduction(prepped_array)
print(f'Our original cost matrix was:\n {prepped_array}\n\n and following our row reduction we are given:\n {row_reduced_matrix}')

reduced_matrix = HAS.step2_col_reduction(row_reduced_matrix)
print(f'\nAfter column reduction, we obtain our final reduced matrix:\n {reduced_matrix}')

Our original cost matrix was:
 [[0 0 0 0]
 [1 0 1 0]
 [0 0 0 0]
 [1 1 0 0]]

 and following our row reduction we are given:
 [[0. 0. 0. 0.]
 [1. 0. 1. 0.]
 [0. 0. 0. 0.]
 [1. 1. 0. 0.]]

After column reduction, we obtain our final reduced matrix:
 [[0. 0. 0. 0.]
 [1. 0. 1. 0.]
 [0. 0. 0. 0.]
 [1. 1. 0. 0.]]


### Section 4: Crossing Out the Zeros

With the matrix both row and column reduced, we proceed to the process of "crossing out zeros" with the minimal number of lines.

Unfortunately, we are unaware of a python function that draws lines in a numpy array.  So, we turn to matrix manipulations.



*To begin, we pass the numpy array through and convert it into a boolean array where values indicate whether the entry is 0 (True) or not (False).
A copy of this array is passed to a helper that fills in a first set of zeros simply by picking the first occurence in each row.
The coordinates that we get back are then separated by row into row and column.*

*We then sift through the boolean matrix column by column in order to iteratively obtain the minimal horizontal and vertical "lines" needed.*


In [4]:
import Hungarian_Alg_LineCheck as HAL

row_lines, col_lines, marked_zeros = HAL.Step_3_Line_Check(reduced_matrix)

print(f'This step returns the rows and columns that have lines crossing through them:')
print(f'Rows: {row_lines}')
print(f'Columns: {col_lines}')
print(f'Marked Zeros:\n {marked_zeros}')
print('\nAnd by finding the length of the line arrays and summing them, we can determine whether we move on to step 4 or 5')
print(f'Row lines({len(row_lines)}) + Column lines({len(col_lines)}) = {len(row_lines) + len(col_lines)}.')

if len(row_lines) + len(col_lines) == len(reduced_matrix):
    print("\nWe're moving to Step 5!!")
    fork = 5
else:
    print("\nWe have to do Step 4!")
    fork = 4

This step returns the rows and columns that have lines crossing through them:
Rows: [0 1 2 3]
Columns: []
Marked Zeros:
 [[1 1]
 [3 2]
 [0 0]
 [2 3]]

And by finding the length of the line arrays and summing them, we can determine whether we move on to step 4 or 5
Row lines(4) + Column lines(0) = 4.

We're moving to Step 5!!


### Section 5: Adjusting the Matrix

If we fail to transform the matrix such that an optimal assignment can be obtained, we need to move to uncover new zeros.

The process works by identifying uncovered cells (those not included in any row or column covered by a line) and modifying their values relative to the smallest uncovered entry. The goal is to create new zeros in the uncovered region and preserve the zeros we already have. Let’s go through this process step by step.

We **first** determine entries that are *not* covered by any of the drawn lines.  Then we select the smallest of these values, called here m.

"m" is then *subtracted from all uncovered entries* and *added to intersection entries* (intersection entries are entries at the intersection of a row and column line from Step 3).

STEP4 then returns the updated cost matrix which gets passed back to STEP3 until we have N lines.

In [5]:
if fork == 4:
    while len(row_lines) + len(col_lines) != len(reduced_matrix):
        reduced_matrix = HAS.step4_adjust_matrix(reduced_matrix, row_lines, col_lines)
        row_lines, col_lines, marked_zeros = HAL.Step_3_Line_Check(reduced_matrix)

### Section 6: Finding the Optimal Assignments

Once we have reduced the cost matrix and identified the zeros, the next step involves assigning rows to columns. This process, implemented in the function STEP5_find_solution, seeks to assign each row to a unique column while ensuring only 1:1 matching and, hopefully, efficiency.

To ensure we get our optimal assignments efficiently, the function does 2 passes.  The first pass goes ahead with directly matching any doctor that only has 1 candidate for assignment (i.e. we fully process rows with only 0).

The second (and beyond) passes take care of matching candidates with the next lowest number of options.  Repeating this until we have no more unmatched candidates (though if the matrix was padded, they may, in reality, be unmatched).

In [14]:
import pandas as pd

def match_residents(assignments, raw_data):
    # Get the number of actual doctors and hospital columns from the data
    doctors = HAID.get_doctors(raw_data)
    prepped_data = HAID.prep_data(raw_data)
    num_doctors = len(doctors)  # Exclude filler

    # Extract doctor and hospital names (note the hospital names will have position iteration)
    doctor_names = doctors.values  # Actual doctor names from raw data
    hospital_names = list(prepped_data.columns)  # Valid hospital columns only

    # Filter assignments to only include rows corresponding to actual doctors
    filtered_assignments = [pair for pair in assignments if pair[0] < num_doctors]

    # Prepare matches list
    matches = []
    for row, col in filtered_assignments:
        doctor = doctor_names[row]  # Map row index to actual doctor name
        hospital = hospital_names[col]  # Map column index to hospital position name

        # Append to matches list
        matches.append({"Doctor": doctor, "Hospital Position": hospital})

    # Convert matches to a DataFrame
    matches_df = pd.DataFrame(matches)

    return matches_df

This gives us our final result.

In [13]:
def Match_Residents(assignments, prepped_data, num_doctors):

    doctor_names = list(prepped_data.index[:num_doctors])  # First `num_doctors` rows, rest are extra
    hospital_names = list(prepped_data.columns) 

    # Filter assignments to only include actual doctors (rows < num_doctors)
    filtered_assignments = [pair for pair in assignments if pair[0] < num_doctors]

    # Prepare list of matches
    matches = []
    for row, col in filtered_assignments:
        doctor = doctor_names[row]  # Map row index to actual doctor name
        hospital = hospital_names[col]  # Map column index to hospital position name

        # Append to matches list
        matches.append({"Doctor": doctor, "Hospital Position": hospital})

    # Convert matches list to a DataFrame
    matches_df = pd.DataFrame(matches)

    return matches_df

In [16]:
import pandas as pd
assignments = HAL.STEP5_find_solution(marked_zeros)
num_to_match = len(HAID.get_doctors(df))
matches_df = HAL.match_residents(assignments, df)
matches_df = matches_df.sort_values("Doctor")
matches_numeric = Match_Residents(assignments, prepped_df, num_to_match)
score, max_score = HAID.get_score(prepped_df, matches_numeric)
matches_df
score
max_score

np.int64(2)

### Section 1: Background on Hungarian Algorithm and Objective

The assignment problem is a classical optimization problem where the goal is to select the maximum matching with the lowest possible cost. In mathematical terms, it boils down to finding a minimum-weight matching in a bipartite graph, where each agent–task pairing carries a weight that reflects how suitable that match is. To solve this type of problem efficiently, Harold Kuhn published the Hungarian algorithm in 1955.<sup>[1]</sup> The method works by transforming the cost matrix through row and column reductions and iteratively uncovering zeros until an optimal set of pairings is reached.<sup>[2]</sup> In our project, we used this approach by converting doctors’ ranked hospital preferences into a cost matrix and applying the algorithm to generate assignments that minimize overall dissatisfaction, bound by hospital capacity limits. **Are we defining the objective as minimize dissatisfaction or maximize satisfaction? - Jonathan**

### Section 2: Model Assumptions
- Each doctor provides a complete ranking of all hospitals with no ties or missing values
- Hospitals do not rank doctors, only doctors rank hospitals
- Each hospital has a capacity currently all set to 1 **remove this - Jonathan**
- The total capacity may differ from the number of doctors (N is not always equal to K). Our implementation handles this by:
  - Duplicating hospital columns to represent multiple slots
  - Adding “no match” dummy columns/rows to make the cost matrix square
- Doctors can be unmatched if no hospital slots remain
- Ties between equally optimal assignments are broken arbitrarily, with priority given to unique solution choices
- Objective considers rankings only and no other factors
- Assumes reasonable penalty for “no match” so it is only chosen if necessary **explicitly define the value - Jonathan**
- The difference between rank 1 and 2 is treated the same as between 2 and 3 **what? - Jonathan**
- Hungarian algorithm always produces a solution
- If multiple optimal solutions exist, the particular assignment chosen depends on how zeros are marked/traversed in the algorithm
- Hungarian is fine for small or medium test cases but could become slow for very large datasets **quantitative size and duration - Jonathan**

### Section 3: Design Decisions

- We chose to represent the problem in a CSV file where each row corresponds to a doctor’s ranked preferences, columns correspond to hospitals, and the header row encodes hospital capacities. This made it easy to test with different datasets, using Pandas for preprocessing.
- Hospitals with more than one slot are expanded into multiple columns, such as H1A and H1B, so that the problem becomes one-to-one and fits the Hungarian algorithm’s requirements<sup>[3]</sup>
- Because the Hungarian algorithm requires a square cost matrix, we pad the smaller dimension with dummy rows/columns, when necessary. A special “no match” column is added to represent doctors who cannot be placed for cases where n is not equal to k
- Doctors’ rankings are directly converted into costs (lower rank = lower cost). This makes the optimization objective “minimize cost” equivalent to “maximize doctor satisfaction.”
- We modularized the algorithm into steps based on a youtube video explanation of the algorithm which made the logic easier to debug and explain<sup>[4]</sup>:
     - Step 1: Row reduction
     - Step 2: Column reduction
     - Step 3: Identify zeros and cover them with minimal lines
     - Step 4: Adjust the matrix when coverage is insufficient
     - Step 5: Extract optimal assignments from zero positions
- We wrote helper functions to map matrix indices back to actual doctor and hospital names, and return results as a clean pandas data frame
- To evaluate results, we defined a scoring function that sums the rank costs of assigned matches and compares this to a maximum possible dissatisfaction score. This gave us a percentage measure of solution quality
- By separating data import, preprocessing, algorithm steps, and output formatting into different modules, we kept the code design flexible and easy to maintain

### Section 4: Labour Division

- Sakshi worked on implementing Step 1 (row reduction), Step 2 (column reduction), and Step 4 (matrix adjustment) functions of the Hungarian algorithm. She also prepared this project write-up and developed testing functions to check that the code was running correctly

- Jonathan focused on Step 3 of the Hungarian algorithm, which involved crossing out zeros and determining the minimum number of lines needed to cover them. He also put together the main demo script that showcased the full workflow of the algorithm on sample data. Additionally, he contributed to the implementation of Step 5 of the Hungarian Algorithm. 

- Nnemdi handled the initial data import and preprocessing, including setting up hospital capacities and padding the matrix so it could be used by the algorithm. In addition, she implemented Step 5 of the Hungarian algorithm, which determines the final assignments of doctors to hospitals from the reduced cost matrix

### Section 5: Organization of this Repository
Our code is split into several files to maintain an orderly, modular structure:
- Main.py: runs the full pipeline, tying everything together and printing the final matches and satisfaction score %
- Demo.ipynb: a notebook we used to show step-by-step outputs and explain the algorithm
- HungarianAlgoImportData.py: handles data loading, hospital capacities, prepping the matrix and scoring results
- Hungarian_Alg_Steps.py: contains the core steps of the Hungarian algorithm (row reduction, column reduction, and matrix adjustment)
- Hungarian_Alg_LineCheck.py: handles crossing out zeros (Step 3), the final assignment step (Step 5), and mapping results back to doctor and hospital names
- README.md: Contains this specific project write up and description
- CSV files: different test datasets we used to check the algorithm's implementation

### References (cited):

[1]The Hungarian Algorithm for the Assignment Problem. The Department of Computer Science. Accessed September 13, 2025. http://www.cs.emory.edu/~cheung/Courses/253/Syllabus/Assignment/algorithm.html.  

[2]Efimov V. The Hungarian algorithm and its applications in computer vision. Towards Data Science. September 9, 2025. Accessed September 13, 2025. https://towardsdatascience.com/hungarian-algorithm-and-its-applications-in-computer-vision/.  

[3]Hungarian algorithm for solving the assignment problem. Hungarian Algorithm - Algorithms for Competitive Programming. December 13, 2023. Accessed September 13, 2025. https://cp-algorithms.com/graph/hungarian-algorithm.html.  

[4]The Munkres Assignment Algorithm (Hungarian Algorithm). YouTube. Accessed September 13, 2025. https://www.youtube.com/watch?v=cQ5MsiGaDY8&t=290s&pp=ygUdaHVuZ2FyaWFuIGFsZ29yaXRobSBleHBsYWluZWQ%3D. 
