Ema's Quantum Computer

https://www.hackerrank.com/challenges/two-pluses/problem?isFullScreen=true

The implementation of Ema's Quantum computer is a complex one, as many complex factors kick in. Currently, all proposed solutions brute force the matter, but we can do better. Inside the problem, there are many parallels to other problems, and notably, several tools and strategies can be used to rapidly decrease the search time.

Problem description: Ema has a grid of good 'G' and bad 'B' cells of size m * n. We need to find the largest product of the volume of two non-overlapping plus signs. A plus sign is a symmetric plus sign of width 1 with all arms of the same length and a centerpoint at which all four arms meet.

A list of the most notable tools available:
- Dictionary Indexing
- L1 norm spatial sorting
- Key sorting and greedy algorithms
- Early stopping heuristics
- Spatially bounded algorithms that reduce n^4 computation to n^3
- Convex Hulls with reduced dimensionality

Step 1: Finding plusses
The comparison tools to find plusses in an optimal manner largely dilute to a single strategy: Trading time complexity for space complexity.
The n x m grid naturally has a minimum of O(n * m) to process. So below, the populate function does just that.
- An initial row by row pass finds and memorizes all the rightmost indexes of a segment and notes both the segment length and rightmost index for each cell in the segment
- A second column by column pass finds the same for each column and enters the largest possible plus at each cell into the dictionary of class Plus
This strategy takes advantage of the fact that given two endpoints and a length, a cell centerpoint of a plus is able to compute its own maximal size in constant time.
As a result, the dictionary Plus.plusses and Plus.decr_plus are both populated in O(n * m) time

Step 2: Finding the largest product.
A naive implementation of this would be to brute force the combinations of every single plus sign and smaller plus sign inside each plus sign.
This would be an ((n * m) choose 2) problem, leading to an (n^2 * m^2) result. 

Optimization: Heuristics
- Plus.decr_plus is specially designed to have a key which is the volume of the plus signs. It is provable are at most min(n, m) different size plusses.
- As a result, the runtime of the sort is O(min(n, m) * log(min(n, m))), well within bounds.

Doing so allows us to first consider the largest possible candidates and if they overlap, consecutively going smaller until further searches are no longer required.
In the vast majority of cases this is enough, leading to a near consant time search for the largest possible plus.

However, we want to do better than this and push the absolute limits of algorithmic optimization. Below is a concept of how to approach this.
To be implemented:

Bounding Box algorithm:
- Given an array of equal size plusses, a bounding box can be constructed where plus centers inside the box must intersect at least one other plus of the same size and plus signs that fall outside of the bounding box must necessarily be distinct. This alone massively reduces the compute time of the first pass of the algorithm.
- Taking advantage of the fact that plusses of equal size must overlap twice when they intersect, we get a fast solution for computing the largest prodcut of their intersect: 
    - Let x be the distance between two centers on the x axis, y be the distance between two centers on the y axis
    - Case 1: x == 0 or y == 0 -- The arms overlap, maximum product is when the arm length is equally divided between the two plus signs. Constant time solution.
    - Case 2: x == y -- The centers are directly diagonal to each other. Maximum product results in the trimming of one plus until there's no intersect. Constant time solution.
    - Case 3: None of the above -- The maximum product is (max(x, y) * 4 + 1) ^ 2. Both plusses are trimmed until they slip past each other. Linear distance metric.
        - Key observation: A 1d convex hull can be recorded with 1d dimensionality and constant time insertion
        - Key obvservation: Instead of doing a full convex hull, a 90 degree convex hull from each corner of the bounding box is sufficient to reduce runtime.
        - Key observation: The convex hull created contains the mutually furthest distances (wrt maximum product) between a point and its corner.
        - A convex hull should be created for each corner, 
    - Example: given a plus at (3, 3) with volume 9, a plus at (4,6) is identical to a plus at (5,6) because the optimal solution for both is to truncate to the largest of the x and y differences. Notable exceptions are overlapping plusses at (3, 3) and (3, 4) as well as directly diagonal plusses (3, 3) and (4, 4) which can be handled separately through a number of sweep line algorithms horizontally, vertically, and diagonally.
As a result, this bounding box algorithm for a fixed volume is linear runtime, and constructing the convex hull is also linear runtime due to the restraints of the maximization function which allow a significantly faster than euclidean distance insertion of log(n)

For simplicity's sake, we'll consider n == m, with the brute force being O(n^4)

Now onto the final step:
- Each distinct volume can have its own bounding box created in linear runtime over n * m elements. -- current runtime O(n^2)
- There are n distinct bounding boxes and convex hulls in the worst case
- Each convex hull has sqrt(n^2) = n possible elements including overlaps. Convex hulls are dimesnionally 1 smaller than the box they cover, so sqrt(n) is the most plusses.
- For each convex hull, at most n operations are permitted to reach a global runtime of O(n^3)
- Comparing two convex hulls for largest L1 and next largest L2 plus volumes:
    - As of now, the brute force method takes O(n^(7/2)), which is a major improvement over the previous n^4, but we can do better.
    - Shrink the bounding box of L1 to match L2 and take a sorted permutation: Convex hull might go outside bounding box.
        - For each of n bounding boxes, run a sorted permutation over all combinations L1 (larger) and L2 (smaller) volumes.
        - Case 1: Both convex hulls fall inside both bounding boxes
            - sqrt(n) recategorization of sqrt(n) plusses under 1d convex hull rules -- current runtime O(n^(5/2))
        - Case 2: Some convex hull plus centerpoints fall outside of the opposing bounding box
            - At minimum, we have L2 * L2 = max product
            - A final n runtime permutation brute force stays below the O(n^3) limit, -- current runtime O(n^(5/2))
        - Case 3: Some conves hull plus centerpoints fall outside of its own shrunk bounding box
            - At minimum, we have L2 * L2 = max product
            - A final n runtime permutation brute force stays below the O(n^3) limit, -- current runtime O(n^(5/2))
    
Final runtime: O(5/2)


 


In [12]:
def twoPluses(grid):
    class Plus:
        def __init__(self):
            self.plusses = {}
            self.decr_plus = {}
            self._maxprod = 0
            self.sorted_at_volume = set()
            self.upper_bound = None

        # Populate class attributes when inserting a plus
        def add(self,row,col,volume):
            t = (row, col)
            self.plusses[t] = volume
            if volume not in self.decr_plus:
                self.decr_plus[volume] = []
            self.decr_plus[volume].append(t)

        # For use in exists conditions
        def __contains__(self, row_col_tuple):
            return row_col_tuple in self.plusses
        
        # Retrieve plusses at specific volume
        def plusses_at_volume(self, volume):
            if volume not in self.sorted_at_volume:
                self.decr_plus[volume].sort(key = lambda x: x[0] + x[1])  # L1 norm very likely to have disjoint plusses.
                self.sorted_at_volume.add(volume)
            return self.decr_plus[volume]
        
        # Maxprod attribute custom handled to reduce clutter
        @property
        def maxprod(self):
            return self._maxprod
        @maxprod.setter
        def maxprod(self, value):
            if self._maxprod < value:
                self._maxprod = value
            else:
                pass
                # print(f'{value} too small to set as maxprod: {self.maxprod}')

        # Main algorithm to compute max product after populating
        def find_max(self):
            vol_desc = sorted(list(self.decr_plus.keys()),reverse=True)
            num_unique_volumes = len(vol_desc)
            for i in range(num_unique_volumes):
                for j in range(i+1):
                    
                    vol_i, vol_j = vol_desc[i], vol_desc[j]
                    
                    # Calculate ideal product without any interference
                    ideal_prod = vol_i * vol_j

                    # Case: Same size plus comparison                    
                    if i == j:
                        plusses = self.plusses_at_volume(vol_desc[i])
                        for index_a in range(len(plusses)):
                            for index_b in range(index_a + 1, len(plusses)):
                                if ideal_prod <= self.maxprod:
                                    break
                                self.find_prod(plusses[index_a], plusses[index_b], vol_i, vol_j, ideal_prod)
                    # Case: Different size plus comparison (two arrays)
                    else:
                        i_plusses = self.plusses_at_volume(vol_desc[i])
                        j_plusses = self.plusses_at_volume(vol_desc[j])
                        for iplus in i_plusses:
                            for jplus in j_plusses:
                                if ideal_prod <= self.maxprod:
                                    break
                                self.find_prod(iplus, jplus, vol_i, vol_j, ideal_prod)

            return self.maxprod

        def find_prod(self, a, b, vol_a, vol_b, ideal_prod):           
            # Calculate arm lengths for both plusses
            arm1 = (vol_a-1)//4
            arm2 = (vol_b-1)//4
            
            # Immediate return if centers coincide
            if a == b: 
                return
            
            # Calculate absolute positional differences
            x = abs(a[0]-b[0])
            y = abs(a[1]-b[1])
            
            # Check for direct line overlap without displacement
            if not x or not y:
                combined_arms = arm1 + arm2
                max_distance = max(x, y)
                if combined_arms < max_distance:
                    self.maxprod = ideal_prod
                    return
                else:
                    space = max_distance - 1
                    arm_small = min(arm1, arm2, space//2)
                    self.maxprod = (arm_small * 4 + 1) * ((space - arm_small) * 4 + 1)
                    return

            # Prepare zero-based indices for conflict checks
            x_space = x - 1
            y_space = y - 1
            
            # Define and check conflict conditions
            axbyconflict = (arm1 >= x and arm2 >= y)
            aybxconflict = (arm1 >= y and arm2 >= x)
            if not axbyconflict and not aybxconflict:  # No conflict
                self.maxprod = ideal_prod
            elif axbyconflict and aybxconflict:  # Symmetric error, larger must win
                self.maxprod = max((min(x_space, y_space) * 4 + 1) * max(vol_a, vol_b), (max(x_space, y_space) * 4 + 1) ** 2)
            elif axbyconflict:  # Only AXBY conflict, require permutation
                self.maxprod = max(((x_space * 4 + 1) * vol_b), (vol_a * (y_space * 4 + 1)))
            else:  # Only AYBX conflict
                self.maxprod = max(((y_space * 4 + 1) * vol_b), (vol_a * (x_space * 4 + 1)))

        # Print method to visualize plusses
        def print(self,n = 1):
            print(f'printing plusses with volume {n} or larger')
            i = 0
            for key, value in self.plusses.items():
                if value >= n:
                    print(key,value,end=' ')
                    i +=1
                    if i % 8 ==  0: print()
            print(f'count {i}')
    # Main function to populate the dictionary of largest plusses at each row and column and their volumes
    def populate(grid):
        def process_segment(row, start, end, d):
            L = end - start + 1
            for col in range(start, end + 1):
                d[(row, col)] = (end, L)  # Populate flat dictionary
                
        def process_plus(col, start, end, d):
            L = end - start + 1
            for row in range(start, end + 1):
                row_right, row_L = d[(row, col)]
                col_right, col_L = end, L
                row_left, col_left = row_right - row_L + 1, col_right - col_L + 1
                volume = min((col - row_left), (row_right - col), (row - col_left), (col_right - row)) * 4 + 1
                dplus.add(row, col, volume)
        def parse_rows(grid):
            for rownum, row in enumerate(grid):
                start_i = None
                for i, val in enumerate(row):
                    if val == 'G':
                        if start_i is None:
                            start_i = i  # Start of a new plus
                    else:
                        if start_i is not None:  # End of a plus
                            process_segment(rownum, start_i, i - 1, d)
                            start_i = None
                if start_i is not None:  # Process any segment extending to the end of the row
                    process_segment(rownum, start_i, len(row) - 1, d)

        def parse_columns(grid):
                num_rows = len(grid)
                num_cols = len(grid[0])
                for col in range(num_cols):
                    start_i = None
                    for i in range(num_rows):
                        if grid[i][col] == 'G':
                            if start_i is None:
                                start_i = i
                        else:
                            if start_i is not None:    # end of a plus
                                process_plus(col, start_i, i - 1, d)
                                start_i = None
                    if start_i is not None:    # process any segment extending to end of col
                        process_plus(col, start_i, num_rows - 1, d)
        dplus = Plus()
        d = {}
        parse_rows(grid)
        parse_columns(grid)

        return dplus
    if not grid or len(grid[0]) == 0:
        return 0

    dplus = populate(grid)
    dplus.print(2)
    return dplus.find_max()


In [13]:
grid = [
    'GGGGGGGGGGGG',
    'BGBGGGBGBGBG',
    'GGGGGGGGGGGG',
    'BGBGGGBGBGBG',
    'GGGGGGGGGGGG',
    'GGGGGGGGGGGG',
    'GGGGGGGGGGGG',
    'GGGGGGGGGGGG',
    'BGBGGGBGBGBG',
    'BGBGGGBGBGBG',
    'BGBGGGBGBGBG',
    'BGBGGGBGBGBG',
    'GGGGGGGGGGGG',
    'GGGGGGGGGGGG'
]

In [14]:
def visualize_easier(grid):
    grid2 = []
    for row in grid:
        temp = row.replace('G','O')
        grid2.append(temp.replace('B','-'))
    return grid2
x = visualize_easier(grid)
for i in x:
    print(i)

OOOOOOOOOOOO
-O-OOO-O-O-O
OOOOOOOOOOOO
-O-OOO-O-O-O
OOOOOOOOOOOO
OOOOOOOOOOOO
OOOOOOOOOOOO
OOOOOOOOOOOO
-O-OOO-O-O-O
-O-OOO-O-O-O
-O-OOO-O-O-O
-O-OOO-O-O-O
OOOOOOOOOOOO
OOOOOOOOOOOO


In [15]:
twoPluses(grid)

printing plusses with volume 2 or larger
(2, 1) 5 (4, 1) 5 (5, 1) 5 (6, 1) 5 (7, 1) 5 (12, 1) 5 (5, 2) 5 (6, 2) 5 
(2, 3) 9 (4, 3) 13 (5, 3) 13 (6, 3) 13 (7, 3) 13 (12, 3) 5 (1, 4) 5 (2, 4) 9 
(3, 4) 5 (4, 4) 17 (5, 4) 17 (6, 4) 17 (7, 4) 17 (8, 4) 5 (9, 4) 5 (10, 4) 5 
(11, 4) 5 (12, 4) 5 (2, 5) 9 (4, 5) 17 (5, 5) 21 (6, 5) 21 (7, 5) 21 (12, 5) 5 
(5, 6) 5 (6, 6) 5 (2, 7) 9 (4, 7) 17 (5, 7) 17 (6, 7) 17 (7, 7) 17 (12, 7) 5 
(5, 8) 5 (6, 8) 5 (2, 9) 9 (4, 9) 9 (5, 9) 9 (6, 9) 9 (7, 9) 9 (12, 9) 5 
(5, 10) 5 (6, 10) 5 count 50


189

In [285]:
grid = [
    'BBBGBGBBB',
    'BBBGBGBBB',
    'BBBGBGBBB',
    'GGGGGGGGG',
    'BBBGBGBBB',
    'BBBGBGBBB',
    'GGGGGGGGG',
    'BBBGBGBBB',
    'BBBGBGBBB',
    'BBBGBGBBB'
]

In [231]:
grid = [
    'GBBBBBBGGGBGGBB',
    'GBBBBBBGGGBGGBB',
    'GBBBBBBGGGBGGBB',
    'GBBBBBBGGGBGGBB',
    'GGGGGGGGGGGGGGG',
    'GGGGGGGGGGGGGGG',
    'GBBBBBBGGGBGGBB',
    'GBBBBBBGGGBGGBB',
    'GGGGGGGGGGGGGGG',
    'GBBBBBBGGGBGGBB',
    'GBBBBBBGGGBGGBB',
    'GGGGGGGGGGGGGGG',
    'GGGGGGGGGGGGGGG',
    'GBBBBBBGGGBGGBB'
]

In [208]:
grid = [
    'GBGBGGB',
    'GBGBGGB',
    'GBGBGGB',
    'GGGGGGG',
    'GGGGGGG',
    'GBGBGGB',
    'GBGBGGB'
]

In [116]:
grid = [
'GGGGGG',
'GBBBGB',
'GGGGGG',
'GGBBGB',
'GGGGGG'
]

In [144]:
grid = [
    'BBGBBBB',
    'BBGBBBB',
    'GGGGGBB',
    'BBGGGGB',
    'BBGBGBB'
]

In [168]:
grid = [
    'BGGGB',
    'GGGGG',
    'BGGGB',
]

In [None]:
grid = [
'GGGGGG',
'GBBBGB',
'GGGGGG',
'GGBBGB',
'GGGGGG',
]