# RLE_NUMBA Function Explained

**Description:**
 - rle_numba is a function used in a number of public HuBMAP notebooks for inference and creating a submission.
 - Seemed to me this function was essential to making a submission, and so I thought I would comb through the function line by line to get a better understanding of how it works.

**References:**
 - https://www.kaggle.com/leighplt/pytorch-fcn-resnet50
 - https://www.kaggle.com/joshi98kishan/hubmap-keras-pipeline-training-inference
 - https://www.kaggle.com/c/hubmap-kidney-segmentation/overview/supervised-ml-evaluation
 
**Function Expectation:**

 - "In order to reduce the submission file size, our metric uses run-length encoding on the pixel values.  Instead of submitting an exhaustive list of indices for your segmentation, you will submit pairs of values that contain a start position and a run length. E.g. '1 3' implies starting at pixel 1 and running a total of 3 pixels (1,2,3)."  
 - "Note that, at the time of encoding, the mask should be binary, meaning the masks for all objects in an image are joined into a single large mask. A value of 0 should indicate pixels that are not masked, and a value of 1 will indicate pixels that are masked.  The competition format requires a space delimited list of pairs. For example, '1 3 10 5' implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. The metric checks that the pairs are sorted, positive, and the decoded pixel values are not duplicated. The pixels are numbered from top to bottom, then left to right: 1 is pixel (1,1), 2 is pixel (2,1), etc."

# Imports

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os

# Full Function

In [None]:
def rle_numba(pixels):
    size = len(pixels)
    points = []
    if pixels[0] == 1: points.append(1)
    flag = True
    for i in range(1, size):
        if pixels[i] != pixels[i-1]:
            if flag:
                points.append(i+1)
                flag = False
            else:
                points.append(i+1 - points[-1])
                flag = True
    if pixels[-1] == 1: points.append(size-points[-1]+1)    
    return points

# Generate Fake Data

In [None]:
np.random.seed(0)
pixels = np.where(np.random.randint(0,100,20)>50,1,0)
print("pixels list:")
pixels

# Run Function

In [None]:
rle_numba(pixels)

# Line by Line Walkthrough

* get the length of the sequence

In [None]:
size = len(pixels)
print("size of list 'pixels': ", size)

* if the first value is 1, append 0 to points
* otherwise, do nothing

In [None]:
points = []
if pixels[0] == 1: 
    points.append(0)
print("points list:")
points

* we start at position 1 in pixels, and we compare the current position to the previous

In [None]:
i = 1
print("current value in list pixels: ", pixels[i])
print("previous value in list pixels: ", pixels[i-1])

* set 'flag' to True
* if the current position is not equal to the previous position, append i+1 to points, or i.e. the current position value plus 1; then set flag to False
* otherwise, do nothing

**in this case, we do nothing because our previous and current values are equal (so flag is still True)**

In [None]:
flag = True
if pixels[i] != pixels[i-1]:
    if flag:
        points.append(i+1)
        flag = False
print("points list:")
points

* iterate to the next position and get previous and current values

In [None]:
i = 2
print("current value in list pixels: ", pixels[i])
print("previous value in list pixels: ", pixels[i-1])

* remember that previously, we still have flag set to True, and since the current value != previous value, we enter the first 'if flag' section
* this time we append i+1 to points and flag is set to False

### Note that when flag = True, seems the function is appending 'the starting position of a mask' to our output list.

In [None]:
flag = True
if pixels[i] != pixels[i-1]:
    if flag:
        points.append(i+1)
        flag = False
    else:
        print(points[-1])
        print(i+1)
        print(i+1 - points[-1])
        points.append(i+1 - points[-1])
        flag = True
print("points list:")
print(points)

* iterate to the next position and get previous and current values

In [None]:
i = 3
print("current value in list pixels: ", pixels[i])
print("previous value in list pixels: ", pixels[i-1])

 * note, previously, we set flag to False
 * since the previous and current values are equal, we do nothing

In [None]:
flag = False
if pixels[i] != pixels[i-1]:
    if flag:
        points.append(i+1)
        flag = False
    else:
        print(points[-1])
        print(i+1)
        print(i+1 - points[-1])
        points.append(i+1 - points[-1])
        flag = True
print("points list:")
print(points)

 * in fact, we do nothing until poisition 5, where previous and current values are now unequal

In [None]:
i = 5
print("current value in list pixels: ", pixels[i])
print("previous value in list pixels: ", pixels[i-1])

 * again, remember that flag is still currently False
 * since previous and current values are unequal and flag is False, we enter the else statement
 * this time, we appned {i+1 - 'last value'} and we set flag back to True

### First, note that when flag is False, it seems the function is appending 'the run length of the mask' to our output list.  Also note that {i+1 - 'last value'} is another way of saying {where we are - where we were}.  

### Notice how we flip back and forth between appending {i+1} and {i+1 - 'last value'} every time the previous value and current value are unequal.

In [None]:
flag = False
if pixels[i] != pixels[i-1]:
    if flag:
        points.append(i+1)
        flag = False
    else:
        print("last values in list points: ", points[-1])
        print("i+1: ", i+1)
        print("i+1 - last values in list points: ", i+1 - points[-1])
        points.append(i+1 - points[-1])
        flag = True
print("points list:")
print(points)

* We complete this for all values in list 'pixels'

In [None]:
flag = True
for i in range(6, size):
    if pixels[i] != pixels[i-1]:
        if flag:
            points.append(i+1)
            flag = False
        else:
            points.append(i+1 - points[-1])
            flag = True
print("points list:")
print(points)

* Finally, if the last value in list 'pixels' is 1, we append a final value to points of {size - 'last value' + 1}

We do this because if the final mask runs to the end (i.e. pixels list ends with a series of 1's), there is no value change at the end of the list; so if the final value is 1, we just take {size - 'last value' + 1} as our final run length. 

In [None]:
pixels

In [None]:
pixels[-1]

In [None]:
if pixels[-1] == 1:
    print("size: ", size)
    print("last values in list points: ", points[-1])
    print("size - last value in list points + 1:", size-points[-1]+1)
    points.append(size-points[-1]+1)  
print("points list:")
print(points)

# Caveats

 * If the 'pixels' list starts with a 1 pixel mask, then the rest of the output looks off
 * A 1 pixel mask in the beginning probably doesn't make sense, but if your model produces something like this by chance, then you could have a problem
 
 * also note that it seems like the function is not expecting to start with a short mask, which again probably doesn't make sense, but if your model produces something like this by chance, again you could have a problem

In [None]:
np.random.seed(1331)
pixels = np.where(np.random.randint(0,100,20)>50,1,0)
pixels

In [None]:
rle_numba(pixels)

# My Function Revision

I think the original function might have a coding error.  Below is an alternative function that appends 1, and in addition sets flag to False, if the the list 'pixels' starts with a mask.  If we're appending 1 and starting with a mask, we want our next appended value to be a run length.  By setting flag to False when pixels[0]==1, we accomplish this. 

In [None]:
np.random.seed(1331)
pixels = np.where(np.random.randint(0,100,20)>50,1,0)
pixels

In [None]:
def rle_numba_v2(pixels):
    size = len(pixels)
    points = []
    if pixels[0] == 1: 
        points.append(1)
        flag = False
    else:
        flag = True
    for i in range(1, size):
        if pixels[i] != pixels[i-1]:
            if flag:
                points.append(i+1)
                flag = False
            else:
                points.append(i+1 - points[-1])
                flag = True
    if pixels[-1] == 1: points.append(size-points[-1]+1)    
    return points

In [None]:
rle_numba_v2(pixels)

# The End

Thanks for reading :).  Would love to hear your thoughts and feedback in the comments.  Thanks again!