OK, do we really need another set of RLE decoding and encoding routines? I think so, because I spent way too much time fiddling with existing ones, so I wrote another one from scratch.

These routines are
* memory-efficient
* fast
* definitely correct on a pixel basis for the specific format used in this competition.

Some other routines I've found were off 1 pixel (not that it matters, but aren't we all a bit nitpicky in this line of work ;)) or had other limitations.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tifffile as tiff

In [None]:
def rle2mask(rle, mask_shape):
    ''' takes a space-delimited RLE string in column-first order
    and turns it into a 2d boolean numpy array of shape mask_shape '''
    
    mask = np.zeros(np.prod(mask_shape), dtype=bool) # 1d mask array
    rle = np.array(rle.split()).astype(int) # rle values to ints
    starts = rle[::2]
    lengths = rle[1::2]
    for s, l in zip(starts, lengths):
        mask[s:s+l] = True
    return mask.reshape(np.flip(mask_shape)).T # flip because of column-first order


def mask2rle(mask):
    ''' takes a 2d boolean numpy array and turns it into a space-delimited RLE string '''
    
    mask = mask.T.reshape(-1) # make 1D, column-first
    mask = np.pad(mask, 1) # make sure that the 1d mask starts and ends with a 0
    starts = np.nonzero((~mask[:-1]) & mask[1:])[0] # start points
    ends = np.nonzero(mask[:-1] & (~mask[1:]))[0] # end points
    rle = np.empty(2 * starts.size, dtype=int) # interlacing...
    rle[0::2] = starts # ...starts...
    rle[1::2] = ends - starts # ...and lengths
    rle = ' '.join([ str(elem) for elem in rle ]) # turn into space-separated string
    return rle

Let's check with the original train RLEs:

In [None]:
df_enc = pd.read_csv('../input/hubmap-kidney-segmentation/train.csv')
df_enc

We'll take image 0486052bb:

In [None]:
enc_original = df_enc.iloc[3,1]
enc_original[:1000]

In [None]:
tiff_shape = tiff.TiffFile('../input/hubmap-kidney-segmentation/train/0486052bb.tiff').pages[0].shape[:2]
tiff_shape

Fast decoding:

In [None]:
%%time
mask = rle2mask(enc_original, tiff_shape)

Mask looks ok *and is in the correct numpy orientation*:

In [None]:
plt.imshow(mask[::50,::50]);

Fast encoding, no RAM was harmed in the process...

In [None]:
%%time
enc_reencoded = mask2rle(mask)

And the final check:

In [None]:
enc_reencoded[:1000]

In [None]:
enc_original == enc_reencoded