In [None]:
import numpy as np

## Notebook purpose:
- The purpose of this notebook is to ask: when projecting 1D coordinates to 2D one, how np.reshape should be use
- In my idea:
    - **np.reshape()** (default) chunk the 1D array row-by-row, which means left-> right then top -> bottom.
    - However, it should be instead, **np.reshape(order='F')**, column-by-column, which means from top -> bottom then left to right

### According to evaluation page:
https://www.kaggle.com/c/sartorius-cell-instance-segmentation/overview/evaluation
- The competition format requires a space delimited list of pairs. For example, '1 3 10 5' implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. The pixels are one-indexed
and numbered from **top to bottom, then left to right**: 1 is pixel 1,1; 2 is pixel 2,1, etc.

### Also see the explanation here:
https://www.kaggle.com/c/sartorius-cell-instance-segmentation/discussion/278936
- The  first comment demonstrates how RLE works

### Compare the 2 types of reshaping using the above example

![](https://user-images.githubusercontent.com/17668390/137576003-a887b201-7cc0-4e22-975f-d7727094d1d0.png)

There are 48 pixels. The top left is numbered 1 then you go down that column until you hit number 8. The top of the next column starts number 9, then down to 16, etc.

If walk the pixels from 1 to 48, a line of yellow begins at pixel 11 for length 5, then another begins at 19 for length 5, then another begins at 27 for length 5 and the last begins at 37 for length 3.

So the RLE is "11 5 19 5 27 5 37 3".

In [None]:
arr2d = np.array([[0,0,0,0,0,0],
                  [0,0,0,0,0,0],
                  [0,1,1,1,0,0],
                  [0,1,1,1,0,0],
                  [0,1,1,1,1,0],
                  [0,1,1,1,1,0],
                  [0,1,1,1,1,0],
                  [0,0,0,0,0,0]])

In [None]:
encoded_str = "11 5 19 5 27 5 37 3"

### Default reshape (row-by-row)
- From notebook: https://www.kaggle.com/dschettler8845/sartorius-segmentation-eda-efficientdet-tf/notebook#helper_functions

In [None]:
def rle_decode(mask_rle, shape):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return 
    Returns numpy array, 1 - mask, 0 - background

    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape)  

In [None]:
mask = rle_decode(encoded_str, shape=(8,6))
mask

In [None]:
# chec if mask == arr2d
(mask == arr2d).all()

### Proposed reshape (column-by-column):
- I think it should be

In [None]:
def rle_decode_top_to_bot_first(mask_rle, shape):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return 
    Returns numpy array, 1 - mask, 0 - background

    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape, order='F')  # Reshape from top -> bottom first

In [None]:
mask = rle_decode_top_to_bot_first(encoded_str, shape=(8,6))
mask

In [None]:
# chec if mask == arr2d
(mask == arr2d).all()