### thanks reference

- [Sartorius - Cell Instance_Starter](https://www.kaggle.com/drcapa/sartorius-cell-instance-starter)

## Library

In [None]:
import numpy as np
import pandas as pd
import os
import time
import matplotlib.pyplot as plt
import seaborn as sns
import cv2


In [None]:
path = "../input/sartorius-cell-instance-segmentation/"
os.listdir(path)

The training annotations are provided as run length encoded masks, and the images are in PNG format. The number of images is small, but the number of annotated objects is quite high. The hidden test set is roughly 240 images.

In [None]:
train = pd.read_csv(path + 'train.csv')
submission = pd.read_csv(path + 'sample_submission.csv')

### train
- id - unique identifier for object
- annotation - run length encoded pixels for the identified neuronal cell
- width - source image width
- height - source image height
- cell_type - the cell line
- plate_time - time plate was created
- sample_date - date sample was created
- sample_id - sample identifier
- elapsed_timedelta - time since first image taken of sample

In [None]:
train

In [None]:
submission

### Domain knowledge

cell line

Atrocycle
- Astrocyte, star-shaped cell that is a type of neuroglia found in the nervous system in both invertebrates and vertebrates.Astrocyte fix neuron and supply nutrition to nueron.

Glioblastoma
- Glioblastoma is an aggressive type of cancer that can occur in the brain or spinal cord

## EDA

### train data

##### show height ( all 520) / width(all 704)

In [None]:
print(train['height'].value_counts())
print(train['width'].value_counts())

##### utils functoin(countplot)

In [None]:
def util_conutplot(columns, input_data):
    fig, ax = plt.subplots(figsize = (20, 8))
    plt.subplot(2, 2, 1)
    sns.countplot(x=columns, data=input_data)
    train[columns].value_counts()

#### distribution cell type / id / plate_time / sample_data . elapsed_timedelta

In [None]:
column_list = ['cell_type','id' ,'plate_time' , 'sample_date', 'elapsed_timedelta']
for _, column in enumerate(column_list):
    util_conutplot(column, train)

In [None]:
row = 0
id_ = train.loc[row, 'id']
file = id_+'.png'
print(file)
print("id {} 1 id contain {} images".format(file,len(train[train['id']==id_])))

if change row ex)400 id (0140b3c8f445) has 108 images , so each file hold different numbers of images.

#### Look image file

In [None]:
img = cv2.imread(path+'train/'+file)
print('Image shape:', img.shape)

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(7, 7))
axs.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
axs.set_xticklabels([])
axs.set_yticklabels([])
plt.show()

In [None]:
# ref https://www.kaggle.com/ihelon/cell-segmentation-run-length-decoding
def rle_decode(mask_rle, shape, color=1):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height, width, channels) of array to return 
    color: color for the mask
    Returns numpy array (mask)

    '''
    s = mask_rle.split()
    
    starts =  list(map(lambda x: int(x) - 1, s[0::2]))
    lengths = list(map(int, s[1::2]))
    ends = [x + y for x, y in zip(starts, lengths)]
    
    img = np.zeros((shape[0] * shape[1], shape[2]), dtype=np.float32)
    
    for start, end in zip(starts, ends):
        img[start : end] = color
    
    return img.reshape(shape)

In [None]:
# detail rle_decode(mask_rle, shape, color=1):
shape=(520, 704, 3)
img = np.zeros((shape[0] * shape[1], shape[2]), dtype=np.float32)
print(img, img.shape)

color = 1
labels = train[train['id'] == '0030fd0e6378']['annotation'].tolist()
s = labels[0].split()
print("s is \n{}".format(s))

starts =  list(map(lambda x: int(x) - 1, s[0::2]))
print("start is {}".format(starts))

lengths = list(map(int, s[1::2]))
print("length is {}".format(lengths))

ends = [x + y for x, y in zip(starts, lengths)]
print("ends is {}".format(ends))

for start, end in zip(starts, ends):
    img[start : end] = color

print("img(masked) is \n{}".format(img))

In [None]:
def plot_masks(image_id, colors=True):
    labels = train[train['id'] == image_id]['annotation'].tolist()
    
    if colors:
        mask = np.zeros((520, 704, 3))
        for label in labels:
            mask += rle_decode(label, shape=(520, 704, 3), color=np.random.rand(3))
    else:
        mask = np.zeros((520, 704, 3))
        for label in labels:
            mask += rle_decode(label, shape=(520, 704, 3))
    
    mask = mask.clip(0, 1)
    
    image = cv2.imread(f"../input/sartorius-cell-instance-segmentation/train/{image_id}.png")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    plt.figure(figsize=(16, 32))
    plt.subplot(3, 1 , 1)
    plt.imshow(image)
    plt.axis("off")
    plt.figure(figsize=(16, 32))
    plt.subplot(3, 1 , 2)
    plt.imshow(image)
    plt.imshow(mask, alpha=0.4)
    plt.axis("off")
    plt.subplot(3, 1 , 3)
    plt.imshow(image)
    plt.imshow(mask)
    plt.axis("off")

    plt.show()

In [None]:
plot_masks('0030fd0e6378', colors=True)

In [None]:
plot_masks('0140b3c8f445', colors=True)

#### train

#### LIVECell_dataset

LIVECell_dataset_2021 - A mirror of the data from the LIVECell dataset. LIVECell is the predecessor dataset to this competition. You will find extra data for the SH-SHY5Y cell line, plus several other cell lines not covered in the competition dataset that may be of interest for transfer learning.