# Intro
Welcome to the [Sartorius - Cell Instance Segmentation[](http://)](https://www.kaggle.com/c/sartorius-cell-instance-segmentation/code) compedition
![](https://storage.googleapis.com/kaggle-competitions/kaggle/30201/logos/header.png)
<span style="color: royalblue;">Please vote the notebook up if it helps you. Feel free to leave a comment above the notebook. Thank you. </span>

# Libraries

In [None]:
import os
import pandas as pd
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt

# Path

In [None]:
path = '/kaggle/input/sartorius-cell-instance-segmentation/'
os.listdir(path)

# Load Data

In [None]:
train_data = pd.read_csv(path+'train.csv')
samp_subm = pd.read_csv(path+'sample_submission.csv')

# Overview

In [None]:
print('Number of train samples: ', len(train_data.index))
print('Number of features: ', len(train_data.columns))

* id: unique identifier for object
* annotation: run length encoded pixels for the identified neuronal cell
* width: source image width
* height: source image height
* cell_type: the cell line
* plate_time: time plate was created
* sample_date: date sample was created
* sample_id: sample identifier
* elapsed_timedelta: time since first image taken of sample

In [None]:
train_data.head()

# Exploratory Data Analysis

There are 606 images in the train data set:

In [None]:
len(os.listdir(path+'train/'))

There are 3 cell types:

In [None]:
train_data['cell_type'].value_counts()

All images have the same shape:

In [None]:
train_data['height'].value_counts()

In [None]:
train_data['width'].value_counts()

# Setting
As we have seen, all the images have the same shape. So we can set a variable to fix the values:

In [None]:
shape = (520, 704)

# Focus On Sample Id
We consider the first dataset of the train data:

In [None]:
row = 0
id_ = train_data.loc[row, 'id']
file = id_+'.png'
file in os.listdir(path+'train/')

There are 395 samples for the image id:

In [None]:
len(train_data[train_data['id']==id_])

Load Image and show shape:

In [None]:
img = cv2.imread(path+'train/'+file)
print('Image shape:', img.shape)

Plot Image:

In [None]:
fig, axs = plt.subplots(1, 1, figsize=(7, 7))
axs.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
axs.set_xticklabels([])
axs.set_yticklabels([])
plt.show()

Annotations:
To encode the masks we use the function of this examples: 
* https://www.kaggle.com/paulorzp/run-length-encode-and-decode
* https://www.kaggle.com/inversion/run-length-decoding-quick-start

In [None]:
def rle_decode(mask_rle, shape):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height,width) of array to return 
    Returns numpy array, 1 - mask, 0 - background

    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape)

We write all masks of the image into a list and the decode them with the function above:

In [None]:
img_masks = train_data.loc[train_data['id']==id_, 'annotation'].to_list()
all_masks = np.zeros(shape)
for mask in img_masks:
    all_masks += rle_decode(mask, shape)

We plot the original image, the masks and the image with the masks:

In [None]:
fig, axarr = plt.subplots(1, 3, figsize=(15, 40))
axarr[0].axis('off')
axarr[1].axis('off')
axarr[2].axis('off')
axarr[0].imshow(img)
axarr[0].set_title('Original Image')
axarr[1].imshow(all_masks)
axarr[1].set_title('Masks')
axarr[2].imshow(img)
axarr[2].imshow(all_masks, alpha=0.4)
axarr[2].set_title('Original Image And Masks')
plt.tight_layout(h_pad=0.1, w_pad=0.1)
plt.show()

The next step is to encode the masks. Therefore we use the function of the recommended links above:

In [None]:
def rle_encode(img):
    '''
    img: numpy array, 1 - mask, 0 - background
    Returns run length as string formated
    '''
    pixels = img.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)

In [None]:
rle_encode(all_masks)[:100]