## Cell image analysis with python via google colab

First we need to set up for colab to be able to find files on google drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

We will use the skimage python package for loading image files

In [None]:
import skimage.io

In [None]:
img = skimage.io.imread('/content/drive/MyDrive/images/cell_stained.jpg') #simply change the file name to analyze a different image

Let's take a quick look at the image using matplotlib's pyplot

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
plt.imshow(img)

What data is stored in this img variable?

In [None]:
print(img.shape)
print(img[0,0,:])

This is a 3-channel 2D image (here 480x640x3) where each pixel of the 480x640 grid has 3 numbers specifying red, green, and blue intensities. These channels may have unique information-- lets look at each channel individually.

In [None]:
plt.subplot(1,3,1); plt.imshow(img[:,:,0],cmap=plt.cm.Reds); plt.title('red channel 0'); plt.axis('off')
plt.subplot(1,3,2); plt.imshow(img[:,:,1],cmap=plt.cm.Greens); plt.title('green channel 1'); plt.axis('off')
plt.subplot(1,3,3); plt.imshow(img[:,:,2],cmap=plt.cm.Blues); plt.title('blue channel 2'); plt.axis('off')
plt.savefig('/content/drive/MyDrive/images/channels.png')

To analyze individual cells in the image, we need to pick them out of the image. This task is referred to as segmentation. We will utilize the cellpose python package, which is a pre-trained machine learning method utilizing a deep neural network and trained across many different kinds of cell imaging.

In [None]:
!pip install cellpose #the exclamation point is a shorthand in a jupyter notebook (which google colab notebooks are built on) to run system commands
!pip install "opencv-python-headless<4.3"

To use cellpose, we need to specify a pre-trained neural network model. There are several models available, we will use one that is made to segment out individual cells from images of the cytoplasm, or cell body.

In [None]:
from cellpose import models
model = models.Cellpose(model_type='cyto2') #other model types are available, see cellpose documentation https://github.com/MouseLand/cellpose

We can now run the model on our image, and see what the results look like-- this takes about a minute usually

In [None]:
masks, flows, styles, diams = model.eval(img, diameter=None, flow_threshold=None, channels=[0,0], cellprob_threshold=-1) #this command runs the pre-trained neural network model on our cell image

Cellpose provides a plotting tool to show us the output of the segmentation. For the provided default image (images/cell_image.jpg) the model picks out what looks to be many individual cells, as well as some of the debris. For your image, how does the segmentation perform? Are there any cells that get missed, or cells that the neural network model segmented that don't appear to be real cells?

In [None]:
from cellpose import plot
fig = plt.figure(figsize=(12,5))
plot.show_segmentation(fig, img, masks, flows[0], channels=[0,0])
plt.tight_layout()
plt.savefig('/content/drive/MyDrive/images/cell_segmentation.png')
plt.show()


What is the output? Lets focus on the masks-- these are a set of labels which "mask" over each individual object picked out by the segmentation.

In [None]:
print(np.unique(masks))
ncells = np.max(masks) + 1

We can take a look at these labels one-by-one to see what they are

In [None]:
from IPython.display import clear_output
for i in range(1,ncells):
    clear_output(wait=True)
    plt.imshow(masks==i) #the logical 
    plt.pause(.2)


Let's measure the cell areas in pixels. To do so, we will add up how many pixels are labeled with each unique label (integers up to ncells)

In [None]:
cell_sizes = np.zeros(ncells) #initialize array for output
for i in range(1,ncells):
    cell_sizes[i-1] = np.sum(masks==i) #i-1 because cells are labeled with integers 1 to ncells while python indexing will go from 0 to ncells-1

In [None]:
hist = plt.hist(cell_sizes,bins=np.linspace(0,np.max(cell_sizes),20)) #create a histogram with 20 bins
plt.plot([np.mean(cell_sizes),np.mean(cell_sizes)],[0,np.max(hist[0])],'k--',label='average cell area')
plt.legend(loc='upper right')
plt.title('cell area histogram')
plt.ylabel('number of cells')
plt.xlabel('cell area (pixels)')
plt.savefig('/content/drive/MyDrive/images/cellarea.png')

We can also measure single-cell intensities in each of the different image channels (Red, Green, Blue)

In [None]:
cell_intensities = np.zeros((ncells,3))
for i in range(1,ncells):
    indcell = np.where(masks==i) #picks out image pixels where each single-cell is labeled
    for ichannel in range(3):
        cell_intensities[i-1,ichannel] = np.sum(img[indcell[0],indcell[1],ichannel])

colorSet = ['red','green','blue']
plt.figure()
ax=plt.gca()
for ichannel in range(3):
    vplot = ax.violinplot(cell_intensities[:,ichannel],positions=[ichannel+1],showmeans=True,showextrema=False)
    vplot['cmeans'].set_color(colorSet[ichannel])
    for pc in vplot['bodies']:
        pc.set_facecolor(colorSet[ichannel])

ax.set_xticks(range(1,4))
ax.set_xticklabels(['Red','Green','Blue'])
plt.ylabel('Total single-cell intensity')
plt.xlabel('image channels')
plt.savefig('/content/drive/MyDrive/images/channel_intensities.png')
plt.pause(.1)

Are these channel intensities correlated? If we make a scatter plot in 3 dimensions of each channel, we can see if the channels carry independent information (points lie everywhere in the 3d space) or dependent information (points lie along a line in the 3d space)

In [None]:
plt.figure()
ax=plt.axes(projection='3d')
ax.scatter(cell_intensities[:,0],cell_intensities[:,1],cell_intensities[:,2],s=20,c=np.mean(cell_intensities,axis=1)/np.max(cell_intensities),cmap=plt.cm.jet)
plt.savefig('/content/drive/MyDrive/images/channel_scatter.png')