# Single Cell Patterns - Train Set Analysis

The objective of this notebook is to compare the theory (single cell patterns as described in the original host notebook) with reality (examples coming from the train set). I hope this insights will help with the modelling!

The train set examples were generated by segmenting train images with HPA Cell Segmentator and saving the RGB channels to jpg images. These may contain a lot of noise as individuall cells may not necessarily reflect the image-level labels.

### Nice people that enjoy this notebook also press the upvote button :)

In [None]:
!pip install ipyplot -q
from fastai.vision.all import *
path = Path('../input/hpa-cell-tiles-sample-balanced-dataset')
df = pd.read_csv(path/'cell_df.csv')
import ipyplot

# 0. Nucleoplasm

The nucleus is found in the center of cell and can be identified with the help of the signal in the blue nucleus channel. A staining of the nucleoplasm may include the whole nucleus or of the nucleus without the regions known as nucleoli (Class 2). 

<div style="background-color:coral">
    <div style="text-align:center;background-color:white">
        <div style="display:inline-block;margin-left: auto;margin-right: auto">
            <img src="https://images.proteinatlas.org/63647/if_selected_medium.jpg" width="200" height="200"/>
        </div>
        <div style="display:inline-block;margin-left:10px;margin-right: auto">
            <img src="https://images.proteinatlas.org/52059/if_selected_medium.jpg" width="200" height="200"/>
        </div>
        <center>
        <a href="https://www.proteinatlas.org/ENSG00000270647-TAF15/cell/U-2+OS#img">More examples</a> 
        </center>
    </div>
</div>


In [None]:
cat = '0'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 1. Nuclear membrane

The nuclear membrane appears as a thin circle around the nucleus. It is not perfectly smooth and sometimes it is also possible to see the folds of the membrane as small circles or dots inside the nucleus.
                                                                           
<div style="background-color:coral">
    <div style="text-align:center;background-color:white">
        <div style="display:inline-block;margin-left: auto;margin-right: auto">
            <img src="https://images.proteinatlas.org/50524/if_selected_medium.jpg" width="200" height="200"/>
        </div>
        <div style="display:inline-block;margin-left:10px;margin-right: auto">
            <img src="https://images.proteinatlas.org/5269/950_C1_1_selected_medium.jpg" width="200" height="200"/>
        </div>
        <center>
        <a href="https://www.proteinatlas.org/ENSG00000113368-LMNB1/cell/MCF7#img">More examples</a> 
        </center>
    </div>
</div>


In [None]:
cat = '1'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 2. Nucleoli

Nucleoli can be seen as slightly elongated circular areas in the nucleoplasm, which usually display a much weaker staining in the blue DAPI channel. The number and size of nucleoli varies between cell types.
                                                                   
<div style="background-color:coral">
    <div style="text-align:center;background-color:white">
        <div style="display:inline-block;margin-left: auto;margin-right: auto">
            <img src="https://images.proteinatlas.org/35736/if_selected_medium.jpg" width="200" height="200"/>
        </div>
        <div style="display:inline-block;margin-left:10px;margin-right: auto">
            <img src="https://images.proteinatlas.org/35735/1888_E10_3_selected_medium.jpg" width="200" height="200"/>
        </div>
        <center>
        <a href="https://www.proteinatlas.org/ENSG00000155438-NIFK/cell/A-431#img">More examples</a> 
        </center>
    </div>
</div>

In [None]:
cat = '2'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 3. Nucleoli fibrillar center

Nucleoli fibrillary center can appear as a spotty cluster or as a single bigger spot in the nucleolus, depending on the cell type.

<p align="center">
  <img src="https://images.proteinatlas.org/37366/453_F3_1_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000166197-NOLC1/cell/HEK+293#img">More examples</a> 
</p> 

In [None]:
cat = '3'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 4. Nuclear speckles

Nuclear speckles can be seen as irregular and mottled spots inside the nucleoplasm.
<p align="center">
  <img src="https://images.proteinatlas.org/66181/1277_H9_2_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000167978-SRRM2/cell/A-431#img">More examples</a> 
</p> 

In [None]:
cat = '4'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 5. Nuclear bodies

Nuclear bodies are visible as distinct spots in the nucleoplasm. They vary in shape, size and numbers depending on the type of bodies as well as cell type, but are usually more rounded compared to nuclear speckles.

<p align="center">
  <img src="https://images.proteinatlas.org/58036/if_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000102901-CENPT/cell/HEK+293#img">More examples</a> 
</p> 

In [None]:
cat = '5'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 6. Endoplasmic reticulum

The endoplasmic reticulum (ER) is recognized by a network-like staining in the cytosol, which is usually stronger close to the nucleus and weaker close to the edges of the cell. The ER can be identified with the help of the staining in the yellow ER channel. 

<p align="center">
  <img src="https://images.proteinatlas.org/47752/769_B10_1_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000012660-ELOVL5/cell/A-431#img">More examples</a> 
</p> 

In [None]:
cat = '6'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 7. Golgi apparatus

The Golgi apparatus is a rather large organelle that is located next to the nucleus, close to the centrosome, from which the microtubules in the red channel originate. It has a folded ribbon-like appearance, but the shape and size can vary between cell types, and in response to cellular various processes.

<p align="center">
  <img src="https://images.proteinatlas.org/56283/if_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000114745-GORASP1/cell/HeLa#img">More examples</a> 
</p> 

In [None]:
cat = '7'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 8. Intermediate filaments

Intermediate filaments often exhibit a slightly tangled structure with strands crossing every so often. They can appear similar to microtubules, but do not match well with the staining in the red microtubule channel. Intermediate filaments may extend through the whole cytosol, or be concentrated in an area close to the nucleus.

<p align="center">
  <img src="https://images.proteinatlas.org/30877/if_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000171401-KRT13/cell/A-431#img">More examples</a> 
</p> 

In [None]:
cat = '8'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 9. Actin filaments 

Actin filaments can be seen as long and rather straight bundles of filaments or as branched networks of thinner filaments. They are usually located close to the edges of the cells.


<div style="background-color:coral">
    <div style="text-align:center;background-color:white">
        <div style="display:inline-block;margin-left: auto;margin-right: auto">
            <img src="https://images.proteinatlas.org/9849/if_selected_medium.jpg" width="200" height="200"/>
        </div>
        <div style="display:inline-block;margin-left:10px;margin-right: auto">
            <img src="https://images.proteinatlas.org/51237/if_selected_medium.jpg" width="200" height="200"/>
        </div>
        <center>
        <a href="https://www.proteinatlas.org/ENSG00000117519-CNN3/cell/U-2+OS#img">More examples</a> 
        </center>
    </div>
</div>


In [None]:
cat = '9'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 10. Microtubules

Microtubules are seen as thin strands that stretch throughout the whole cell. It is almost always possible to detect the center from which they all originate (the centrosome). And yes, as you might have guessed, this overlaps the staining in the red channel.

<p align="center">
  <img src="https://images.proteinatlas.org/39323/if_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000166153-DEPDC4/cell/U-2+OS#img">More examples</a> 
</p>  

In [None]:
cat = '10'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 11. Mitotic spindle

The mitotic spindle can be seen as an intricate structure of microtubules radiating from each of the centrosomes at opposite ends of a dividing cell (mitosis). At this stage, the chromatin of the cell is condensed, as visible by intense DAPI staining. The size and exact shape of the mitotic spindle changes during mitotic progression, clearly reflecting the different stages of mitosis.

<p align="center">
  <img src="https://images.proteinatlas.org/5487/1825_A5_31_cr5ac33e6e0739c_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000088325-TPX2/cell/HEL#img">More examples</a> 
</p>  

In [None]:
cat = '11'
print(f'Train set examples for category {cat}')
df['11'] = df.image_labels.apply(lambda r: '11' in r.split('|'))
dfs = df[df['11'] == True].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 12. Centrosome

This class includes centrosomes and centriolar satellites. They can be seen as a more or less distinct staining of a small area at the origin of the microtubules, close to the nucleus. When a cell is dividing, the two centrosomes move to opposite ends of the cell and form the poles of the mitotic spindle. 

<div style="background-color:coral">
    <div style="text-align:center;background-color:white">
        <div style="display:inline-block;margin-left: auto;margin-right: auto">
            <img src="https://images.proteinatlas.org/44233/516_D8_1_selected_medium.jpg" width="200" height="200"/>
        </div>
        <div style="display:inline-block;margin-left:10px;margin-right: auto">
            <img src="https://images.proteinatlas.org/40778/1767_F9_33_selected.jpg" width="200" height="200"/>
        </div>
        <center>
        <a href="https://www.proteinatlas.org/ENSG00000125863-MKKS/cell/A-431#img">More examples</a> 
        </center>
    </div>
</div>

In [None]:
cat = '12'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 13. Plasma membrane
This class includes plasma membrane and cell junctions. Both are at the outer edge of the cell. Plasma membrane sometimes appears as a more or less distinct edge around the cell, occasionally with characteristic protrusions or ruffles. In some cell lines, the staining can be uniform across the entire cell. Cell junctions can be observed at contact sites between neighboring cells.

<p align="center">
  <img src="https://images.proteinatlas.org/21616/if_selected_medium.jpg">
  <a href="https://www.proteinatlas.org/ENSG00000092820-EZR/cell/A-431#img">More examples</a> 
<p align="center">
</p>    


In [None]:
cat = '13'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 14. Mitochondria
Mitochondria are small rod-like units in the cytosol, which are often distributed in a thread-like pattern along microtubules.

<p align="center">
  <img src="https://images.proteinatlas.org/36985/if_selected_medium.jpg">
  <br><br>
  <a href="https://www.proteinatlas.org/ENSG00000132463-GRSF1/cell/U-2+OS#img">More examples</a> 
<p align="center">
</p>    

In [None]:
cat = '14'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 15. Aggresome

An aggresome can be seen as a dense cytoplasmic inclusion, which is usually found close to the nucleus, in a region where the microtubule network is disrupted.

<p align="center">
  <img src="https://images.proteinatlas.org/65730/if_selected_medium.jpg">
  <br><br>
  <a href="https://www.proteinatlas.org/ENSG00000174010-KLHL15/cell/HEK+293#img">More examples</a> 
<p align="center">
</p>    


In [None]:
cat = '15'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 16. Cytosol

The cytosol extends from the plasma membrane to the nuclear membrane. It can appear smooth or granular, and the staining is often stronger close to the nucleus.

<p align="center">
  <img src="https://images.proteinatlas.org/54177/if_selected_medium.jpg">
  <br><br>
  <a href="https://www.proteinatlas.org/ENSG00000136371-MTHFS/cell/MCF7#img">More examples</a> 
<p align="center">
</p>    


In [None]:
cat = '16'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 17. Vesicles and punctate cytosolic patterns

This class includes small circular compartments in the cytosol: Vesicles, Peroxisomes (lipid metabolism), Endosomes (sorting compartments), Lysosomes (degradation of molecules or eating up dead molecules), Lipid droplets (fat storage), Cytoplasmic bodies (distinct granules in the cytosol). They are highly dynamic, varying in numbers and size in response to environmental and cellular cues. They can be round or more elongated.
<p align="center">
  <img src="https://images.proteinatlas.org/6964/22_B1_1_selected_medium.jpg">
  <br><br>
  <a href="https://www.proteinatlas.org/ENSG00000122705-CLTA/cell/U-251+MG">Vesicles</a> |
  <a href="https://www.proteinatlas.org/ENSG00000115425-PECR/cell/A-431">Peroxisomes</a> |
  <a href="https://www.proteinatlas.org/ENSG00000185722-ANKFY1/cell/A549">Endosomes</a> |
  <a href="https://www.proteinatlas.org/ENSG00000075785-RAB7A/cell/U-2+OS">Lysosomes</a> |
  <a href="https://www.proteinatlas.org/ENSG00000177666-PNPLA2/cell/A549">Lipid droplets</a> |
  <a href="https://www.proteinatlas.org/ENSG00000038358-EDC4/cell/U-2+OS#img">Cytoplasmic bodies</a> 
</p>

In [None]:
cat = '17'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

# 18. Negative

This class include negative stainings and unspecific patterns. This means that the cells have no green staining (negative), or have staining but no pattern can be deciphered from the staining (unspecific). 

![Capture.PNG](attachment:Capture.PNG)

In [None]:
cat = '18'
print(f'Train set examples for category {cat}')
dfs = df[df.image_labels == cat].sample(n=10, random_state=42)
images = []
ids = []
for i,r in dfs.iterrows():
    fname = r['image_id'] + '_' + str(r['cell_id']) +'.jpg'
    images.append(path/'cells'/fname)
    ids.append(r['image_id'])
images = [PILImage.create(f) for f in images]
ipyplot.plot_images(images, labels=ids, max_images=9, img_width=300)

Please upvote if you found this helpful, you will make my day :-)