This repository provides cytogenetics related datasets to investigate the semantic segmentation
Jupyter Notebook Python


Why Deep? Because of deep-learning.

Why FISH? Because of Fluorescent in-situ Hybridization.

This repository is intended to share data and code for resolving some problems met in cytogenetics imaging such overlapping chromosomes.

Problem description

In cytogenetics, experiments typically starts from chromosomal preparations fixed on glass slides. Occasionally a chromosome can fall on another one, yielding overlapping chromosomes in the image. Before computers and images processing with photography, chromosomes were cut from a paper picture and then classified (at least two paper pictures were required when chromosomes are overlapping). Automatic segmentation methods were developped to overcome this problem, however, these methods rely on a geometric analysis of the chromosome contour and require some human intervention when partial overlap occurs.

The QFISH on metaphase was classified as a low-throughput method for quantitative analysis of the lenght of the telomeres by Vera and Blasco. One of the botleneck of the method is the resolution of the the overlapping chromosomes. Modern deep-learning techniques have the potential to provide a more reliable, fully-automated solution.

Cytogenetics references

Libraries required to run this notebook:

This notebook is run from jupyter with a python2 Kernel on a Ubuntu 16.04 OS inside a virtual environnement using the python packages available on the system. Several image processing libraries are used:


Up to now, there's only python notebooks is to produce a dataset large enough to train a supervised learning algorithm (semantic segmentation) capable of segmenting overlapping chromosomes. The overlapping chromosomes generated, imply only two chromosomes (this is a start). They are obtained by varying the relative positions and orientations of the two chromosomes.

Project stages

The first stage would to submit one dataset to a semantic segmentation algorithm such segnet. Different implementations of Segnet are available in the current deep-learning frameworks:

The latter deep-learning framework is supposed to be more efficient than the segnet implementation:


  • DAPI.tif and Cy3.tif : 12 bits images of metaphasic chromosomes. The telomeres marking the end of the chromosomes are visible in the Cy3.tif image.The metaphase doesn't contain overlapping chromosomes.
  • lowres_82146_overlapping_pairs_grey_DAPI-GroundTruth.h5 : 82146 pairs of low resolution (decreased by 4: the overlapping were generated from a DAPI image 16 times smaller than the original image).

  • LowRes_13434_overlapping_pairs.h5 : 13434 pairs of overlapping chromosomes generated from the two previous images. This dataset is intended to train a supervised learning algorithm to resolve overlapping chromosomes. The dataset is stored as a numpy array and saved in a hdf5 file. Compared to the DAPI and Cy3 images,the resolution was decreased by two.

  • overlapping_chromosomes_examples.h5: smaller dataset (~2000 images). The resolution of the images is the same than the DAPI/Cy3 images.