Skip to content

Latest commit

 

History

History
103 lines (87 loc) · 3.31 KB

data.rst

File metadata and controls

103 lines (87 loc) · 3.31 KB

Data

The data module gives access to a set of publicly available WSIs, stained with different techniques (H&E and IHC). In particular, slides in the data module are retrieved from the following repositories:

  • The Cancer Genome Atlas (TCGA): as detailed in the methods docstring, for each WSI, we access the URL pointing to the corresponding location within the portal, e.g. https://portal.gdc.cancer.gov/files/9c960533-2e58-4e54-97b2-8454dfb4b8c8, to retrieve the WSI;
  • OpenSlide, a repository of freely-distributed test slides from different scanner vendors;
  • Image Data Resource (IDR): the WSIs are selected from the data collection provided by Schaadt et al. [1] and available at IDR under the accession number idr0073.

Note

We use Pooch under the hood, which is an optional requirement for histolab and needs to be installed separately with:

pip install pooch
Set of downloadable WSIs.
Tissue Dimensions (wxh) Size (MB) Repository Staining
Aorta 15374x17497 63.8 OpenSlide H&E
CMU small sample 2220x2967 1.8 OpenSlide H&E
Breast 96972x30682 299.1 TCGA-BRCA H&E
Breast (black pen) 121856x94697 1740.8 TCGA-BRCA H&E
Breast (green pen) 98874x64427 719.6 TCGA-BRCA H&E
Breast (red pen) 60928x75840 510.9 TCGA-BRCA H&E
Breast (IHC) 99606x7121 218.3 IDR IHC
Heart 32672x47076 289.3 OpenSlide H&E
Kidney 5179x4192 66.1 IDR IHC
Ovary 30001x33987 389.1 TCGA-OV H&E
Prostate 16000x15316 46.1 TCGA-PRAD H&E

TCGA-BRCA: TCGA Breast Invasive Carcinoma dataset; TCGA-PRAD: TCGA Prostate Adenocarcinoma dataset; TCGA-OV: Ovarian Serous Cystadenocarcinoma dataset.

Thumbnails of avaliable WSIs
.. toctree::
   :caption: API Reference
   :maxdepth: 2

.. automodule:: histolab.data
    :members:

.. toctree::

References

[1]Schaadt NS, Schönmeyer R, Forestier G, et al. "Graph-based description of tertiary lymphoid organs at single-cell level." PLoS Comput Biol. (2020)