The data
module gives access to a set of publicly available WSIs, stained with different techniques (H&E and IHC). In particular, slides in the data
module are retrieved from the following repositories:
- The Cancer Genome Atlas (TCGA): as detailed in the methods docstring, for each WSI, we access the URL pointing to the corresponding location within the portal, e.g. https://portal.gdc.cancer.gov/files/9c960533-2e58-4e54-97b2-8454dfb4b8c8, to retrieve the WSI;
- OpenSlide, a repository of freely-distributed test slides from different scanner vendors;
- Image Data Resource (IDR): the WSIs are selected from the data collection provided by Schaadt et al. [1] and available at IDR under the accession number idr0073.
Note
We use Pooch under the hood, which is an optional requirement for histolab
and needs to be installed separately with:
pip install pooch
Tissue | Dimensions (wxh) | Size (MB) | Repository | Staining |
---|---|---|---|---|
Aorta | 15374x17497 | 63.8 | OpenSlide | H&E |
CMU small sample | 2220x2967 | 1.8 | OpenSlide | H&E |
Breast | 96972x30682 | 299.1 | TCGA-BRCA | H&E |
Breast (black pen) | 121856x94697 | 1740.8 | TCGA-BRCA | H&E |
Breast (green pen) | 98874x64427 | 719.6 | TCGA-BRCA | H&E |
Breast (red pen) | 60928x75840 | 510.9 | TCGA-BRCA | H&E |
Breast (IHC) | 99606x7121 | 218.3 | IDR | IHC |
Heart | 32672x47076 | 289.3 | OpenSlide | H&E |
Kidney | 5179x4192 | 66.1 | IDR | IHC |
Ovary | 30001x33987 | 389.1 | TCGA-OV | H&E |
Prostate | 16000x15316 | 46.1 | TCGA-PRAD | H&E |
TCGA-BRCA: TCGA Breast Invasive Carcinoma dataset; TCGA-PRAD: TCGA Prostate Adenocarcinoma dataset; TCGA-OV: Ovarian Serous Cystadenocarcinoma dataset.
.. toctree:: :caption: API Reference :maxdepth: 2
.. automodule:: histolab.data :members:
.. toctree::
[1] | Schaadt NS, Schönmeyer R, Forestier G, et al. "Graph-based description of tertiary lymphoid organs at single-cell level." PLoS Comput Biol. (2020) |