Add `DICOM Stack` data set #5

adam-grant-hendry · 2021-11-08T16:59:03Z

This dataset will be used to test improved volume rendering and DICOM stack reading per pyvista/pyvista-support issue #500.

~~![image](https://user-images.githubusercontent.com/59346180/140785116-3dc68d20-bca6-4d6f-85b6-6b832a576138.png)~~

The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “collections”; typically patients’ imaging related by a common disease (e.g. lung cancer), image modality or type (MRI, CT, digital histopathology, etc) or research focus. DICOM is the primary file format used by TCIA for radiology imaging. Supporting data related to the images such as patient outcomes, treatment details, genomics and expert analyses are also provided when available.

This dataset is a member of the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium Sarcomas (CPTAC-SAR) cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Radiology and pathology images from CPTAC patients are being collected and made publicly available by The Cancer Imaging Archive to enable researchers to investigate cancer phenotypes which may correlate to corresponding proteomic, genomic and clinical data.

This data has been published under the Creative Commons Attribution 3.0 Unported License and must adhere to the CPTAC Data Use Agreement. Per the TCIA Data Usage Policy (see License file), all oral or written presentations, disclosures, or publications must acknowledge the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data. The appropriate citations are included in the Citations file. The metadata for this dataset is included in metadata.csv. Questions may be directed to help@cancerimagingarchive.net.

Title: Forearm Sarcoma
DataDescription URI: https://doi.org/10.7937/TCIA.2019.9bt23r95
Number of Images: 3
Total Size: 1.51 MB
File Format: DICOM

Files:
DICOM_Stack.zip
LICENSE.txt
CITATION.txt
metadata.csv

This dataset is a member of the Pancreatic-CT-CBCT-SEG collection and is distributed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Per the TCIA Data Usage Policy (see `License` file), all oral or written presentations, disclosures, or publications must acknowledge the specific dataset(s) or applicable accession number( [metadata.csv](https://github.com/pyvista/vtk-data/files/8467442/metadata.csv) s) and the NIH-designated data repositories through which the investigator accessed any data. The appropriate citations are included in the `Citations` file.
Specifically, the metadata for this dataset is as follows:

Series UID: 1.3.6.1.4.1.14519.5.2.1.302382790855582445722435410442490497846
Collection: Pancreatic-CT-CBCT-SEG
3rd Party Analysis: NO
DataDescription URI: NA
Subject ID: Pancreas-CT-CB_001
Study UID: 1.3.6.1.4.1.14519.5.2.1.21087345762211724523378497892240459677
Study Description: PANCREAS

3dgallery data

Add pvd files from paraview data

This dataset will be used to test improved volume rendering and DICOM stack reading per `pyvista/pyvista-support` issue #500. The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “collections”; typically patients’ imaging related by a common disease (e.g. lung cancer), image modality or type (MRI, CT, digital histopathology, etc) or research focus. DICOM is the primary file format used by TCIA for radiology imaging. Supporting data related to the images such as patient outcomes, treatment details, genomics and expert analyses are also provided when available. This dataset is a member of the Pancreatic-CT-CBCT-SEG collection and is distributed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). Per the TCIA Data Usage Policy (see `License` file), all oral or written presentations, disclosures, or publicatons must acknowledge the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data. The appropriate citations are included in the `Citations` file. Specifically, the metadata for this dataset is as follows: Series UID: 1.3.6.1.4.1.14519.5.2.1.302382790855582445722435410442490497846 Collection: Pancreatic-CT-CBCT-SEG 3rd Party Analysis: NO DataDescription URI: NA Subject ID: Pancreas-CT-CB_001 Study UID: 1.3.6.1.4.1.14519.5.2.1.21087345762211724523378497892240459677 Study Description: PANCREAS Study Date: 7/6/2012 Series Description: PANCREAS DI iDose 3 Manufacturer: Philips Modality: CT SOP Class Name: CT Image Storage SOP Class UID: 1.2.840.10008.5.1.4.1.1.2 Number of Images: 134 File Size: 70.58 MB File Location: .\Pancreatic-CT-CBCT-SEG\Pancreas-CT-CB_001\07-06-2012-NA-PANCREAS-59677\201.000000-PANCREAS DI iDose 3-97846 Download Timestamp: 2021-11-07T16:51:32.384

adam-grant-hendry · 2021-11-09T01:20:44Z

Additionally, it should be noted, TCIA has a REST API interface, so if we want to use more data from this site in the future, we could possibly lazy load and convert data instead of directly adding to this repo (making sure to include proper citations, of course).

MatthewFlamm · 2021-11-09T19:13:18Z

The license looks very permissive, the citation requirement will have to be kept in mind if used for documentation. This is a somewhat large dataset for testing (~67 Mb in total), it might be prohibitive to be constantly downloading these files? But it certainly is an interesting dataset, which is also nice for the documentation.

I'm not familiar with DICOM datasets, but is it possible to reduce the number of files for testing purposes? This may be a middle ground here. If so, this looks good to me.

adam-grant-hendry · 2021-11-11T02:48:51Z

@MatthewFlamm

the citation requirement will have to be kept in mind if used for documentation

Yes, agreed. The docstrings in the corresponding updated code and sphinx docs will need to contain the citations.

This is a somewhat large dataset for testing (~67 Mb in total), it might be prohibitive to be constantly downloading these files?

Yes, agreed; I'm concerned about that as well. Instead, I can write a private module that implements the TCIA REST API so we can download directly from their site rather than store data on GitHub. It behaves similarly to the URL requests functions currently implemented in pyvista/examples/downloads.py:

You access a resource by sending an HTTP request to the TCIA API server. The server replies with a response that either contains the data you requested, or a status indicator.
You can access the metadata of an API by appending /metadata to the end of the query. The metadata is in JSON format and conforms to this schema.
Most APIs can return results as CSV/JSON/XML/HTML. You can specify the return format by including the query parameter format.
An API request takes the following structure:

<BaseURL><Resource><QueryEndpoint>?<QueryParameters><Format>

They also provide code examples and an SDK for python hosted here on GitHub.

Would you and the team be alright if I made such a python module to implement the API?

I'm not familiar with DICOM datasets, but is it possible to reduce the number of files for testing purposes?

DICOM is simply an image file format standardized by the medical community to secure data transfer, particularly for patient files. A single standalone DICOM image is typically something like a chest x-ray, mammogram, bone fracture image, etc. Alternatively, an MRI or CT (computed tomography) machine can reconstruct a 3D volume by taking images of a body at multiple slices (hence the "tomography" in "computed tomography") along an axis.

Thus, multiple images will always exist for a raw DICOM stack that represents a volume, unfortunately. Oftentimes, there are many files. A single file volume image, like a ".ply" or an ".stl", can be created from a stack, but raw data is important in research. There is a lot of clean up that occurs in generated the 3D model and we typically want to experiment with and list the filters used to create the 3D model in our research so that we can guarantee the results can be repeated by others.

I could try to find a smaller data set, which may very well exist, but I think the best solution would be to use the TCIA API and download from them.

What do you think?

MatthewFlamm · 2021-11-11T14:05:39Z

Having the data here is probably the best IMO if it is used for testing. Otherwise, the testing could be broken due to the other endpoint being down. I'm realizing that if we use all the DICOM layers for the documentation building, we already need to download the whole dataset for full testing anyway. So my question above about utilizing a partial dataset is probably not important.

Maybe the 50+ Mb file size is no issue? Let's wait to get another opinion from @akaszynski .

adam-grant-hendry · 2021-11-14T18:28:07Z

@akaszynski Do you have any opinions?

adam-grant-hendry · 2021-11-28T20:25:14Z

@adeak @banesullivan Any thoughts on this as well?

banesullivan · 2021-11-28T20:35:28Z

Skimmed over this...

Any data that are used for testing/examples need to be hosted here as GitHub is generally reliable. If that external service goes down or changes their URL, it will break our CI and create a significant burden to us much like pyvista/pyvista#1226 did

banesullivan · 2021-11-28T20:37:12Z

Ah, just saw the concerns about file size

Perhaps we should just make a seperate repo for lfs files?

MatthewFlamm · 2021-11-29T17:50:56Z

There is also this dataset from vtk data testing that is much smaller. I know nothing about it.

https://data.kitware.com/#collection/55f17f758d777f6ddc7895b7/folder/5afd93708d777f15ebe1b516

adam-grant-hendry · 2021-11-29T22:05:09Z

There is also this dataset from vtk data testing that is much smaller. I know nothing about it.

@MatthewFlamm The data appears corrupted (I cannot open it in ParaView as a volume). This appears to be several slices of a prostate CT scan exam. A prostate scan would definitely be smaller than a torso. I found a working one from TCIA that is 2.1MB. Would that be okay?

About the smallest useful set I can find is 2MB. Is that acceptable?

akaszynski · 2021-11-29T23:51:48Z

About the smallest useful set I can find is 2MB. Is that acceptable?

2MB is quite acceptable.

MatthewFlamm · 2021-11-30T21:23:14Z

I found the inconsistency in the data set I linked. It is for a different reader: https://vtk.org/doc/nightly/html/classvtkDICOMImageReader.html

It is used in this example
https://kitware.github.io/vtk-examples/site/Cxx/IO/ReadDICOM/

This reader isn't available in my version of ParaView either.

I would agree that a smaller dataset would make a whole lot more sense.

Per contributor feedback, a smaller dataset (<= 2MB) is ideal. This dataset will be used to test improved volume rendering and DICOM stack reading per pyvista/pyvista-support issue #500. The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download. The data are organized as “collections”; typically patients’ imaging related by a common disease (e.g. lung cancer), image modality or type (MRI, CT, digital histopathology, etc) or research focus. DICOM is the primary file format used by TCIA for radiology imaging. Supporting data related to the images such as patient outcomes, treatment details, genomics and expert analyses are also provided when available. This dataset is a member of the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium Sarcomas (CPTAC-SAR) cohort. CPTAC is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Radiology and pathology images from CPTAC patients are being collected and made publicly available by The Cancer Imaging Archive to enable researchers to investigate cancer phenotypes which may correlate to corresponding proteomic, genomic and clinical data. This data has been published under the `Creative Commons Attribution 3.0 Unported License` and must adhere to the CPTAC Data Use Agreement. Per the TCIA Data Usage Policy (see `License` file), all oral or written presentations, disclosures, or publications must acknowledge the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data. The appropriate citations are included in the `CITATIONS` file. The metadata for this dataset is included in `metadata.csv`. Questions may be directed to <help@cancerimagingarchive.net>. Title: Forearm Sarcoma DataDescription URI: https://doi.org/10.7937/TCIA.2019.9bt23r95 Number of Images: 3 Total Size: 1.51 MB File Format: DICOM

adam-grant-hendry · 2022-04-11T20:53:04Z

@akaszynski @MatthewFlamm @banesullivan I've replaced the dataset with a 1.5 MB dataset. Please kindly review at your earliest convenience and let me know if this will work. Thank you!

MatthewFlamm · 2022-04-11T21:06:47Z

This seems much more reasonable. It looks like a rebase or merge got messed up, and will have to be fixed before merging. GitHub is saying there are 102 files changed.

adam-grant-hendry · 2022-04-12T01:11:48Z

@MatthewFlamm That should be the 102 files I deleted so that there are only 3 DICOM files. I forgot to add the upstream. I'll fix this.

adam-grant-hendry · 2022-04-12T01:34:03Z

@MatthewFlamm I accidentally delete this PR. Please see #9 to continue. Sorry for the confusion.

banesullivan and others added 30 commits January 29, 2020 19:05

add FORGE data

0ba5ee0

add large volume

08fc998

add delaunay points

ccacd0e

add embryo

30f1efc

add antarctica_velocity.vtp

d10fbf7

add translucent room sruface mesh

19ba3fe

added tree.ply

cc80604

converted tree.ply to binary

720cbcd

added ensight example file

21a9e41

add skybox2

7e96002

add GPR example data

1208ea2

add thermal probes

d40f8fe

add turbineblade

eb9536d

add carburetor

467c610

add lobster.ply

e750530

add woman.stl

2999fd1

Add urn.stl

09e51db

add tigerfighter.obj

2fe4613

add pepper.ply and pepper.obj

cb98a81

add man_face.stl

2bf306a

Merge pull request pyvista#1 from pyvista/3dgallery-data

da06188

3dgallery data

renamed pepper.obj to drill.obj

b61d30c

add mars and stars

5bab966

Merge branch 'master' of https://github.com/pyvista/vtk-data

026a021

make mars smaller

3b58647

add notch stress FEA example

48f15a5

add notch displacement

b385832

notch_disp.vtk --> notch_disp.vtu

a102608

actually add file

3d58456

add louis

f4e81d8

MatthewFlamm and others added 6 commits August 20, 2021 16:53

add wave PVD files (pyvista#3)

38d8e9d

add osmnx graph

771a4c5

add pvd files from paraview data

f438364

Merge pull request pyvista#4 from MatthewFlamm/add-paraview-pvd

44ea287

Add pvd files from paraview data

add lucy

113b8b5

adam-grant-hendry mentioned this pull request Nov 8, 2021

PyVista add_volume: Garbled Output, Much Slower than ParaView, & Memory Hog pyvista/pyvista-support#500

Closed

banesullivan force-pushed the master branch from 7167394 to c9e5f34 Compare February 6, 2022 01:31

adam-grant-hendry closed this Apr 12, 2022

adam-grant-hendry deleted the feat/dicomstack branch April 12, 2022 01:19

This was referenced Apr 12, 2022

Add DICOM Stack data set #9

Merged

[BUGFIX] Move *.dcm Files to data Subdirectory #10

Merged

adam-grant-hendry mentioned this pull request Aug 16, 2022

fix(add_volume): use memory-efficient rescaling pyvista/pyvista#3170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `DICOM Stack` data set #5

Add `DICOM Stack` data set #5

adam-grant-hendry commented Nov 8, 2021 •

edited

Loading

adam-grant-hendry commented Nov 9, 2021 •

edited

Loading

MatthewFlamm commented Nov 9, 2021

adam-grant-hendry commented Nov 11, 2021 •

edited

Loading

MatthewFlamm commented Nov 11, 2021

adam-grant-hendry commented Nov 14, 2021 •

edited

Loading

adam-grant-hendry commented Nov 28, 2021

banesullivan commented Nov 28, 2021

banesullivan commented Nov 28, 2021

MatthewFlamm commented Nov 29, 2021

adam-grant-hendry commented Nov 29, 2021 •

edited

Loading

akaszynski commented Nov 29, 2021

MatthewFlamm commented Nov 30, 2021

adam-grant-hendry commented Apr 11, 2022

MatthewFlamm commented Apr 11, 2022

adam-grant-hendry commented Apr 12, 2022 •

edited

Loading

adam-grant-hendry commented Apr 12, 2022 •

edited

Loading

Add DICOM Stack data set #5

Add DICOM Stack data set #5

Conversation

adam-grant-hendry commented Nov 8, 2021 • edited Loading

adam-grant-hendry commented Nov 9, 2021 • edited Loading

MatthewFlamm commented Nov 9, 2021

adam-grant-hendry commented Nov 11, 2021 • edited Loading

MatthewFlamm commented Nov 11, 2021

adam-grant-hendry commented Nov 14, 2021 • edited Loading

adam-grant-hendry commented Nov 28, 2021

banesullivan commented Nov 28, 2021

banesullivan commented Nov 28, 2021

MatthewFlamm commented Nov 29, 2021

adam-grant-hendry commented Nov 29, 2021 • edited Loading

akaszynski commented Nov 29, 2021

MatthewFlamm commented Nov 30, 2021

adam-grant-hendry commented Apr 11, 2022

MatthewFlamm commented Apr 11, 2022

adam-grant-hendry commented Apr 12, 2022 • edited Loading

adam-grant-hendry commented Apr 12, 2022 • edited Loading

Add `DICOM Stack` data set #5

Add `DICOM Stack` data set #5

adam-grant-hendry commented Nov 8, 2021 •

edited

Loading

adam-grant-hendry commented Nov 9, 2021 •

edited

Loading

adam-grant-hendry commented Nov 11, 2021 •

edited

Loading

adam-grant-hendry commented Nov 14, 2021 •

edited

Loading

adam-grant-hendry commented Nov 29, 2021 •

edited

Loading

adam-grant-hendry commented Apr 12, 2022 •

edited

Loading

adam-grant-hendry commented Apr 12, 2022 •

edited

Loading