Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive resource demands #37

Open
mih opened this issue Sep 5, 2015 · 9 comments
Open

Excessive resource demands #37

mih opened this issue Sep 5, 2015 · 9 comments

Comments

@mih
Copy link
Contributor

mih commented Sep 5, 2015

I am trying to convert Philips DICOMs from a ~1 hour scan. About 36k single-slice DICOMs in a single directory, several image series together. The total size of the tarball is ~850MB (~160MB gzipped). I convert via the following call:

% dcmstack -v -d --dest-dir . --file-ext '' study_20...

These DICOMs have no file name extensions, hence the option.

At this point the process is running for 40 min and consumes 18GB of RAM. However, no files have been created yet, hence I assume it will keep going.

The memory consumption is more than 20 times the input data. This seems excessive. Any idea what is happening?

Thanks!

@mih
Copy link
Contributor Author

mih commented Sep 5, 2015

For the record: The conversion has now finished after about an hour, with a peak memory demand of ~24 GB.

@moloney
Copy link
Owner

moloney commented Sep 9, 2015

Is this with the master branch? I made a change where speed/memory use should be much better when the '--extract' option isn't being used.

@mih
Copy link
Contributor Author

mih commented Sep 9, 2015

Thanks for your response!

Yes, this was using the current master at that time. By --extract do you mean --extract-private?

@moloney
Copy link
Owner

moloney commented Sep 9, 2015

Ah, sorry. I meant --embed-meta (or --dump-meta). I guess you are not using those options though. Which version of pydicom?

Could you run dcm2nii on this data and check the speed and memory use? When I made the improvement for speed/memory I did some basic benchmarking and it seemed comparable to dcm2nii.

@moloney
Copy link
Owner

moloney commented Sep 10, 2015

Just started to look at this in some more depth. I missed that you are using the '-d' flag (--dump-meta) when I was looking before. This definitely does slow things down and causes increased memory use.

I just redid some quick benchmarks against dcm2nii, and while the speed is similar when --embed/--dump options are not used the memory use is still quite a bit worse.

I think memory use can be improved, but it would be tricky and it would cause some backwards compatibility issues for the python API. One relatively simple thing that could be done is remove any pydicom objects immediately after parsing them, and just keep the pixel array plus the extracted meta data in memory. Of course, if you are extracting almost all the meta data then memory use will still be higher than dcm2nii, unless the meta data is "simplified" (avoid storing duplicate values) on the fly.

@moloney
Copy link
Owner

moloney commented Sep 25, 2015

@Hanke - I wrote some proof-of-concept code to do faster meta data summarization on the fly, using much less memory. Could you try benchmarking this script on your large dataset: https://gist.github.com/moloney/c3b3d46383f4618ae29e

It won't actually make a Nifti, but should give some idea about what the performance would be.

This does require the "bitarray" package as well.

@mih
Copy link
Contributor Author

mih commented Sep 27, 2015

Thanks! I started running the code. I only had to adjust the glob, as Philips DICOM filenames do not come with '.dcm' by default. While running I see:

gist/faststack.py:8: UserWarning: The DICOM readers are highly experimental, unstable, and only work for Siemens time-series at the moment
Please use with caution.  We would be grateful for your help in improving them
  from nibabel.nicom import dicomwrappers

Runtime was approximately the same as with stock dcmstack (1h:06min), the memory consumption, however, was ~800 MB -- a fraction of what it was before.

I used the same DICOM tarball as for the previous test.

If you like, I can give you access to a DICOM tarball of similar size.

@moloney
Copy link
Owner

moloney commented Sep 28, 2015

Thanks! If you can share a similar data set that would be very helpful.

@moloney
Copy link
Owner

moloney commented Oct 1, 2015

I spent some time looking at the data you provided offline. Here are a couple of general findings:

  1. If we really want all the meta data extracted, the only way to really speed that up more would be to use multiprocessing. Or I guess improve pydicom performance if that is possible...

  2. Extracting less meta data speeds things up considerably. I am seeing over 3X speedup when I extract 20 specific elements rather that extracting everything.

  3. On Phillips data, dcm2nii is incredibly fast. For Siemens data I found dcmstack to be about the same speed (or much faster in the case of mosaic data) provided the '--dump' and '--embed' options are not used. On Phillips data it looks like dcm2nii is almost 6X faster.

Also, one general comment about your data. I guess you are trying to run the whole study through at once? I highly recommend you sort files into per-series folders first and then run dcmstack on those directories. This keeps peak memory use down and it allows you to convert multiple series in parallel to decrease run time. Of course if your data is not already sorted, the total run time may not improve much...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants