Harmonize pixel data representations #50

darcymason · 2014-11-27T01:10:42Z

From darcymason@gmail.com on May 27, 2009 22:58:02

A pydicom "gotcha" comes from disconnect between Dataset.pixel_array
property and Dataset.PixelData. The latter is actually in the Dataset (a
dict), the former is created from it but changes are not written back
unless ds.PixelData is explicitly set with e.g. pixel_array.tostring().

Possible solutions:

always require Numpy, always convert pixel data into numpy array. Problem
is this requires decompressing JPEG data (which is not available yet in
pydicom and would possibly waste time on a step that might never be used if
the code is not modifying pixels).
link pixel_array and PixelData together. if pixel_array is asked for,
keep reference to it in dataset instance and automatically do tostring()
before data is written. But what if user modified pixel_array and modified
PixelData directly (current code would do that using the tostring() method
mentioned above). Could see which one was modified most recently and use
that as the definitive final value, or could throw an error or warning of
possible inconsistent pixel data changes.
This idea means PixelData probably needs to be redefined as a property so
can flag writes to it. That makes it 'special' and different than other
items in the Dataset dict, but perhaps that is necessary.

I like the second idea better, but am hoping someone can come up with an
even cleaner solution.

Original issue: http://code.google.com/p/pydicom/issues/detail?id=49

darcymason · 2014-11-27T01:10:42Z

From martin.s...@gmail.com on September 17, 2009 16:37:11

The secound lazy approach sounds better to me too, however...

The modification of pixel_array may also effect; Rows, Columns, Bits Stored, Smallest
Image Pixel Value, Largest Image Pixel Value. Perhaps pixel_array should be a class
corresponding to an Image Pixel Module (PS 3.3 - 2008 C.7.6.3).

If the pixel_aray were requested, then the workflow could be to attempt to extract a
representation of an Image Pixel Module from a DICOM file, which could raise an
exception if there are problems with the tags.

To get around editing data in two places, and having it crash. In the case that
attempts were made to read one of these tags and the module had been changed, then
the module could write itself back to the tags. In the case that any attempts were
made to write, then the module could update itself or return to a state that it will
lazily evaluate itself again if required.

This concept could potentially be extended to other DICOM modules. With the various
modules extending from a common base.

darcymason · 2014-11-27T01:10:42Z

From darcymason@gmail.com on September 23, 2009 06:09:46

Martin, you've given some good ideas here -- the idea of a Module object is very
interesting. Conceptually I've always thought of pydicom as a low-level library, just
reading files and giving back a bag of data elements, requiring the user code to
structure their meaning and make sure they are self-consistent. However, perhaps
there is room to add a Module layer on top. I'd like to see it be explicit (as per
the usual "Explicit is better than Implicit" python philosophy) -- in other words,
the user calls code that is clearly operating at the Module level. This could mean a
new "SmartDataset" object, say, or a dataset.modules object that calls are made to
when you want self-consistency to be enforced. To do it in a general way (not just
for Pixel data) would require some thought and a significant amount of new code (and
IOD definitions files). I've added a new issue ( issue 57 ) for this.

darcymason · 2014-11-27T01:10:43Z

From darcymason@gmail.com on December 23, 2011 19:30:33

Changing status as this is not a priority for next release.

Labels: -Milestone-NextRelease Milestone-Release1.5

darcymason · 2015-01-14T00:44:33Z

This was also raised in #178. As noted there, we need a setter for pixel_array (unless actually use a new backward-incompatible PixelData directly). Question is whether the setter also keeps track of and sets other data elements through the Module concept.

jrkerns · 2015-01-14T21:54:46Z

So, having read this discussion now, maybe we can work toward a solution. Darcy, I like your original ideas, particularly the second. I guess an executive decision needs to be made whether to link PixelData and pixel_array.

(P.S. I retract this solution but keep it for discussion's sake)
Perhaps we can make a solution that accommodates both...
We could leave the pixel_array property alone for backwards-compatibility, but push a getter and setter (set_pixel_array(), get_pixel_array(); an internal function already exists for the getter). With an explicit setter we could also pass a parameter, something like propagate_changes for lack of a better term. This boolean flag, if true, would write to PixelData and update the Rows, etc tags to be consistent. False would simply be current behavior. This all assumes that pixel_array and PixelData are separate entities.

As I reread this it seems like things might get complicated. What if a user sets the pixel array with the setter but without propagating changes. What would the user expect? Given that a user can change any other tag directly and it's saved, e.g. with .save_as(), upon writing, shouldn't the user expect the same from pixel_array? The pixel_array setter and getter would require Numpy, but PixelData could still be edited directly. Given that there is no current setter for pixel_array there's no backwards-compatible issues, yes?

"Could see which one was modified most recently and use that as the definitive final value" If they are linked (pixel_array assignments (the setter) write to PixelData and pixel_array (the getter) reads from PixelData) then the first issue seems to be negated. IDK how fast .tostring() is, so maybe it's a performance hit.

"or could throw an error or warning of possible inconsistent pixel data changes." This seems like a good idea no matter what and could be incorporated into the setter. It seems like a good idea still though to set other relevant tags with the setter as well.

My hat goes to linking them, but my hat's pretty small ;-)

cancan101 · 2015-01-14T23:53:46Z

I think the settable property is a no-go if you ever want to support any form of data compression. In order to map from pixel_array to PixelData, you will need to know the transfer syntax (ie the compression scheme), cols, rows, # bit used per pixel, etc.

Assuming you don't want to require the user to have set those on the data set in advance, something like this makes sense:
dataset.set_image(pixel_array, rows, cols, transfer_syntax,...) where those values are then all set on the underlying dataset.

darcymason · 2015-01-15T03:36:01Z

@jrkerns, yes, there are a lot of complications to figure out. However, I think I may have found a potential way through all of it. See the iod branch for a proof of principle start on this. The idea is that a "Module" defines a list of tags that it "owns". The modules register themselves with Dataset -- If any of the tags is called in the dataset, it is hooked into the module class to get or set them. This can preserve backwards compatibilty, because user code can simply turn off the module hooks to run old code. Still lots to work out here in terms of pixels. For example I'm fairly sure a numpy array of pixels by itself does not uniquely determine number of planes and colour planes vs gray scale etc. So we might be back to a set_image() type function as Alex has mentioned.

@cancan101, you make a good point about transfer syntax. However, I read that with the emphasis on the word "transfer", meaning the syntax can be whatever we want until communicating it to someone else. So it should be possible to leave the pixel data in whatever representation we want internally, until the point where it is written somewhere else. I would argue that after reading a file, regardless of transfer syntax, if someone tried to access the pixels they would want them decompressed or else they cannot be sensibly interpreted anyway. So the ImagePixel module would keep track of the original syntax, convert when pixels requested, and when writing, reconvert if pixels have been changed or if the transfer syntax was changed. It would require some internal flags, but i think it wouldn't be too bad.

…OB values to file. Includes unit test setting PixelData to pixel_array before writing. Related to #50.

… raw data elements. Related to #50.

darcymason · 2017-07-15T14:41:51Z

There was a lot of discussion previously, now I'm trying to tidy this up for v1.0 release (or not).

Here is my 'gun to the head' gotta make a decision, quick solution:

make PixelData gettable or settable as a numpy array
pixel_array can stay as a 'getter', perhaps with DeprecationWarning, but is simply an alias for PixelData
as now, setting pixel data is done without enforcing connection to Rows, Columns, planes and all that. Like all other data elements, consistency across data elements is up to the user code.
notwithstanding the point above, we could warn on obvious inconsistencies

Explanation: Pydicom makes dicom file data element values available as the natural python type for that value. This proposal extends that to PixelData. The natural python type for that is a numpy array.

If that is the path, here are the implementation details:

on writing PixelData, if it is a numpy array, it is converted to bytes for the file (i.e. tostring() is done for the user transparently).
wherever PixelData has been used internally in pydicom (including unit tests), it will likely be need to be changed to pixel_bytes() or some name like that, that actually returns the raw bytes pydicom has previously expected.
pixel_array coded as alias for PixelData getter, as mentioned above
on setting PixelData, bytes are acceptable (for backwards compatibility), or a numpy array.

Compressed pixel data complicates this. For now we could just raise NotImplemented if the Dataset transfer syntax is a compressed one, i.e. the user is free to decompress and then write the file, but it is an error unless they change the transfer syntax to an uncompressed version.

Backwards-incompatibilty: with the above solution, PixelData no longer returns 1-D bytes, but a correctly dimensioned numpy array. If any old user code actually used the bytes directly, that code will probably break. I suspect this is very rare, as pixel_array would have been the natural way to
use pixel data.

I think this is a good solution all around. It minimizes backwards incompatibility, and is consistent with pydicom philosophy for other data types. I've also coded much of this before in test branches, so it can be implemented fairly quickly.

cancan101 · 2017-07-15T17:21:15Z

I disagree with the suggested changes. I think that PixelData should continue to return the raw bytes. This would be 1. consistent with previous behavior and 2. consistent with how the attr access works for all other dicom keywords. Further the idea of transparently calling tostring on the array when assigning is problematic. This works when creating uncompressed dicoms, but currently I am creating some compressed dicoms. this involves me creating the compressed binary representation and then assigning it to PixelData.

Further the "what" that is returned when accessing either of pixel_array or your new PixelData still seems somewhat ambiguous. There is the question as to whether you return the LUT value, the YBR value or the RGB value (see #273 and #263).

vsoch · 2017-07-15T17:31:19Z

I think having variables that are named clearly based on what they return makes a lot of sense. pixel_data, as a naive person like me would understand it, is the data from the pixels in some format I can work pixel_bytes is the same data in bytes. I can think of use cases for wanting both -
the one that is strongest for my groups are just wanting to extract pixel data for image processing, in which case the numpy array is much preferable. I'm not sure about using tostring (when moving from bytes to string I usually use decode('utf0-8') or something like that, but I suspect this is different).

darcymason · 2017-07-15T17:37:57Z

I think that PixelData should continue to return the raw bytes. This would be 1. consistent with previous behavior and 2. consistent with how the attr access works for all other dicom keywords

Where are raw bytes returned for other DICOM keywords? Binary numbers, for example, are changed to int, float, etc. Multi-valued items are split by the slash and returned as python data types. RawDataElement's carry the raw bytes, but only until they are accessed, then they are converted.

There is the question as to whether you return the LUT value, the YBR value or the RGB value (see #273 and #263).

Yes, this is an issue. But in any case we have a numpy array to return, whether it is called PixelData or pixel_array. Similarly if the user wants to modify it and write back to file, then we need a way to convert them to some binary representation. In any case, with my proposal, users would still be able to get the raw bytes (through pixel_bytes), and are able to write a bytes array to PixelData to have that control if they needed it.

cancan101 · 2017-07-15T17:57:00Z

I don't love the idea that depending on whether it is bytes or an array that is assigned to PixelData, the behavior is different.

darcymason · 2017-07-15T18:15:26Z

There is some concern there, I suppose, but in line with python's 'We are all consenting adults' philosophy, I see it that if someone assigns bytes, then they have full control, we write what they set. If they assign a numpy array, we are going to convert it to bytes when it is written. If that isn't what they wanted then they must have assigned the wrong array anyway, or have not set the other image-related data elements for resolution, transfer syntax, etc. correctly.

One concern is that if the user sets PixelData with bytes, then accesses the value, it will trigger a conversion to a numpy array, which will depend on the other data elements. But we could also use @cancan101's convenience function from a comment further up this thread:

dataset.set_image(pixel_array, rows, cols, transfer_syntax,...)

to help guide that down the right path. Maybe that should be the way that is documented as the 'correct' way to set a pixel array.

rhaxton · 2017-07-16T03:47:45Z

I would also prefer not to break existing code (meaning mine). I am used to the pixel_array/PixelData duality and I don't find it any worse than the horror that is character encodings in python 2.7. I like the idea that pydicom can read any tag right out of the box without needing numpy/scipy/pillow/etc.
Any change that breaks client code is a big step.

darcymason · 2017-07-16T18:37:40Z

I appreciate everyone's comments. I'm certainly willing to keep things as they are, which is obviously much easier.

Interestingly, though, I just tried something:

ds = pydicom.read_file("CT_small.dcm")
pix = ds.pixel_array
ds.PixelData = pix
ds.save_as("del_me.dcm")

And compared the output file with the original. Binary identical. So it seems that we can already set the PixelData to a numpy array and it seems to work fine. Reading on python's write() method, it can accept any "bytes-like object" supporting the 'Buffer Protocol' and a C-contiguous buffer. numpy seems to do C order by default, but there is also a ascontiguousarray() method, which makes me think that my example got lucky, and for larger arrays that are not contiguous in memory, it might not work. Could try some kind of slicing example with a step, I guess, to try to show that. Even if it were needed, ascontiguousarray() could be called on writing the file.

Meanwhile, numpy's tostring() doc says it is a compatibility alias for tobytes(). It would make a lot of sense for us to adopt the latter for all future documentation.

Anyway, I'm wondering about putting together a dev branch to at least get rid of the 'tostring' requirement, let people run it and see if it breaks backwards compatibility. I think it can be done so that it doesn't. And I think it can be done so that numpy is only required if the bytes are accessed. As I said, I'm okay to drop this, but v1.0 is our best time to make a change like this so I want to make sure before I drop it.

mrbean-bremen · 2018-03-03T17:35:38Z

@darcymason - is this still something that needs to be done, or is the current state sufficient? I got a bit lost in the discussion (vs the changes made lately in the implementation), not sure if this is still a valid issue.

darcymason · 2018-03-03T20:31:35Z

is this still something that needs to be done, or is the current state sufficient

Hmmm, I'd like to think about it a bit longer. Re-reading the discussion, my take-home is that at least part of this can be done without breaking backwards compatibility -- for example, assigning a numpy array to PixelData can just be handled the way the user currently does -- by converting it via tobytes(). That could save one step that is a little confusing for newcomers.

cancan101 · 2018-03-03T20:55:41Z

I would vote to avoid that sort of magic and keep pixeldata as the bytes for assignment and reading and use pixel array for handling numpy arrays.

darcymason · 2020-05-18T20:55:47Z

I still dream of resolving this but have pushed back to v3.0. Reading through quickly again, there was lots of concern about breaking code, but I believe my branch code was showing that it can be done without breaking. But assigning v3.0 just in case...

nooj · 2024-04-04T16:42:26Z

I would like to add a use case to the discussion of writing to pixel_array / PixelData: changing the shape of pixel_array.

That is, I have a dicom file with multiple slices in it (pixel_array.shape is (10,128,128)) and I want to generate a dicom file whose pixel_array.shape is (1,128,128) and preserves all the metadata and other properties of the original dicom. The contents of the new file are a calculation based on the original 10-slice data.

Example code demonstrating that the setter .PixelData does not adjust all of the metadata to match the size/shape of the new data:

my_dcm = pydicom.dcmread(my_dicom_path)
my_pixel_array = my_dcm.pixel_array[3,:,:]  # an example calculation
my_dcm.PixelData = my_pixel_array.tostring()
rdpa = raw_dcm.pixel_array

raw_dcm.save_as(filename = denoised_dicom_path)

This results in a value error: "The length of the pixel data in the dataset (32768 bytes) doesn't match the expected length (327680 bytes)."

scaramallion · 2024-06-18T07:05:37Z

Assigning PixelData using bytes

This will never be able to update the dataset's Image Pixel module elements:

ds.PixelData = arr.tobytes()

At best all we can do is check that the number of bytes matches the dataset, which we already do on trying to write

Assigning PixelData using ndarray

The following can theoretically update the Image Pixel elements in a (very) limited fashion:

ds.PixelData = arr

It can set (Rows, Columns) and Samples per Pixel for single frame, single sample arrays (rows, cols)
It can probably set Bits Allocated?
It can set Pixel Representation

It cannot do anything with (frames, rows, cols), (rows, cols, samples) or (frames, rows, cols, samples) due to Planar Configuration and the inherent ambiguity in the first two shapes. It also cannot set Bits Stored in a conformant manner and simply doesn't have the information required to set Photometric Interpretation.

Assigning PixelData using class method

The following would be required to set Pixel Data and the Image Pixel elements in a fully conformant manner (additional args/kwargs may be required)

def set_pixel_data(
    arr: np.ndarray, 
    bits_stored: int, 
    photometric_interpretation: str, 
    *,
    rows: int | None = None,
    columns: int | None = None,
    number_of_frames: int | None = None, 
    bits_allocated: int | None = None,  # maybe?
    planar_configuration: int | None = None,
    pack_bits: bool = False,
) -> None:
     ....

Which is not much better than setting the Image Pixel elements manually, but at least lets us add some validation checks.

So, as I see it, the choices to finally stake this issue through the heart and bury it at a crossroad are:

Do nothing, the user is responsible for setting the Image Pixel elements to match Pixel Data
Raise an exception for ds.PixelData = arr - nice and unambiguous!
Handle the arr.tobytes() conversion (but nothing else) - I don't really like this option, you risk this issue rising from the grave all to save users from typing .tobytes()
Add a Dataset.set_pixel_data() method

Or some combination thereof.

mrbean-bremen · 2024-06-19T05:16:31Z

I would prefer an explicit set_pixel_data method - this would be documented, helps the not-so-DICOM-savvy user, and no magic is involved.
Also probably raising an exception on ds.PixelData = arr that points to set_pixel_data makes sense.

darcymason added bug imported Milestone-Release1.5 Difficulty-Hard labels Nov 27, 2014

darcymason removed the Milestone-Release1.5 label Nov 29, 2014

darcymason added this to the v1.0.0 milestone Dec 17, 2014

darcymason mentioned this issue Dec 17, 2014

Add Module objects to group related data elements #58

Closed

darcymason mentioned this issue Jan 14, 2015

Pixel Array setter doesn't work #178

Closed

darcymason mentioned this issue Apr 23, 2015

Determining ambiguous VRs in implicit data sets #201

Closed

darcymason added a commit that referenced this issue Feb 28, 2016

Adds automatic tostring() conversion of numpy PixelData when writing …

eedb10e

…OB values to file. Includes unit test setting PixelData to pixel_array before writing. Related to #50.

darcymason self-assigned this Feb 28, 2016

darcymason added a commit that referenced this issue Mar 6, 2016

PixelData changes: all unit tests pass. More checks added for writing…

5d8c42f

… raw data elements. Related to #50.

darcymason mentioned this issue Jul 15, 2017

Classify issues #395

Closed

scaramallion modified the milestones: v1.0, v1.1 Mar 5, 2018

This was referenced May 15, 2018

In Sequences default_encoding is used instead of the parents one. #625

Closed

Release 1.1 #644

Closed

mrbean-bremen modified the milestones: v1.1, v1.2 Jul 18, 2018

mrbean-bremen removed this from the v1.2 milestone Sep 10, 2018

mrbean-bremen mentioned this issue Jun 25, 2019

Multi-frame Saving #865

Closed

darcymason added this to the v1.5 milestone Jun 29, 2019

darcymason modified the milestones: v2.0, v3.0 May 18, 2020

scaramallion added the pixel-data label Nov 1, 2020

darcymason mentioned this issue Jan 23, 2021

Cannot save decompressed multi-frame scans #1300

Closed

darcymason mentioned this issue May 5, 2021

[MRG] MNT+ENH: modernize the code and add channels support pydicom/contrib-pydicom#21

Merged

scaramallion mentioned this issue Jan 31, 2024

[MRG] Stage 3 of new pixels module #2000

Merged

7 tasks

scaramallion mentioned this issue Jun 25, 2024

[MRG] Add Dataset method and function to set Image Pixel elements using an ndarray #2082

Merged

6 tasks

darcymason closed this as completed in #2082 Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmonize pixel data representations #50

Harmonize pixel data representations #50

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Jan 14, 2015

jrkerns commented Jan 14, 2015

cancan101 commented Jan 14, 2015

darcymason commented Jan 15, 2015

darcymason commented Jul 15, 2017

cancan101 commented Jul 15, 2017

vsoch commented Jul 15, 2017

darcymason commented Jul 15, 2017 •

edited

Loading

cancan101 commented Jul 15, 2017

darcymason commented Jul 15, 2017

rhaxton commented Jul 16, 2017

darcymason commented Jul 16, 2017

mrbean-bremen commented Mar 3, 2018

darcymason commented Mar 3, 2018

cancan101 commented Mar 3, 2018

darcymason commented May 18, 2020

nooj commented Apr 4, 2024

scaramallion commented Jun 18, 2024 •

edited

Loading

mrbean-bremen commented Jun 19, 2024

Harmonize pixel data representations #50

Harmonize pixel data representations #50

Comments

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Nov 27, 2014

darcymason commented Jan 14, 2015

jrkerns commented Jan 14, 2015

cancan101 commented Jan 14, 2015

darcymason commented Jan 15, 2015

darcymason commented Jul 15, 2017

cancan101 commented Jul 15, 2017

vsoch commented Jul 15, 2017

darcymason commented Jul 15, 2017 • edited Loading

cancan101 commented Jul 15, 2017

darcymason commented Jul 15, 2017

rhaxton commented Jul 16, 2017

darcymason commented Jul 16, 2017

mrbean-bremen commented Mar 3, 2018

darcymason commented Mar 3, 2018

cancan101 commented Mar 3, 2018

darcymason commented May 18, 2020

nooj commented Apr 4, 2024

scaramallion commented Jun 18, 2024 • edited Loading

Assigning PixelData using bytes

Assigning PixelData using ndarray

Assigning PixelData using class method

mrbean-bremen commented Jun 19, 2024

darcymason commented Jul 15, 2017 •

edited

Loading

scaramallion commented Jun 18, 2024 •

edited

Loading