-
-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harmonize pixel data representations #50
Comments
From martin.s...@gmail.com on September 17, 2009 16:37:11 The secound lazy approach sounds better to me too, however... The modification of pixel_array may also effect; Rows, Columns, Bits Stored, Smallest If the pixel_aray were requested, then the workflow could be to attempt to extract a To get around editing data in two places, and having it crash. In the case that This concept could potentially be extended to other DICOM modules. With the various |
From darcymason@gmail.com on September 23, 2009 06:09:46 Martin, you've given some good ideas here -- the idea of a Module object is very |
From darcymason@gmail.com on December 23, 2011 19:30:33 Changing status as this is not a priority for next release. Labels: -Milestone-NextRelease Milestone-Release1.5 |
This was also raised in #178. As noted there, we need a setter for pixel_array (unless actually use a new backward-incompatible PixelData directly). Question is whether the setter also keeps track of and sets other data elements through the Module concept. |
So, having read this discussion now, maybe we can work toward a solution. Darcy, I like your original ideas, particularly the second. I guess an executive decision needs to be made whether to link (P.S. I retract this solution but keep it for discussion's sake) As I reread this it seems like things might get complicated. What if a user sets the pixel array with the setter but without propagating changes. What would the user expect? Given that a user can change any other tag directly and it's saved, e.g. with "Could see which one was modified most recently and use that as the definitive final value" If they are linked ( "or could throw an error or warning of possible inconsistent pixel data changes." This seems like a good idea no matter what and could be incorporated into the setter. It seems like a good idea still though to set other relevant tags with the setter as well. My hat goes to linking them, but my hat's pretty small ;-) |
I think the settable property is a no-go if you ever want to support any form of data compression. In order to map from Assuming you don't want to require the user to have set those on the data set in advance, something like this makes sense: |
@jrkerns, yes, there are a lot of complications to figure out. However, I think I may have found a potential way through all of it. See the iod branch for a proof of principle start on this. The idea is that a "Module" defines a list of tags that it "owns". The modules register themselves with Dataset -- If any of the tags is called in the dataset, it is hooked into the module class to get or set them. This can preserve backwards compatibilty, because user code can simply turn off the module hooks to run old code. Still lots to work out here in terms of pixels. For example I'm fairly sure a numpy array of pixels by itself does not uniquely determine number of planes and colour planes vs gray scale etc. So we might be back to a set_image() type function as Alex has mentioned. @cancan101, you make a good point about transfer syntax. However, I read that with the emphasis on the word "transfer", meaning the syntax can be whatever we want until communicating it to someone else. So it should be possible to leave the pixel data in whatever representation we want internally, until the point where it is written somewhere else. I would argue that after reading a file, regardless of transfer syntax, if someone tried to access the pixels they would want them decompressed or else they cannot be sensibly interpreted anyway. So the ImagePixel module would keep track of the original syntax, convert when pixels requested, and when writing, reconvert if pixels have been changed or if the transfer syntax was changed. It would require some internal flags, but i think it wouldn't be too bad. |
…OB values to file. Includes unit test setting PixelData to pixel_array before writing. Related to #50.
… raw data elements. Related to #50.
There was a lot of discussion previously, now I'm trying to tidy this up for v1.0 release (or not). Here is my 'gun to the head' gotta make a decision, quick solution:
Explanation: Pydicom makes dicom file data element values available as the natural python type for that value. This proposal extends that to If that is the path, here are the implementation details:
Compressed pixel data complicates this. For now we could just raise NotImplemented if the Backwards-incompatibilty: with the above solution, I think this is a good solution all around. It minimizes backwards incompatibility, and is consistent with pydicom philosophy for other data types. I've also coded much of this before in test branches, so it can be implemented fairly quickly. |
I disagree with the suggested changes. I think that Further the "what" that is returned when accessing either of |
I think having variables that are named clearly based on what they return makes a lot of sense. pixel_data, as a naive person like me would understand it, is the data from the pixels in some format I can work pixel_bytes is the same data in bytes. I can think of use cases for wanting both - |
Where are raw bytes returned for other DICOM keywords? Binary numbers, for example, are changed to int, float, etc. Multi-valued items are split by the slash and returned as python data types. RawDataElement's carry the raw bytes, but only until they are accessed, then they are converted.
Yes, this is an issue. But in any case we have a numpy array to return, whether it is called |
I don't love the idea that depending on whether it is bytes or an array that is assigned to |
There is some concern there, I suppose, but in line with python's 'We are all consenting adults' philosophy, I see it that if someone assigns bytes, then they have full control, we write what they set. If they assign a numpy array, we are going to convert it to bytes when it is written. If that isn't what they wanted then they must have assigned the wrong array anyway, or have not set the other image-related data elements for resolution, transfer syntax, etc. correctly. One concern is that if the user sets
to help guide that down the right path. Maybe that should be the way that is documented as the 'correct' way to set a pixel array. |
I would also prefer not to break existing code (meaning mine). I am used to the pixel_array/PixelData duality and I don't find it any worse than the horror that is character encodings in python 2.7. I like the idea that pydicom can read any tag right out of the box without needing numpy/scipy/pillow/etc. |
I appreciate everyone's comments. I'm certainly willing to keep things as they are, which is obviously much easier. Interestingly, though, I just tried something: ds = pydicom.read_file("CT_small.dcm")
pix = ds.pixel_array
ds.PixelData = pix
ds.save_as("del_me.dcm") And compared the output file with the original. Binary identical. So it seems that we can already set the PixelData to a numpy array and it seems to work fine. Reading on python's write() method, it can accept any "bytes-like object" supporting the 'Buffer Protocol' and a C-contiguous buffer. numpy seems to do C order by default, but there is also a Meanwhile, numpy's Anyway, I'm wondering about putting together a dev branch to at least get rid of the 'tostring' requirement, let people run it and see if it breaks backwards compatibility. I think it can be done so that it doesn't. And I think it can be done so that numpy is only required if the bytes are accessed. As I said, I'm okay to drop this, but v1.0 is our best time to make a change like this so I want to make sure before I drop it. |
@darcymason - is this still something that needs to be done, or is the current state sufficient? I got a bit lost in the discussion (vs the changes made lately in the implementation), not sure if this is still a valid issue. |
Hmmm, I'd like to think about it a bit longer. Re-reading the discussion, my take-home is that at least part of this can be done without breaking backwards compatibility -- for example, assigning a numpy array to PixelData can just be handled the way the user currently does -- by converting it via |
I would vote to avoid that sort of magic and keep pixeldata as the bytes for assignment and reading and use pixel array for handling numpy arrays. |
I still dream of resolving this but have pushed back to v3.0. Reading through quickly again, there was lots of concern about breaking code, but I believe my branch code was showing that it can be done without breaking. But assigning v3.0 just in case... |
I would like to add a use case to the discussion of writing to pixel_array / PixelData: changing the shape of pixel_array. That is, I have a dicom file with multiple slices in it (pixel_array.shape is (10,128,128)) and I want to generate a dicom file whose pixel_array.shape is (1,128,128) and preserves all the metadata and other properties of the original dicom. The contents of the new file are a calculation based on the original 10-slice data. Example code demonstrating that the setter my_dcm = pydicom.dcmread(my_dicom_path)
my_pixel_array = my_dcm.pixel_array[3,:,:] # an example calculation
my_dcm.PixelData = my_pixel_array.tostring()
rdpa = raw_dcm.pixel_array
raw_dcm.save_as(filename = denoised_dicom_path) This results in a value error: "The length of the pixel data in the dataset (32768 bytes) doesn't match the expected length (327680 bytes)." |
Assigning PixelData using bytesThis will never be able to update the dataset's Image Pixel module elements: ds.PixelData = arr.tobytes() At best all we can do is check that the number of bytes matches the dataset, which we already do on trying to write Assigning PixelData using ndarrayThe following can theoretically update the Image Pixel elements in a (very) limited fashion: ds.PixelData = arr
It cannot do anything with Assigning PixelData using class methodThe following would be required to set Pixel Data and the Image Pixel elements in a fully conformant manner (additional args/kwargs may be required) def set_pixel_data(
arr: np.ndarray,
bits_stored: int,
photometric_interpretation: str,
*,
rows: int | None = None,
columns: int | None = None,
number_of_frames: int | None = None,
bits_allocated: int | None = None, # maybe?
planar_configuration: int | None = None,
pack_bits: bool = False,
) -> None:
.... Which is not much better than setting the Image Pixel elements manually, but at least lets us add some validation checks. So, as I see it, the choices to finally stake this issue through the heart and bury it at a crossroad are:
Or some combination thereof. |
I would prefer an explicit |
From darcymason@gmail.com on May 27, 2009 22:58:02
A pydicom "gotcha" comes from disconnect between Dataset.pixel_array
property and Dataset.PixelData. The latter is actually in the Dataset (a
dict), the former is created from it but changes are not written back
unless ds.PixelData is explicitly set with e.g. pixel_array.tostring().
Possible solutions:
is this requires decompressing JPEG data (which is not available yet in
pydicom and would possibly waste time on a step that might never be used if
the code is not modifying pixels).
keep reference to it in dataset instance and automatically do tostring()
before data is written. But what if user modified pixel_array and modified
PixelData directly (current code would do that using the tostring() method
mentioned above). Could see which one was modified most recently and use
that as the definitive final value, or could throw an error or warning of
possible inconsistent pixel data changes.
This idea means PixelData probably needs to be redefined as a property so
can flag writes to it. That makes it 'special' and different than other
items in the Dataset dict, but perhaps that is necessary.
I like the second idea better, but am hoping someone can come up with an
even cleaner solution.
Original issue: http://code.google.com/p/pydicom/issues/detail?id=49
The text was updated successfully, but these errors were encountered: