-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
- DICOM files start with a 128 byte preamble which is unstructured (i.e., the first 128 bytes can contain anything). The spec says "File-set Readers or Updaters shall not rely on the content of this Preamble to determine that this File is or is not a DICOM File."
- A TIFF header is only 8 bytes long (well, not really, but for the purposes of this investigation, it is).
- Apparently there's a dual format concept in DICOM, where the preamble may contain e.g. TIFF data so that applications can recognize the file as either a TIFF or a DICOM (see section 7.5)
- Some dicom files, including the official pydicom example files, do start with a TIFF header (bytes 'II' followed by the short 42)
- So, technically, from filetype's perspective, the file is both a valid TIFF and a valid DICOM, but it selects TIFF because it checks for a match for TIFF first
- While these files are technically both valid TIFF and DICOM, I believe
filetype
would be more accurate if it checked for DICOM prior to checking for TIFF
Example:
import pydicom
import filetype
import tempfile
import os
with tempfile.TemporaryDirectory(suffix="_dcm_test") as tdir:
dcm_path = os.path.join(tdir, "test.dcm")
pydicom.examples.ct.save_as(dcm_path)
print("pydicom.misc.is_dicom(dcm_path):", pydicom.misc.is_dicom(dcm_path))
print("filetype.guess(dcm_path).mime:", filetype.guess(dcm_path).mime)
Results:
pydicom.misc.is_dicom(dcm_path): True
filetype.guess(dcm_path).mime: image/tiff
Metadata
Metadata
Assignees
Labels
No labels