### Convert DICOM to JPG

By following the procedure from MIMIC-CXR-JPG, convert DICOM images to JPG.

Repos that I checked earlier, i.e. TorchXrayVision, MedCLIP assume JPG images. It is hard to know what kind of procedure they followed, but we can do the following.
1. Convert these DICOM images to JPG
2. Test whether these JPG images result in nice accuracy
3. If they do BINGO!, else update the procedure iteratively, goto step 1.

In [1]:
from glob import glob

dcm_paths = glob("/datasets/mimic/physionet.org/files/mimic-cxr/2.0.0/files/p10/*/*/*.dcm")

In [2]:
len(dcm_paths), dcm_paths[0]

(36681,
 '/datasets/mimic/physionet.org/files/mimic-cxr/2.0.0/files/p10/p10000032/s50414267/02aa804e-bde0afdd-112c0b34-7bc16630-4e384014.dcm')

In [3]:
create_jpg_path = lambda dcm_p: "/".join(dcm_p.split("/")[8:]).split(".")[0] + ".jpg"
create_jpg_dir = lambda dcm_p: "/".join(dcm_p.split("/")[8:-1])

In [4]:
import pydicom
import cv2
import numpy as np
import os

def process_dicom_image_new(input_path, output_dir, output_path):
    # Load DICOM file
    ds = pydicom.dcmread(input_path)

    # Extract pixel data and normalize to range [0, 255]
    pixel_data = ds.pixel_array
    pixel_data = pixel_data.astype(np.float32)
    pixel_data -= np.min(pixel_data)
    pixel_data /= np.max(pixel_data)
    pixel_data *= 255.0
    pixel_data = np.uint8(pixel_data)

    # Check PhotometricInterpretation for inversion
    if ds.PhotometricInterpretation == "MONOCHROME1":
        # Invert pixel values
        pixel_data = 255 - pixel_data

    # Histogram equalization
    pixel_data = cv2.equalizeHist(pixel_data)

    # Convert to JPEG with quality factor 95
    encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 95]
    _, jpeg_data = cv2.imencode('.jpg', pixel_data, encode_param)

    # Write JPEG to file
    os.makedirs(output_dir, exist_ok=True)
    with open(output_path, 'wb') as f:
        f.write(jpeg_data)

In [None]:
from tqdm import tqdm
example_counts = [5000, 10000, len(dcm_paths)]

for example_count in example_counts:
    with open(f"examples_{example_count}.txt", "w") as f:
        for dcm_path in dcm_paths[:example_count]:
            jpg_path = create_jpg_path(dcm_path)
            f.write(f"{jpg_path}\n")
    
for dcm_path in tqdm(dcm_paths):
    jpg_path = create_jpg_path(dcm_path)
    jpg_dir = create_jpg_dir(dcm_path)

    process_dicom_image_new(dcm_path, jpg_dir, jpg_path)
    

 99%|█████████▉| 36233/36681 [3:53:20<02:55,  2.55it/s]  