Introduction

The images were converted from DICOM (.dcm) to PNG (.png) format in Google Colab as the initial part of the 
RSNA_Pneumonia_Detection_Challenge_dcm_to_png.ipynb workflow. 
This conversion step is essential for preparing medical imaging datasets, particularly for tasks like pneumonia detection in 
chest X-rays, by transforming DICOM files into a more accessible and widely supported image format.

This workflow includes the following key steps:

Resize Images :
Resizes all PNG images to a fixed resolution of 800x800 pixels using OpenCV's Lanczos interpolation method to ensure high-quality 
resizing.

Verify Image Properties :
Randomly samples images to verify their dimensions (800x800 pixels) and file sizes in KB, ensuring consistency across the dataset.

Clean Up Original Data :
Deletes or moves the original folder containing unresized images to free up space after processing.

Inspect Directory Contents :
Lists the contents of the working directory to provide an overview of the processed files and folder structure.
This pipeline is designed to streamline the preparation of medical imaging datasets, ensuring uniformity in 
image size and organization, which is critical for training robust machine learning models.

In [1]:
import os
import cv2
from tqdm import tqdm

# Source folder (original images)
source_folder = "/notebooks/RSNA_PNG_Images/"

# Destination folder for resized images
destination_folder = "/notebooks/RSNA_Resized_800/"

# Create the destination folder if it doesn't exist
if not os.path.exists(destination_folder):
    os.makedirs(destination_folder)

# Define target resolution (width x height)
TARGET_SIZE = (800, 800)

# Get the list of files from the source folder
image_files = os.listdir(source_folder)

# Process each image
for file in tqdm(image_files, desc="Resizing images"):
    if file.lower().endswith(".png"):
        src_path = os.path.join(source_folder, file)
        dst_path = os.path.join(destination_folder, file)

        # Read the image in grayscale (since the data are X-rays)
        image = cv2.imread(src_path, cv2.IMREAD_GRAYSCALE)
        if image is not None:
            # Resize the image using Lanczos method for better quality
            resized_image = cv2.resize(image, TARGET_SIZE, interpolation=cv2.INTER_LANCZOS4)
            # Save the resized image to the destination folder
            cv2.imwrite(dst_path, resized_image)
        else:
            print(f"Error loading image: {file}")

Redimensionando imagens: 100%|██████████| 26684/26684 [18:30<00:00, 24.04it/s]


In [2]:
import random

# Folder where the resized images were saved
resized_folder = "/notebooks/RSNA_Resized_800/"

# Get the list of available images
image_files = os.listdir(resized_folder)

# Select 5 random images
sample_images = random.sample(image_files, 5)

# Check the dimensions of the selected images
for img_name in sample_images:
    img_path = os.path.join(resized_folder, img_name)
    image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

    if image is not None:
        print(f"Image: {img_name} - Size: {image.shape}")
    else:
        print(f"⚠️ Error loading: {img_name}")

Imagem: 091fa211-36c0-42a4-bd7d-c8bced664d9d.png - Tamanho: (800, 800)
Imagem: 7d042363-6a99-4925-a9ae-886d298fcdbf.png - Tamanho: (800, 800)
Imagem: 7bb5c7df-51a0-4675-9962-6407d0160c65.png - Tamanho: (800, 800)
Imagem: c76ee8f0-c26c-4538-883d-cd7994a52779.png - Tamanho: (800, 800)
Imagem: 39d5e1bf-8092-4bc7-af98-f2a5dcd4c39d.png - Tamanho: (800, 800)


In [3]:
# Folder where the resized images were saved
resized_folder = "/notebooks/RSNA_Resized_800/"

# Get the list of available images
image_files = os.listdir(resized_folder)

# Select 5 random images
sample_images = random.sample(image_files, 5)

# Check the sizes of the selected images
for img_name in sample_images:
    img_path = os.path.join(resized_folder, img_name)

    # Get file size in KB
    file_size_kb = os.path.getsize(img_path) / 1024  # Convert bytes to KB

    print(f"Image: {img_name} - Size: {file_size_kb:.2f} KB")

Imagem: b39023e8-9092-44a9-a999-8f98ebae0958.png - Tamanho: 291.33 KB
Imagem: 6ae40616-cee1-4d80-a5bc-878bce7f8ac9.png - Tamanho: 270.73 KB
Imagem: c87992ba-0fed-475e-9391-a8d2acd82bde.png - Tamanho: 228.25 KB
Imagem: d3fbcb3d-b39c-4c8b-a23d-ae786dc11bfe.png - Tamanho: 240.15 KB
Imagem: 9f80d9c7-5ed4-40bf-8535-8289717d3264.png - Tamanho: 286.21 KB


In [4]:
import shutil

# Delete the original folder
shutil.rmtree("/notebooks/RSNA_PNG_Images/")
print("✅ Original folder deleted successfully!")

# Move the original folder to a backup location on Paperspace Drive (Persistent Storage)
# shutil.move("/notebooks/RSNA_PNG_Images/", "/storage/RSNA_Backup/")
# print("✅ Folder moved to /storage/RSNA_Backup/")

✅ Pasta original apagada com sucesso!


In [4]:
# List Contents of the /notebooks/ Directory
!ls -lh /notebooks/

total 56M
-rw-r--r-- 1 root root  213 Feb 19 13:38 README.md
-rw-r--r-- 1 root root 1.1M Feb 22 02:32 RSNA_Pneumonia_Detection.ipynb
-rw-r--r-- 1 root root  14K Mar  3 12:05 RSNA_Pneumonia_Detection.py
-rw-r--r-- 1 root root 9.0M Feb 20 15:13 RSNA_Pneumonia_Detection_rclone.ipynb
-rw-r--r-- 1 root root 2.6M Feb 23 18:38 RSNA_Pneumonia_Detection_v1.ipynb
-rw-r--r-- 1 root root 5.5M Feb 25 01:10 RSNA_Pneumonia_Detection_v2.ipynb
-rw-r--r-- 1 root root 6.9M Feb 26 16:32 RSNA_Pneumonia_Detection_v3.0.ipynb
-rw-r--r-- 1 root root 8.2M Feb 27 15:14 RSNA_Pneumonia_Detection_v3.1-IoU-Copy1.ipynb
-rw-r--r-- 1 root root 6.9M Mar  3 19:23 RSNA_Pneumonia_Detection_v3.2-Copy1.ipynb
-rw-r--r-- 1 root root 6.5M Mar  7 11:33 RSNA_Pneumonia_Detection_v3.3.ipynb
-rw-r--r-- 1 root root 6.3M Mar  9 15:19 RSNA_Pneumonia_Detection_v4.ipynb
drwxr-xr-x 2 root root  27K Feb 21 15:42 RSNA_Resized_800
-rw-r--r-- 1 root root 9.4K Mar  9 15:19 RSNA_Resized_800.ipynb
-rw-r--r-- 1 root root  20K Mar  3 12:16 pylint_