<a href="https://colab.research.google.com/github/jhagopal/MachineLearning/blob/main/dragonfruitai_submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
from PIL import Image
from tqdm import tqdm
import time
import os

Image.MAX_IMAGE_PIXELS = None

Image Generation

In [None]:
# Define the new image dimensions
new_width = 100000
new_height = 100000

# Create an empty NumPy array of zeros with the new specified dimensions
new_image_array = np.zeros((new_height, new_width), dtype=np.uint8)

# Set the top-left quadrant to 1
new_image_array[:new_height // 2, :new_width // 2] = 1

# Enable compression and save the new image with compression
compression_method = "tiff_lzw"  # You can choose the compression method you prefer
with tqdm(total=3, desc="Processing images") as pbar:
    new_image_pil = Image.fromarray(new_image_array)
    new_image_pil.save("modified_mo1.tiff", compression=compression_method)
    pbar.update(1)

    # Set the center pixel to 1 in the image (modified_dye1.tiff)
    new_image_array[new_height // 2, new_width // 2] = 1
    new_image_pil = Image.fromarray(new_image_array)
    new_image_pil.save("modified_dye1.tiff", compression=compression_method)
    pbar.update(1)

    # Reset new image to all zeros
    new_image_array = np.zeros((new_height, new_width), dtype=np.uint8)
    # Save an empty new image (modified_dye2.tiff)
    new_image_pil = Image.fromarray(new_image_array)
    new_image_pil.save("modified_dye2.tiff", compression=compression_method)
    pbar.update(1)

In [None]:
def encode_img(img_path):
    img = Image.open(img_path)
    img_array = np.array(img)
    return img_array


modified_mo1_array = encode_img('modified_mo1.tiff')
modified_dye1_array = encode_img('modified_dye1.tiff')
modified_dye2_array = np.zeros((new_height, new_width), dtype=np.uint8)

In [None]:
def rle_encode(img_array):
    rows, cols = img_array.shape
    rle_data = []

    prev_value = img_array[0, 0]
    count = 1

    # Add tqdm around the outer loop (rows) for progress tracking
    for r in tqdm(range(rows), desc="Encoding"):
        for c in range(cols):
            if (r == 0 and c == 0):  # Skip the first pixel since we already initialized it
                continue

            current_value = img_array[r, c]

            if current_value == prev_value:
                count += 1
            else:
                rle_data.extend([prev_value, count])
                prev_value = current_value
                count = 1

    rle_data.extend([prev_value, count])  # Add the last run
    return rle_data


modified_mo1_rle = rle_encode(modified_mo1_array)

In [None]:
def rle_decode(rle_data, shape):
    """Decodes an RLE encoded array to its original form."""
    array = np.zeros(shape, dtype=np.uint8)
    position = 0

    for i in range(0, len(rle_data), 2):
        value = rle_data[i]
        count = rle_data[i + 1]

        # Mark the positions with the given value
        array.flat[position:position + count] = value
        position += count

    return array

In [None]:
def has_cancer_rle(rle_encoded_array1, array2):
    start_time = time.time()

    common_ones = 0
    total_ones_in_array1 = 0
    position = 0

    for i in range(0, len(rle_encoded_array1), 2):
        value = rle_encoded_array1[i]
        count = rle_encoded_array1[i + 1]

        if value == 1:
            total_ones_in_array1 += count
            common_ones += np.count_nonzero(array2[position:position + count])

        position += count

    # Calculate the ratio
    if total_ones_in_array1 == 0:
        ratio = 0
    else:
        ratio = common_ones / total_ones_in_array1

    # Check if the ratio is greater than 0.1
    result = ratio > 0.1

    end_time = time.time()
    elapsed_time = end_time - start_time

    # Print the execution time
    print(f"Execution time: {elapsed_time} seconds")

    return result

In [None]:
image_data = {}
import os

def process_images(microscope_image, dye_image):
    # Extract the base filename without the extension
    key = os.path.splitext(os.path.basename(microscope_image))[0]

    # Convert microscope_image and dye_image to NumPy arrays
    microscope_array = encode_img(microscope_image)
    microscope_rle = rle_encode(microscope_array)
    dye_array = encode_img(dye_image)

    # Check if the images have cancer using has_cancer_optimized function
    if has_cancer_rle(microscope_rle, dye_array):
        # If True, store both images in the dictionary
        image_data[key] = [microscope_rle, dye_array]
    else:
        # If False, store only the microscope image
        image_data[key] = [microscope_rle, None]

In [None]:
def save_rle_as_tiff(rle_data, shape, filename, compression="tiff_lzw"):
    # Decode the RLE data
    img_array = rle_decode(rle_data, shape)
    # Convert to PIL Image
    img_pil = Image.fromarray(img_array)
    # Save as .tiff
    img_pil.save(filename, compression=compression)

# Answer 1

For representing images generated by the microscope, the RLE (Run Length Encoding) format is a suitable choice. This format efficiently stores consecutive runs of pixels with the same value, which is beneficial for images with large contiguous regions of the same color. In the worst-case scenario, where the image alternates between black (0) and white (1) for every pixel, the storage size for RLE encoding would be
8
×
𝑁
×
𝑀
8×N×M bytes, which is 80GB for a 100,000x100,000 image. However, in practice, such extreme cases are rare, and RLE encoding would offer significant storage savings for images with large continuous regions of the same color.

For representing images generated by the dye sensor, storing them as dense matrices (2D arrays) is a suitable choice. Each pixel in the image is represented by a single byte (uint8), resulting in a storage size of
𝑁
×
𝑀
N×M bytes, which is 10GB for a 100,000x100,000 image. Dense matrices are straightforward to work with and provide constant-time access to individual pixel values, making them suitable for storing images with varying dye concentrations across the entire image.

# Answer 2

For the microscope image (mo1), you simulate a simple scenario where the top-left quadrant is completely black, representing the parasite, while the rest of the image is white. This scenario is realistic and provides a straightforward test case for your RLE encoding.

For the dye sensor images (dye1 and dye2), you simulate scenarios where the top-left quadrant is black, similar to the microscope image, and an additional blob (adjacent pixel) is added to simulate dye leakage. This helps in testing the functionality of your algorithm to detect dye presence both within and outside the parasite area.

Encoding the microscope image (mo1) using RLE encoding and dye sensor images (dye1 and dye2) into dense NumPy matrices provides a realistic simulation of the data that would be captured by the microscope and dye sensor. This approach allows you to test your code under realistic conditions before working with real images.

# Answer 3

Previous approach:

It calculates the ratio of common non-zero values (representing dye within the parasite) to the total non-zero values in the microscope image. This ratio indicates the proportion of the parasite area that contains dye.
Check Threshold:
It checks if the calculated ratio exceeds the threshold of 0.1, indicating that the total amount of dye detected within the parasite area exceeds 10% of the total parasite area.
Determine Cancer:
If the ratio is greater than 0.1, the function concludes that the parasite has cancer and returns True. Otherwise, it returns False, indicating that the parasite does not have cancer.

# Answer 4

This function efficiently computes whether a parasite has cancer based on the RLE encoded representation of the microscope image and the dense matrix representation of the dye sensor image, providing insights into the presence of cancerous tissue within the parasite.

For each run in the RLE encoded array, if the value is 1 (black pixel representing the parasite), the count of black pixels within that run is calculated using NumPy's count_nonzero function on the corresponding section of the dense matrix representation of the dye sensor image. The total number of black pixels within the microscope image is also accumulated. After processing all runs, the function calculates the ratio of black pixels within the parasite area to the total area occupied by the parasite in the microscope image. If the ratio is greater than 0.1 (indicating that the total amount of dye detected in the parasite's body exceeds 10% of the area occupied by the parasite), the function returns True, indicating that the parasite has cancer. Otherwise, it returns False.

# Answer 5

Downsampling:
Downsampling involves reducing the resolution of the image by averaging or subsampling pixels. This can significantly reduce the storage size of the image but may result in loss of detail.
Downsampling a 100,000x100,000 image to 10x10 would reduce the storage size by a factor of 10,000. However, this reduction in resolution may impact the accuracy of cancer detection, especially for detecting small parasites or subtle variations in dye concentration.
Runtime impact: Downsampling typically involves iterating through the image pixels and performing averaging or subsampling operations, which can be computationally expensive for large images.

# Answer 6


Here are the tools and resources used for this assignment:

GitHub: Used for version control and reference codes. Storing code snippets, scripts, and project files for collaboration and sharing.
Google: Utilized for research about handling large images, compression techniques, and optimization strategies. Gathering information, tutorials, and documentation related to image processing.
ChatGPT: Leveraged for code assistance, debugging, and brainstorming ideas. Utilizing the AI model to generate code snippets, explain concepts, and provide guidance on various aspects of the assignment
Python: Specifically, libraries such as tifffile and PIL (Python Imaging Library) were used for image processing tasks. tifffile for reading and writing TIFF images, and PIL for various image manipulation operations such as resizing, encoding, and decoding.