# Blood Image Analysis

Analysis steps:

1. Load the image in the same directory as this notebook.
2. Convert it to grayscale.
3. Apply a median filter to reduce noise.
4. Perform adaptive thresholding to make the cells more distinguishable.
5. Detect and filter the contours based on the area to get likely red blood cells.
6. Overlay the detected contours on the original image.
7. Display the image with overlaid contours in the notebook.
8. Extract and print out the areas of the filtered contours.

In [None]:
import os
os.getcwd()

In [None]:
import cv2
import numpy as np
from matplotlib import pyplot as plt

# Load the image
image_path = os.getcwd()+"/"+"pic1.jpeg"

In [None]:
ksize = 5
blocksize = 19

print(image_path)
image = cv2.imread(image_path, cv2.IMREAD_COLOR)

# Convert the image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply a median filter to reduce noise while preserving edges
median_filtered = cv2.medianBlur(gray_image, ksize)

# Apply adaptive thresholding to emphasize the cells
adaptive_thresh = cv2.adaptiveThreshold(median_filtered, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                        cv2.THRESH_BINARY_INV, blocksize, 2)

# Find contours based on the thresholded image
contours, _ = cv2.findContours(adaptive_thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Define the area range for filtering contours
min_contour_area = 50
max_contour_area = 400

# Filter contours by area, considering the typical size of red blood cells
filtered_contours = [cnt for cnt in contours if min_contour_area < cv2.contourArea(cnt) < max_contour_area]

# Draw the filtered contours on a copy of the original image
image_with_filtered_contours = cv2.drawContours(image.copy(), filtered_contours, -1, (0,255,0), 1)

# Convert to RGB for matplotlib
image_with_filtered_contours_rgb = cv2.cvtColor(image_with_filtered_contours, cv2.COLOR_BGR2RGB)

# Display the image with contours in the notebook
plt.figure(figsize=(10, 10))
plt.imshow(image_with_filtered_contours_rgb)
plt.title(f'Red Blood Cells with Contours, k = {ksize}, b = {blocksize}')
plt.axis('off')
plt.show()

# Extract the areas of the filtered contours
contour_areas = [cv2.contourArea(cnt) for cnt in filtered_contours]
print(f"Areas of the first ten filtered contours: {contour_areas[:10]}")  # Print the first 10 contour areas as a sample
print(f"Number of contours: {len(contour_areas)}")

# Plotting the contour areas as a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(range(len(contour_areas)), contour_areas, color='blue', alpha=0.5)
plt.title('Scatter Plot of Contour Areas')
plt.xlabel('Contour Index')
plt.ylim(0,500)
plt.ylabel('Area')
plt.grid(True)
plt.show()



In [None]:
from scipy.stats import norm

# Calculate mean and standard deviation excluding outliers
mean_contour_area = np.mean(contour_areas)
std_dev_contour_area = np.std(contour_areas)

# Define outliers as points beyond mean ± 1.5 standard deviations
lower_bound = mean_contour_area - 1.5 * std_dev_contour_area
upper_bound = mean_contour_area + 1.5 * std_dev_contour_area

# Filter out outliers
filtered_contour_areas = [area for area in contour_areas if lower_bound <= area <= upper_bound]

# Recalculate mean and standard deviation without outliers
mean_filtered = np.mean(filtered_contour_areas)
std_dev_filtered = np.std(filtered_contour_areas)

# Create a scatter plot of the filtered contour areas
plt.figure(figsize=(10, 6))
plt.scatter(range(len(contour_areas)), contour_areas, color='blue', alpha=0.5)
plt.axhline(y=mean_filtered, color='r', linestyle='-', label=f'$\mu$: {mean_filtered:.2f}')
plt.fill_between(range(len(contour_areas)), mean_filtered - 1.5 * std_dev_filtered,
                 mean_filtered + 1.5 * std_dev_filtered, color='red', alpha=0.2, label='$\mu$ ± 1.5$\sigma$')
plt.title('Filtered Scatter Plot of Contour Areas')
plt.xlabel('Contour Index')
plt.ylabel('Area (px$^{2}$)')
plt.ylim(0, 500)
plt.legend()
plt.grid(True)
plt.show()

# Plot a histogram of the contour areas along with the Gaussian distribution
plt.figure(figsize=(10, 6))
# Histogram of the data with raw counts
n, bins, patches = plt.hist(filtered_contour_areas, bins=30, alpha=0.6, color='g')

# Generate the y-values for the Gaussian distribution equivalent with raw counts
# First we need the bin width to scale our Gaussian distribution
bin_width = bins[1] - bins[0]
gaussian_y_raw = norm.pdf(bins, mean_filtered, std_dev_filtered) * len(filtered_contour_areas) * bin_width

plt.plot(bins, gaussian_y_raw, 'k', linewidth=2, label='Gaussian distribution (raw count)')

plt.title('Histogram of Filtered Contour Areas with Gaussian Distribution')
plt.xlabel('Area (px$^{2}$)')
plt.ylabel('Number of Cells')
plt.legend([r'Gaussian Distribution, $\mu = {0:.0f}, \sigma = {1:.0f}$'.format(mean_filtered, std_dev_filtered)], loc = "upper left")
plt.grid(True)
plt.show()

# Return new mean and standard deviation without outliers for quoting error
mean_filtered, std_dev_filtered


# Analysis of Red Blood Cell Sizes Using Image Processing

This notebook presents an automated approach to quantify the sizes of red blood cells (RBCs) from microscope images. The goal is to extract meaningful statistics about the size distribution of RBCs, which can be crucial for various diagnostic purposes.

## Methodology

The process can be summarized in the following steps:

1. **Image Loading**: The image is loaded into the workspace, ensuring it retains its color properties for accurate processing.
2. **Grayscale Conversion**: The image is converted to grayscale to simplify the detection of RBCs by focusing on intensity rather than color.
3. **Noise Reduction**: A median filter is applied to the grayscale image to reduce noise, which is essential for accurate edge detection.
4. **Adaptive Thresholding**: The filtered image undergoes adaptive thresholding, highlighting the cells against the background and improving contour detection.
5. **Contour Detection**: Contours are detected in the thresholded image, and those within a realistic size range for RBCs are filtered and retained for analysis.
6. **Overlay Contours**: The detected contours are overlaid on the original image to visually confirm the accuracy of cell detection.
7. **Area Extraction**: The areas of the filtered contours, representing individual RBCs, are extracted for statistical analysis.

## Results

The analysis resulted in the successful identification of RBC contours, with their areas displayed in a scatter plot to visualize the distribution. A histogram of the contour areas was also plotted, revealing the distribution's skewness and kurtosis.

## Statistical Analysis

Outliers, defined as areas beyond mean ± 1.5 standard deviations, were identified and excluded to mitigate their influence on the mean size estimation. The remaining data was used to calculate a revised mean and standard deviation, providing a measure of central tendency and dispersion that's less affected by extreme values.

The histogram of the filtered contour areas was then plotted against a Gaussian distribution to compare the empirical data with a normal distribution. This comparison revealed the extent to which the size distribution of RBCs follows a normal distribution, which can have implications for the healthiness of the blood sample.

## Conclusion

The mean area of RBCs, without considering outliers, is approximately 194.17 square units, with a standard deviation of about 32.48 square units. This suggests that while there is variability in the sizes of RBCs, the majority tend to cluster around the calculated mean size.

In a clinical context, the standard deviation provides a metric of the heterogeneity of RBC sizes in the sample. A higher standard deviation could indicate a condition known as anisocytosis, where the RBCs vary more significantly in size than normal.

By leveraging the power of image processing and statistical analysis, this approach allows for a quick and automated assessment of RBC sizes, which could be used to support diagnostic decision-making in medical practice.
