# Image Processing and Noise Cancelling

## Noise reduction by Gaussian blurring
Gaussian Blurring is a smoothing technique used in image processing to reduce noise and detail while preserving edges. It is particularly useful in OCR preprocessing to remove background noise, enhance text visibility, and improve segmentation.

In [None]:
import cv2
img = cv2.imread('image.png', 0)
blur = cv2.GaussianBlur(img, (5,5), 0)
cv2.imwrite('blurred_image.png', blur)

## Remove shadows and enhance contrast

Shadows and uneven lighting can distort text visibility, making OCR models less accurate. By applying morphological transformations, we can remove background shadows while preserving text clarity, significantly improving OCR performance.

In [None]:
import cv2

# Load grayscale image
img = cv2.imread("document.png", cv2.IMREAD_GRAYSCALE)

# Define a rectangular kernel of size (5x5)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))

# Apply Black Hat transformation to extract dark regions (shadows)
blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)

# Save and display results
cv2.imwrite("blackhat_removed.png", blackhat)
cv2.imshow("Black Hat Transformation", blackhat)
cv2.waitKey(0)
cv2.destroyAllWindows()

## Contour detection

Contour detection is a key technique in image processing for OCR. It helps in segmenting text blocks by identifying the outlines of characters or words. This is useful for cropping out text areas before feeding them to an OCR model.

In [None]:
import cv2

# Detect contours in the binary image
contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Loop through each detected contour
for cnt in contours:
    # Get bounding box coordinates (x, y, width, height)
    x, y, w, h = cv2.boundingRect(cnt)
    
    # Crop the detected text region from the image
    cropped = img[y:y+h, x:x+w]

    # Save each cropped text region as a separate image
    cv2.imwrite(f'cropped_{x}.png', cropped)