# OCR on PAN Card

###  Procedure followed

1.After inputting image converted it to gray scale because it helps in noise reduction and reduces time for mathematical computation on images(colour is of no use for our task).

2.Detected horizontal lines/edges in the image to know whether image is tilted by any angle or not.

3.After detecting horizontal lines I calculated angle of that lines and calculated median of the lines so as to get average of all the lines detected.

4.Rotated the image if median angle of the obtained lines is not equal to zero .

5.Applied pytesseract library to detect text from the image.

6.Processed the text obtained.

7.Applied regex to detect date and PAN card number from the text .



# Code

In [1]:
import os
import ftfy
import pytesseract
import re
import math
from scipy import ndimage
import cv2
import numpy as np

#os.chdir('C:\\Users\\santosh\\Desktop')
# reading file from path
def ocr_on_pan(image_path):
    
    img = cv2.imread(image_path)
    

    img_before = cv2.imread(image_path)
    img_gray = cv2.cvtColor(img_before, cv2.COLOR_BGR2GRAY)# converting image to gray scale to remove noise and colour complexity
    img_edges = cv2.Canny(img_gray, 100, 100, apertureSize=3)# canny edge detector detects edges in an image
    
#Calculating rotation angle of the image
    lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=100, maxLineGap=5 )#detects any shape but here horizontal lines if any

    # Caclulating angle of lines detected 
    angles = []

    for [[x1, y1, x2, y2]] in lines:
        cv2.line(img_before, (x1, y1), (x2, y2), (255, 0, 0), 3)
        angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
        angles.append(angle)
    median_angle = np.median(angles)
    print("image rotated by",median_angle,"angle")
    
    

# If-else block to check and rotate image and applying  pytesseract on final rotated images
    if median_angle!=0:
        img_rotated = ndimage.rotate(img, (median_angle*2-1))
    # extracting text from image using tesseract
        text = pytesseract.image_to_string(img_rotated)
    else:
        text = pytesseract.image_to_string(img_gray)

    text = ftfy.fix_text(text)

    
    
# Applying regex to obtain date(dob) and Pan number
    dob=re.search(r'\d{2}/\d{2}/\d{4}', text)    # regex for date
    pan_no=re.search('[A-Z]{5}[0-9]{4}[A-Z]{1}',text)  #regex for pan_number
    print("pan card number:",pan_no.group()) 
    print("dob on pan card:",dob.group())


# For Image 1

In [2]:
ocr_on_pan('pan1.jpg')

image rotated by 0.0 angle
pan card number: ELWPM8089J
dob on pan card: 30/01/1997


# For Image 2

In [3]:
ocr_on_pan('pan2.jpg')

image rotated by 0.0 angle
pan card number: BXAPC1000L
dob on pan card: 13/09/1996


# For Image 3

In [4]:
ocr_on_pan('pan3.jpg')

image rotated by -45.0 angle
pan card number: BGYPJ0129A
dob on pan card: 18/03/1996
