
# Metrics

https://towardsdatascience.com/evaluating-ocr-output-quality-with-character-error-rate-cer-and-word-error-rate-wer-853175297510

Here we evaluate two metrics. Character Error Rate (CER) and Word Error Rate (WER).

CER calculation is based on the concept of Levenshtein distance, where we count the minimum number of character-level operations required to transform the ground truth text (aka reference text) into the OCR output.

Since this project involves transcription of particular sequences (e.g., social security number, phone number, etc.), then the use of CER won't be relevant.
On the other hand, Word Error Rate might be more applicable if it involves the transcription of paragraphs and sentences of words with meaning (e.g., pages of books, newspapers).


In [1]:
import cv2
import pytesseract
import fastwer
import os
import numpy as np
import pandas as pd
pytesseract.pytesseract.tesseract_cmd = 'C:\\Users\\Harle\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'

In [2]:
ref = 'my name is kenneth'
output = 'myy nime iz kenneth'

# CER
print(fastwer.score_sent(output, ref, char_level=True))

# WER
print(fastwer.score_sent(output, ref))

16.6667
75.0


In [3]:
# Create empty dataframe to store output
df_output = pd.DataFrame(columns = ['img_filename', 'ocr_output'])

In [4]:
os.chdir("C:\\Users\\Harle\\OneDrive - York St John University\\YEAR 3\\Work Based Project\\Code\\Text Localisation\\metrics\\sample_imgs")
for img in os.listdir():
    image = cv2.imread(img)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    custom_config = r'--oem 3 --psm 6'
    output = pytesseract.image_to_string(gray,config=custom_config)
    # print(output)
    dictionary = {'img_filename':img, 'ocr_output':str(output)}
    df_output = df_output.append(dictionary, ignore_index=True)

In [5]:
# Create new columns for reference, CER and WER
df_output['ref_text'] = ''
df_output['cer'] = ''
df_output['wer'] = ''

In [6]:
df_output

Unnamed: 0,img_filename,ocr_output,ref_text,cer,wer
0,bart-43253423.jpg,TVM No-.: Ho\nBART |\n\nsan Franeisco Int’l Ai...,,,
1,caffenero_20190831.jpg,Caffe Nero\n782 Edinburgh Airport Gate 12\nVAT...,,,
2,cheesecake-20191221_003.jpg,THE CHEESECAKE FACTORY\nLAS VEGAS\nOF27a TABLE...,,,
3,cowgirlcreamery-20190420_010.jpg,Cowgirl Creamery\n415-362-9354\nThe Ferry Plaz...,,,
4,gourmetmash_20190830.jpg,Makars Gourmet Mash Bar (West\nEnd)\n91 Shandw...,,,
5,haymarketedi_20190826.jpg,Haymarket\nEdinburgh\nEH12 SDS\n74179639~P0S-0...,,,
6,lebowski_20190828.jpg,"Lebouski’s\n""the dude abides’\nServer: MARCO 2...",,,
7,roxannes_599355.jpg,3\nRoxanne’s Cafe\n570 Powell Street\nSan fran...,,,
8,shakeshack_20181208_004.jpg,SHAKE SHACK\n3790 Las Vegas Blvd South\nHost: ...,,,
9,topgold-20191203.jpg,"TOPGOLF\n4627 Koval Lane\nLas Vegas, NV 89109\...",,,


In [7]:
# 1
df_output.loc[df_output['img_filename'] == 'bart-43253423.jpg', 'ref_text'] = '''TVM No-.: 5315
BART

san Francisco Int'l Airp
ort

497 North Link Road
DATE: O7/21/18

TIME: 10:11 PM

Debit Card Sale

CARD NO. 2825
AMOUNT $ 20.00
AUTHORIZATION DENIED
CONTACT BANK

Thanks for riding BART.'''

#2
df_output.loc[df_output['img_filename'] == 'caffenero_20190831.jpg', 'ref_text'] = '''Caffe Nero
782 Edinburgh Airport Gate 12
VAT: 795871659
1327 Barista 7
dK 30052
ake Away
1 Salted Caramel Cheesecake 2.95
1 Chai Latte 3.30
1 Latte Grande 3.05
Credit Card MPG GBP 9.30
1282516103619537900
1.06 VAT 20 % 6.35
Net Total: GBP 5.29
Subtotal GBP 9.30
Payment GBP 9.30
Change Due GBP 0.00
----------- Check Closed -----------
8/31/2019 11:15 AM
Tell us how we did today.
Visit www.mynerovisit com'''

#3
df_output.loc[df_output['img_filename'] == 'cheesecake-20191221_003.jpg', 'ref_text'] = '''THE CHEESECAKE FACTORY
LAS VEGAS
OF27a TABLE 82 #Party 1
BRANDON L SvrCk; 32 19:34 12/06/19
DINING ROOM
Separate checks: 2-of-4
| Great Basin Dr 7.50
1 Louisiana Chicken Pasta 18.50
1 Blue Moon Draft 7.50
1 Original Cheesecake 7.95
Sub Total: 41.45
Tax: 3.42
12/06 20:57 TOTAL : 44.87
Gratuity Not included
Suggested Gratuity:
22% 9.87
20% 8.97
18% 8.08
15% 6.73
We'd love to hear about your visit!
wow .ccfsurvey .com
Enter this code within 5 days:
1092-60171-02027
Join us for Brunch, Sat/Sun 10-2
For to-go orders, please visit
order thecheesecakefactory.com'''

#4
df_output.loc[df_output['img_filename'] == 'cowgirlcreamery-20190420_010.jpg', 'ref_text'] = '''Cowgirl Creamery
415-362-9354
The Ferry Plaza
One Embarcadero
TONY
Host: Gabino 04/08/2019
TONY 11:55 AM
40060
Far nouse GC 9.75
House Made Lemonade 2.95
Subtotal 12.70
Tax 1.08
Order Total 13.78
MC 13.78
Auth: NV97AY
Thank you for shopping at
Cowgirl Creamery!
Follow us on Instagram & Facebook
for the latest cheese news!
--- Check Closed ---'''

#5
df_output.loc[df_output['img_filename'] == 'gourmetmash_20190830.jpg', 'ref_text'] = '''Makars Gourmet Mash Bar (West
End)
91 Shandwick Place
Edinburgh, EH2 4SD
0131 228 5100
ORDER: Table 20
Cashier: Calli
30-Aug-2019 19:53:54
1 Lamb Shank £15.00
Bacon £0.00
1 Boar Sausage £12.00
Cheese £0.00
1 Stewarts Edinburgh Gold £4.00
2 Robert Burns Ale £9.00
1 Belhaven Craft Pilsner £4.00
2 Sticky Toffee £10.00
Total £54.00
VAT
% Net Tax Gross
20 45.00 9.00 54.00
EOTIS Ltd Reg# 342027 VAT No:
GB839829469
Online: https://eu.clover.com/r/
SOZHB8XAZKC3T
SOZHB8XAZKC3T
Order SOZHBBXAZKC3T
Clover Privacy Policy
https.//eu.clovet.com/pp/clover'''

#6
df_output.loc[df_output['img_filename'] == 'haymarketedi_20190826.jpg', 'ref_text'] = '''Haymarket
Edinburgh
EH12 SDS
ZA179639-P0S-04
Gary 26 Aug 2019 20: 16
Table: 26 c: 2 Acc No: 2747
4 Guinness Keg = 17.00
1 Steak Ale Pie = 11.00
| Chicken Mush Pie = 11.00
Product Group Summary
Food And Drink 39.00
20% VAT Net 32.50
20% VAT 6.50
20% VAT Total 39.00
Total £39.00
Nicholson's Gift Card - tue ° orfect
gift for pub lovers! wist °: a memuer
of our tea today ci gy ta:
wuw.nicholscnssuos.co.t+ iftcards
Tel 0121 228 2!
VAT 2¢2 185'''

#7
df_output.loc[df_output['img_filename'] == 'lebowski_20190828.jpg', 'ref_text'] = '''Lebowski’s
"the dude abides’
Server: MARCO 28/08/2019
Cashier: MICHEAL
Table 14/1 9:49 PM
Guests: 2 30044
MAC WAIN 9.95

FISH & CHIPS 10.95
Bell field ipa (4 @4.60) 18.40
Guinness Pt (2 @4.80) 9.60
Subtotal 40.75
Tax 8.15
Total 48.90
Balance Due 48.90

18 Morrison Street

Edinburgh, EH3 8BJ

WiFi Password: lebowskis1013

Tel No: 0131 466 1779

VAT No: 170 5101 51

www. lebowskis.co.uk'''

#8
df_output.loc[df_output['img_filename'] == 'roxannes_599355.jpg', 'ref_text'] = '''Roxanne’s Cafe
570 Powell Street
San francisco, California
Tel: (415) 989-5555
Check #: 599355
Server: Masha Date: 05/12/2019
Table: 21 Time: 09:29
Client: 1
1 Coffee 2.75
1 Grande 15.00
SUB-TOTAL : 17.75
S.F.H.O (0): 0.53
Sales Tax1: 1.60
Total: 19.88
For General Comments About
Your Dining Experience
Please Contact Us
We Are Obligated to
Collect a Surcharge for
the SF Health Care
Security Ordinance'''

#9
df_output.loc[df_output['img_filename'] == 'shakeshack_20181208_004.jpg', 'ref_text'] = '''SHAKE SHACK
3790 Las Vegas Blvd South
Host: Lisa 12/02/2018
166 WALTER 11:44 AM
60042
DBL ShackBurger 8.69
Bacon Cheese Fries 4.89
Shake 5.49
Vanilla Shake
Subtotal 19.07
Tax 1.57
To Stay Total 20.64
MasterCard #XXXXXXXXXXXX2825 20.64
Auth: NGJJ58
We wanna hear ya! Take our survey
for $5 off your next $20 App order.
http://bit.1ly/shack-survey-1130
--- Check Closed ---'''

#10
df_output.loc[df_output['img_filename'] == 'topgold-20191203.jpg', 'ref_text'] = '''TOPGOLF
4627 Koval Lane
Las Vegas, NV 89109

Check 465 Tab 111/WALTER
Caitlyn B. 12/3/2019
Guests 1 9:13 PM
4 Membership Charge (5.00) 20.00

Topgolf Gameplay 63.00

Bud Light Draft 7.25

Lagunitas IPA Pint 8.25

SA New England IPA 8.25

Heineken Can 8.25
Subtotal 115.00
Tax 2.64
TOTAL 117.64
BALANCE DUE 117.64

Thank you for visiting Topgolf! For

Upcoming events and promotions visit

Topgolf.com and follow us @topgolf

on Facebook, Twitter or Instagram.'''

In [8]:
# Replace new lines in output/ref text with space
df_output['ocr_output'] = df_output['ocr_output'].apply(lambda x: x.replace('\n',' ').upper())
df_output['ref_text'] = df_output['ref_text'].apply(lambda x: x.replace('\n',' ').upper())


In [9]:
for index, row in df_output.iterrows():
  filename = row['img_filename']
  ref = row['ref_text']
  output = row['ocr_output']
  cer = fastwer.score_sent(output, ref, char_level=True)
  wer = fastwer.score_sent(output, ref, char_level=False)
  df_output.loc[df_output['img_filename'] == filename, 'cer'] = round(cer,2) # Round value to 2 decimal places
  df_output.loc[df_output['img_filename'] == filename, 'wer'] = round(wer,2)

df_output

Unnamed: 0,img_filename,ocr_output,ref_text,cer,wer
0,bart-43253423.jpg,TVM NO-.: HO BART | SAN FRANEISCO INT’L AIRP ...,TVM NO-.: 5315 BART SAN FRANCISCO INT'L AIRP ...,8.21,19.51
1,caffenero_20190831.jpg,CAFFE NERO 782 EDINBURGH AIRPORT GATE 12 VAT: ...,CAFFE NERO 782 EDINBURGH AIRPORT GATE 12 VAT: ...,1.72,5.71
2,cheesecake-20191221_003.jpg,THE CHEESECAKE FACTORY LAS VEGAS OF27A TABLE 6...,THE CHEESECAKE FACTORY LAS VEGAS OF27A TABLE 8...,1.43,6.45
3,cowgirlcreamery-20190420_010.jpg,COWGIRL CREAMERY 415-362-9354 THE FERRY PLAZA ...,COWGIRL CREAMERY 415-362-9354 THE FERRY PLAZA ...,4.07,10.53
4,gourmetmash_20190830.jpg,MAKARS GOURMET MASH BAR (WEST END) 91 SHANDWIC...,MAKARS GOURMET MASH BAR (WEST END) 91 SHANDWIC...,0.54,1.23
5,haymarketedi_20190826.jpg,HAYMARKET EDINBURGH EH12 SDS 74179639~P0S-04 G...,HAYMARKET EDINBURGH EH12 SDS ZA179639-P0S-04 G...,1.35,3.49
6,lebowski_20190828.jpg,"LEBOUSKI’S ""THE DUDE ABIDES’ SERVER: MARCO 28/...","LEBOWSKI’S ""THE DUDE ABIDES’ SERVER: MARCO 28/...",1.28,5.63
7,roxannes_599355.jpg,3 ROXANNE’S CAFE 570 POWELL STREET SAN FRANCIS...,ROXANNE’S CAFE 570 POWELL STREET SAN FRANCISCO...,4.46,7.69
8,shakeshack_20181208_004.jpg,SHAKE SHACK 3790 LAS VEGAS BLVD SOUTH HOST: LI...,SHAKE SHACK 3790 LAS VEGAS BLVD SOUTH HOST: LI...,0.27,0.0
9,topgold-20191203.jpg,"TOPGOLF 4627 KOVAL LANE LAS VEGAS, NV 89109 C...","TOPGOLF 4627 KOVAL LANE LAS VEGAS, NV 89109 C...",0.65,1.2


In [10]:
# Overall performances
mean_cer = df_output['cer'].mean()
mean_wer = df_output['wer'].mean()
print(f'Mean CER = {round(mean_cer,3)}%, Mean WER = {round(mean_wer,3)}%')

Mean CER = 2.398%, Mean WER = 6.144%


#