# Train OCR text Detector quick example

For train datasets please download last version of ocr datasets [https://nomeroff.net.ua/datasets/](https://nomeroff.net.ua/datasets/). Unpack archive and rename to **./datasets/ocr** .
For examle
```bash
cd ./datasets/ocr
wget https://nomeroff.net.ua/datasets/autoriaNumberplateOcrUa-2020-07-14.zip
unzip autoriaNumberplateOcrUa-2020-07-14.zip
mv autoriaNumberplateOcrUa-2020-07-14 ua
```
or use your own dataset.

In [1]:
import os
import sys
import warnings
warnings.filterwarnings('ignore')

# change this property
NOMEROFF_NET_DIR = os.path.abspath('../')

DATASET_NAME = "ua"
VERSION = "20"
MODE = "cpu"
PATH_TO_DATASET = os.path.join(NOMEROFF_NET_DIR, "datasets/ocr/", DATASET_NAME)
RESULT_MODEL_PATH = os.path.join(NOMEROFF_NET_DIR, "models/", 'anpr_ocr_{}_{}-{}.h5'.format(DATASET_NAME, VERSION, MODE))

FROZEN_MODEL_PATH = os.path.join(NOMEROFF_NET_DIR, "models/", 'anpr_ocr_{}_{}-{}.pb'.format(DATASET_NAME, VERSION, MODE))

sys.path.append(NOMEROFF_NET_DIR)

from NomeroffNet.Base import OCR, convert_keras_to_freeze_pb

Using TensorFlow backend.
W0716 13:21:26.974896 139624113284800 module_wrapper.py:139] From /mnt/data/var/www/html2/js/nomeroff-net_2/NomeroffNet/Detector.py:14: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0716 13:21:26.976128 139624113284800 module_wrapper.py:139] From /mnt/data/var/www/html2/js/nomeroff-net_2/NomeroffNet/Detector.py:16: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



In [2]:
class eu_ua_2004_2015(OCR):
    def __init__(self):
        OCR.__init__(self)
        # only for usage model
        # in train generate automaticly
        self.letters = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "E", "H", "I", "K", "M", "O", "P", "T", "X"]
        
        self.EPOCHS = 1

In [12]:
ocrTextDetector = eu_ua_2004_2015()
model = ocrTextDetector.prepare(PATH_TO_DATASET, aug_count=0)

GET ALPHABET
Max plate length in "val": 8
Max plate length in "train": 8
Max plate length in "test": 8
Letters train  {'M', '4', '7', '5', 'P', '1', '3', 'O', 'I', 'A', '8', 'K', 'T', 'B', '6', 'H', 'E', '0', 'C', '2', '9', 'X'}
Letters val  {'M', '4', '7', '5', 'P', '1', '3', 'O', 'I', 'A', '8', 'K', 'T', 'B', '6', 'E', 'H', '0', 'C', '2', '9', 'X'}
Letters test  {'M', '4', '7', '5', 'P', '1', '3', 'O', 'I', 'A', '8', 'K', 'T', 'B', '6', 'H', 'E', '0', 'C', '2', '9', 'X'}
Max plate length in train, test and val do match
Letters in train, val and test do match
Letters: 0 1 2 3 4 5 6 7 8 9 A B C E H I K M O P T X

EXPLAIN DATA TRANSFORMATIONS
Text generator output (data which will be fed into the neutral network):
1) the_input (image)
2) the_labels (plate number): AT7514CI is encoded as [10, 20, 7, 5, 1, 4, 12, 15]
3) input_length (width of image that is fed to the loss function): 30 == 128 / 4 - 2
4) label_length (length of plate number): 8
START BUILD DATA
DATA PREPARED


In [42]:
model = ocrTextDetector.train(mode=MODE)


START TRAINING
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
the_input_eu_ua_2004_2015 (Inpu (None, 128, 64, 1)   0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 128, 64, 16)  160         the_input_eu_ua_2004_2015[0][0]  
__________________________________________________________________________________________________
max1 (MaxPooling2D)             (None, 64, 32, 16)   0           conv1[0][0]                      
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, 64, 32, 16)   2320        max1[0][0]                       
_____________________________________________________________________________________________

In [14]:
ocrTextDetector.test(verbose=True)


RUN TEST

Predicted: 		 BC1210AT
True: 			 BC1210AM

Predicted: 		 BC9607CI
True: 			 BC9607CX

Predicted: 		 AA6935O
True: 			 AA6935TX

Predicted: 		 AO7026BO
True: 			 AC7026BO

Predicted: 		 CA0907CO
True: 			 CA0909CO

Predicted: 		 AI4637AE
True: 			 AI4637HE

Predicted: 		 BC4410AK
True: 			 BC4411AX

Predicted: 		 AC2846AP
True: 			 AC2046AP

Predicted: 		 AT0054BK
True: 			 AT0054BX

Predicted: 		 BK8390BT
True: 			 BK8393BT

Predicted: 		 CA1689BT
True: 			 CA1889BT

Predicted: 		 A46AE
True: 			 KA3616AE

Predicted: 		 CB5088BX
True: 			 CB5088BK

Predicted: 		 KA5577CA
True: 			 KA5517CA

Predicted: 		 CA3807TT
True: 			 CA3827TT

Predicted: 		 AI79979
True: 			 AI7907CP

Predicted: 		 T7079CA
True: 			 AM1031CA
Test processing time: 0.6961464881896973 seconds
acc: 0.9792429792429792


In [44]:
ocrTextDetector.save(RESULT_MODEL_PATH, verbose=True)
#model = ocrTextDetector.load(RESULT_MODEL_PATH)

SAVED TO /mnt/data/var/www/html2/js/nomeroff-net_2/models/anpr_ocr_ua_20-cpu.h5


In [13]:
model = ocrTextDetector.load(RESULT_MODEL_PATH)

### Convert keras OCR  .h5 model to .pb graph

In [6]:
import keras
keras.backend.clear_session()
model = ocrTextDetector.load(RESULT_MODEL_PATH)
convert_keras_to_freeze_pb(model, FROZEN_MODEL_PATH)

W0716 13:35:39.705186 139624113284800 module_wrapper.py:139] From /usr/local/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

W0716 13:35:39.705983 139624113284800 module_wrapper.py:139] From /usr/local/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:98: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.



OUTPUT: softmax_eu_ua_2004_2015/truediv
INPUT: the_input_eu_ua_2004_2015
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
the_input_eu_ua_2004_2015 (Inpu (None, 128, 64, 1)   0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 128, 64, 16)  160         the_input_eu_ua_2004_2015[0][0]  
__________________________________________________________________________________________________
max1 (MaxPooling2D)             (None, 64, 32, 16)   0           conv1[0][0]                      
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, 64, 32, 16)   2320        max1[0][0]                       
____________________________________

W0716 13:35:45.281709 139624113284800 deprecation.py:323] From /usr/local/lib64/python3.7/site-packages/tensorflow_core/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0716 13:35:45.511345 139624113284800 deprecation.py:323] From /usr/local/lib64/python3.7/site-packages/tensorflow_core/python/tools/freeze_graph.py:233: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
W0716 13:35:45.511840 139624113284800 deprecation.py:323] From /usr/local/lib64/python3.7/site-packages/tensorflow_core/python/framework/graph_util_impl.py:277: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and wil