# Train OCR text Detector quick example

For train datasets please download last version of ocr datasets [https://nomeroff.net.ua/datasets/](https://nomeroff.net.ua/datasets/). Unpack archive and rename to **./datasets/ocr** .
For examle
```bash
cd ./datasets/ocr
wget https://nomeroff.net.ua/datasets/autoriaNumberplateOcrUa-1995-2019-07-30.tar.gz
tar -xvf autoriaNumberplateOcrUa-1995-2019-07-30.tar.gz
mv autoriaNumberplateOcrUa-1995-2019-07-30 ua-1995
```
or use your own dataset.

In [1]:
import os
import sys
import warnings
warnings.filterwarnings('ignore')

# change this property
NOMEROFF_NET_DIR = os.path.abspath('../')

DATASET_NAME = "ua-1995"
VERSION = "2021_01_12_tensorflow_v2"
PATH_TO_DATASET = os.path.join(NOMEROFF_NET_DIR, "datasets/ocr/", DATASET_NAME)
RESULT_MODEL_PATH = os.path.join(NOMEROFF_NET_DIR, "models/", 'anpr_ocr_{}_{}.h5'.format(DATASET_NAME, VERSION))

sys.path.append(NOMEROFF_NET_DIR)

from NomeroffNet.Base import OCR

In [2]:
class eu_ua_1995(OCR):
    def __init__(self):
        OCR.__init__(self)
        # only for usage model
        # in train generate automaticly
        self.letters = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'E', 'H', 'I', 'K', 'M', 'O', 'P', 'T', 'X']
        
        self.EPOCHS = 5

In [3]:
ocrTextDetector = eu_ua_1995()
model = ocrTextDetector.prepare(PATH_TO_DATASET, use_aug=0)

GET ALPHABET
Max plate length in "val": 7
Max plate length in "train": 7
Max plate length in "test": 7
Letters train  {'8', 'E', '0', 'C', 'T', '9', 'B', 'I', 'X', 'O', '7', 'M', '5', '6', '3', 'A', 'H', 'P', '1', 'K', '4', '2'}
Letters val  {'8', '0', 'E', 'C', 'T', '9', 'B', 'I', 'X', 'O', '7', 'M', '5', '6', '3', 'A', 'H', 'P', '1', 'K', '4', '2'}
Letters test  {'8', '0', 'E', 'C', 'T', '9', 'B', 'I', 'X', 'O', '7', 'M', '5', '6', '3', 'A', 'H', '1', 'P', 'K', '4', '2'}
Max plate length in train, test and val do match
Letters in train, val and test do match
Letters: 0 1 2 3 4 5 6 7 8 9 A B C E H I K M O P T X

EXPLAIN DATA TRANSFORMATIONS
Text generator output (data which will be fed into the neutral network):
1) the_input (image)
2) the_labels (plate number): 20829BO is encoded as [2, 0, 8, 2, 9, 11, 18]
3) input_length (width of image that is fed to the loss function): 30 == 128 / 4 - 2
4) label_length (length of plate number): 7
START BUILD DATA
DATA PREPARED


In [4]:
model = ocrTextDetector.train(is_random=0)


START TRAINING
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
the_input_eu_ua_1995 (InputLaye [(None, 128, 64, 1)] 0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 128, 64, 16)  160         the_input_eu_ua_1995[0][0]       
__________________________________________________________________________________________________
max1 (MaxPooling2D)             (None, 64, 32, 16)   0           conv1[0][0]                      
__________________________________________________________________________________________________
conv2 (Conv2D)                  (None, 64, 32, 16)   2320        max1[0][0]                       
______________________________________________________________________________

KeyboardInterrupt: 

In [None]:
ocrTextDetector.test(verbose=True)

In [None]:
ocrTextDetector.save(RESULT_MODEL_PATH, verbose=True)

In [None]:
# Train with aug
for i in range(0,12):
    ocrTextDetector = eu_ua_1995()
    ocrTextDetector.EPOCHS = 1

    model = ocrTextDetector.prepare(PATH_TO_DATASET, use_aug=True)

    model = ocrTextDetector.train(load_last_weights=True, is_random=i)
    
    ocrTextDetector.test(verbose=True)

    ocrTextDetector.save(os.path.join(NOMEROFF_NET_DIR, "models/", 'anpr_ocr_{}_{}_{}.h5'.format(DATASET_NAME, VERSION, i)), verbose=True)
    #!cp ../train/buff_weights_kg.h5 ../train/buff_weights.h5


In [None]:
ocrTextDetector.test(verbose=True)

In [None]:
ocrTextDetector.save(RESULT_MODEL_PATH, verbose=True)