Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do CTC Decode on text-recognition output? #113

Closed
Nosferath opened this issue May 9, 2019 · 1 comment
Closed

How to do CTC Decode on text-recognition output? #113

Nosferath opened this issue May 9, 2019 · 1 comment

Comments

@Nosferath
Copy link

Nosferath commented May 9, 2019

I am using the text-recognition model to perform OCR. The model has an output that I have been unable to decode.
This is the class I'm using to generate the model output.

class TextRecognizer:
    def __init__(self):
        model_xml = 'models/text-recognition-0012.xml'
        model_bin = 'models/text-recognition-0012.bin'
        
        plugin = IEPlugin(device='CPU')
        net = IENetwork(model=model_xml, weights=model_bin)
        plugin.add_cpu_extension("/home/claudio/inference_engine_samples_build/intel64/Release/lib/libcpu_extension.so")
        
        supported_layers = plugin.get_supported_layers(net)
        not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers]
        if len(not_supported_layers) != 0:
            print('thing')
        
        print("Preparing input blobs")
        self.input_blob = next(iter(net.inputs))
        self.out_blob = next(iter(net.outputs))
        
        print("Loading model to the plugin")
        self.exec_net = plugin.load(network=net)
        del net
        
    def process(self, img_source):
        img = img_source.copy()
        input_width = 120
        input_height = 32
        img_height = img.shape[0]
        img_width = img.shape[1]
        # rw = img_width/float(input_width)
        # rh = img_height/float(input_height)
        #img = cv2.resize(img, (input_width, input_height))
        plot_gray(img)
        blob = cv2.dnn.blobFromImage(img, 1.0, (input_width, input_height))
        tt.tic()
        
        res = self.exec_net.infer()
        res = res[self.out_blob]
        tt.toc("Infer")
        #print(res.shape)
        #print(res)
        return res

And this is the function I wrote for CTC Decoding.

symbols = "0123456789abcdefghijklmnopqrstuvwxyz#"

def ctc_decoder(data, alphabet):
    result = ""
    prev_pad = False
    num_classes = len(alphabet)
    for i in range(data.shape[0]):
        symbol = alphabet[np.argmax(data[i])]
        if symbol != alphabet[-1]:
            if len(result) == 0 or prev_pad or (len(result) > 0 and symbol != result[-1]):
                prev_pad = False
                result = result + symbol
        else:
            prev_pad = True
    return result

The text detection demo outputs the expected text, but with my decoder I only get nonsense. How do I properly use and decode the model?

@Nosferath
Copy link
Author

Solved. I was missing the argument in this line:

res = self.exec_net.infer({self.input_blob: blob})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant