anyway to get "Confidence" metric? #19

mattfeury · 2017-10-09T15:40:46Z

Hello,

I'm interested in knowing a "confidence" for a given prediction. Does anyone have any ideas on the best way to tackle this? I assume there is some output in the graph (potentially for each character?) that I could tap into to calculate this. Hope to play with this later this week but wanted to see if anyone had any ideas first.

mattfeury · 2017-10-16T16:30:24Z

so dug in a little bit here and wanted to report back. still seeing some funkiness, but here's where i'm at:

Looking at this code:

attention-ocr/aocr/model/model.py

Lines 177 to 179 in 4505228

    
           for l in xrange(len(self.attention_decoder_model.output)): 
        
               guess = tf.argmax(self.attention_decoder_model.output[l], axis=1) 
        
               num_feed.append(guess)

This seems to be where we get the predictions back for each AttentionDecoder (this list looks to be equal in size to MAX_PREDICTION which makes sense, and each list in that list looks to be of size TARGET_VOCAB_SIZE which also makes sense). To me this seems like the prediction values for each character for each decoder. so i extended this code to add a node with this entire list so i could get it at predict time:

num_feed = []
allProbabilities = []

for l in xrange(len(self.attention_decoder_model.output)):
    outputs = self.attention_decoder_model.output[l]
    guess = tf.argmax(outputs, axis=1)
    num_feed.append(guess)
    allProbabilities.append(outputs)

all_probs_output = tf.convert_to_tensor(allProbabilities, name = "allProbabilities")

THEN, during predict time, i'm able to get the output of this tensor, softmax each list to turn it into probabilities, take the max probability for each AttentionDecoder, and either take the mean or the product of all the value, e.g.:

allProbs = graph.get_tensor_by_name('prefix/allProbabilities:0')
np.mean(softmax(allProps).max(axis=2)) 

with tf.Session(graph=graph) as sess:
    (y_out, probs_output) = sess.run([y,allProbs], feed_dict={
        x: [img]
    })

    return {
        "predictions": [{
            "ocr": str(y_out),
            "confidence": str(np.mean(softmax(probs_output).max(axis=2)))
        }]
    };

HOWEVER, it seems like everything i'm getting is around 50%-60% which is wildly unhelpful. Any ideas on where I'm going wrong here? I tried to use np.prod which seemed more accurate to me, but then i was getting most values <1%.

reidjohnson · 2017-10-17T07:33:45Z

The logic seems right (when using the product). I have a different implementation that I believe/hope is working that might serve as a useful (though perhaps inefficient) reference.

I basically populate a probability tensor in parallel with the prediction tensor:

num_feed = []
prb_feed = []

for l in range(len(self.attention_decoder_model.output)):
    guess = tf.argmax(self.attention_decoder_model.output[l], axis=1)
    proba = tf.reduce_max(
        tf.nn.softmax(self.attention_decoder_model.output[l]), axis=1)
    num_feed.append(guess)
    prb_feed.append(proba)

# Join the predictions into a single output string.
trans_output = tf.transpose(num_feed)
trans_output = tf.map_fn(
    lambda m: tf.foldr(
        lambda a, x: tf.cond(
            tf.equal(x, DataGen.EOS_ID),
            lambda: '',
            lambda: table.lookup(x) + a
        ),
        m,
        initializer=''
    ),
    trans_output,
    dtype=tf.string
)

# Calculate the total probability of the output string.
trans_outprb = tf.transpose(prb_feed)
trans_outprb = tf.gather(trans_outprb, tf.range(tf.size(trans_output)))
trans_outprb = tf.map_fn(
    lambda m: tf.foldr(
        lambda a, x: tf.multiply(tf.cast(x, tf.float64), a),
        m,
        initializer=tf.cast(1, tf.float64)
    ),
    trans_outprb,
    dtype=tf.float64
)

self.prediction = tf.cond(
    tf.equal(tf.shape(trans_output)[0], 1),
    lambda: trans_output[0],
    lambda: trans_output,
)
self.probability = tf.cond(
    tf.equal(tf.shape(trans_outprb)[0], 1),
    lambda: trans_outprb[0],
    lambda: trans_outprb,
)

self.prediction = tf.identity(self.prediction, name='prediction')
self.probability = tf.identity(self.probability, name='probability')

I then add it to the output feed at each step:

if not forward_only:
    output_feed += [
        self.summaries_by_bucket[0],
        self.updates[0],
        self.prediction,
    ]
else:
    output_feed += [
        self.prediction,
        self.probability,
    ]
    if self.visualize:
        output_feed += self.attention_decoder_model.attention_weights_history

outputs = self.sess.run(output_feed, input_feed)

res = {
    'loss': outputs[0],
}

if not forward_only:
    res['summaries'] = outputs[1]
    res['prediction'] = outputs[3]
else:
    res['prediction'] = outputs[1]
    res['probability'] = outputs[2]
    if self.visualize:
        res['attentions'] = outputs[3:]

Apologies for the long code snippets.

mattfeury · 2017-10-17T19:19:05Z

thanks for the feedback! just implemented yours and it is working fine, but some of the numbers still seem funny to me. for instance, on values that match 100%, i see probabilities that range from 18%-99% (e.g. 45%, 27%, 61%, 34%, 99%). and on values that are fairly wrong, i see anything from 17% - 99.8% (e.g. 54%, 99%, 95%, 32%, 70%).

it's possible it's just my dataset, but i'm surprised to see such disparity. have you been able to run this with your dataset and feel confident in the results?

theoretically everything makes sense, so i don't think it's an issue with the implementation. just surprised to see such wide range of results. i guess i don't have confidence in the confidence score.

mattfeury · 2017-10-17T19:37:01Z

just piggybacking off my comment, i have my max prediction length set for 12, which gives me 12 probability lists. however, the overwhelming majority of my training set is ~6 characters, which is potentially why i'm seeing such crazy probability values. perhaps it's those last 7-12 tensors that are just "not confident" dragging my prediction down? if i down max_prediction_length to 8, i get some more bearable values

reidjohnson · 2017-10-17T22:58:40Z

Interesting. I can confirm that the probabilities make sense on my dataset and task (which has a max prediction length of 20). It might also be that the model simply hasn't run for long enough, so the probability range is still quite wide for any given label, but would narrow with additional training.

ckirmse · 2017-10-18T01:43:38Z

It would be great to get the confidence per character exposed in the outputs via a PR, if you guys don't mind sharing the great work! It's cool that it seems we have a few of us actively using and improving this repo... let's help each other out :).

emedvedev · 2017-10-18T02:08:57Z

It's cool that it seems we have a few of us actively using and improving this repo... let's help each other out :).

Thank you guys for that, your help on this project is really appreciated. ❤️ I don't have too much time for it, since it's just a side project for me, but I'll help out where I can — and of course happily review contributions and publish the new versions to pip.

mattfeury · 2017-10-18T15:41:51Z

can submit a PR for overall confidence today. would love to get confidence per character at some point, because honestly i'd love to give a list of possible solutions instead of just the most probable. e.g. here are 20 potential OCRs with their confidence score. but that would be a later thing unless someone wants to jump on it

mattfeury · 2017-10-18T21:32:12Z

PR submitted!

ckirmse · 2017-10-18T22:25:37Z

Totally agreed about the list of possible solutions being ideal--I'm running into a lot of '0' vs 'O' and '1' vs 'l' choices that don't always come out correctly. I'm looking forward to trying out the overall confidence, though too.

emedvedev · 2017-10-19T08:41:15Z

Merged the overall probability, so I'll close this issue, and the multiple guesses are going to be tracked in #28.

This was referenced Oct 18, 2017

Poor results using exported model in some cases #25

Closed

Probable Cause #27

Merged

mattfeury mentioned this issue Oct 18, 2017

Return multiple guesses and probabilities for a given image #28

Open

emedvedev closed this as completed Oct 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anyway to get "Confidence" metric? #19

anyway to get "Confidence" metric? #19

mattfeury commented Oct 9, 2017

mattfeury commented Oct 16, 2017 •

edited

reidjohnson commented Oct 17, 2017 •

edited

mattfeury commented Oct 17, 2017

mattfeury commented Oct 17, 2017

reidjohnson commented Oct 17, 2017

ckirmse commented Oct 18, 2017

emedvedev commented Oct 18, 2017

mattfeury commented Oct 18, 2017

mattfeury commented Oct 18, 2017

ckirmse commented Oct 18, 2017

emedvedev commented Oct 19, 2017

anyway to get "Confidence" metric? #19

anyway to get "Confidence" metric? #19

Comments

mattfeury commented Oct 9, 2017

mattfeury commented Oct 16, 2017 • edited

reidjohnson commented Oct 17, 2017 • edited

mattfeury commented Oct 17, 2017

mattfeury commented Oct 17, 2017

reidjohnson commented Oct 17, 2017

ckirmse commented Oct 18, 2017

emedvedev commented Oct 18, 2017

mattfeury commented Oct 18, 2017

mattfeury commented Oct 18, 2017

ckirmse commented Oct 18, 2017

emedvedev commented Oct 19, 2017

mattfeury commented Oct 16, 2017 •

edited

reidjohnson commented Oct 17, 2017 •

edited