vectorized viterbi decoding #1

kmike · 2013-08-03T11:12:33Z

Hi,

Here is vectorized viterbi function. I didn't get how to vectorize it using outer products, so I just rewrote inner loop using numpy vectorized operations. For my data (20 states, 30k samples) it makes StructuredPerceptron.fit 4x faster.

For (20 states, 30k samples) data it makes StructuredPerceptron.fit 4x faster

larsmans · 2013-08-03T11:50:57Z

seqlearn/_decode.py

@@ -48,15 +48,13 @@ def viterbi(Phi, trans, init, final):
    dp[0, :] = init + Phi[0, :]

    backp = np.empty((n_samples, n_states), dtype=np.int32)
+    _idx = np.mgrid[0:n_states]


Is this different from arange(n_states) in any way?

arange(n_states) works. I blindly followed an advice from here: http://numpy-discussion.10968.n7.nabble.com/using-numpy-argmax-to-index-into-another-array-td26387.html

larsmans · 2013-08-03T11:53:05Z

Thanks! What's the number of samples? In my profiling runs, I found that Viterbi was actually quite cheap compared to the sparse dot products in fit.

kmike · 2013-08-03T11:58:25Z

X is the following:

<30640x28350 sparse matrix of type '<type 'numpy.int32'>'
    with 380735 stored elements in Compressed Sparse Row format>

Viterbi is still a bottleneck in this case, even after vectorizing:

231360 function calls (231225 primitive calls) in 6.115 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      135    2.314    0.017    2.781    0.021 _decode.py:24(viterbi)
      270    1.672    0.006    1.675    0.006 _utils.py:10(count_trans)
        1    1.112    1.112    6.113    6.113 perceptron.py:51(fit)
    68460    0.301    0.000    0.301    0.000 {method 'argmax' of 'numpy.ndarray' objects}
      540    0.217    0.000    0.217    0.000 {method 'ravel' of 'numpy.ndarray' objects}
      270    0.121    0.000    0.451    0.002 compressed.py:272(_mul_multivector)
    68461    0.104    0.000    0.104    0.000 {method 'reshape' of 'numpy.ndarray' objects}
    68325    0.059    0.000    0.360    0.000 fromnumeric.py:684(argmax)
      548    0.054    0.000    0.054    0.000 {numpy.core.multiarray.zeros}
      135    0.037    0.000    0.037    0.000 {_csr.csr_matvecs}
      135    0.023    0.000    0.023    0.000 {_csc.csc_matvecs}
      270    0.013    0.000    0.026    0.000 compressed.py:103(check_format)
      270    0.010    0.000    0.490    0.002 extmath.py:70(safe_sparse_dot)
      135    0.008    0.000    0.008    0.000 {_csr.get_csr_submatrix}
      270    0.008    0.000    0.009    0.000 compressed.py:633(prune)
      270    0.005    0.000    0.038    0.000 compressed.py:22(__init__)
      405    0.004    0.000    0.005    0.000 sputils.py:95(isintlike)
  270/135    0.003    0.000    0.042    0.000 csr.py:189(__getitem__)

kmike · 2013-08-03T12:20:56Z

ah, sequences are quite long in my case: 30000 samples from the above correspond to just 75 sequences.

larsmans · 2013-08-03T14:16:16Z

Ok, pulled as aa27b20, thanks!

vectorized viterbi decoding

697ae3f

For (20 states, 30k samples) data it makes StructuredPerceptron.fit 4x faster

larsmans reviewed Aug 3, 2013
View reviewed changes

COSMIT use less obscure np.arange

469dee4

larsmans closed this Aug 3, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorized viterbi decoding #1

vectorized viterbi decoding #1

kmike commented Aug 3, 2013

larsmans Aug 3, 2013

kmike Aug 3, 2013

larsmans commented Aug 3, 2013

kmike commented Aug 3, 2013

kmike commented Aug 3, 2013

larsmans commented Aug 3, 2013

vectorized viterbi decoding #1

vectorized viterbi decoding #1

Conversation

kmike commented Aug 3, 2013

larsmans Aug 3, 2013

Choose a reason for hiding this comment

kmike Aug 3, 2013

Choose a reason for hiding this comment

larsmans commented Aug 3, 2013

kmike commented Aug 3, 2013

kmike commented Aug 3, 2013

larsmans commented Aug 3, 2013