Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorized viterbi decoding #1

Closed
wants to merge 2 commits into from
Closed

Conversation

kmike
Copy link
Collaborator

@kmike kmike commented Aug 3, 2013

Hi,

Here is vectorized viterbi function. I didn't get how to vectorize it using outer products, so I just rewrote inner loop using numpy vectorized operations. For my data (20 states, 30k samples) it makes StructuredPerceptron.fit 4x faster.

For (20 states, 30k samples) data it makes StructuredPerceptron.fit 4x faster
@@ -48,15 +48,13 @@ def viterbi(Phi, trans, init, final):
dp[0, :] = init + Phi[0, :]

backp = np.empty((n_samples, n_states), dtype=np.int32)
_idx = np.mgrid[0:n_states]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from arange(n_states) in any way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsmans
Copy link
Owner

larsmans commented Aug 3, 2013

Thanks! What's the number of samples? In my profiling runs, I found that Viterbi was actually quite cheap compared to the sparse dot products in fit.

@kmike
Copy link
Collaborator Author

kmike commented Aug 3, 2013

X is the following:

<30640x28350 sparse matrix of type '<type 'numpy.int32'>'
    with 380735 stored elements in Compressed Sparse Row format>

Viterbi is still a bottleneck in this case, even after vectorizing:

231360 function calls (231225 primitive calls) in 6.115 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      135    2.314    0.017    2.781    0.021 _decode.py:24(viterbi)
      270    1.672    0.006    1.675    0.006 _utils.py:10(count_trans)
        1    1.112    1.112    6.113    6.113 perceptron.py:51(fit)
    68460    0.301    0.000    0.301    0.000 {method 'argmax' of 'numpy.ndarray' objects}
      540    0.217    0.000    0.217    0.000 {method 'ravel' of 'numpy.ndarray' objects}
      270    0.121    0.000    0.451    0.002 compressed.py:272(_mul_multivector)
    68461    0.104    0.000    0.104    0.000 {method 'reshape' of 'numpy.ndarray' objects}
    68325    0.059    0.000    0.360    0.000 fromnumeric.py:684(argmax)
      548    0.054    0.000    0.054    0.000 {numpy.core.multiarray.zeros}
      135    0.037    0.000    0.037    0.000 {_csr.csr_matvecs}
      135    0.023    0.000    0.023    0.000 {_csc.csc_matvecs}
      270    0.013    0.000    0.026    0.000 compressed.py:103(check_format)
      270    0.010    0.000    0.490    0.002 extmath.py:70(safe_sparse_dot)
      135    0.008    0.000    0.008    0.000 {_csr.get_csr_submatrix}
      270    0.008    0.000    0.009    0.000 compressed.py:633(prune)
      270    0.005    0.000    0.038    0.000 compressed.py:22(__init__)
      405    0.004    0.000    0.005    0.000 sputils.py:95(isintlike)
  270/135    0.003    0.000    0.042    0.000 csr.py:189(__getitem__)

@kmike
Copy link
Collaborator Author

kmike commented Aug 3, 2013

ah, sequences are quite long in my case: 30000 samples from the above correspond to just 75 sequences.

@larsmans
Copy link
Owner

larsmans commented Aug 3, 2013

Ok, pulled as aa27b20, thanks!

@larsmans larsmans closed this Aug 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants