Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactored _log_probability function to avoid gil in multithreading #1008

Closed

Conversation

ghost
Copy link

@ghost ghost commented Nov 28, 2022

I noticed that when using training HiddenMarkovModel with emissions composed by ExponentialDistribution wrapped in IndependentComponentsDistribution, the training was orders of magnitude slower than when using ExponentialDistribution directly. I was able to replicate the same behaviour also with NormalDistribution and so I narrowed it down to IndependentComponentsDistribution.

The slower speed is noticable only when using multithreading in fit (n_jobs > 1).
I was able to narrow down the cause to the function _log_probability of IndependentComponentsDistribution.
The following version of the function leads to slow behavior:

cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
        cdef int i, j
        cdef double logp
        
        memset(log_probability, 0, n*sizeof(double))
        
        for i in range(n):
	        for j in range(self.d):
                          if self.cython == 1:
	                          (<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
                          else:
	                          with gil:
		                          python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)

                          log_probability[i] += logp * self.weights_ptr[j]

But the following is fast

cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
        cdef int i, j
        cdef double logp
        
        memset(log_probability, 0, n*sizeof(double))
        
        for i in range(n):
	        for j in range(self.d):
                          if True or self.cython == 1: # notice the True here
	                          (<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
                          else:
	                          with gil:
		                          python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)

                          log_probability[i] += logp * self.weights_ptr[j]

This happens even though self.cython is always 1, and so the else block is never executed.
Moreover, this is also slow:

cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
        cdef int i, j
        cdef double logp
        
        memset(log_probability, 0, n*sizeof(double))
        
        for i in range(n):
	        for j in range(self.d):
                          if  self.cython == 1 or True: # notice the True here with reverse order
	                          (<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
                          else:
	                          with gil:
		                          python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)

                          log_probability[i] += logp * self.weights_ptr[j]

I think that branch in the for loop somehow makes it harder for cython to compile efficient code.
Here a complete reproducible example that elicits the slow behaviour:

from pomegranate.distributions import ExponentialDistribution, IndependentComponentsDistribution
from pomegranate.base import State
from pomegranate.io import SequenceGenerator
from pomegranate.hmm import HiddenMarkovModel
import numpy as np

def get_model(A_d, B_d):
    model = HiddenMarkovModel()
   
    A_s = State(A_d, name="A")
    B_s = State(B_d, name="B")

    model.add_states([A_s, B_s])

    model.add_transition(
        a=model.start,
        b=A_s,
        probability=0.5,
    )
    model.add_transition(
        a=model.start,
        b=B_s,
        probability=0.5,
    )
    model.add_transition(
        a=A_s,
        b=A_s,
        probability=0.99,
    )
    model.add_transition(
        a=B_s,
        b=B_s,
        probability=0.99,
    )
    model.add_transition(
        a=A_s,
        b=B_s,
        probability=0.01,
    )
    model.add_transition(
        a=B_s,
        b=A_s,
        probability=0.01,
    )

    model.bake(verbose=True)
    
    return model

model = get_model(
    IndependentComponentsDistribution(
        [ExponentialDistribution(1), ExponentialDistribution(1), ExponentialDistribution(1)]
    ),
    IndependentComponentsDistribution(
        [ExponentialDistribution(100), ExponentialDistribution(100), ExponentialDistribution(100)]
    )
)
X = np.zeros((1000, 3))
X[:500] = 1
X[500:] = 1/100
X_set = [X]*1000
model.fit(X_set, min_iterations=10, max_iterations=10, verbose=True, n_jobs=-1)

model = get_model(
    ExponentialDistribution(1),
    ExponentialDistribution(100)
)
X = np.zeros((1000))
X[:500] = 1
X[500:] = 1/100
X_set = [X]*1000
model.fit(X_set, min_iterations=10, max_iterations=10, verbose=True, n_jobs=-1)

The first fit call is slow with the original pomegranate code:

[1] Improvement: 8114.942703745794	Time (s): 6.926
[2] Improvement: 0.0011873976327478886	Time (s): 8.598
[3] Improvement: 1.3969838619232178e-09	Time (s): 8.284
[4] Improvement: -6.426125764846802e-08	Time (s): 8.033
[5] Improvement: 6.472691893577576e-08	Time (s): 8.972
[6] Improvement: -6.472691893577576e-08	Time (s): 8.011
[7] Improvement: 6.472691893577576e-08	Time (s): 7.855
[8] Improvement: -6.472691893577576e-08	Time (s): 8.366
[9] Improvement: 6.472691893577576e-08	Time (s): 8.269
[10] Improvement: -6.472691893577576e-08	Time (s): 8.243
Total Training Improvement: 8114.943891080562
Total Training Time (s): 88.2151

The second fit call without IndependentComponentsDistribution is fast:

[1] Improvement: 8111.520942713367	Time (s): 0.1579
[2] Improvement: 1.6850003311410546	Time (s): 0.1475
[3] Improvement: 0.00018647708930075169	Time (s): 0.1554
[4] Improvement: 1.3271346688270569e-08	Time (s): 0.1509
[5] Improvement: -1.234002411365509e-08	Time (s): 0.1648
[6] Improvement: 2.3283064365386963e-10	Time (s): 0.1637
[7] Improvement: 0.0	Time (s): 0.1708
[8] Improvement: 0.0	Time (s): 0.1509
[9] Improvement: 0.0	Time (s): 0.1576
[10] Improvement: 0.0	Time (s): 0.1464
Total Training Improvement: 8113.206129522761
Total Training Time (s): 1.7545

With the modifications in this commit, the slow fit call becomes fast:

[1] Improvement: 8114.942703745794	Time (s): 0.1773
[2] Improvement: 0.0011873976327478886	Time (s): 0.1736
[3] Improvement: 1.3969838619232178e-09	Time (s): 0.1783
[4] Improvement: -6.426125764846802e-08	Time (s): 0.1667
[5] Improvement: 6.472691893577576e-08	Time (s): 0.1846
[6] Improvement: -6.472691893577576e-08	Time (s): 0.1884
[7] Improvement: 6.472691893577576e-08	Time (s): 0.1826
[8] Improvement: -6.472691893577576e-08	Time (s): 0.1638
[9] Improvement: 6.472691893577576e-08	Time (s): 0.1868
[10] Improvement: -6.472691893577576e-08	Time (s): 0.1695
Total Training Improvement: 8114.943891080562
Total Training Time (s): 1.9820

@ghost ghost marked this pull request as ready for review November 28, 2022 03:18
@ghost ghost changed the title refactored _log_probability functionto avoid gil in multithreading refactored _log_probability function to avoid gil in multithreading Nov 28, 2022
@jmschrei
Copy link
Owner

Thank you for your contribution. However, pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython, and so this PR is unfortunately out of date and will be closed.

@jmschrei jmschrei closed this Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant