Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel resampling of observation distributions #9

Closed
wants to merge 12 commits into from

Conversation

inti
Copy link

@inti inti commented Apr 17, 2013

Hi Matt,
I have created a resample_model_parallel2 function which resamples the observation distributions in parrallel. This was at the cost of pushing the model into the engines an additional time. However, on my test this had very minor overload. I have added code to parallel.py .. that was easy!!! given the code you had in there already.

In my initial test with Nmax = 100 the execution time came down to half :)

i did not see a way to parallelise the sampling of the trans_dist but let me know if you think it is possible I can work on it. resampling of the initial state is super quick anyways so no point in speeding that up.

Thanks a lot!

Inti

@inti
Copy link
Author

inti commented Apr 17, 2013

A bit more on timings.
on larger piece of data with 11 data sequences the resample_model_parallel2 function is about 10-15% faster but both parallel functions are about 5 times slower than the non-parallel version ...

i am looking more into this.

@mattjj
Copy link
Owner

mattjj commented Apr 17, 2013

I like your pull request! Should I merge it now, or wait for these general timing issues to be tracked down?

That line shouldn't be reloading the data; it should just be building up new states objects given the state sequences returned and the data that is already loaded locally. It should just be copying pointers.

Is that line definitely slow? If it is then I'm wrong about something… I suppose instead of building a new object we could re-use the old one and just set its stateseq pointer; that might avoid some object-creation overhead, though it would be good to understand if the object creation is really the problem.

On Apr 17, 2013, at 8:36 AM, Inti Pedroso notifications@github.com wrote:

I think I may have found the issue.

def _build_states_parallel(self,states_to_resample):
from pyhsmm import parallel
raw_stateseq_tuples = parallel.build_states.map([s.data_id for s in states_to_resample])
for data_id, stateseq in raw_stateseq_tuples:
self.add_data(data=parallel.alldata[data_id],stateseq=stateseq)
self.states_list[-1].data_id = data_id
is it necessary to reload the data each time the statesequence is updated? I would have though it is enough with

self.states_list[data_id].stateseq=stateseq
I think this is making this slower.
Inti


Reply to this email directly or view it on GitHub.

@inti
Copy link
Author

inti commented Apr 17, 2013

hi,

self.add_data(data=parallel.alldata[data_id],stateseq=stateseq)

it is not really the problem I think.
pushing the data into 8 local engines takes 0.02 seconds. so that is not the issue neither.

I still wonder why the parallel code, either the original or the one above, are slower. If I resample the obsv_distns in parallel and the states locally then it takes 50 % than the non-parallel version and ~ 1/3 of the parallel ones.
I have already increased the amount of data and it does not change the picture much.

I also tried with

def rss(s):
    s.resample()
    return s
parallel.dv.map_sync(rss, [ s for s in negative_binomial_model.states_list ])

but I am getting the following error

[0:apply]: 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)<string> in <module>()
<ipython-input-2-37caae605d17> in rss(s)
/home/inti/APP/pyhsmm/internals/states.pyc in resample(self)
    103 
    104     def resample(self):
--> 105         betal = self.messages_backwards()
    106         self.sample_forwards(betal)
    107 
/home/inti/APP/pyhsmm/internals/states.pyc in messages_backwards(self)
     81 
     82     def messages_backwards(self):
---> 83         return self._messages_backwards(self.model.trans_distn.A,self.aBl)
     84 
     85     @staticmethod
/home/inti/APP/pyhsmm/internals/states.pyc in _messages_backwards(trans_matrix, log_likelihoods)
    169         betal = np.zeros((T,M))
    170 
--> 171         scipy.weave.inline(hmm_messages_backwards_codestr,['AT','betal','aBl','T','M'],
    172                 headers=['<Eigen/Core>'],include_dirs=[eigen_path],
    173                 extra_compile_args=['-O3','-DNDEBUG'])
NameError: global name 'hmm_messages_backwards_codestr' is not defined

Do you have any idea why the parallel implementations are taking longer or where the bottleneck could be?
I think it is perhaps better not to merge yet, just in case there is another way of making this better.
All the best. Inti

@inti
Copy link
Author

inti commented Apr 17, 2013

The above error seems to do with namespace issues with iPython parallel.
it it snot solved by adding

@lbv.parallel(block=True)
@interactive
def rss(s):
    global hmm_messages_backwards_codestr, eigen_path
    s.resample()
    return s

to the parallel.py
Inti

@inti
Copy link
Author

inti commented Apr 17, 2013

defining the function like

@lbv.parallel(block=True)
@interactive
def rss(s):
    global global_model
    global_model.states_list[0].resample()
    #s.resample()
    return global_model.states_list[0]

(using index 0 just for testing)

also spits out the same error

[3:apply]: 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)<string> in <module>()
/home/inti/My_Soft/pyhsmm/parallel.py in rss(s)
     56 def rss(s):
     57     global global_model
---> 58     global_model.states_list[0].resample()
     59     #s.resample()
     60     return global_model.states_list[0]
/home/inti/My_Soft/pyhsmm/internals/states.pyc in resample(self)
    103 
    104     def resample(self):
--> 105         betal = self.messages_backwards()
    106         self.sample_forwards(betal)
    107 
/home/inti/My_Soft/pyhsmm/internals/states.pyc in messages_backwards(self)
     81 
     82     def messages_backwards(self):
---> 83         return self._messages_backwards(self.model.trans_distn.A,self.aBl)
     84 
     85     @staticmethod
/home/inti/My_Soft/pyhsmm/internals/states.pyc in _messages_backwards(trans_matrix, log_likelihoods)
    169         betal = np.zeros((T,M))
    170 
--> 171         scipy.weave.inline(hmm_messages_backwards_codestr,['AT','betal','aBl','T','M'],
    172                 headers=['<Eigen/Core>'],include_dirs=[eigen_path],
    173                 extra_compile_args=['-O3','-DNDEBUG'])
NameError: global name 'hmm_messages_backwards_codestr' is not defined

I am not sure whether this strategy would be faster or not, is your expectation that will not?

I am trying to approach the problem in different ways ... I just cannot get why if one splits the states re sampling across engines it is slower than doing it sequentially on a single engine.

Inti

@mattjj
Copy link
Owner

mattjj commented Apr 17, 2013

I think this might be a bug I introduced in the rewrite on Saturday. Basically, the string is no longer copied into the states instances, and therefore not pushed along when the global model is distributed.

I think I know a quick fix. Hold on and I'll put it into mattjj/master...

On Apr 17, 2013, at 10:22 AM, Inti Pedroso notifications@github.com wrote:

defining the function like

@lbv.parallel(block=True)
@Interactive
def rss(s):

global global_model

global_model.states_list[0].resample()

#s.resample()

return global_model.states_list[0](using index 0 just for testing)

also spits out the same error

NameError Traceback (most recent call last) in ()
/home/inti/My_Soft/pyhsmm/parallel.py in rss(s)

56 def rss(s):

57 global global_model
---> 58 global_model.states_list[0].resample()

59 #s.resample()

60 return global_model.states_list[0]
/home/inti/My_Soft/pyhsmm/internals/states.pyc in resample(self)

103

104 def resample(self):
--> 105 betal = self.messages_backwards()

106 self.sample_forwards(betal)

107

/home/inti/My_Soft/pyhsmm/internals/states.pyc in messages_backwards(self)

81

82 def messages_backwards(self):
---> 83 return self._messages_backwards(self.model.trans_distn.A,self.aBl)

84

85 @staticmethod
/home/inti/My_Soft/pyhsmm/internals/states.pyc in _messages_backwards(trans_matrix, log_likelihoods)

169 betal = np.zeros((T,M))

170

--> 171 scipy.weave.inline(hmm_messages_backwards_codestr,['AT','betal','aBl','T','M'],

172 headers=['<Eigen/Core>'],include_dirs=[eigen_path],

173 extra_compile_args=['-O3','-DNDEBUG'])
NameError: global name 'hmm_messages_backwards_codestr' is not defined
I am not sure whether this strategy would be faster or not, is your expectation that will not?

I am trying to approach the problem in different ways ... I just cannot get why if one splits the states re sampling across engines it is slower than doing it sequentially on a single engine.

Inti


Reply to this email directly or view it on GitHub.

@mattjj
Copy link
Owner

mattjj commented Apr 17, 2013

Hmm, actually there's another issue to worry about: the eigen_path variable for the engine processes needs to be set correctly (I don't remember how I did it before or whether it was even done right; maybe my tests were just getting lucky). I have some stuff to this morning so I can't fix it right away; maybe I can fix it this afternoon or tonight. Sorry!

For the time being you might be able to work on a version from before Saturday afternoon.

On Apr 17, 2013, at 10:34 AM, Matthew Johnson mattjj@mit.edu wrote:

I think this might be a bug I introduced in the rewrite on Saturday. Basically, the string is no longer copied into the states instances, and therefore not pushed along when the global model is distributed.

I think I know a quick fix. Hold on and I'll put it into mattjj/master...

On Apr 17, 2013, at 10:22 AM, Inti Pedroso notifications@github.com wrote:

defining the function like

@lbv.parallel(block=True)
@Interactive
def rss(s):

global global_model

global_model.states_list[0].resample()

#s.resample()

return global_model.states_list[0](using index 0 just for testing)

also spits out the same error

NameError Traceback (most recent call last) in ()
/home/inti/My_Soft/pyhsmm/parallel.py in rss(s)

56 def rss(s):

57 global global_model
---> 58 global_model.states_list[0].resample()

59 #s.resample()

60 return global_model.states_list[0]
/home/inti/My_Soft/pyhsmm/internals/states.pyc in resample(self)

103

104 def resample(self):
--> 105 betal = self.messages_backwards()

106 self.sample_forwards(betal)

107

/home/inti/My_Soft/pyhsmm/internals/states.pyc in messages_backwards(self)

81

82 def messages_backwards(self):
---> 83 return self._messages_backwards(self.model.trans_distn.A,self.aBl)

84

85 @staticmethod
/home/inti/My_Soft/pyhsmm/internals/states.pyc in _messages_backwards(trans_matrix, log_likelihoods)

169 betal = np.zeros((T,M))

170

--> 171 scipy.weave.inline(hmm_messages_backwards_codestr,['AT','betal','aBl','T','M'],

172 headers=['<Eigen/Core>'],include_dirs=[eigen_path],

173 extra_compile_args=['-O3','-DNDEBUG'])
NameError: global name 'hmm_messages_backwards_codestr' is not defined
I am not sure whether this strategy would be faster or not, is your expectation that will not?

I am trying to approach the problem in different ways ... I just cannot get why if one splits the states re sampling across engines it is slower than doing it sequentially on a single engine.

Inti


Reply to this email directly or view it on GitHub.

@inti
Copy link
Author

inti commented Apr 17, 2013

Thanks for looking into this. I have reverted to Sat morning code. cheers. Inti

@mattjj
Copy link
Owner

mattjj commented Apr 18, 2013

I think 0344fa0, which is the new master branch head, might fix the issue, but I haven't had a chance to test it. Let me know if you can give it a shot.

I got rid of the use_eigen stuff, so you should remove those calls from your script files. To use the Eigen-backed classes, just instantiate a models.HMMEigen object (the same way you used to instantiate a models.HMM object). The examples/hmm.py file should make it more clear. That stuff also isn't very well tested at the moment, so let me know if anything blows up.

@inti
Copy link
Author

inti commented Apr 19, 2013

cool!, i'll try an I'll let you know.
thanks a lot
Inti
On 18 Apr 2013, at 19:31, Matthew Johnson notifications@github.com wrote:

I think 0344fa0, which is the new master branch head, might fix the issue, but I haven't had a chance to test it. Let me know if you can give it a shot.

I got rid of the use_eigen stuff, so you should remove those calls from your script files. To use the Eigen-backed classes, just instantiate a models.HMMEigen object (the same way you used to instantiate a models.HMM object). That stuff also isn't very well tested at the moment, so let me know if anything blows up.


Reply to this email directly or view it on GitHub.

@inti
Copy link
Author

inti commented Apr 19, 2013

Hi,
I got this error when I tried to run something with the latest version

Traceback (most recent call last):
  File "cnv_calling.py", line 11, in <module>
    import pyhsmm
  File "/home/inti/My_Soft/pyhsmm/__init__.py", line 2, in <module>
    import models
  File "/home/inti/My_Soft/pyhsmm/models.py", line 8, in <module>
    import basic.distributions
  File "/home/inti/My_Soft/pyhsmm/basic/distributions.py", line 59, in <module>
    NegativeBinomialFixedRDuration = _start_at_one(NegativeBinomialFixedR)
NameError: name 'NegativeBinomialFixedR' is not defined

Let me know what you think.

BW, Inti
PS: and have a nice weekend!

@mattjj
Copy link
Owner

mattjj commented Apr 19, 2013

Try running git submodule update --recursive

@inti
Copy link
Author

inti commented Apr 19, 2013

that did the trick. carry on testing it ... thanks!
On 19/04/13 16:17, Matthew Johnson wrote:

Try running |git submodule update --recursive|


Reply to this email directly or view it on GitHub
#9 (comment).

@inti
Copy link
Author

inti commented Apr 19, 2013

i'll finish to test this on monday and let you know about more timing
experiments.
BW and 've a nice weekend.
Inti
On 19/04/13 16:17, Matthew Johnson wrote:

Try running |git submodule update --recursive|


Reply to this email directly or view it on GitHub
#9 (comment).

@inti
Copy link
Author

inti commented Apr 30, 2013

Hi,
I have now tested this, not timed it throughly though, and it seems to work fine. it would be great if you can merge now.
In my quick timing it was a bit faster than using a single CPU but it seems to get better the more sequences one has. so I will test more with this to try to work our when it makes sense to use this approach.
thanks a lot!

Inti

@inti
Copy link
Author

inti commented Apr 30, 2013

hang on I found a minor problem

@inti
Copy link
Author

inti commented Apr 30, 2013

Hi,
I had a minor bug but it is OK know.
Let me know what you think.

BW, Inti

for i, idx in enumerate(idx_to_resample):
self.states_list[idx] = copy.deepcopy(resampled_stateseqs[i])
parallel.c.purge_results('all')
parallel.dv.results.clear()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having serious trouble with memory build up and this line was left behind from some attempts to deal with it. you may want not to merge it for the moment.

numtoresample = len(self.states_list)
elif numtoresample == 'engines':
numtoresample = len(parallel.dv)
if numtoresample > len(self.states_list):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number to resample is more than the number of sequence you get a error on line 155

@mattjj
Copy link
Owner

mattjj commented Jul 29, 2013

I just re-did the parallelism stuff so that this kind of thing should be easier and more robust. Are you still working on this?

@inti
Copy link
Author

inti commented Jul 29, 2013

Hi,
yes, still working on this but has somehow gone down on the priority list until a few weeks time.

@mattjj
Copy link
Owner

mattjj commented Feb 18, 2014

The code has changed a lot since these issues were active, so I'm closing them.

@mattjj mattjj closed this Feb 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants