New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heads up - MKL error in parallel mode #444

Closed
satra opened this Issue Oct 10, 2014 · 10 comments

Comments

Projects
None yet
4 participants
@satra
Member

satra commented Oct 10, 2014

latest anaconda on centos 6.5

Intel MKL ERROR: Parameter 4 was incorrect on entry to DGELSD.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.

i believe this happened in either:

response, ratio = auto_response(gtab, data, roi_radius=10, fa_thr=0.7)

or peaks from model

for csd.

i had to move on, so i switched off parallel. at present this is just a heads up, but if i get a chance over the weekend i'll provide a better diagnosis.

@satra

This comment has been minimized.

Member

satra commented Oct 10, 2014

error happening somewhere in here.

^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
/om/user/satra/projects/mgh_35subjects/scripts/run_dipy_decon.py in <module>()
     77                          min_separation_angle=25,
     78                          parallel=num_threads > 1,
---> 79                          nbr_processes=num_threads)
     80 
     81 from dipy.reconst.dti import TensorModel

/om/user/satra/envs/testenv/lib/python2.7/site-packages/dipy/reconst/peaks.pyc in peaks_from_model(model, data, sphere, relative_peak_threshold, min_separation_angle, mask, return_odf, return_sh, gfa_thr, normalize_peaks, sh_order, sh_basis_type, npeaks, B, invB, parallel, nbr_processes)
    432             continue
    433 
--> 434         odf = model.fit(data[idx]).odf(sphere)
    435 
    436         if return_sh:

/om/user/satra/envs/testenv/lib/python2.7/site-packages/dipy/reconst/multi_voxel.pyc in new_fit(self, data, mask)
     16         # If only one voxel just return a normal fit
     17         if data.ndim == 1:
---> 18             return single_voxel_fit(self, data)
     19 
     20         # Make a mask if mask is None

/om/user/satra/envs/testenv/lib/python2.7/site-packages/dipy/reconst/csdeconv.pyc in fit(self, data)
    132         X = self.R.diagonal() * self.B_dwi
    133         shm_coeff, num_it = csdeconv(dwi_data, self.sh_order, X, self.B_reg,
--> 134                                      self.lambda_, self.tau)
    135         return SphHarmFit(self, shm_coeff, None)
    136 

/om/user/satra/envs/testenv/lib/python2.7/site-packages/dipy/reconst/csdeconv.pyc in csdeconv(dwsignal, sh_order, X, B_reg, lambda_, tau)
    430         # measurement, which can help to better estimate the fodf_sh, even if
    431         # you have more SH coeffcients to estimate than actual S measurements.
--> 432         M = np.concatenate((X, lambda_ * B_reg[k, :]))
    433         S = np.concatenate((dwsignal, np.zeros(k.shape)))
    434         try:
@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 10, 2014

I also have that problem fro mtiem to time, I don't think it's related to multiprocessing but rather the csd badly behaving on problematic, non converging voxels. Try without the parallel to see if it works or not (or did it work?).

@satra

This comment has been minimized.

Member

satra commented Oct 10, 2014

didn't work without parallel either. so not a parallel issue

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 10, 2014

Can you share with us the datasets which had the problem?

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 10, 2014

Yeah, if it's what I think it's a low level call error in blas/lapack because the data is misbehaved. No idea if it's fixable though.

Are you running it in a brain mask? The background can do weird stuff sometimes.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 10, 2014

Yeah this is a blas error. @satra do you get this problem only when you use MKL?

@samuelstjean

This comment has been minimized.

Contributor

samuelstjean commented Oct 10, 2014

I also had it on my computer, and I'm running ordinary blas, so it's not an mkl specific issue.

@arokem

This comment has been minimized.

Member

arokem commented Oct 10, 2014

Seems like a case that should be tested and have some reasonable error
handling, similar to: #439

On Fri, Oct 10, 2014 at 11:47 AM, Samuel St-Jean notifications@github.com
wrote:

I also had it on my computer, and I'm running ordinary blas, so it's not
an mkl specific issue.


Reply to this email directly or view it on GitHub
#444 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Oct 10, 2014

Yeah, sure. Please let's do that asap (before the release).

@satra

This comment has been minimized.

Member

satra commented Oct 11, 2014

it all works now. here are the updates:

  1. data - i can't redistribute this according to their terms - but this was tested on one of the mgh diffusion datasets (now available through the washu hcp website). note: that the fsl corrected gradient vectors don't work with dipy - the non-fsl vectors work fine - this might be an issue on the dataset side.
  2. i initially selected only volumes that had non-zero b-values. adding just one b=0 volume helped solve the primary issue. if this is a requirement, perhaps this can be checked in the function.

closing this for now.

@satra satra closed this Oct 11, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment