Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sct_fmri_moco and sct_dmri_moco are slow #947

Closed
jcohenadad opened this issue Jul 27, 2016 · 8 comments · Fixed by #2642
Closed

sct_fmri_moco and sct_dmri_moco are slow #947

jcohenadad opened this issue Jul 27, 2016 · 8 comments · Fixed by #2642
Labels
enhancement category: improves performance/results of an existing feature good first issue Issues that new contributors can easily work on priority:HIGH sct_dmri_moco context: sct_fmri_moco context:

Comments

@jcohenadad
Copy link
Member

jcohenadad commented Jul 27, 2016

When running it in some stations, e.g., joplin, motion correction is very slow. Is it related to the numerous system calls? To the slow imports when using network account? To the slow i/o? We should investigate the bottleneck and fix it.

E.g., batch_processing on joplin during sct_fmri_moco (note the 62.39s/iter):

Z=0/0:  22%|#######4                          | 11/50 [11:07<40:33, 62.39s/iter]

EDIT 2020-03-19 16:16:13:

  • No problem on abbey: ~3s/iter
  • No problem on bireli: ~5s/iter
  • No problem on ferguson: ~2.5iter/s (VERY FAST!)
  • No problem on parker: 1.44s/iter
@jcohenadad jcohenadad added tests context: unit, integration, or functional tests priority:LOW travis labels Jul 27, 2016
@peristeri peristeri removed the travis label May 9, 2017
@jcohenadad jcohenadad changed the title testing_fmri_moco is slow sct_fmri_moco and sct_dmri_moco are slow May 25, 2019
@jcohenadad jcohenadad added enhancement category: improves performance/results of an existing feature sct_dmri_moco context: sct_fmri_moco context: and removed priority:LOW tests context: unit, integration, or functional tests labels May 25, 2019
@jcohenadad jcohenadad added the good first issue Issues that new contributors can easily work on label Jul 24, 2019
@jcohenadad
Copy link
Member Author

Priority changed to HIGH because of recent user report

@jcohenadad
Copy link
Member Author

Further investigations on joplin:
the bottleneck is isct_antsSliceRegularizedRegistration. Interestingly, when looking at how CPU jobs are distributed, instead of having one core getting all the load, the jobs are distributed across all cores (128 in the case of joplin):

Screen Shot 2020-03-22 at 11 00 31 AM

The user reporting this issue is also having a lot of distributed cores (n=64), so I am wondering if this issue could be related to the presence of "a lot of cores", not being used properly.

One first thing to investigate is the forcing of ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1 before running ANTs function, which was implemented to address #201

@jcohenadad
Copy link
Member Author

Tried to change ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS with values 4 and 64: it did not change anything in terms of total and individual CPU load (showed with htop) and in terms of iter/s (still around 70 iter/s).

@jcohenadad
Copy link
Member Author

Compared isct_antsRegistration between joplin and my mac, with different transforms:

sct_register_multimodal -i mt1.nii.gz -d mt0.nii.gz -param step=1,type=im,algo=translation,iter=50:step=2,type=im,algo=syn,iter=10:step=3,type=im,algo=bsplinesyn,iter=10 -v 2

Both systems gave similar computational time across transforms.

@jcohenadad
Copy link
Member Author

However, then trying specifically with isct_antsSliceRegularizedRegistration, the difference is striking:

sct_register_multimodal -i mt1.nii.gz -d mt0.nii.gz -param step=1,type=im,algo=slicereg,iter=50 -v 2

On mac: 5s
On Linux (joplin): 70s

So, there is clearly something wrong with the Linux binary. I'll explore older versions that were shipped for Linux...

@jcohenadad
Copy link
Member Author

jcohenadad commented Mar 22, 2020

Re-built ANTs binaries on Ubuntu 16.04 and OSX. Good news is: processing time is much faster with the recent built on Ubuntu 16.04:

ANTs Version: 2.3.1.dev145-g7ed2b

time antsSliceRegularizedRegistration -t 'Translation[0.5]' -m 'MeanSquares[dest_RPI_crop.nii,src_reg_crop.nii,1,4,Regular,0.2]' -p 5 -i 50 -f 1 -s 0 -v 1 -o '[step1,src_reg_crop_regStep1-new.nii]'
Loop0 polyerr: 0.0995983 image-metric 0.225547
Loop1 polyerr: 0.00827533 image-metric 0.225138
real	0m4.414s
user	0m24.775s
sys	0m12.730s

ANTs Version: 2.1.0.post359-g3df42

time antsSliceRegularizedRegistration -t 'Translation[0.5]' -m 'MeanSquares[dest_RPI_crop.nii,src_reg_crop.nii,1,4,Regular,0.2]' -p 5 -i 50 -f 1 -s 0 -v 1 -o '[step1,src_reg_crop_regStep1.nii]'
Loop0 polyerr: 0.139908 image-metric 0.226171
 polyx -0.496811 1.50435 -0.990867 -0.177755 0.221031 iceptx 0.0654782
 polyy -1.83812 12.1132 -25.9729 23.1219 -7.39511 icepty -0.0206768
Loop1 polyerr: 0 image-metric 0.226171
 polyx -0.496811 1.50435 -0.990867 -0.177755 0.221031 iceptx 0.0654782
 polyy -1.83812 12.1132 -25.9729 23.1219 -7.39511 icepty -0.0206768

real	1m5.092s
user	0m43.118s
sys	2m8.049s

However, there is a much larger difference in results between OSX and Linux for the more recent version. Version 2.1.0.post359-g3df42 shows a difference at the 10^-14 decimal while version 2.3.1.dev145-g7ed2b shows a difference at the 10^-2 decimal.

Data + Results here:
antsSliceRegularizedRegistration.zip

@jcohenadad
Copy link
Member Author

jcohenadad commented Mar 22, 2020

Now testing integrity with batch_processing.sh, starting from latest changes in the moco functions (#2634), using commit 4b05905

--> looks fine

@jcohenadad
Copy link
Member Author

To follow-up on this issue of test-retest variation, this is now fixed by specifying "None" in sampling strategy, as instructed by the ANTs folks: ANTsX/ANTs#976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement category: improves performance/results of an existing feature good first issue Issues that new contributors can easily work on priority:HIGH sct_dmri_moco context: sct_fmri_moco context:
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants