Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocked LU with StarPU-MPI #7

Closed
TommyUW opened this issue Feb 19, 2023 · 6 comments
Closed

Blocked LU with StarPU-MPI #7

TommyUW opened this issue Feb 19, 2023 · 6 comments
Labels
bug Something isn't working question Further information is requested

Comments

@TommyUW
Copy link

TommyUW commented Feb 19, 2023

To whom it may concern:
. I am trying to run the example of MPI blocked LU with StarPU. However, the running time kept increasing as the number of processes increased and sometimes even if the time decreased, it didn’t decrease significantly. What did this happen?
StarPU mpi blocked LU

@sthibaul sthibaul added question Further information is requested bug Something isn't working labels Feb 20, 2023
@sthibaul
Copy link
Collaborator

It depends a lot on the details of the platform, your execution script, etc.

Your output also shows a very low GFlop/s result, so something really odd seems to be happening on your machines.

@TommyUW
Copy link
Author

TommyUW commented Feb 21, 2023

It depends a lot on the details of the platform, your execution script, etc.

Your output also shows a very low GFlop/s result, so something really odd seems to be happening on your machines.

It depends a lot on the details of the platform, your execution script, etc.

Your output also shows a very low GFlop/s result, so something really odd seems to be happening on your machines.

Thank you for your reply. I run this code on my laptop, my groupmate's laptop and the cluster machine in the lab. However, all of them showed that the running time became slower as the number of process increased.
Here are all the commands that I have input before running this example:
Install MPI
Setting up StarPU:
$ apt-cache search starpu
$ sudo apt-get install libstarpu-1.3 libstarpu-dev
$ ./autogen.sh
$ ./configure
$ mkdir build
$ cd build
$ ./configure
$ make
$ make install
$ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$STARPU_PATH/lib/pkgconfig
Then I got into the mpi file in starpu-master.
I run the code of mpi_lu with the command: mpirun -n 4 ./plu_example_double 8 -size 4096 -nblocks 8 -p 4 -q 1.
If the value of q is more than 1, then the running time is longer.
Besides, I changed the thread of OpenMP with the commad: export OMP_NUM_THREADS = <num_of_threads> from 1 to 16. The running time still didn't shorten.
Would you please read the procedure above? Are there any steps that I am missing? Thank you very much.

@sthibaul
Copy link
Collaborator

I run this code on my laptop, my groupmate's laptop and the cluster machine in the lab

But are you sure that they do get used? Does top show that they indeed get to use ample CPU time there, and no other program is running?

-size 4096

This size is quite small, better use larger matrices.

I changed the thread of OpenMP with the commad: export OMP_NUM_THREADS = <num_of_threads> from 1 to 16. The running time still didn't shorten.

The lu example does not support parallel tasks, so the number of openmp thread should be kept to 1.

You can also check with fxt traces whether the load balance is correct.

@TommyUW
Copy link
Author

TommyUW commented Feb 22, 2023

Thank you for your reply.
We are sure that no other program is running on our machines.
We tried to run a matix with 40960x40960 this time. However, the performance is still not good:
single process: 20230.978518
two processes: 837848.915379
four processes: 541028.325787
We have tested matrixes with various of sizes. But no matter what sizes of them are, the running time of two processes is always slower than the single process.
The load balance is correct. This whole set of program we used is included in the example file in StarPU, which program is writtern by StarPU developers.
We basically followed the StarPU manual and installed everything it required. We were able to compile and run StarPU program successfully. However, the performance of all LU program from the MPI_StarPU example wasn't good.

@nfurmento
Copy link
Member

nfurmento commented Feb 22, 2023 via email

@sthibaul
Copy link
Collaborator

We are sure that no other program is running on our machines

Yes, but are you sure that you are really using the different CPU cores?

You can try to e.g. run /bin/hostname to make sure that the different MPI ranks actually go to different machines, for a start.

@nfurmento nfurmento closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants