-
Notifications
You must be signed in to change notification settings - Fork 930
Topic/fcoll fixes #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic/fcoll fixes #773
Conversation
|
@edgargabriel could you please review this ? here is a test file #include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <mpi.h>
int buf1[4] = {0, 1, 2, 3};
int buf2[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8 , 9};
int buf[4] = {-1, -1, -1, -1};
int main (int argc, char *argv[]) {
int rank, size;
int fd;
MPI_File file;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
fd = open("f1.data", O_WRONLY|O_TRUNC|O_CREAT, 0600);
write (fd, buf1, 4*sizeof(int));
close(fd);
fd = open("f2.data", O_WRONLY|O_TRUNC|O_CREAT, 0600);
write (fd, buf2, 10*sizeof(int));
close(fd);
MPI_File_open(MPI_COMM_WORLD, "f1.data", MPI_MODE_RDONLY, MPI_INFO_NULL, &file);
MPI_File_set_view(file, 0, MPI_INT, MPI_INT, "native", MPI_INFO_NULL);
MPI_File_read_ordered(file,buf,1, MPI_INT,MPI_STATUS_IGNORE);
MPI_File_close(&file);
if (rank != buf[0]) {
fprintf(stderr, "%d/%d: got %d instead of %d\n", rank, size, buf[0], rank);
}
MPI_File_open(MPI_COMM_WORLD, "f2.data", MPI_MODE_RDONLY, MPI_INFO_NULL, &file);
MPI_File_set_view(file, 0, MPI_INT, MPI_INT, "native", MPI_INFO_NULL);
MPI_File_read_ordered(file,buf,rank+1, MPI_INT,MPI_STATUS_IGNORE);
MPI_File_close(&file);
switch(rank) {
case 0:
if (0 != buf[0])
fprintf(stderr, "%d/%d: got %d instead of %d\n", rank, size, buf[0], 0);
break;
case 1:
if (1 != buf[0] || 2 != buf[1])
fprintf(stderr, "%d/%d: got %d,%d instead of %d,%d\n", rank, size, buf[0], buf[1], 1, 2);
break;
case 2:
if (3 != buf[0] || 4 != buf[1] || 5 != buf[2])
fprintf(stderr, "%d/%d: got %d,%d,%d instead of %d,%d,%d\n", rank, size, buf[0], buf[1], buf[2], 3, 4, 5);
break;
case 3:
if (6 != buf[0] || 7 != buf[1] || 8 != buf[2] || 9 != buf[3])
fprintf(stderr, "%d/%d: got %d,%d,%d,%d instead of %d,%d,%d,%d\n", rank, size, buf[0], buf[1], buf[2], buf[3], 6, 7, 8, 9);
break;
}
MPI_Finalize();
return 0;
}first commit fixes mpirun -np 1 a.out as a side note, ompio is failing several test from the mpi_test_suite (error or hang) |
|
@ggouaillardet can I ask what the motivation for both fixes are? I originally deliberatly set the individual component to also be selected for the size=2 case (after some performance evaluation done on two separate systems). And for the shared file pointer fix, was there a bug, i.e. the test shown above does not work? I will look into it. I will still look into them to see whether there is something new, we used to pass everything that was reasonable. Regarding the mpi_test_suite bugs, I debugged them during the spring quite a bit, and unfortunately the tests themselves are buggy and cannot be used for correctness evaluation in many cases, e..g the test_suite sets a file view with gaps, and reads them back with the default file view (i.e. without gaps) and expects the result to be correct. That cannot work. |
|
I see the problem. Your first fix is mostly correct, the only comment that I have is that the same change needs to be done to the sm and the other sharedfp components too. The second fix does not truly solve the problem, it circumvents it only. There is a fundamental issue when using the individual fcoll component form shared file pointer operations. Let me think there about a solution before we commit something |
|
ok, now I need the help of a git/gihub expert. I know that its easy to add another commit to the pr , e.g. the ones that are required for changing the other sharedfp components. We can also add a commit for the individual component (I found a simple solution). But the question is how to remove your second commit from this pr? |
|
@edgargabriel and I are meeting via webex in a few hours to go through this stuff. |
|
@ggouaillardet I created a new pr #779 please have a look at it, it contains your commit 3d1780f . If it works, I would suggest to close this one. |
|
@edgargabriel i misunderstood about the second commit, and naively thought individual module was only good for communicators of size 1. so you are right, the second commit just hide the issue, and you fixed it in #779 yes, mpi_test_suite is broken for "non dense" datatypes (e.g. MPI_DOUBLE_INT, MPI_CHAR_INT, ...). with romio (e.g. from now, i will close this PR and merge #779 fwiw, here is how i use git to review PR in this case, it did not work because i already have topic/fcoll_fixes branches in my local and remote (ggouaillardet) repository. |
…atus yalla: fix valgrind error due to uninitialized status field.
No description provided.