-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
romio: make contig_access_count MPI_Count instead of int #6928
Conversation
test:mpich/ch3/tcp |
Can ROMIO now handle a single request of size > INT_MAX made from a process rank? |
It is still in testing, but it should -- at least that's the goal :) |
@wkliao ROMIO has been able to handle large sizes for a few years now. Large counts are proving a bit harder. Requires lots of changes all over ROMIO, but I think I'm close. I'll push my changes to this branch. Do you have a test case we should try out? |
Most of my tests are through PnetCDF. Because of its use of type conversion and nonblocking APIs, PnetCDF calls MPI-IO APIs with MPI_BYTE datatype for buffers. Both MPI-IO and PnetCDF need to be modernized to add large count APIs. |
9c0ae76
to
b24c4ae
Compare
I just pushed a bunch of changes, but for review only. i'm trying to figure out how |
Below is the test program with the assertion message when running 2 processes.
|
Wei-keng I always love to see new testscases: can I add this to |
Sure. |
b24c4ae
to
3b653c5
Compare
FYI. To test reads, I added a call to MPI_File_read_all right after MPI_File_write_all in the test program (and file open mode changed to MPI_MODE_RDWR)
Then, I got this error when running just one MPI process.
|
Forgive me for parachuting in, but I'd like to get this PR merged and backported to 4.2.x for a release by the end of the month. I added a commit to hopefully address the issue currently blocking @wkliao. Could you take a look? |
test:mpich/ch4/most |
Hi, @raffenet I tested your patch on a ufs using the above test program.
I can see your patch fixed file ad_read_str.c and ad_write_str.c, but the error In addition, there are other ADIO drivers that also implement ReadStrided/WriteStrided. |
Please also update file |
Thanks for testing. I also enabled |
oh man last I checked (a few years ago) there were 15,000
-Wshorten-64-to-32 warnings
…On Sun, Mar 17, 2024 at 9:05 PM Ken Raffenetti ***@***.***> wrote:
Thanks for testing. I also enabled -Wshorten-64-to-32 on my laptop and
found a bunch more issues that need fixing. Will keep plugging at it when I
have time.
—
Reply to this email directly, view it on GitHub
<#6928 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARMIENFIRAYKQQRVL3ZDJTYYZDYBAVCNFSM6AAAAABD5ANR4OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSG42DONZZGY>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
I can make ROMIO happy about this large count but not MPICH itself:
(guess I need to rebuild with debug info) |
|
0f0d6f2
to
58fae4c
Compare
still in progress but pushed some more changes to the branch |
with a debug, no optimizaiton build:
|
hm. |
58fae4c
to
ebe1ee3
Compare
pushed another round of commits that seems to make the big-count IOR workload happy. Still investigating Wei-keng's test case |
going back to "invalid request"... I've added return-type checking to every isend and irecv in that path and still get other than checking return value of MPI_Isend and MPI_Irecv, how can I confirm the returned request is valid? I'd like to assert/error/dump-core then, not at waitall time. I could change to MPI_Send and MPI_recv but I am pretty sure that will deadlock. |
test:mpich/warnings |
da803b6
to
0b59b09
Compare
test:mpich/ch3/most |
@@ -30,6 +30,8 @@ noinst_PROGRAMS = \ | |||
external32_derived_dtype \ | |||
tst_fileview \ | |||
zero_count \ | |||
large_count \ | |||
large_dtype \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big (in terms of file space and memory it needs) are these two tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the memory footprint, but the output file size is 18 GB when running on 2 MPI processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that both tests are not added to the testlist so they won't be run during CI testing. I guess that is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/usr/bin/time says large_count
needs ~26 MiB (26276 KiB) of memory (max RSS) -- that seems low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. That is how the chunking algorithm is supposed to work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, it allocates a local buffer of size ~4GB, so ~26MB does seem low
The rd_end failure is a regression from this PR:
|
ERROR("MPI_Type_free"); | ||
|
||
/* allocate a local buffer */ | ||
buf_len = (size_t) NVARS *LEN * LEN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, 1100 * 2048 * 2048 -> ~ 4GB?
User buffer malloc total size per process is 8 GB in large_dtype.c.
and
The former is for bash and latter mac. The value shown in "maximum resident set size" is the max footprint. |
/usr/bin/time -v mpiexec -n 2 large_dtype -f dummy
|
0b59b09
to
c0d63f4
Compare
test:mpich/ch3/most |
It is potentially possible to have more than INT_MAX number of noncontig segments, thus contig_access_count need be typed as MPI_Count in order to prevent overflow.
While the collective buffering routine will chunk up large requests, there is still a preliminary "what is everyone doing" step. Those exchanges might require large counts.
Derived from a pnetcdf-generated workload
These routines stored length values in int and unsigned variable types in some places. Update them to use MPI_Aint, which matches the eventual API call.
Huge patch set touching almost all of romio, but should be much fewer places where we store potentially large values in an int. Passes '-fsanitize=undefined' and also reduces '-Wshorten-64-to-32' warnings.
c0d63f4
to
dec5640
Compare
test:mpich/ch3/most |
the failing tests are mostly OS/X failing to find the right autoconf version. However, I don't know how to resolve https://jenkins-pmrs.cels.anl.gov/job/mpich-review-ch4-ucx/3351/jenkins_configure=debug,label=ubuntu22.04_review/testReport/junit/(root)/io/02482_____io_i_noncontig_coll2_4__/ -- that's a segfault in |
Let's ignore the osx tests for now. Most of the ch4 failures are expected. OFI failures are due to known performance issues in the sockets provider. The I think this PR is good to merge now. |
Pull Request Description
It is potentially possible to have more than INT_MAX number of noncontig segments, thus contig_access_count need be typed as MPI_Count to prevent overflow.
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.