Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace manual mmap with llvm::MemoryBuffer #1032

Merged
merged 1 commit into from
Mar 24, 2020

Conversation

DavidTruby
Copy link
Collaborator

Fixes #840

@DavidTruby
Copy link
Collaborator Author

@sscalpone could you check if this has the defective fd behaviour you mentioned on the mailing list? I don't believe it should as llvm::MemoryBuffer doesn't keep the fd around after it has mmapped a file.

@DavidTruby DavidTruby force-pushed the memorybuffer branch 2 times, most recently from 370c29b to 4430289 Compare February 27, 2020 16:10
@klausler
Copy link
Collaborator

@sscalpone could you check if this has the defective fd behaviour you mentioned on the mailing list? I don't believe it should as llvm::MemoryBuffer doesn't keep the fd around after it has mmapped a file.

If the open file descriptor is closed, then the file is not mapped into the virtual address space.

@DavidTruby
Copy link
Collaborator Author

If the open file descriptor is closed, then the file is not mapped into the virtual address space.

Per the POSIX standard, this isn't the case. See https://pubs.opengroup.org/onlinepubs/7908799/xsh/mmap.html:
" The mmap() function adds an extra reference to the file associated with the file descriptor fildes which is not removed by a subsequent close() on that file descriptor. This reference is removed when there are no more mappings to the file. "

@@ -33,7 +33,7 @@ add_library(FortranParser
)

target_link_libraries(FortranParser
FortranCommon
FortranCommon LLVMSupport
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General style seems to be one entry per library/file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@klausler
Copy link
Collaborator

If the open file descriptor is closed, then the file is not mapped into the virtual address space.

Per the POSIX standard, this isn't the case. See https://pubs.opengroup.org/onlinepubs/7908799/xsh/mmap.html:
" The mmap() function adds an extra reference to the file associated with the file descriptor fildes which is not removed by a subsequent close() on that file descriptor. This reference is removed when there are no more mappings to the file. "

So the limit on the number of simultaneous open file descriptors won't be a problem? That's my only concern here. Have you tested this with a large number of INCLUDE files?

@DavidTruby
Copy link
Collaborator Author

So the limit on the number of simultaneous open file descriptors won't be a problem? That's my only concern here. Have you tested this with a large number of INCLUDE files?

It shouldn't be a problem, as we shouldn't have that many fds open at any one time. I don't have a test case that would have enough files to reach the fd limit though.

Clang and Swift both use this MemoryBuffer class for their source file managing so I assume this has already been considered. We should check it though, I'll see if I can randomly generate a case.

@DavidTruby
Copy link
Collaborator Author

DavidTruby commented Feb 27, 2020

We should check it though, I'll see if I can randomly generate a case.

I've tried it with a file that INCLUDEs 10,000 other files, which is well over my fd limit per process on my system. I don't get any errors and get correct output from -fget-symbols-sources.

@@ -29,12 +29,13 @@ const SourceFile *Parsing::Prescan(const std::string &path, Options options) {
}
}

std::stringstream fileError;
std::string fileError_buf;
llvm::raw_string_ostream fileError{fileError_buf};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is llvm::raw_string_ostream better than std::stringstream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of llvm::raw_ostream over std::ostream is mandated for new code going in to llvm and this has been bought up on the mailing list with respect to f18, so if our goal is to submit to llvm need to change over.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good reason not to use std::stringstream and it's not clear that's what that link says. Right above there it says:

Note that using the other stream headers (<sstream> for example) is not problematic in this regard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does say that sstream is not problematic in the regard that it doesn't introduce global objects with non-static constructors. However, it goes on to say:

New code should always use raw_ostream for writing

Which I think is fairly clear, and we certainly are new code from LLVM's perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the context of the quote "New code should always use...." pertains to writing files. In this case, stringstream seems better and appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was specifically clarified by Hal on a community call a few weeks ago, that the policy is that the use of any of the stream libraries including std::stringstream is highly discouraged for new code.

Could you explain why stringstream is better and more appropriate here? std::stringstream and llvm::raw_string_ostream have the same interface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why stringstream is better and more appropriate here?

It's part of the language.

std::stringstream and llvm::raw_string_ostream have the same interface.

This change shows a case where that's not the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why stringstream is better and more appropriate here?

And llvm::raw_string_ostream is a part of the library of the project we are supposed to be a part of, and is preferred by that project for a number of technical reasons that are outlined in the documentation. I fail to see how this makes std::stringstream better.

This change shows a case where that's not the case.

Are you referring to the fact that a separate buffer needs to be stored here? That is a minor change in the construction of the class but the observable interface as used here is identical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on the flang technical call on March 9 and agreed:

  • The LLVM Coding Guidelines are unclear on whether or not sstream should be allowed
  • The long-time LLVM community folks felt that the intention is for sstream not to be used at all in preference to LLVM APIs.
  • We would make the change that David suggests in F18
  • Johannes would start a thread to clarify the wording in the LLVM Coding Standards to make clear the intent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of std::stringstream in Polly was removed: llvm/llvm-project@0e93f3b.

Comment on lines 577 to 578
std::string error_buf;
llvm::raw_string_ostream error{error_buf};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string error_buf;
llvm::raw_string_ostream error{error_buf};
std::string errorBuf;
llvm::raw_string_ostream error{errorBuf};

Comment on lines 61 to 63
bool is_dir = false;
auto er = llvm::sys::fs::is_directory(path, is_dir);
if (!er && !is_dir) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool is_dir = false;
auto er = llvm::sys::fs::is_directory(path, is_dir);
if (!er && !is_dir) {
bool isDir{false};
auto er{llvm::sys::fs::is_directory(path, isDir)};
if (!er && !isDir) {

@DavidTruby DavidTruby changed the title Replaced manual mmap with llvm::MemoryBuffer Replace manual mmap with llvm::MemoryBuffer Mar 2, 2020
}
return wrote;
std::size_t RemoveCarriageReturns(llvm::MutableArrayRef<char> buf) {
auto end = llvm::remove_if(buf, [](const char c) { return c == '\r'; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the lambda is inlined? If not, we should probably check the performance vs the original loop & memmove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original loop actually has O(n^2) complexity if I am not reading it wrong, whereas the complexity of remove_if is O(n). Regarding the specific question though, the lambda does get inlined by both compilers I've tested it on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think that the loop is O(n**2)? It makes one pass over the source, one iteration per \r, and moves each of the other bytes exactly once. And the new version doesn't use memchr, which has been highly optimized for SIMD. I'd like to see actual measured performance comparisons before signing off on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memmove is an O(n) function and is called inside an O(n) loop. In the worst case where every character is a carriage return n^2 operations will have been performed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(#of lines) * O(average line length) == O(#of lines) * O(#total bytes / #lines) == O(#total bytes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we move forward here?

Clearly we don't want to degrade performance unnecessarily, but that is not the only concern. The coding style of the proposed new implementation is certainly closer to what the LLVM community would expect and this particular code has been requested to be re-written - http://lists.llvm.org/pipermail/llvm-dev/2020-February/139464.html - before we upstream.

Given the data shown, I don't think it is clear that David's proposed new implementation is systematically worse than the current implementation. It seems that results vary depending probably on how optimised your C and C++ standard libraries are for the system you are on,

If we agree on that, I think we should make the proposed change on the grounds of coding style alignment to LLVM to unblock upstreaming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealing with DOS line endings is something that we have to do most often on Windows than elsewhere, so the x86 timings seem to indicate to me that we'll want to retain the fast approach in NVIDIA's product sources. You can do whatever you think you have to in your LLVM fork.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in order to make any statements about windows we would have to test there rather than Linux; their C and C++ standard library implementations are completely different to the ones on Linux and therefore very likely to have different performance characteristics. In addition my macOS results still show remove_if as being faster than the hand rolled function here, and that's also on x86. Again I suspect that this is because the C standard library on macOS is different to on Linux.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does also have to run on every platform, regardless of whether the input actually has carriage returns or not. So the performance matters everywhere not just on windows on x86.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sscalpone
So that we can progress the rest of the patch, David has removed this change from the review. Can we merge as-is?

@DavidTruby
Copy link
Collaborator Author

@sscalpone @tskeith is this ready to merge now?

lib/Parser/source.cpp Outdated Show resolved Hide resolved
@DavidTruby DavidTruby force-pushed the memorybuffer branch 2 times, most recently from 8c6701a to 8d882cc Compare March 19, 2020 17:31
@DavidTruby
Copy link
Collaborator Author

@sscalpone rebased on top of the LLVM streams patch

@sscalpone
Copy link
Member

sscalpone commented Mar 22, 2020

@DavidTruby This PR causes a fatal check when compiling Nyx/Exec/Scaling in my sandbox. I don't know if the problem is with the PR or if the PR is exposing a different issue.

fatal internal error: CHECK(range_.Contains(at)) failed at /home/sjs/work/pgi/f18/pr1032/f18/lib/Parser/provenance.cpp(389)

I can't share the Nyx source tree that I'm using; however, I think it is based on this:
https://github.com/AMReX-Astro/Nyx/tree/master/Exec/Scaling.

The failing file is zero length.

% pwd
.../pr1032/build
% touch zero.f90
% ls -al zero.f90
-rw-rw-r-- 1 xxx yy 0 Mar 22 08:46 zero.f90
% tools/f18/bin/f18 zero.f90

fatal internal error: CHECK(range_.Contains(at)) failed at /home/sjs/work/pgi/f18/pr1032/f18/lib/Parser/provenance.cpp(389)
Aborted

@DavidTruby
Copy link
Collaborator Author

@DavidTruby This PR causes a fatal check when compiling Nyx/Exec/Scaling in my sandbox. I don't know if the problem is with the PR or if the PR is exposing a different issue.

fatal internal error: CHECK(range_.Contains(at)) failed at /home/sjs/work/pgi/f18/pr1032/f18/lib/Parser/provenance.cpp(389)

I can't share the Nyx source tree that I'm using; however, I think it is based on this:
https://github.com/AMReX-Astro/Nyx/tree/master/Exec/Scaling.

The failing file is zero length.

% pwd
.../pr1032/build
% touch zero.f90
% ls -al zero.f90
-rw-rw-r-- 1 xxx yy 0 Mar 22 08:46 zero.f90
% tools/f18/bin/f18 zero.f90

fatal internal error: CHECK(range_.Contains(at)) failed at /home/sjs/work/pgi/f18/pr1032/f18/lib/Parser/provenance.cpp(389)
Aborted

@sscalpone I seem to be able to reproduce this with empty files. I'll fix it and add a lit test for empty files.

@DavidTruby
Copy link
Collaborator Author

@sscalpone Should be fixed now. I've left it as a separate commit so you can review the fix, please let me know if it's ok to squash.

Copy link
Member

@sscalpone sscalpone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidTruby Please squash. Thanks!

The previous code had handling for cases when too many file descriptors may be
opened; this is not necessary with MemoryBuffer as the file descriptors are
closed after the mapping occurs. MemoryBuffer also internally handles the case
where a file is small and therefore an mmap is bad for performance; such files
are simply copied to memory after being opened.

Many places elsewhere in the code assume that the buffer is not empty, and the
old file opening code handles this by replacing an empty file with a buffer
containing a single newline. That behavior is now kept in the new MemoryBuffer
based code.
@DavidTruby
Copy link
Collaborator Author

DavidTruby commented Mar 24, 2020

@sscalpone I've squashed this to one commit and written a more descriptive commit message

@sscalpone sscalpone merged commit 35f7def into flang-compiler:master Mar 24, 2020
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Apr 9, 2020
The previous code had handling for cases when too many file descriptors may be
opened; this is not necessary with MemoryBuffer as the file descriptors are
closed after the mapping occurs. MemoryBuffer also internally handles the case
where a file is small and therefore an mmap is bad for performance; such files
are simply copied to memory after being opened.

Many places elsewhere in the code assume that the buffer is not empty, and the
old file opening code handles this by replacing an empty file with a buffer
containing a single newline. That behavior is now kept in the new MemoryBuffer
based code.

Original-commit: flang-compiler/f18@d34df84
Reviewed-on: flang-compiler/f18#1032
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Apr 9, 2020
…morybuffer

Replace manual mmap with llvm::MemoryBuffer

Original-commit: flang-compiler/f18@35f7def
Reviewed-on: flang-compiler/f18#1032
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
The previous code had handling for cases when too many file descriptors may be
opened; this is not necessary with MemoryBuffer as the file descriptors are
closed after the mapping occurs. MemoryBuffer also internally handles the case
where a file is small and therefore an mmap is bad for performance; such files
are simply copied to memory after being opened.

Many places elsewhere in the code assume that the buffer is not empty, and the
old file opening code handles this by replacing an empty file with a buffer
containing a single newline. That behavior is now kept in the new MemoryBuffer
based code.

Original-commit: flang-compiler/f18@d34df84
Reviewed-on: flang-compiler/f18#1032
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Fortran::parser::SourceFile::ReadFile with llvm::sys::fs::readNativeFile
7 participants