Skip to content

Conversation

@hahnjo
Copy link
Member

@hahnjo hahnjo commented Aug 11, 2022

The overload taking a path opens the file and then mmap its contents. This can cause bus errors when another process truncates the file while we are trying to read it. Instead just read the first 1024 bytes, which should be enough for identify_magic to do its work.

@hahnjo hahnjo requested a review from vgvassilev August 11, 2022 15:04
@hahnjo hahnjo self-assigned this Aug 11, 2022
@hahnjo hahnjo requested a review from Axel-Naumann as a code owner August 11, 2022 15:04
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@hahnjo
Copy link
Member Author

hahnjo commented Aug 11, 2022

FWIW I managed to make llvm::identify_magic crash with a bus error like this:

#include <llvm/ADT/Twine.h>
#include <llvm/BinaryFormat/Magic.h>
#include <llvm/Support/MemoryBuffer.h>

#include <iostream>

#include <unistd.h>

int main(int argc, char *argv[]) {
  if (argc < 2) {
    std::cerr << "Usage: " << argv[0] << " <file>" << std::endl;
    return 1;
  }

  llvm::Twine Path(argv[1]);

  auto FileOrError =
#ifdef FIXED
    llvm::MemoryBuffer::getFile(Path, /*IsText=*/false,
                                /*RequiresNullTerminator=*/true,
                                /*IsVolatile=*/true);
#else
    llvm::MemoryBuffer::getFile(Path, /*IsText=*/false,
                                /*RequiresNullTerminator=*/false);
#endif
  if (!FileOrError) {
    std::cerr << "Problem reading '" << argv[1] << "'!" << std::endl;
    return 1;
  }

  std::unique_ptr<llvm::MemoryBuffer> FileBuffer = std::move(*FileOrError);
  sleep(5);
  llvm::file_magic Magic = llvm::identify_magic(FileBuffer->getBuffer());

  std::cout << "Magic = " << Magic << std::endl;

  return 0;
}

Compile this into an executable, and then make a copy to use as input. While the program is sleeping, call truncate -s 0 on the input and it should crash. (I hope this is also what happens in Jenkins with the random crashes we've been seeing...) The version with FIXED calls getFile with IsVolatile=true (and RequiresNullTerminator=true) to avoid using mmap and the crash. For Cling, I decided to just go with a std::ifstream to read the first 128 bytes.

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

@pcanal
Copy link
Member

pcanal commented Aug 11, 2022

For Cling, I decided to just go with a std::ifstream to read the first 128 bytes.

what is the advantage over the solution in the reproducer you have (using llvm::MemoryBuffer::getFile with different arguments)?

@vgvassilev
Copy link
Member

@hahnjo, this looks good to me but can we not fix the llvm::identify_magic as well or instead?

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@hahnjo
Copy link
Member Author

hahnjo commented Aug 11, 2022

For Cling, I decided to just go with a std::ifstream to read the first 128 bytes.

what is the advantage over the solution in the reproducer you have (using llvm::MemoryBuffer::getFile with different arguments)?

It's plain C++ and we can guarantee that it's not using mmap. For llvm::MemoryBuffer::getFile I had to look at the source code because I would have argued that IsVolatile=true is enough, but it also requires RequiresNullTerminator=true to read in the file contents immediately. Additionally, getFile would try to read in the entire file, which we don't need here (and the reason why mmap is so efficient).

@hahnjo
Copy link
Member Author

hahnjo commented Aug 11, 2022

@hahnjo, this looks good to me but can we not fix the llvm::identify_magic as well or instead?

The question is whether llvm::identify_magic should be prepared to deal with files that are modified concurrently. As far as I understand, Cling uses this code path to auto-detect libraries and distinguish them from random binary files. If the answer is yes, then I can try to fix llvm::identify_magic upstream so we get it with some future LLVM upgrade.

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@vgvassilev
Copy link
Member

@hahnjo, this looks good to me but can we not fix the llvm::identify_magic as well or instead?

The question is whether llvm::identify_magic should be prepared to deal with files that are modified concurrently. As far as I understand, Cling uses this code path to auto-detect libraries and distinguish them from random binary files. If the answer is yes, then I can try to fix llvm::identify_magic upstream so we get it with some future LLVM upgrade.

I think this is a valid use-case. Moreover, we are planning to upstream that functionality from cling which suggests which library is needed on a missing symbol to the jit infrastructure.

The overload taking a path opens the file and then mmap its contents.
This can cause bus errors when another process truncates the file
while we are trying to read it. Instead just read the first 1024 bytes,
which should be enough for identify_magic to do its work.
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@hahnjo
Copy link
Member Author

hahnjo commented Aug 12, 2022

@hahnjo, this looks good to me but can we not fix the llvm::identify_magic as well or instead?

The question is whether llvm::identify_magic should be prepared to deal with files that are modified concurrently. As far as I understand, Cling uses this code path to auto-detect libraries and distinguish them from random binary files. If the answer is yes, then I can try to fix llvm::identify_magic upstream so we get it with some future LLVM upgrade.

I think this is a valid use-case. Moreover, we are planning to upstream that functionality from cling which suggests which library is needed on a missing symbol to the jit infrastructure.

Ok, I've added a TODO comment.

@vgvassilev
Copy link
Member

@hahnjo, this looks good to me but can we not fix the llvm::identify_magic as well or instead?

The question is whether llvm::identify_magic should be prepared to deal with files that are modified concurrently. As far as I understand, Cling uses this code path to auto-detect libraries and distinguish them from random binary files. If the answer is yes, then I can try to fix llvm::identify_magic upstream so we get it with some future LLVM upgrade.

I think this is a valid use-case. Moreover, we are planning to upstream that functionality from cling which suggests which library is needed on a missing symbol to the jit infrastructure.

Ok, I've added a TODO comment.

Note that we can fix it now, and backport the change. Otherwise I fear the TODO will just stay forever there.

@hahnjo
Copy link
Member Author

hahnjo commented Aug 12, 2022

Ok, I've added a TODO comment.

Note that we can fix it now, and backport the change. Otherwise I fear the TODO will just stay forever there.

We could, but I don't think this is the right approach: We're in the middle of an LLVM upgrade already and any change to LLVM prevents distributions from building against their vanilla LLVM libraries in the future. Moreover, I'm still not 100% convinced that it should be the original llvm::identify_magic that needs to handle this situation. In principle this would mean that mmap must never be used in LLVM since any file could randomly be truncated.

I can remove the TODO comment again if you prefer, I only added it because it sounded like you want a reminder.

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@vgvassilev
Copy link
Member

Ok, I've added a TODO comment.

Note that we can fix it now, and backport the change. Otherwise I fear the TODO will just stay forever there.

We could, but I don't think this is the right approach: We're in the middle of an LLVM upgrade already and any change to LLVM prevents distributions from building against their vanilla LLVM libraries in the future. Moreover, I'm still not 100% convinced that it should be the original llvm::identify_magic that needs to handle this situation. In principle this would mean that mmap must never be used in LLVM since any file could randomly be truncated.

I can remove the TODO comment again if you prefer, I only added it because it sounded like you want a reminder.

Thanks for the clarification. Let's keep it as it is now.

@hahnjo hahnjo merged commit 588e13c into root-project:master Aug 12, 2022
@hahnjo hahnjo deleted the cling-identify branch August 12, 2022 12:14
@phsft-bot
Copy link

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@phsft-bot
Copy link

Build failed on ROOT-debian10-i386/soversion.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:48:11.413Z] CMake Error at /home/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

@phsft-bot
Copy link

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:48:31.328Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

@phsft-bot
Copy link

Build failed on ROOT-performance-centos8-multicore/cxx17.
Running on olbdw-01.cern.ch:/data/sftnight/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:48:57.282Z] CMake Error at /data/sftnight/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

@phsft-bot
Copy link

Build failed on ROOT-ubuntu2004/python3.
Running on root-ubuntu-2004-3.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:49:05.340Z] CMake Error at /home/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

@phsft-bot
Copy link

Build failed on mac1015/cxx17.
Running on macitois22.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:49:10.128Z] CMake Error at /Volumes/HD2/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

@phsft-bot
Copy link

Build failed on mac11/cxx14.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:49:19.465Z] CMake Error at /Users/sftnight/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

@phsft-bot
Copy link

Build failed on ROOT-ubuntu18.04/nortcxxmod.
Running on sft-ubuntu-1804-3.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Errors:

  • [2022-08-15T11:50:06.246Z] CMake Error at /mnt/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1088 (message):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants