Skip to content

Swift: open(2) interception #10447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 20, 2022
Merged

Swift: open(2) interception #10447

merged 4 commits into from
Sep 20, 2022

Conversation

AlexDenisov
Copy link
Contributor

No description provided.

@AlexDenisov AlexDenisov force-pushed the alexdenisov/open-interception branch from 9c5d078 to 4dba121 Compare September 16, 2022 09:57
@AlexDenisov AlexDenisov marked this pull request as ready for review September 16, 2022 09:57
@AlexDenisov AlexDenisov requested review from a team as code owners September 16, 2022 09:57
@AlexDenisov AlexDenisov force-pushed the alexdenisov/open-interception branch from 4dba121 to c638789 Compare September 16, 2022 10:02
@redsun82
Copy link
Contributor

We probably need a quick description for the PR, but if you prefer as I'm writing an internal document for this, I can later copy & paste the relevant part in the PR description (even after we merge it)

Copy link
Contributor

@redsun82 redsun82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I think we should at least solve the possible use-after-destructor problem, the other comments are not really required to merge (especially if we later do a cross-platform thing without fishhook)


namespace codeql {

static std::string scratchDir;
Copy link
Contributor

@redsun82 redsun82 Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a use-after-destructor problem with this, in case open is called after exit by some other destructor of an object with static storage duration. Even though this is not happening now, we cannot really guarantee this won't happen. For example, what if the swift library initializes some non-POD static global that writes down diagnostics at the end of the program?

We could either:

  • undo the interception more or less at the time of calling remapArtifacts, putting original_open back to open
  • use a static const char* for the scratchDir, making codeql_open auto fallback on original_open if scratchDir == nullptr, and set it back to nullptr in remapArtifacts. The SwiftExtractorConfiguration object owning the scratch dir string will be alive between calls to initInterception and remapArtifacts

(or maybe both for maximum cleanness)

I'm always wary of using non-const globals (which is probably unavoidable here because of having to interface with C via free functions), but the wariness doubles when using non-POD globals (const or non const) because of the subtle bugs that can happen during end of program destructor calls. So if there is a way to avoid that I would rather take it, and go for the second option above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very good point and it would've been a nightmare to catch/debug afterwards!

However, SwiftExtractorConfiguration doesn't really own the scratch dir as we changed tempArtifactDir to be a function returning a new string every time it's called 😅. We can easily workaround this limitation, but IMO having a char * is way too fragile.

I'll see what's the best way to rebind the functions back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point on the const char lifetime!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you go for a RAII class you could use the lifetime of that class to make sure the raw pointer is alive while required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit with the options and the best suggestion I have is to add a boolean flag, though as I'm writing I'm wondering if it should be an atomic boolean? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, does llvm or the swift frontend fire up threads? Even if they do, are they still not joined when we execute our code?

Another option could be maybe to make scratchDir indestructible. Declare it as

static const std::string* scratchDir = nullptr;

and initialize it with

scratchDir = new std::string(dir);

and just fuggedaboutit... but I'm ok with this solution. We will probably need to come back to this code if we generalize the interception to Linux.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all the available options are more like "which footgun shall we pick" 😄

Comment on lines 8 to 9
void initInterception(const std::string& dir);
void remapArtifacts(const std::unordered_map<std::string, std::string>& mapping);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should consider an alternative API using RAII, something like

class Interceptor {
  const std::unordered_map<std::string, std::string>& mapping;

 public:
  // does initInterception, stores reference to mapping
  Interceptor(const std::string& dir, const std::unordered_map<std::string, std::string>& mapping); 

  // does remapArtifacts using stored mapping, does cleanup if needed
  ~Interceptor();
};

The mapping does not even need to be filled at construction time.

@@ -55,3 +56,11 @@ def codeql_workspace(repository_name = "codeql"):
"https://github.com/bazelbuild/rules_python/archive/refs/tags/0.8.1.tar.gz",
],
)

new_git_repository(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using http_archive with GitHubs codeload service, i.e.
https://github.com/facebook/fishhook/archive/aadc161ac3b80db07a9908851839a17ba63a9eb1.zip
tends to be much faster than using git to pull in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, this is not a drop-in replacement as I'm getting include path errors:

  /usr/bin/sandbox-exec -f /private/var/tmp/_bazel_alexdenisov/5115cfbf6c0d699ed00ee2a590e1d941/sandbox/darwin-sandbox/46/sandbox.sb /var/tmp/_bazel_alexdenisov/install/0c7899cb691a00c6ca493ede5765e1af/process-wrapper '--timeout=0' '--kill_delay=15' external/local_config_cc/wrapped_clang_pp '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -O0 -DDEBUG '-std=c++11' 'DEBUG_PREFIX_MAP_PWD=.' -iquote . -iquote bazel-out/darwin-fastbuild/bin -iquote external/swift_prebuilt_darwin_x86_64 -iquote bazel-out/darwin-fastbuild/bin/external/swift_prebuilt_darwin_x86_64 -iquote external/fishhook -iquote bazel-out/darwin-fastbuild/bin/external/fishhook -Ibazel-out/darwin-fastbuild/bin/external/swift_prebuilt_darwin_x86_64/_virtual_includes/swift-llvm-support -Ibazel-out/darwin-fastbuild/bin/external/fishhook/_virtual_includes/fishhook -MD -MF bazel-out/darwin-fastbuild/bin/swift/extractor/remapping/_objs/remapping/SwiftOpenInterception.macOS.d '-frandom-seed=bazel-out/darwin-fastbuild/bin/swift/extractor/remapping/_objs/remapping/SwiftOpenInterception.macOS.o' -isysroot __BAZEL_XCODE_SDKROOT__ -F__BAZEL_XCODE_SDKROOT__/System/Library/Frameworks -F__BAZEL_XCODE_DEVELOPER_DIR__/Platforms/MacOSX.platform/Developer/Library/Frameworks '-mmacosx-version-min=12.3' -no-canonical-prefixes -pthread '-std=c++17' -fno-rtti '-arch=x86_64' -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -target x86_64-apple-macosx -c swift/extractor/remapping/SwiftOpenInterception.macOS.cpp -o bazel-out/darwin-fastbuild/bin/swift/extractor/remapping/_objs/remapping/SwiftOpenInterception.macOS.o)
clang++: warning: argument unused during compilation: '-arch=x86_64' [-Wunused-command-line-argument]
swift/extractor/remapping/SwiftOpenInterception.macOS.cpp:2:10: fatal error: 'fishhook.h' file not found
#include <fishhook.h>
         ^~~~~~~~~~~~
1 error generated.
Error in child process '/usr/bin/xcrun'. 1
Target //swift/extractor:extractor failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3.137s, Critical Path: 2.37s
INFO: 6 processes: 6 internal.
FAILED: Build did NOT complete successfully

With a broken symbolic link:

> file bazel-out/darwin-fastbuild/bin/external/fishhook/_virtual_includes/fishhook/fishhook.h
bazel-out/darwin-fastbuild/bin/external/fishhook/_virtual_includes/fishhook/fishhook.h: broken symbolic link to /private/var/tmp/_bazel_alexdenisov/5115cfbf6c0d699ed00ee2a590e1d941/external/fishhook/fishhook.h

The link points to a wrong place as there is an extra folder after archive extraction:

> ls /private/var/tmp/_bazel_alexdenisov/5115cfbf6c0d699ed00ee2a590e1d941/external/fishhook/
BUILD.bazel  fishhook-aadc161ac3b80db07a9908851839a17ba63a9eb1  WORKSPACE

With fishhook-aadc161ac3b80db07a9908851839a17ba63a9eb1 being the actual repo root:

> ls /private/var/tmp/_bazel_alexdenisov/5115cfbf6c0d699ed00ee2a590e1d941/external/fishhook/fishhook-aadc161ac3b80db07a9908851839a17ba63a9eb1
CODE_OF_CONDUCT.md  CONTRIBUTING.md  fishhook.c  fishhook.h  fishhook.podspec  LICENSE  README.md

Any ideas what I'm doing wrong?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to set the strip_prefix option to (presumably) strip_prefix = "fishhook-aadc161ac3b80db07a9908851839a17ba63a9eb1" - see the WORKSPACE.bazel file in semmle-code for examples where we already do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did the trick 🎉

@redsun82 redsun82 self-requested a review September 20, 2022 08:39

namespace codeql {

static std::string scratchDir;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, does llvm or the swift frontend fire up threads? Even if they do, are they still not joined when we execute our code?

Another option could be maybe to make scratchDir indestructible. Declare it as

static const std::string* scratchDir = nullptr;

and initialize it with

scratchDir = new std::string(dir);

and just fuggedaboutit... but I'm ok with this solution. We will probably need to come back to this code if we generalize the interception to Linux.

Copy link
Collaborator

@criemen criemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bazel changes LGTM

@AlexDenisov AlexDenisov merged commit addab09 into main Sep 20, 2022
@AlexDenisov AlexDenisov deleted the alexdenisov/open-interception branch September 20, 2022 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants