-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Swift: open(2) interception #10447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Swift: open(2) interception #10447
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
c638789
Swift: open(2) interception
AlexDenisov d6d8480
Swift: fix internal builds
AlexDenisov 3c12644
Swift: add a guard around hashing to aboid use-after-destructor
AlexDenisov 9401eda
Swift: use http_archive instead of new_git_repository since it's faster
AlexDenisov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
load("//swift:rules.bzl", "swift_cc_library") | ||
|
||
swift_cc_library( | ||
name = "remapping", | ||
srcs = select({ | ||
"@platforms//os:linux": [ | ||
"SwiftOpenInterception.Linux.cpp", | ||
], | ||
"@platforms//os:macos": [ | ||
"SwiftOpenInterception.macOS.cpp", | ||
], | ||
}), | ||
hdrs = glob(["*.h"]), | ||
visibility = ["//swift:__subpackages__"], | ||
deps = [ | ||
"//swift/tools/prebuilt:swift-llvm-support", | ||
] + select({ | ||
"@platforms//os:linux": [], | ||
"@platforms//os:macos": [ | ||
"@fishhook//:fishhook", | ||
], | ||
}), | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#include "swift/extractor/remapping/SwiftOpenInterception.h" | ||
|
||
namespace codeql { | ||
// TBD | ||
void initRemapping(const std::string& dir) {} | ||
void finalizeRemapping(const std::unordered_map<std::string, std::string>& mapping) {} | ||
|
||
} // namespace codeql |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#pragma once | ||
|
||
#include <string> | ||
#include <unordered_map> | ||
|
||
namespace codeql { | ||
|
||
void initRemapping(const std::string& dir); | ||
void finalizeRemapping(const std::unordered_map<std::string, std::string>& mapping); | ||
|
||
} // namespace codeql |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
#include "swift/extractor/remapping/SwiftOpenInterception.h" | ||
#include <fishhook.h> | ||
#include <llvm/Support/raw_ostream.h> | ||
#include <llvm/Support/FileSystem.h> | ||
#include <llvm/Support/Path.h> | ||
#include <fcntl.h> | ||
#include <unistd.h> | ||
|
||
namespace codeql { | ||
|
||
static std::string scratchDir; | ||
static bool interceptionEnabled = false; | ||
|
||
static int (*original_open)(const char*, int, ...) = nullptr; | ||
|
||
static std::string fileHash(const std::string& filename) { | ||
int fd = original_open(filename.c_str(), O_RDONLY); | ||
if (fd == -1) { | ||
return {}; | ||
} | ||
auto maybeMD5 = llvm::sys::fs::md5_contents(fd); | ||
close(fd); | ||
if (!maybeMD5) { | ||
return {}; | ||
} | ||
return maybeMD5->digest().str().str(); | ||
} | ||
|
||
static int codeql_open(const char* path, int oflag, ...) { | ||
va_list ap = {0}; | ||
mode_t mode = 0; | ||
if ((oflag & O_CREAT) != 0) { | ||
// mode only applies to O_CREAT | ||
va_start(ap, oflag); | ||
mode = va_arg(ap, int); | ||
va_end(ap); | ||
} | ||
|
||
std::string newPath(path); | ||
|
||
if (interceptionEnabled && llvm::sys::fs::exists(newPath)) { | ||
// TODO: check file magic instead | ||
if (llvm::StringRef(newPath).endswith(".swiftmodule")) { | ||
auto hash = fileHash(newPath); | ||
auto hashed = scratchDir + "/" + hash; | ||
if (!hash.empty() && llvm::sys::fs::exists(hashed)) { | ||
newPath = hashed; | ||
} | ||
} | ||
} | ||
|
||
return original_open(newPath.c_str(), oflag, mode); | ||
} | ||
|
||
void finalizeRemapping(const std::unordered_map<std::string, std::string>& mapping) { | ||
for (auto& [original, patched] : mapping) { | ||
// TODO: Check file magic instead | ||
if (!llvm::StringRef(original).endswith(".swiftmodule")) { | ||
continue; | ||
} | ||
auto hash = fileHash(original); | ||
auto hashed = scratchDir + "/" + hash; | ||
if (!hash.empty() && llvm::sys::fs::exists(patched)) { | ||
if (std::error_code ec = llvm::sys::fs::create_link(/* from */ patched, /* to */ hashed)) { | ||
llvm::errs() << "Cannot remap file '" << patched << "' -> '" << hashed | ||
<< "': " << ec.message() << "\n"; | ||
} | ||
} | ||
} | ||
interceptionEnabled = false; | ||
} | ||
|
||
void initRemapping(const std::string& dir) { | ||
scratchDir = dir; | ||
|
||
struct rebinding binding[] = { | ||
{"open", reinterpret_cast<void*>(codeql_open), reinterpret_cast<void**>(&original_open)}}; | ||
rebind_symbols(binding, 1); | ||
interceptionEnabled = true; | ||
} | ||
|
||
} // namespace codeql |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
cc_library( | ||
name = "fishhook", | ||
srcs = glob(["*.c"]), | ||
hdrs = glob(["*.h"]), | ||
strip_include_prefix = ".", | ||
visibility = ["//visibility:public"], | ||
) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be a use-after-destructor problem with this, in case
open
is called afterexit
by some other destructor of an object with static storage duration. Even though this is not happening now, we cannot really guarantee this won't happen. For example, what if the swift library initializes some non-POD static global that writes down diagnostics at the end of the program?We could either:
remapArtifacts
, puttingoriginal_open
back toopen
static const char*
for thescratchDir
, makingcodeql_open
auto fallback onoriginal_open
ifscratchDir == nullptr
, and set it back tonullptr
inremapArtifacts
. TheSwiftExtractorConfiguration
object owning the scratch dir string will be alive between calls toinitInterception
andremapArtifacts
(or maybe both for maximum cleanness)
I'm always wary of using non-const globals (which is probably unavoidable here because of having to interface with C via free functions), but the wariness doubles when using non-POD globals (const or non const) because of the subtle bugs that can happen during end of program destructor calls. So if there is a way to avoid that I would rather take it, and go for the second option above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very good point and it would've been a nightmare to catch/debug afterwards!
However,
SwiftExtractorConfiguration
doesn't really own the scratch dir as we changedtempArtifactDir
to be a function returning a new string every time it's called 😅. We can easily workaround this limitation, but IMO having achar *
is way too fragile.I'll see what's the best way to rebind the functions back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point on the const char lifetime!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you go for a RAII class you could use the lifetime of that class to make sure the raw pointer is alive while required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played a bit with the options and the best suggestion I have is to add a boolean flag, though as I'm writing I'm wondering if it should be an atomic boolean? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, does llvm or the swift frontend fire up threads? Even if they do, are they still not joined when we execute our code?
Another option could be maybe to make
scratchDir
indestructible. Declare it asand initialize it with
and just fuggedaboutit... but I'm ok with this solution. We will probably need to come back to this code if we generalize the interception to Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all the available options are more like "which footgun shall we pick" 😄