Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASAN] Add "sanitized_padded_global" llvm ir attribute to identify sanitizer instrumented globals #68865

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

skc7
Copy link
Contributor

@skc7 skc7 commented Oct 12, 2023

#70166 requires a way to identify the globals instrumented by asan pass.
This patch introduces LLVM IR attribute "sanitized_padded_global", which can be attached to globals by asan pass.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 12, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-ir

Author: Chaitanya (skc7)

Changes

#66666 requires a way to identify the globals instrumented by asan pass.
This patch introduces LLVM IR attribute "asan_instrumented", which can be attached to globals by asan pass.


Full diff: https://github.com/llvm/llvm-project/pull/68865.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+1)
  • (modified) llvm/include/llvm/IR/Attributes.h (+3)
  • (modified) llvm/include/llvm/IR/Attributes.td (+3)
  • (modified) llvm/lib/AsmParser/LLLexer.cpp (+1)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+7)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+2)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+2)
  • (modified) llvm/lib/IR/Attributes.cpp (+7)
diff --git a/llvm/include/llvm/Bitcode/LLVMBitCodes.h b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
index 52e76356a892e45..bb15e01d08d0670 100644
--- a/llvm/include/llvm/Bitcode/LLVMBitCodes.h
+++ b/llvm/include/llvm/Bitcode/LLVMBitCodes.h
@@ -713,6 +713,7 @@ enum AttributeKindCodes {
   ATTR_KIND_SKIP_PROFILE = 85,
   ATTR_KIND_MEMORY = 86,
   ATTR_KIND_NOFPCLASS = 87,
+  ATTR_KIND_ASAN_INSTRUMENTED = 88,
 };
 
 enum ComdatSelectionKindCodes {
diff --git a/llvm/include/llvm/IR/Attributes.h b/llvm/include/llvm/IR/Attributes.h
index db33b5400471685..eb02158f15a1254 100644
--- a/llvm/include/llvm/IR/Attributes.h
+++ b/llvm/include/llvm/IR/Attributes.h
@@ -251,6 +251,9 @@ class Attribute {
   /// Return the FPClassTest for nofpclass
   FPClassTest getNoFPClass() const;
 
+  /// Return if global variable is instrumented by AddrSanitizer.
+  bool isAsanInstrumented() const;
+
   /// The Attribute is converted to a string of equivalent mnemonic. This
   /// is, presumably, for writing out the mnemonics for the assembly writer.
   std::string getAsString(bool InAttrGrp = false) const;
diff --git a/llvm/include/llvm/IR/Attributes.td b/llvm/include/llvm/IR/Attributes.td
index aba1d718f7f72f9..1f81435d677d687 100644
--- a/llvm/include/llvm/IR/Attributes.td
+++ b/llvm/include/llvm/IR/Attributes.td
@@ -273,6 +273,9 @@ def SanitizeHWAddress : EnumAttr<"sanitize_hwaddress", [FnAttr]>;
 /// MemTagSanitizer is on.
 def SanitizeMemTag : EnumAttr<"sanitize_memtag", [FnAttr]>;
 
+/// Attribute to identify global variables instrumented by AddrSanitizer.
+def AsanInstrumented : EnumAttr<"asan_instrumented", []>;
+
 /// Speculative Load Hardening is enabled.
 ///
 /// Note that this uses the default compatibility (always compatible during
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 466bdebc001f589..d3cf6ce8626f921 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -582,6 +582,7 @@ lltok::Kind LLLexer::LexIdentifier() {
   KEYWORD(no_sanitize_address);
   KEYWORD(no_sanitize_hwaddress);
   KEYWORD(sanitize_address_dyninit);
+  KEYWORD(asan_instrumented);
 
   KEYWORD(ccc);
   KEYWORD(fastcc);
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index 04eabc94cfc6abe..4e49f38b16b9c1b 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -1525,6 +1525,13 @@ bool LLParser::parseFnAttributeValuePairs(AttrBuilder &B,
       continue;
     }
 
+    if (Token == lltok::kw_asan_instrumented) {
+      if (parseToken(lltok::kw_asan_instrumented, "expected 'asan_instrumented'"))
+        return true;
+      B.addAttribute(Attribute::AsanInstrumented);
+      continue;
+    }
+
     SMLoc Loc = Lex.getLoc();
     if (Token == lltok::kw_builtin)
       BuiltinLoc = Loc;
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 1d1ec988a93d847..ff6e42970b9a8d4 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1980,6 +1980,8 @@ static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
     return Attribute::NoSanitizeCoverage;
   case bitc::ATTR_KIND_NULL_POINTER_IS_VALID:
     return Attribute::NullPointerIsValid;
+  case bitc::ATTR_KIND_ASAN_INSTRUMENTED:
+    return Attribute::AsanInstrumented;
   case bitc::ATTR_KIND_OPT_FOR_FUZZING:
     return Attribute::OptForFuzzing;
   case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index e991d055f33474b..0209882cc33f20e 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -789,6 +789,8 @@ static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
     return bitc::ATTR_KIND_SANITIZE_THREAD;
   case Attribute::SanitizeMemory:
     return bitc::ATTR_KIND_SANITIZE_MEMORY;
+  case Attribute::AsanInstrumented:
+    return bitc::ATTR_KIND_ASAN_INSTRUMENTED;
   case Attribute::SpeculativeLoadHardening:
     return bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING;
   case Attribute::SwiftError:
diff --git a/llvm/lib/IR/Attributes.cpp b/llvm/lib/IR/Attributes.cpp
index 3d89d18e5822bee..7cef414f53ec357 100644
--- a/llvm/lib/IR/Attributes.cpp
+++ b/llvm/lib/IR/Attributes.cpp
@@ -408,6 +408,10 @@ FPClassTest Attribute::getNoFPClass() const {
   return static_cast<FPClassTest>(pImpl->getValueAsInt());
 }
 
+bool Attribute::isAsanInstrumented() const {
+  return hasAttribute(Attribute::AsanInstrumented);
+}
+
 static const char *getModRefStr(ModRefInfo MR) {
   switch (MR) {
   case ModRefInfo::NoModRef:
@@ -562,6 +566,9 @@ std::string Attribute::getAsString(bool InAttrGrp) const {
     return Result;
   }
 
+  if (hasAttribute(Attribute::AsanInstrumented))
+    return "asan_instrumented";
+
   // Convert target-dependent attributes to strings of the form:
   //
   //   "kind"

@github-actions
Copy link

github-actions bot commented Oct 12, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@skc7 skc7 changed the title [WIP][ASAN] Add "asan_instrumented" llvm ir attribute to identify AddressSanitizer instrumented globals [ASAN] Add "asan_instrumented" llvm ir attribute to identify AddressSanitizer instrumented globals Oct 12, 2023
@hctim
Copy link
Collaborator

hctim commented Nov 1, 2023

GlobalVariable sanitization is done as a subset of SanitizerMetadata (what you've touched here is the sanitizer attributes for functions). You can take a look at 8db981d which should guide you to the right IR and bitcode printing/parsing logic.

Other than that, can you please make the "we increased the global variable size" names non-asan-specific? We also increase GV size on -fsanitize=hwaddress and -fsanitize=memtag-globals as well. Maybe an appropriate name is sanitized_padded_global (or something like that).

@skc7
Copy link
Contributor Author

skc7 commented Nov 2, 2023

GlobalVariable sanitization is done as a subset of SanitizerMetadata (what you've touched here is the sanitizer attributes for functions). You can take a look at 8db981d which should guide you to the right IR and bitcode printing/parsing logic.

Other than that, can you please make the "we increased the global variable size" names non-asan-specific? We also increase GV size on -fsanitize=hwaddress and -fsanitize=memtag-globals as well. Maybe an appropriate name is sanitized_padded_global (or something like that).

Thanks for the feedback @hctim. Renamed the attribute to "sanitized_padded_global".
8db981d has parsing/priniting logic to fill the SanitizerMetadata structure and also print the metadata of globals. I want to attach the new attribute to global variables and don't want to extend the SanitizerMetadata.

@skc7 skc7 changed the title [ASAN] Add "asan_instrumented" llvm ir attribute to identify AddressSanitizer instrumented globals [ASAN] Add "sanitized_padded_global" llvm ir attribute to identify sanitizer instrumented globals Nov 2, 2023
@skc7
Copy link
Contributor Author

skc7 commented Nov 8, 2023

@hctim please let me know if the new changes for adding "sanitized_padded_global" is fine. I haven't extended the sanitizer metadata.

@skc7
Copy link
Contributor Author

skc7 commented Nov 23, 2023

@hctim please review.

@MaskRay
Copy link
Member

MaskRay commented Dec 19, 2023

(I will be out for about 2 weeks and my response time will be slow.)

Most object file formats don't have a symbol size field.
Although ELF introduced st_size, it's scarcely utilized.
Its primary use pertains to copy relocations, which are discouraged and gradually being phased out.
The other, albeit minor, usage involves symbolization, where identifying whether st_size is zero can occasionally be helpful.

Consequently, I feel a bit nervous seeing a substantial reliance on st_size.
I wonder whether you can avoid relying on this symbol field. I am not familiar with AMDGPU, but for other architectures, symbols are completely optional and can be freely stripped if they are not used for dynamic linking.

Here is an idea using a separate metadata section:

.quad sym[0].hash; .long sym[0].size; .quad sym[1].hash; .long sym[1].size; 

(You can even place .quad sym[0].hash; .long sym[0].size in a section SHF_LINK_ORDER linking to the global variable for linker garbage collection.)

The runtime can build a map correlating hashes to sizes, which can be used to answer variable size queries.

@skc7
Copy link
Contributor Author

skc7 commented Dec 20, 2023

(I will be out for about 2 weeks and my response time will be slow.)

Most object file formats don't have a symbol size field. Although ELF introduced st_size, it's scarcely utilized. Its primary use pertains to copy relocations, which are discouraged and gradually being phased out. The other, albeit minor, usage involves symbolization, where identifying whether st_size is zero can occasionally be helpful.

Consequently, I feel a bit nervous seeing a substantial reliance on st_size. I wonder whether you can avoid relying on this symbol field. I am not familiar with AMDGPU, but for other architectures, symbols are completely optional and can be freely stripped if they are not used for dynamic linking.

Here is an idea using a separate metadata section:

.quad sym[0].hash; .long sym[0].size; .quad sym[1].hash; .long sym[1].size; 

(You can even place .quad sym[0].hash; .long sym[0].size in a section SHF_LINK_ORDER linking to the global variable for linker garbage collection.)

The runtime can build a map correlating hashes to sizes, which can be used to answer variable size queries.

AMD language runtimes provide queries for the size of device global symbols and functions to copy data to and from device global variables. Currently, runtime gets the needed information form the ELF symbol sizes in the symbol table. So, in #70166 We have come up with approach of adding two symbols (at the same offset but with different sizes) for the same global, one symbol which reports actual global size and other symbol which reports instrumented size.

@skc7 skc7 force-pushed the skc7/gv_asan_attr branch 2 times, most recently from 339f3b5 to 6070f1d Compare January 22, 2024 06:23
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 25, 2024
@arsenm
Copy link
Contributor

arsenm commented May 9, 2024

(You can even place .quad sym[0].hash; .long sym[0].size in a section SHF_LINK_ORDER linking to the global variable for linker garbage collection.)
The runtime can build a map correlating hashes to sizes, which can be used to answer variable size queries.

AMD language runtimes provide queries for the size of device global symbols and functions to copy data to and from device global variables. Currently, runtime gets the needed information form the ELF symbol sizes in the symbol table. So, in #70166 We have come up with approach of adding two symbols (at the same offset but with different sizes) for the same global, one symbol which reports actual global size and other symbol which reports instrumented size.

Have you looked into switching to the suggested approach of having a separately emitted field?

@b-sumner
Copy link

b-sumner commented May 9, 2024

AMD language runtimes provide queries for the size of device global symbols and functions to copy data to and from device global variables. Currently, runtime gets the needed information form the ELF symbol sizes in the symbol table. So, in #70166 We have come up with approach of adding two symbols (at the same offset but with different sizes) for the same global, one symbol which reports actual global size and other symbol which reports instrumented size.

Have you looked into switching to the suggested approach of having a separately emitted field?

If you mean rewrite the runtime implementation, we have not. There is nothing wrong with the runtime implementation, and I'm still not understanding the concern being raised about the approach here which has been running without issue for many months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants