Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LangRef] Try to formalize the definition of "odr" in LLVM IR. #92619

Merged
merged 2 commits into from
May 20, 2024

Conversation

efriedma-quic
Copy link
Collaborator

@efriedma-quic efriedma-quic commented May 17, 2024

The current definition is a bit fuzzy... replace it with something that's somewhat rigorous.

For functions, the definition is pretty narrow; as a consequence of language-level non-determinism, it's impossible to tell whether two functions are equivalent, so just embrace the non-determinism. For constants, we're pretty strict; otherwise you end up concluding constants can actually change value, which is bad for alias analysis. I think C++ standard don't allow any non-deterministic operations in constants, so we should be okay there? Poison is per-byte to allow some ambiguity in the way padding is defined.

The current definition is a bit fuzzy... replace it with something
that's somewhat rigorous.

For functions, the definition is pretty narrow; as a consequence of
language-level non-determinism, it's impossible to actually whether two
functions are equivalent, so just embrace the non-determinism.  For
constants, we're pretty strict; otherwise you end up concluding
constants can actually change value, which is bad for alias analysis.
I think C++ standard don't allow any non-deterministic operations in
constants, so we should be okay there?  Poison is per-byte to allow some
ambiguity in the way padding is defined.
@llvmbot
Copy link
Collaborator

llvmbot commented May 17, 2024

@llvm/pr-subscribers-llvm-ir

Author: Eli Friedman (efriedma-quic)

Changes

The current definition is a bit fuzzy... replace it with something that's somewhat rigorous.

For functions, the definition is pretty narrow; as a consequence of language-level non-determinism, it's impossible to tell whether two functions are equivalent, so just embrace the non-determinism. For constants, we're pretty strict; otherwise you end up concluding constants can actually change value, which is bad for alias analysis. I think C++ standard don't allow any non-deterministic operations in constants, so we should be okay there? Poison is per-byte to allow some ambiguity in the way padding is defined.


Full diff: https://github.com/llvm/llvm-project/pull/92619.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+11-7)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index e2f4d8bfcaeed..d503f060b3d04 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -290,13 +290,17 @@ linkage:
     symbol is weak until linked, if not linked, the symbol becomes null
     instead of being an undefined reference.
 ``linkonce_odr``, ``weak_odr``
-    Some languages allow differing globals to be merged, such as two
-    functions with different semantics. Other languages, such as
-    ``C++``, ensure that only equivalent globals are ever merged (the
-    "one definition rule" --- "ODR"). Such languages can use the
-    ``linkonce_odr`` and ``weak_odr`` linkage types to indicate that the
-    global will only be merged with equivalent globals. These linkage
-    types are otherwise the same as their non-``odr`` versions.
+    The ``odr`` suffix indicates that all globals defined with the given name
+    are equivalent, along the lines of the C++ "one definition rule" ("ODR").
+    Informally, this means we can inline functions and fold loads of constants.
+
+    Formally, use the following definition: when an ``odr`` function is
+    called, one of the definitions is non-deterministically chosen to run. For
+    ``odr`` variables, if the value any byte is not equal in all initializers,
+    that byte is a :ref:`poison value <poisonvalues>`. For aliases and ifuncs,
+    apply the rule for the underlying function or variable.
+
+    These linkage types are otherwise the same as their non-``odr`` versions.
 ``external``
     If none of the above identifiers are used, the global is externally
     visible, meaning that it participates in linkage and can be used to


Formally, use the following definition: when an ``odr`` function is
called, one of the definitions is non-deterministically chosen to run. For
``odr`` variables, if the value any byte is not equal in all initializers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: if any byte in the value?

Copy link
Member

@DianQK DianQK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks! For me, the explanation of the ODR variable has become much clearer.

@nunoplopes
Copy link
Member

LGTM.

@efriedma-quic efriedma-quic merged commit c1d5cc9 into llvm:main May 20, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants