Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ms] __LPREFIX feature of MSVC isn't supported in clang #27402

Open
llvmbot opened this issue Mar 22, 2016 · 13 comments
Open

[ms] __LPREFIX feature of MSVC isn't supported in clang #27402

llvmbot opened this issue Mar 22, 2016 · 13 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla clang:frontend Language frontend issues, e.g. anything involving "Sema" confirmed Verified by a second party extension:microsoft

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 22, 2016

Bugzilla Link 27028
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @dmpolukhin,@JVApen

Extended Description

This report is about MSVC feature __LPREFIX that adds prefix L to its argument, for example, it changes its argument's type char[n] to wchar_t[n]. Identifier __LPREFIX with any argument is unknown for clang.
=========Environment=============
OS: Win
Version: trunk

=========Reproducer==============
test.cpp

int main()
{
  wchar_t* C = __LPREFIX(__FUNCTION__);
  return 0;
}

===========Output================
MSVC compiles clearly

$ clang-cl -c test.cpp
test.cpp(3,16) :  error: use of undeclared identifier '__LPREFIX'

It should work like macro, but it is really not preprocessor macro:

$ cl -E test.cpp
int main()
{
  wchar_t* C = __LPREFIX(__FUNCTION__);
  return 0;
}

=================================
Andrey Skripkin
Software Engineer
Intel Compiler Team

@JVApen
Copy link

JVApen commented Jan 28, 2017

I've noticed that MSVC somehow even translates _T to this __LPREFIX (to my surprise).

The code I'm looking at is similar to:

#define LOGMESSAGE(text) _T(__FUNCTION__) _T(":") text

Result after preprocessing becomes:

__LPREFIX( __FUNCTION__) L":" "some text"

However, if I try to reproduce this in a small example, I don't get this behavior.

Replacing the _T by __LPREFIX gives the same error as reported by Skripkin Andrey

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@cor3ntin
Copy link
Contributor

It would be good to find some documentation for that, I can't find anything online.

@Endilll
Copy link
Contributor

Endilll commented Aug 30, 2023

CC @EugeneZelenko FYI we now have extension:gnu, extension:microsoft, and extension:clang labels. I think this issue fits into the second one.

@EugeneZelenko
Copy link
Contributor

@Endilll: Thank you for information!

@RIscRIpt
Copy link
Member

To avoid duplicate work, I am working at this issue (in my branch).

@RIscRIpt
Copy link
Member

RIscRIpt commented Sep 23, 2023

In terms of this issue I am trying to add support for the following expressions in Clang (under MSVC compatibility mode): https://godbolt.org/z/nrj1j3bMY

#define _CONCAT(A, B) A##B
#define CONCAT(A, B) _CONCAT(A, B)

int main() {
   const char *x = __lPREFIX(L"Yes, " __LPREFIX(u8"it " __lPREFIX(U"is ")) L"insane function " CONCAT(L, __FUNCTION__));
   return 0;
}

I see the following levels of inseparable implementation:

  1. Preprocessor: transformation of CONCAT(L, __FUNCTION__) to __LPREFIX(__FUNCTION__)
  2. Sema, concatenation of these tokens/literals.
  3. Sema for __LPREFIX: re-encoding of string-literals.
  4. TemplateInstantiator: re-evaluation of predefined expressions like __FUNCTION__.

I think the implementation is inseparable, because: if (1) is implemented, then we need to handle __LPREFIX in Sema (2, 3). If we implement (2, 3) without taking (4) into account, then values of expanded __FUNCTION__ in templated functions are not correct (explanation is below).

I am writing this message, because I see difficulty at combination of (2,3,4), and (as per my current plan) it would require significant changes in Sema, by either adding new type of Expr in Clang, or reworking PredefinedExpr. Before making such changes, I want to coordinate the implementation.

1. Preprocessor. As per my observations and experiments with MSVC, CONCAT(L, __FUNCTION__) works the following way: B in _CONCAT(A, B) gets expanded to __FSTREXP __FUNCTION__. Later, tokens in L##__FSTREXP __FUNCTION__ are replaced as follows:

  ///         L##__FSTREXP   __FUNCTION__
  ///         || |           |
  ///         vv v           v
  /// __LPREFIX( __FUNCTION__)

All of these transformations are easily made in TokenLexer::ExpandFunctionArguments. I have no questions here, this is implemented in lprefix

2, 3, 4. Sema and TemplateInstantiator. Unfortunately, the approach we followed in D153914, was erroneous: https://godbolt.org/z/vx9zY8aTj

template<class T> class A {
public:
    A() {
        static const char *X = __FUNCTION__; // A<class int>::A
        static const char *Y = "" __FUNCTION__; // A::A<T>
    }
};

int main() {
    A<int> a;
}

And thus we cannot blindly implement (2, 3) without taking (4) into account. An example of partial implementation is in lprefix.


My proposal for (2,3,4). Create a new kind of Expr, let's call it StringConcatExpr (if you have ideas for good names, let me know). This expression would be basically a container of StringLiterals, PredefinedExprs, and __LPREFIX exprs in an AST form. On Sema level we would create a StringConcatExpr, and pre-compute its value. Later, in TemplateInstantiator we can re-build StringConcatExpr by adjusting values of __FUNCTION__ tokens and re-computing its value.

Pinging reviewers of D153914: @AaronBallman, @cor3ntin, @tahonermann

Edit: regarding the name of new Expr class. Taking a look at existing classes, I think a good name would be like MSStringLiteral / MSConcatStringLiteral / MSCompositeStringLiteral. I like the latter.

@RIscRIpt
Copy link
Member

RIscRIpt commented Nov 25, 2023

tl;dr I wanted to add yet another AST node MSCastStringLiteral to represent __LPREFIX (and company), but after finding that I need to deal with user defined string literals, I re-evaluated the whole approach and decided that I don't need such AST node.


Recently I've learned that

  • Microsoft String Cast expressions (I am talking about __LPREFIX and company) ultimately get the type of outermost cast (which is logical), e.g. https://godbolt.org/z/oqoGvxzj4
constexpr size_t operator""_len(const char*, size_t len) { return len; }
constexpr size_t operator""_len(const char8_t*, size_t len) { return len; }
constexpr size_t operator""_len(const char16_t*, size_t len) { return len; }
constexpr size_t operator""_len(const char32_t*, size_t len) { return len; }
constexpr size_t operator""_len(const wchar_t*, size_t len) { return len; }

size_t foo() {
    return __lPREFIX(__UPREFIX(__LPREFIX(U"wtf"_len) L"qwe" __LPREFIX(__FUNCTION__))) ""_len;
}
  • User-defined string literals may appear more than once, and they don't produce immediate result cppreference

When string literal concatenation takes place in translation phase 6, user-defined string literals are concatenated as well, and their ud-suffixes are ignored for the purpose of concatenation, except that only one suffix may appear on all concatenated literals

https://godbolt.org/z/PrY3cMWMP

#include <cstddef>

constexpr size_t operator""_len(const char*, size_t len) {
  return len;
}

size_t foo() {
    return ""_len "333"_len * "2"_len "2"_len - ""_len "6" "6"_len "6" "666"_len;
}

The latest implementation plan looks as follows: we can scan the whole "string literal" (including string-like predefined macros like __FUNCTION__), verify that we don't do concatenation of incompatible types (e.g. u16"" __uPREFIX with u32"" __UPREFIX), then omit all "Microsoft String Casts" except the outermost. Pass everything to StringLiteral builder (including desired string type). If list of tokens contains __FUNCTION__ (or other string-like macros), then we can create several StringLiterals and one or several PredefinedExpr, and store them into MSCompositeStringLiteral. And if we had any UDL, we pass the resulting literal (either StringLiteral or MSCompositeStringLiteral) to the UDL builder.

On template instantiation phase we can re-build MSCompositeStringLiteral changing the value of the containing PredefinedExpr.


Update: I didn't have much time to work on this until now. I am still interested in finishing this task. I'll try to make it until LLVM 19 release.

@Endilll Endilll added the confirmed Verified by a second party label Nov 25, 2023
@RIscRIpt
Copy link
Member

RIscRIpt commented Feb 7, 2024

Currently I have implemented transformation of string-prefixes (u, u8, U, L) to
appropriate __LPREFIX macro-function (via undocumented __FSTREXP helper macro).
Basically this works as follows:

// In this example macros are defined in reverse order to be able to read from top to bottom.
STR2(__FUNCTION__);
#define STR2(A) #A STR1(A) // Would get expanded to: "__FUNCTION__" STR1(__FSTREXP __FUNCTION__)
#define STR1(A) #A         // Would get expanded to: "__FUNCTION__" "__FSTREXP __FUNCTION__"
                  WIDE(__FUNCTION__)
#define WIDE(X)  _WIDE(X)
#define _WIDE(X) L##X

/*
                  WIDE(            __FUNCTION__)
                                   |
                                  /|
                                 / |
                                /  |
                               /   |
                               v   v
                 _WIDE(__FSTREXP   __FUNCTION__)
                    L##__FSTREXP   __FUNCTION__
                    || |           |
                    vv v           v
            __LPREFIX( __FUNCTION__)
*/

Now I am at the point of implementing semantics of __LPREFIX (and other) macros.
As per my understanding tokens which are inside of __LPREFIX() parentheses
are treated by MSVC as an independent string literal. E.g.

U"Hello" __UPREFIX(L" " "World")

The concatenation of U"Hello" L" " "World" is not valid by itself,
unless we apply __UPREFIX conversion first, which makes it U"Hello" U" World".

See https://godbolt.org/z/hcc8KGf5e


Another difficulty is that __LPREFIX() macros accept function local macros such as
__FUNCTION__ among its parameters. In which turn these function local macros are
context dependent, and shall be re-evaluated in templated context.

I didn't take this into account when I implemented 66c43fb,
for more info see my previous comments in this issue.
This problem is going to be fixed in terms of my current patch I am working at.

The support of such behavior I want to implement by introducing a new AST node (as mentioned above) called MSCompositeStringLiteral (don't mind MSCastStringExpr we won't need it).

See https://godbolt.org/z/qhqr5Gsbx


Implementation decision 1

Due to all of above I see several possible implementations (disclaimer, where I write "recursive" I mean unrolled recursion using some container; I am aware that Clang does not welcome recursion in parsers):

  1. Refactor StringLiteralParser in a way to make it support recursive parsing inside __LPREFIX().
  2. Introduce a new MicrosoftStringLiteralParser which inherits from StringLiteralParser. This new parser would make recursive parsing of __LPREFIX() and fill fields of StringLiteralParser base class.
  3. Outline interface methods of StringLiteralParser into StringLiteralParserBase, and inherit a new MicrosoftStringLiteralParser from StringLiteralParserBase.

I am inclining towards the last option, because: pure string literal parser would not need to know about Microsoft specific stuff, and thus it would be easier to maintain it.


Regarding __LPREFIX(__FUNCTION__) support in MicrosoftStringLiteralParser: the result of this new parser would not be a single string. Instead it would consist of a list of strings of a single type (e.g. ordinary, wide, etc.),
and tokens like __FUNCTION__.

Sema::ActOnStringLiteral would take such possibility into account, and construct MSCompositeStringLiteral accordingly.


Implementation decision 2

There are two ways in regards to handing template dependent context for MSCompositeStringLiteral.

  1. MSCompositeStringLiteral::getString(Decl *Context) which would construct string as per request in respect to current Decl context
  2. Pre-compute string representation and re-build it only in TreeTransform.

I didn't decide which is better yet.

@Endilll
Copy link
Contributor

Endilll commented Feb 8, 2024

CC @cor3ntin ^

@RIscRIpt
Copy link
Member

RIscRIpt commented Feb 20, 2024

Regarding "implementation decision 1": I was concerned that I should rather follow (1) and embed __LPREFIX support straight into StringLiteralParser, because theoretically MSFT could support __LPREFIX in places that are handled outside of ActOnStringLiteral in terms of Clang code. So I looked at usages of StringLiteralParser and tested relevant usages in MSVC (the latest version 19.39). Looks like approach (3) still holds:

  • Sema::ActOnStringLiteral; this is ok, we want to add support of __LPRFIX here as per initial idea
  • StringLiteral::getLocationOfByte; okay-ish, it is possible to handle __LPREFIX(__FUNCTION__) here with additional code
  • Sema::CheckLiteralOperatorDeclaration; __lPREFIX("") is not allowed in declaration of user defined literals
  • Parser::ParseUnqualifiedIdOperator; __lPREFIX("") is not allowed in declaration of user defined literals
  • Preprocessor::FinishLexStringLiteral; used in pragma handlers - MSVC does not recognize __LPREFIX in pragmas.
  • PragmaDebugHandler::HandlePragma; MSVC does not recognize string-prefix in pragmas
  • Preprocessor::HandleLineDirective; MSVC does not recognize neither __LPRFIX nor string concatenation here
  • Preprocessor::HandleDigitDirective; same as above
  • ReadOriginalFileName (hack to read line/digit directive); same as above
  • LexModuleNameComponent; MSVC does not recognize string-prefix in module
  • ModuleMapParser::consumeToken; ??? same as above ???
  • __identifier lexer; MSVC does not allow string literals there at all; but clang does, that's weird. I am going to investigate this, and open an issue if needed. MSVC does not recognize string-prefix in __identifier.

@AaronBallman
Copy link
Collaborator

Thank you for the detailed investigation into improved compatibility here! It was clearly a lot of work and it is truly appreciated. That said, I have some concerns with how much effort we would still need to put into this feature, and the long term maintenance costs for something that would not be used a lot. It's not a documented API from Microsoft, it isn't used in any Windows SDK, MS CRT, or other system headers (that I've been able to find, anyway), and it's not commonly used in the wild (at least with the best tools we have to search over a large corpus of code: https://sourcegraph.com/search?q=context:global+__LPREFIX+lang:C+lang:C%2B%2B&patternType=keyword&case=yes&sm=0). So this looks like a very large amount of implementation effort for a nominal feature.

This topic has come up before in https://bugs.llvm.org/show_bug.cgi?id=11789 and I think we ended up with the amount of support we'd like in 3a691a3 (patch discussion found at https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120618/059525.html). In short, we don't support __LPREFIX but we do support combining L with predefined identifiers that are macro-like. We end up supporting a bit more than Microsoft does, but you have to enable -fms-extensions mode to get the behavior: https://godbolt.org/z/xbqqe3Ma5

I think we may want to close this issue as Won't Fix given how much effort is required for exact MS compatibility and how little use this extension seems to have in the wild. If it starts showing up in system headers, we may want to reconsider at that point.

@RIscRIpt
Copy link
Member

RIscRIpt commented Feb 21, 2024

tl;dr I am willing to take a risk that my patch could not end up in main.

Thank you for feedback and references! I understand your concerns regarding maintenance cost, and amount of efforts required to develop such support.

Some context The main reason I started working at this issue is that the product of my employer would benefit from such support. Initially I worked at it once a workweek, however currently I have more important tasks, so now I allocate a few hours on weekends towards this issue, mainly because of combined interests: the commercial product of my employer as well as this could be useful for community of LLVM. That's why it takes me a long time (the same problem was with msvc::constexpr which took me a year to merge into main). I hope in future I could allocate more time towards LLVM during workweek.

Why is it important for our commercial product? Keeping it short, Clang cannot parse a source file that was preprocessed by MSVC, if it contains __LPREFIX. See: https://godbolt.org/z/fqvf6edx3

More context We have workarounds, and we are planning to switch to using my patch that supports __LPREFIX in the AST, but the patch misses semantics for it. I am sure LLVM community won't be interested in such a patch (similarly to no-op C++11 attributes). So, ultimately, the goal would be to upstream the maintenance cost, I believe it's a win-win situation: Clang gets __LPREFIX support, my employer upstreams maintenance cost, I get "internet points".

Why is it important for Clang? Clang-cl is not able to produce similar results as MSVC when __FUNCTION__ is concatenated with other strings in the templated context (as I mentioned above in regards to D153914), see https://godbolt.org/z/W74rvMd8v

There are ways to make it work, but this would add even more workarounds to existing workaround approach with L__FUNCTION__. I believe it's possible to reach parity with MSVC by making a proper implementation, as well as removing L__FUNCTION__ and 66c43fb workarounds.

Where L##__FUNCTION__ is used? Windows Trace Logging macros produces L##__FUNCTION__ internally. By compiling a product that uses TraceLogging with clang-cl instead of MSVC one would get different logs.

I would like to finish this task, and I am willing to take a risk of getting merge rejection, if community don't like the final patch series.

At the moment I see the final result as series of several patches:

  1. Done: Add support of L##__FUNCTION__ to __LPREFIX(__FUNCTION__) conversion via __FSTREXP without semantics (changes mostly in Lexer).
  2. TODO: Extract interfaces of StringLiteralParser into StringLiteralParserBase
  3. TODO: Implement __LPREFIX semantics in MicrosoftStringLiteralParser
  4. Partially done: Use MicrosoftStringLiteralParser in Sema::ActOnStringLiteral and create MSCompositeStringLiteral if needed
  5. Partially done: Make sure MSCompositeStringLiteral is processed differently from StringLiteral
  6. Partially done: Handle MSCompositeStringLiteral in TreeTransform

I believe (2) and (3) should not be difficult, I just need to come up with a proper implementation.

I am open for discussions if someone has interest in collaboration. And I could create a more detailed development plan (instead of keeping everything in my unstructured notes and thoughts) if someone would find it useful.

@RIscRIpt
Copy link
Member

My plans have changed. I don't have time to work on it either at work or during my free time. I'll try to get back to it during the next half of the year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:frontend Language frontend issues, e.g. anything involving "Sema" confirmed Verified by a second party extension:microsoft
Projects
None yet
Development

No branches or pull requests

7 participants