Skip to content

Conversation

@charles-zablit
Copy link
Contributor

@charles-zablit charles-zablit commented Dec 9, 2025

This patch improves the way lldb checks if the terminal it's opened in (if any) supports Unicode or not.

On POSIX systems, we check if LANG contains UTF-8.

On Windows, we always return true since we use the WriteToConsoleW api.

This is a relanding of #168603.

The tests failed because the bots support Unicode but the tests expect ASCII. To avoid different outputs depending on the environment the tests are running in, this patch always force ASCII in the tests.

@llvmbot
Copy link
Member

llvmbot commented Dec 9, 2025

@llvm/pr-subscribers-lldb

Author: Charles Zablit (charles-zablit)

Changes

This patch improves the way lldb checks if the terminal it's opened in (if any) supports Unicode or not.

On POSIX systems, we check if LANG contains UTF-8.

On Windows, we always return true since we use the WriteToConsoleW api.

This is a relanding of #168603.


Full diff: https://github.com/llvm/llvm-project/pull/171491.diff

5 Files Affected:

  • (modified) lldb/include/lldb/Host/Terminal.h (+12)
  • (modified) lldb/include/lldb/Host/common/DiagnosticsRendering.h (+18-1)
  • (modified) lldb/source/Host/common/DiagnosticsRendering.cpp (+5-6)
  • (modified) lldb/source/Host/common/Terminal.cpp (+15)
  • (modified) lldb/unittests/Host/common/DiagnosticsRenderingTest.cpp (+1-1)
diff --git a/lldb/include/lldb/Host/Terminal.h b/lldb/include/lldb/Host/Terminal.h
index da0d05e8bd265..3d66515c18812 100644
--- a/lldb/include/lldb/Host/Terminal.h
+++ b/lldb/include/lldb/Host/Terminal.h
@@ -68,6 +68,18 @@ class Terminal {
 
   llvm::Error SetHardwareFlowControl(bool enabled);
 
+  /// Returns whether or not the current terminal supports Unicode rendering.
+  ///
+  /// The value is cached after the first computation.
+  ///
+  /// On POSIX systems, we check if the LANG environment variable contains the
+  /// substring "UTF-8", case insensitive.
+  ///
+  /// On Windows, we always return true since we use the `WriteConsoleW` API
+  /// internally. Note that the default Windows codepage (437) does not support
+  /// all Unicode characters. This function does not check the codepage.
+  static bool SupportsUnicode();
+
 protected:
   struct Data;
 
diff --git a/lldb/include/lldb/Host/common/DiagnosticsRendering.h b/lldb/include/lldb/Host/common/DiagnosticsRendering.h
index dd33d671c24a5..3eea0647da37e 100644
--- a/lldb/include/lldb/Host/common/DiagnosticsRendering.h
+++ b/lldb/include/lldb/Host/common/DiagnosticsRendering.h
@@ -59,10 +59,27 @@ struct DiagnosticDetail {
 
 StructuredData::ObjectSP Serialize(llvm::ArrayRef<DiagnosticDetail> details);
 
+/// Renders an array of DiagnosticDetail instances.
+///
+/// \param[in] stream
+///     The stream to render the diagnostics to.
+/// \param offset_in_command
+///     An optional offset to the column position of the diagnostic in the
+///     source.
+/// \param show_inline
+///     Whether to show the diagnostics inline.
+/// \param details
+///     The array of DiagnosticsDetail to render.
+/// \param force_ascii
+///     Whether to force ascii rendering. If false, Unicode characters will be
+///     used if the output file supports them.
+///
+/// \see lldb_private::Terminal::SupportsUnicode
 void RenderDiagnosticDetails(Stream &stream,
                              std::optional<uint16_t> offset_in_command,
                              bool show_inline,
-                             llvm::ArrayRef<DiagnosticDetail> details);
+                             llvm::ArrayRef<DiagnosticDetail> details,
+                             bool force_ascii = false);
 
 class DiagnosticError
     : public llvm::ErrorInfo<DiagnosticError, CloneableECError> {
diff --git a/lldb/source/Host/common/DiagnosticsRendering.cpp b/lldb/source/Host/common/DiagnosticsRendering.cpp
index f2cd3968967fb..2c9d33a6c325c 100644
--- a/lldb/source/Host/common/DiagnosticsRendering.cpp
+++ b/lldb/source/Host/common/DiagnosticsRendering.cpp
@@ -7,6 +7,8 @@
 //===----------------------------------------------------------------------===//
 
 #include "lldb/Host/common/DiagnosticsRendering.h"
+#include "lldb/Host/Terminal.h"
+
 #include <cstdint>
 
 using namespace lldb_private;
@@ -85,7 +87,8 @@ static llvm::raw_ostream &PrintSeverity(Stream &stream,
 void RenderDiagnosticDetails(Stream &stream,
                              std::optional<uint16_t> offset_in_command,
                              bool show_inline,
-                             llvm::ArrayRef<DiagnosticDetail> details) {
+                             llvm::ArrayRef<DiagnosticDetail> details,
+                             bool force_ascii) {
   if (details.empty())
     return;
 
@@ -97,12 +100,8 @@ void RenderDiagnosticDetails(Stream &stream,
     return;
   }
 
-  // Since there is no other way to find this out, use the color
-  // attribute as a proxy for whether the terminal supports Unicode
-  // characters.  In the future it might make sense to move this into
-  // Host so it can be customized for a specific platform.
   llvm::StringRef cursor, underline, vbar, joint, hbar, spacer;
-  if (stream.AsRawOstream().colors_enabled()) {
+  if (Terminal::SupportsUnicode() && !force_ascii) {
     cursor = "˄";
     underline = "˜";
     vbar = "│";
diff --git a/lldb/source/Host/common/Terminal.cpp b/lldb/source/Host/common/Terminal.cpp
index 436dfd8130d9b..dd1dc75133f45 100644
--- a/lldb/source/Host/common/Terminal.cpp
+++ b/lldb/source/Host/common/Terminal.cpp
@@ -400,6 +400,21 @@ llvm::Error Terminal::SetHardwareFlowControl(bool enabled) {
 #endif // LLDB_ENABLE_TERMIOS
 }
 
+bool Terminal::SupportsUnicode() {
+  static std::optional<bool> result;
+  if (result)
+    return result.value();
+#ifdef _WIN32
+  return true;
+#else
+  const char *lang_var = std::getenv("LANG");
+  if (!lang_var)
+    return false;
+  result = llvm::StringRef(lang_var).lower().find("utf-8") != std::string::npos;
+#endif
+  return result.value();
+}
+
 TerminalState::TerminalState(Terminal term, bool save_process_group)
     : m_tty(term) {
   Save(term, save_process_group);
diff --git a/lldb/unittests/Host/common/DiagnosticsRenderingTest.cpp b/lldb/unittests/Host/common/DiagnosticsRenderingTest.cpp
index 851b478def32e..78fc52ae11ad1 100644
--- a/lldb/unittests/Host/common/DiagnosticsRenderingTest.cpp
+++ b/lldb/unittests/Host/common/DiagnosticsRenderingTest.cpp
@@ -10,7 +10,7 @@ class ErrorDisplayTest : public ::testing::Test {};
 
 std::string Render(std::vector<DiagnosticDetail> details) {
   StreamString stream;
-  RenderDiagnosticDetails(stream, 0, true, details);
+  RenderDiagnosticDetails(stream, 0, true, details, true);
   return stream.GetData();
 }
 } // namespace

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat unfortunate to have to have an argument just for the test, but every alternative I can think of is objectively worse.

std::string Render(std::vector<DiagnosticDetail> details) {
StreamString stream;
RenderDiagnosticDetails(stream, 0, true, details);
RenderDiagnosticDetails(stream, 0, true, details, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RenderDiagnosticDetails(stream, 0, true, details, true);
RenderDiagnosticDetails(stream, 0, true, details, /*force_ascii=*/true);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

}

bool Terminal::SupportsUnicode() {
static std::optional<bool> result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLDB coding style would be g_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@charles-zablit
Copy link
Contributor Author

Somewhat unfortunate to have to have an argument just for the test, but every alternative I can think of is objectively worse.

The only other reasonable thing I could think of was to have an if statement in the test and expecting one output or the other depending on whether unicode is supported or not.

@charles-zablit charles-zablit merged commit 8445909 into llvm:main Dec 10, 2025
10 checks passed
@charles-zablit charles-zablit deleted the charles-zablit/lldb/better-unicode-check branch December 10, 2025 13:00
@slydiman
Copy link
Contributor

It seems a number of lldb-shell tests failed on Windows hosts after this patch
around CHECK1:{{^ \^ \^}} or CHECK1:{{^ \^~}}
https://lab.llvm.org/buildbot/#/builders/211/builds/4468
https://lab.llvm.org/buildbot/#/builders/197/builds/11724
Please take a look.

@charles-zablit
Copy link
Contributor Author

I opened a fix here: #171685

charles-zablit added a commit that referenced this pull request Dec 10, 2025
This patch fixes issues introduced by
#171491 when running tests in
CI.

The shell tests expect certain characters when matching diagnostics.
With #171491, those characters
can either be Unicode specific characters or their ASCII equivalent. The
tests were always expecting the ASCII version. This patch fixes this by
using a regex to match one or the other.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 10, 2025
…#171685)

This patch fixes issues introduced by
llvm/llvm-project#171491 when running tests in
CI.

The shell tests expect certain characters when matching diagnostics.
With llvm/llvm-project#171491, those characters
can either be Unicode specific characters or their ASCII equivalent. The
tests were always expecting the ASCII version. This patch fixes this by
using a regex to match one or the other.
charles-zablit added a commit to charles-zablit/llvm-project that referenced this pull request Dec 11, 2025
@slydiman
Copy link
Contributor

#171685 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants