Skip to content

Conversation

@snnn
Copy link
Contributor

@snnn snnn commented Jun 8, 2025

This refactoring addresses fundamental issues with Unicode and cross-platform compatibility in our logging sinks. The previous implementation, which relied on C++ iostreams, was not robust enough for a library like ONNX Runtime that must operate correctly in diverse host environments. For example, it does not work at all in python on Windows. When you run onnx_backend_test_series.py on Windows, you will see logs like:
image

It looks weird that every char has an extra space after that. However, it is not a display issue. It is a character encoding issue.

Background

Several alternative approaches were considered and rejected due to significant drawbacks:

  • The std::wcout and _setmode Trap: A common method for printing Unicode on Windows is using std::wcout after switching the stream's mode to a Unicode format (e.g., _O_U16TEXT) with _setmode. This approach is unsuitable for a library for two critical reasons:

    • It breaks the host application: Once the stream mode is changed, standard multibyte functions like printf will no longer work correctly on that stream. As a library, ONNX Runtime cannot make such a disruptive global change.
    • It is incompatible with Python: When used from Python, the stdout and stderr streams are often set to O_BINARY mode. This binary mode is fundamentally incompatible with std::wcout, which relies on text-mode translations, rendering it unusable.
  • Per-Call Mode Switching and Race Conditions: One might consider switching the stream's mode to Unicode before each write and switching it back immediately after. While this avoids a permanent global change, it introduces a significant thread-safety issue. If two threads—one from the host application and one from ONNX Runtime—attempt to write to stdout concurrently, the output will likely be garbled anyway. More importantly, we cannot assume that no two threads will use the same stream concurrently, and making such an assumption would lead to race conditions between the mode-switching calls themselves. A library must not impose such threading constraints on its host.

  • The Limitations of Multibyte APIs (ANSI Code Pages): Using the multibyte -A versions of the Windows API is also not a viable solution for modern applications.

    • Legacy Technology: ANSI code pages are a legacy concept from the 1990s. Modern applications are strongly encouraged to use Unicode, as all modern Microsoft products and APIs use it internally (as UTF-16).
    • Data Loss and Incompatibility: Each ANSI code page is limited to a specific block of 256 characters, making it impossible to represent text from multiple languages simultaneously (e.g., Cyrillic and Greek). This leads to data loss and "mojibake" (garbled text) when characters are not in the active code page.
    • No Universal Coverage: There are many languages and scripts for which no suitable code page exists. The code page model fundamentally cannot handle the full breadth of Unicode.

Main changes:

  1. Refactored OStreamSink:
  • The platform-specific OStreamSink and WOStreamSink have been consolidated into a single, unified OStreamSink class using C-style FILE* streams.
  • On Windows, the sink now checks if it is writing to an interactive console.
    • If it is a console, it uses the Win32 WriteConsoleW API to ensure correct Unicode rendering without altering stream modes or affecting other threads.
    • If the output is redirected to a file, it writes the standard UTF-8 string directly, which is the expected behavior for files.
      Modernized FileSink:
  1. The FileSink has also been rewritten to use C-style file I/O (fopen, fprintf, fclose) for consistency and control.
    The constructor now uses the modern std::filesystem::path for file paths and std::mutex for thread safety.

C-style FILE* streams were used instead of C++ iostreams because the C functions have less layers. For example, we do not need to consider the impact of imbue.

@snnn snnn closed this Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants