Skip to content

Conversation

@vivche
Copy link
Contributor

@vivche vivche commented Jan 27, 2026

Fixes #644 - Windows Unicode Encoding Issue Report

Problem

The application crashes on Windows when processing or displaying Unicode characters beyond the Western European character set. This critical cross-platform compatibility issue occurs because:

  • Windows Default: Python uses cp1252 encoding for stdout/stderr (limited to 256 Western European characters)
  • Modern Web/Cloud: Azure services and web applications use UTF-8 encoding universally
  • Result: Application crashes when logging or displaying emojis, international characters, IPA symbols, or special formatting

This affects multiple areas including:

  • ✅ Video transcripts with phonetic symbols
  • ✅ Chat messages containing emojis or international text
  • ✅ Agent responses with Unicode formatting
  • ✅ Debug logging across the entire application
  • ✅ Error messages and stack traces

Common Error: UnicodeEncodeError: 'charmap' codec can't encode character '\uXXXX'

Solution

Configured UTF-8 encoding globally at application startup for Windows platforms. This ensures:

  • Consistent encoding across all output streams
  • Support for all Unicode characters (1.1M+ characters vs 256)
  • Cross-platform compatibility (matches Linux/macOS behavior)

Changes

  • Modified app.py to reconfigure sys.stdout and sys.stderr to UTF-8 on Windows
  • Applied at top of file before any imports or print statements
  • Includes fallback for older Python versions (<3.7)
  • Platform-specific fix (only applies on Windows)

Testing

  • ✅ Video processing with IPA phonetic symbols in transcripts
  • ✅ Chat messages with emojis and international characters
  • ✅ Debug logging with Unicode content
  • ✅ Verified no impact on Linux/macOS deployments

- Added explicit UTF-8 encoding when reading file content on Windows
- Prevents UnicodeDecodeError when processing non-ASCII filenames
- Ensures consistent file handling across different operating systems
Copy link
Contributor

@paullizer paullizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, straight forward.

@paullizer paullizer merged commit 0e0c437 into microsoft:Development Jan 30, 2026
3 of 4 checks passed
paullizer added a commit that referenced this pull request Jan 30, 2026
* Add custom subdomain support for OpenAI and Speech Service in Terraform

- Added custom_subdomain_name to OpenAI resource for managed identity authentication
- Created Speech Service resource with custom subdomain configuration
- Added RBAC role assignments for Speech Service (Managed Identity and App Service MI)
- Includes Cognitive Services Speech User and Speech Contributor roles
- Documentation: Azure Speech managed identity setup guide

* Fix Azure AI Search test connection with managed identity

Replaced REST API approach with SearchIndexClient SDK to properly handle managed identity authentication in Azure public cloud. The SDK automatically handles token acquisition and endpoint construction, eliminating the 'search_resource_manager is not defined' error that occurred with the REST API approach.

* Fix Azure AI Search test connection with managed identity

Replaced REST API approach with SearchIndexClient SDK to properly handle managed identity authentication in Azure public cloud. The SDK automatically handles token acquisition and endpoint construction, eliminating the 'search_resource_manager is not defined' error that occurred with the REST API approach.

* Corrected file folder name

* Corrected the version number to reference 0.236.012

* Removed unneeded folder and document

* Revert terraform main.tf to upstream/Development version

* updated the logging logic when running retention delete with archiving enabled (#642)

* Corrected version to 0.236.011 (#645)

* v0.237.001 (#649)

* Use Microsoft python base image

* Add python ENV vars

* Add python ENV vars

* Install deps to systme

* Add temp dir to image and pip conf support

* Add custom-ca-certificates dir

* Logo bug fix (#654)

* release note updating for github coplilot

* fixed logo bug issue

* added 2,3,4,5,6,14 days to rentention policy

* added retention policy time updates

* Rentention policy (#657)

* Critical Retention Policy Deletion Fix

* Create RETENTION_POLICY_NULL_LAST_ACTIVITY_FIX.md

* fixed retention policy runtime bug and sidebar bug (#672)

* Fix: Windows Unicode encoding issue for video uploads (#662)

- Added explicit UTF-8 encoding when reading file content on Windows
- Prevents UnicodeDecodeError when processing non-ASCII filenames
- Ensures consistent file handling across different operating systems

Co-authored-by: Chen, Vivien <Vivien.Chen+ecolab@ecolab.com>

* Update docs/how-to/azure_speech_managed_identity_manul_setup.md (#675)

Co-authored-by: vivche <vivche@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add custom subdomain support for OpenAI and Speech Service in Terraform (#558)

* Add custom subdomain support for OpenAI and Speech Service in Terraform

- Added custom_subdomain_name to OpenAI resource for managed identity authentication
- Created Speech Service resource with custom subdomain configuration
- Added RBAC role assignments for Speech Service (Managed Identity and App Service MI)
- Includes Cognitive Services Speech User and Speech Contributor roles
- Documentation: Azure Speech managed identity setup guide

* Update docs/how-to/azure_speech_managed_identity_manul_setup.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Chen, Vivien <Vivien.Chen+ecolab@ecolab.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* 0.237.006 (#676)

* Update chat-sidebar-conversations.js

* 0.237.006

* Update release_notes.md

---------

Co-authored-by: Chen, Vivien <Vivien.Chen+ecolab@ecolab.com>
Co-authored-by: Ed Clark <clarked@microsoft.com>
Co-authored-by: Bionic711 <13358952+Bionic711@users.noreply.github.com>
Co-authored-by: vivche <vivche@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
paullizer added a commit that referenced this pull request Jan 30, 2026
* Add custom subdomain support for OpenAI and Speech Service in Terraform

- Added custom_subdomain_name to OpenAI resource for managed identity authentication
- Created Speech Service resource with custom subdomain configuration
- Added RBAC role assignments for Speech Service (Managed Identity and App Service MI)
- Includes Cognitive Services Speech User and Speech Contributor roles
- Documentation: Azure Speech managed identity setup guide

* Fix Azure AI Search test connection with managed identity

Replaced REST API approach with SearchIndexClient SDK to properly handle managed identity authentication in Azure public cloud. The SDK automatically handles token acquisition and endpoint construction, eliminating the 'search_resource_manager is not defined' error that occurred with the REST API approach.

* Fix Azure AI Search test connection with managed identity

Replaced REST API approach with SearchIndexClient SDK to properly handle managed identity authentication in Azure public cloud. The SDK automatically handles token acquisition and endpoint construction, eliminating the 'search_resource_manager is not defined' error that occurred with the REST API approach.

* Corrected file folder name

* Corrected the version number to reference 0.236.012

* Removed unneeded folder and document

* Revert terraform main.tf to upstream/Development version

* updated the logging logic when running retention delete with archiving enabled (#642)

* Corrected version to 0.236.011 (#645)

* v0.237.001 (#649)

* Use Microsoft python base image

* Add python ENV vars

* Add python ENV vars

* Install deps to systme

* Add temp dir to image and pip conf support

* Add custom-ca-certificates dir

* Logo bug fix (#654)

* release note updating for github coplilot

* fixed logo bug issue

* added 2,3,4,5,6,14 days to rentention policy

* added retention policy time updates

* Rentention policy (#657)

* Critical Retention Policy Deletion Fix

* Create RETENTION_POLICY_NULL_LAST_ACTIVITY_FIX.md

* fixed retention policy runtime bug and sidebar bug (#672)

* Fix: Windows Unicode encoding issue for video uploads (#662)

- Added explicit UTF-8 encoding when reading file content on Windows
- Prevents UnicodeDecodeError when processing non-ASCII filenames
- Ensures consistent file handling across different operating systems

Co-authored-by: Chen, Vivien <Vivien.Chen+ecolab@ecolab.com>

* Update docs/how-to/azure_speech_managed_identity_manul_setup.md (#675)

Co-authored-by: vivche <vivche@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add custom subdomain support for OpenAI and Speech Service in Terraform (#558)

* Add custom subdomain support for OpenAI and Speech Service in Terraform

- Added custom_subdomain_name to OpenAI resource for managed identity authentication
- Created Speech Service resource with custom subdomain configuration
- Added RBAC role assignments for Speech Service (Managed Identity and App Service MI)
- Includes Cognitive Services Speech User and Speech Contributor roles
- Documentation: Azure Speech managed identity setup guide

* Update docs/how-to/azure_speech_managed_identity_manul_setup.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Chen, Vivien <Vivien.Chen+ecolab@ecolab.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* 0.237.006 (#676)

* Update chat-sidebar-conversations.js

* 0.237.006

* Update release_notes.md

* fixed sidebar race condition (#679)

---------

Co-authored-by: Chen, Vivien <Vivien.Chen+ecolab@ecolab.com>
Co-authored-by: Ed Clark <clarked@microsoft.com>
Co-authored-by: Bionic711 <13358952+Bionic711@users.noreply.github.com>
Co-authored-by: vivche <vivche@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants