Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Payloads as varchar and charset of payloads #164

Open
Francois-BellegardeOSF opened this issue Apr 12, 2023 · 7 comments
Open

Payloads as varchar and charset of payloads #164

Francois-BellegardeOSF opened this issue Apr 12, 2023 · 7 comments
Labels
question Further information is requested

Comments

@Francois-BellegardeOSF
Copy link

The choice of using varchar(max) instead of nvarchar(max) for payloads causes issues when payloads contain characters not supported by varchar.

The workaround would be to encode every message, but that is a solution more adapted for new apps, and even then, it's a bit awkward.

Migrating existing apps could really be problematic.

For example, I added an emoji to an activity output. Note the '??' that have replaced it. When using Azure Storage, there is no issue with charset.

image

@cgillum
Copy link
Member

cgillum commented Apr 12, 2023

@Francois-BellegardeOSF what collation is your database using?

SELECT DB_NAME() As DB, DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS Collation

@cgillum
Copy link
Member

cgillum commented Apr 12, 2023

The schema we use assumes you are using a _UTF8 database collation. The default setup uses Latin1_General_100_BIN2_UTF8, which should be able to handle non-ASCII characters fine in varchar columns.

Support for UTF-8 collation was added in SQL Server 2019: https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-utf-8-support-for-sql-server/ba-p/734928.

@Francois-BellegardeOSF
Copy link
Author

Francois-BellegardeOSF commented Apr 12, 2023

When I create a DB, it still defaults to SQL_Latin1_General_CP1_CI_AS. And apparently, this is also what IT did in production.

I tested reimporting the DB after editing model.xml in the .bacpac and it solved this issue.
It will be unpleasant and maybe impractical for large databases, but in my case it's manageable.

Thank you for clearing this up. Could this not have been made to explicitly use a UTF8 collation on the columns?
If people edit their schema to use a different collation, would it cause an issue? I would think that as long as every column are changed, at least there would not be internal issues. I wonder how future proof this would be.

@Francois-BellegardeOSF
Copy link
Author

@cgillum
I just realized this is a case sensitive collation. This makes it a huge breaking change for existing databases.
Would you say this would be safe to use with a case insensitive equivalent?

@cgillum
Copy link
Member

cgillum commented Apr 14, 2023

@Francois-BellegardeOSF yes, this particular collation was chosen for performance reasons. I also like to use it at development time to make sure we maintain compatibility with case-sensitive databases. But you can still use our schema with non-case sensitive databases. The UTF-8 issue should be the only compatibility concern.

@cgillum cgillum added question Further information is requested and removed Needs: Triage 🔍 labels Apr 14, 2023
@Francois-BellegardeOSF
Copy link
Author

@cgillum
Thank you for this prompt and clear response. This is helping me immensely.

@jianjunwang2
Copy link

@cgillum , we can't change the collation for our production running DB. Another the DTF tables are in the same DB with other business tables. To change collation will break our other logic.
Do we have other solution to resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants