Skip to content


Subversion checkout URL

You can clone with
Download ZIP
100644 690 lines (517 sloc) 25.709 kB
2a80cfa Add common problems doc
Martin J. Evans authored
1 =head1 Some common unicode problems and solutions using Perl DBD::ODBC and MS SQL Server
3 =head2 Introduction
5 Here I have tried to collect useful information to help you use
6 Unicode in MS SQL Server from Perl DBD::ODBC. Many of the problems
7 listed came from questions on the dbi-user list, perlmonks or emails
8 direct to me.
10 I've tried very hard to make all the examples work on Windows
11 and Unix but the facts are:
13 =over
15 =item not all ODBC drivers are equal
17 Obviously ODBC drivers support for ODBC and bugs mean they all behave
18 a little differently. Also, many ODBC drivers for Unix have multiple
19 ways to configure them for different ways of supprting Unicode (e.g.,
20 the ODBC way, returning wide characters or converting these to UTF-8).
22 =item SQL Server is not constant
24 Different versions of MS SQL Server add new features or bug fixes e.g.,
25 MS SQL Server did not used to support supplemental characters formally
26 but now does if you use the correct collation.
28 =back
30 =head2 Terminolgy
32 In this document I repeatedly use some terminolgy which needs further
33 explanation.
35 =over
37 =item Encoding
39 By encoding, I mean how unicode characters are encoded e.g., they
40 could be encoded in UTF-8, UTF-16, UCS-2 etc. In perl unicode
41 characters are encoded in UTF-8 but you mostly don't need to know
42 that although DBD::ODBC does.
44 =item Wide characters
46 In the ODBC API wide characters were 2 bytes and UCS-2. However,
47 Microsoft's idea of wide characters keeps changing and now it is sometimes
48 4 bytes in UTF-16.
50 =item Wide APIs
52 The ODBC wide APIs are those called SQLxxxW e.g., SQLDriverConnectW. Any
53 string arguments to wide APIs expect UCS-2 (normally and sometimes
54 UTF-16).
58 SQL_WCHAR and SQL_VARWCHAR are actually macros in the C ODBC API which
59 are assigned numbers and passed into some ODBC APIs to tell the
60 ODBC driver to return L</Wide characters>.
62 You can use these macros independently of using ODBC L</Wide APIs>
63 i.e., you don't have to use SQLDriverConnectW, SQLPrepareW etc just
64 to get L</Wide characters> back from an ODBC Driver, SQLBindCol
65 and SQLBindParameter often support SQL_WCHAR/SQL_WVARCHAR as well.
67 =back
69 =head2 Some DBD::ODBC background/history
71 DBD::ODBC has not always supported Unicode. It all started with a
72 patch from Alexander Foken and around version 1.14 Alexander's
73 original patch was adapted to include optional Unicode support for
74 Unix.
76 For DBD::ODBC, building without unicode support really means
77 build as DBD::ODBC worked before unicode support was added to
78 maintain backwards compatibility.
80 The initial Unicode support was for Windows only and allowed you to
81 send/retrieve nchar/nvarchar columns as SQL_WCHARs meaning DBD::ODBC:
83 o used Windows APIs like WideCharToMultiByte and MultiByteToWideChar
84 to convert from UCS-2 to UTF-8 and vice versa.
86 o passed SQL_WCHAR to SQLBindCol for nchar columns meaning UCS-2
87 data could be retrieved from a column.
89 o marked converted UTF-8 data returned from the database as UTF-8 for perl
91 o unicode data in Perl which was bound to placeholders was bound
92 as SQL_WCHAR and passed to SQLBindParameter meaning you could
93 write unicode data into columns.
95 Since then unicode support has grown to include:
97 o unicode support in SQL (1.16_2)
98 o unicode support in connection strings (1.16_2)
99 o unicode support in column names
100 o unicode support in metadata calls (1.32_3)
102 For full documentation on Unicode in DBD::ODBC see the Unicode
103 section in DBD::ODBC pod.
105 =head2 Using non-Unicode aware components
107 The ODBC API has two main sets of APIs; the ANSI API (SQLxxxA APIs)
108 and the unicode API (SQLxxxW APIs). By default, DBD::ODBC uses the
109 unicode API on Windows and the ANSI API on non-Windows
110 platforms. There are good historical reasons for this beyond the scope
111 of this article. If you want to read/write/update/select unicode data
112 with DBD::ODBC and MS SQL Server from non-Windows platforms you need:
114 =over
116 =item unixODBC
118 Use the unixODBC ODBC Driver Manager. You will need the unixodbc and
119 unixodbc-dev packages or you can build it yourself quite easily.
121 =item DBD::ODBC
123 Build DBD::ODBC for the unicode API i.e., build it with
125 perl Makefile.PL -u
127 If you install DBD::ODBC from the CPAN shell (in Unix) by default
128 you'll get a DBD::ODBC which does not use the wide ODBC APIs.
130 =item ODBC Driver
132 Use a MS SQL Server ODBC driver which supports the unicode APIs. The
133 Easysoft ODBC driver for MS SQL Server supports all the wide ODBC APIs
134 as does the Microsoft ODBC driver.
136 =back
138 =head2 How can you tell what you've already got support for the unicode ODBC API?
140 =over
142 =item odbc_has_unicode
144 For DBD::ODBC you need to get a connection established to MS SQL
145 Server and then you can test the odbc_has_unicode attribute:
147 perl -MDBI -le 'my $h = DBI->connect; print $h->{odbc_has_unicode};'
149 If this outputs 1, your DBD::ODBC was built for Unicode. Any false
150 value means DBD::ODBC is not using the unicode APIs.
152 =item Use a recent unixODBC.
154 Use the most recent unixODBC you can get hold of). Most packaged
155 unixODBCs (e.g., for Ubuntu, Debian etc) are quite old but even those
156 support Unicode. We recommend a recent unixODBC because of important
157 bug fixes in recent unixODBCs.
159 =item Unicode/Wide APIs supported by ODBC Driver
161 Find out if your ODBC driver supports the unicode APIs (all Easysoft
162 ODBC drivers do). This is a lot harder than it sounds and you'll most
163 likely have to consult the driver documentation.
165 If you understand shared objects in Unix you can try looking for
166 SQLxxxW APIs being exported by the driver shared library e.g.,
168 nm /usr/local/easysoft/sqlserver/lib/ | grep SQLDriverConnectW
169 0001cf80 T SQLDriverConnectW
171 =back
173 =head2 What happens if I try and use unicode from Perl DBD::ODBC and a component in the chain does not support the unicode APIs?
175 The simple answer is you won't be able to insert/update/delete/select
176 Unicode data from MS SQL Server but to be honest this is too
177 simplistic and it is worth looking at some examples.
179 ex1 Simple insert/select with non-unicode built DBD::ODBC
181 <code>
182 use 5.008001;
183 use strict;
184 use warnings;
185 use DBI qw{:utils};
187 my $unicode = "\x{20ac}"; # unicode euro symbol
188 my $h = DBI->connect or die $DBI::errstr;
189 $h->{RaiseError} = 1;
191 eval {$h->do(q/drop table unicode_test/)};
192 $h->do(q/create table unicode_test (a nvarchar(20))/);
194 my $s = $h->prepare(q/insert into unicode_test (a) values(?)/);
195 $s->execute($unicode);
197 my $r = $h->selectall_arrayref(q/select a from unicode_test/);
198 my $data = $r->[0][0];
199 print "DBI describes data as: ", data_string_desc($data), "\n";
200 print "Data Length: ", length($data), "\n";
201 print "hex ords: ";
202 foreach my $c(split(//, $data)) {
203 print sprintf("%x,", ord($c));
204 }
205 print "\n";
206 </code>
208 which outputs:
210 <output>
211 DBI describes data as: UTF8 off, non-ASCII, 3 characters 3 bytes
212 Data Length: 3
213 hex ords: e2,82,ac,
214 </output>
216 and as you can see we attempted to insert a unicode Euro symbol and
217 when we seleted it back we got 3 characters and 3 bytes instead of 1
218 character and 3 bytes and it is confirmed by the fact the Perl data
219 contains a UTF-8 encoded Euro.
221 An explanation of what happended above:
223 =over
225 =item 1
227 The column was created as an nvarchar so MS SQL Server should be happy
228 to accept unicode characters for the column data.
230 =item 2
232 DBD::ODBC prepared the SQL and asked the ODBC driver to describe the
233 parameter where it was told it was an UNICODE VARCHAR of length 20
234 characters. However, it bound the parameter as a value type of a
235 SQL_C_CHAR and a parameter type of SQL_C_WCHAR so the driver
236 interpreted each byte as a character.
238 =item 3
240 When we read the data back we got the bytes back as Perl had encoded
241 the Euro internally (UTF-8).
243 =back
245 You might be asking yourself at this point why DBD::ODBC bound the
246 data as a value type of SQL_C_CHAR and the answer is backwards
247 compatibility i.e., that is what it did for a long time before support
248 for the unicode API was added.
250 =head3 So what if we force DBD::ODBC to bind the data as SQL_WCHAR?
252 ex2. Simple insert/select with non-unicode built DBD::ODBC forcing SQL_WCHAR
254 The code for this is nearly identical to the above except we add a
255 bind_param call and import :sql_types from DBI.
257 <code>
258 use 5.008001;
259 use strict;
260 use warnings;
261 use DBI qw{:utils :sql_types};
263 my $unicode = "\x{20ac}"; # unicode euro symbol
264 my $h = DBI->connect or die $DBI::errstr;
265 $h->{RaiseError} = 1;
267 eval {$h->do(q/drop table unicode_test/)};
268 $h->do(q/create table unicode_test (a nvarchar(20))/);
270 my $s = $h->prepare(q/insert into unicode_test (a) values(?)/);
271 $s->bind_param(1, undef, {TYPE => SQL_WVARCHAR});
272 $s->execute($unicode);
274 my $r = $h->selectall_arrayref(q/select a from unicode_test/);
275 my $data = $r->[0][0];
276 print "DBI describes data as: ", data_string_desc($data), "\n";
277 print "Data Length: ", length($data), "\n";
278 print "hex ords: ";
279 foreach my $c(split(//, $data)) {
280 print sprintf("%x,", ord($c));
281 }
282 print "\n";
283 </code>
285 and the output is:
287 <output>
288 DBI describes data as: UTF8 off, non-ASCII, 3 characters 3 bytes
289 Data Length: 3
290 hex ords: e2,82,ac,
291 </output>
293 Exactly the same as before. Why? The TYPE argument passed to bind_param
294 sets the SQL Type (the parameter type) and not the value type in a
295 SQLBindParameter call.
297 =head3 Reading properly written unicode in non-unicode built DBD::ODBC
299 Now what if the unicode Euro was inserted by something else correctly and we
300 want to read it using a non-unicode built DBD::ODBC?
302 ex3. Reading unicode from non unicode built DBD::ODBC
304 We've got a valid unicode Euro symbol in the database (don't worry about how
305 for now this is just showing what happens when the data in the database
306 is correct but you use the wrong method to get it).
308 <code>
309 use 5.008001;
310 use strict;
311 use warnings;
312 use DBI qw{:utils};
314 my $unicode = "\x{20ac}"; # unicode euro symbol
315 my $h = DBI->connect or die $DBI::errstr;
316 $h->{RaiseError} = 1;
318 my $r = $h->selectall_arrayref(q/select a from unicode_test/);
319 my $data = $r->[0][0];
320 print "DBI describes data as: ", data_string_desc($data), "\n";
321 print "Data Length: ", length($data), "\n";
322 print "hex ords: ";
323 foreach my $c(split(//, $data)) {
324 print sprintf("%x,", ord($c));
325 }
326 print "\n";
327 </code>
329 which outputs:
331 <output>
332 DBI describes data as: UTF8 off, non-ASCII, 1 characters 1 bytes
333 Data Length: 1
334 hex ords: 80,
335 </output>
337 To be honest, what you get back in data here very much depends on the
338 ODBC driver and platform. On Windows you'd probably get the above
339 because 0x80 is the windows-1252 character for a Euro (if it had been
340 something not in Windows-1252, it would probably have returned a
341 question market). With some unix MS SQL Server ODBC drivers you could get
342 any of the following (and perhaps more)
344 =over
346 =item o
348 0xac (the low byte of 0x20ac)
350 =item o
352 0x3f (a question mark because the driver cannot convert a wide
353 character to a SQL_C_CHAR)
355 =item o
357 0x80 if you set the client character-set to windows-1252
359 =item o
361 0xe2, 0x82, 0xac (UTF-8) encoded Euro
363 =back
365 The point of the illustration is that this is a world of pain and you
366 don't really want to do any of the above unless you have absolutely no
367 choice.
369 You might be saying to yourself, yes but you can set a type in the
370 bind_col method so you can control how the data is returned to
371 you. Mostly that is not true for just about all Perl DBDs I know but
372 with DBD::ODBC you can override the default type in a bind_col call
373 but only if it is a decimal or a timestamp.
375 =head2 Using varchar columns instead of nvarchar columns for unicode data
377 Don't do this.
e982e8d add para on why you should use nvarchar
Martin J. Evans authored
379 Generally speaking you should use nchar/nvarchar when you need to
380 support multiple languages in the same column although even that isn't
381 always necessary e.g., you can support English, German and Italian in
382 one Windows codepage. A better recommendation would be to use n
383 columns for user provided data which is unconstrained and varchar for
384 columns which are constrained which you control like a number plate,
385 serial number.
2a80cfa Add common problems doc
Martin J. Evans authored
387 However, in the spirit of describing why let's look at some examples.
388 These examples assume we are now using a DBD::ODBC built using the
389 Unicode API (see above) and you have a unicode aware ODBC driver.
391 So we return to our first simple example but now run it with a
392 unicode built DBD::ODBC:
394 ex4. Simple insert/select with unicode built DBD::ODBC but using varchar
396 <code>
397 use 5.008001;
398 use strict;
399 use warnings;
400 use DBI qw{:utils :sql_types};
402 my $unicode = "\x{20ac}"; # unicode euro symbol
403 my $h = DBI->connect or die $DBI::errstr;
404 $h->{RaiseError} = 1;
406 eval {$h->do(q/drop table unicode_test/)};
407 $h->do(q/create table unicode_test (a varchar(20))/);
409 my $s = $h->prepare(q/insert into unicode_test (a) values(?)/);
410 $s->execute($unicode);
411 $s->bind_param(1, undef, {TYPE => SQL_WVARCHAR});
412 $s->execute($unicode);
414 my $r = $h->selectall_arrayref(q/select a from unicode_test/);
415 foreach my $r (@$r) {
416 my $data = $r->[0];
417 print "DBI describes data as: ", data_string_desc($data), "\n";
418 print "Data Length: ", length($data), "\n";
419 print "hex ords: ";
420 foreach my $c(split(//, $data)) {
421 print sprintf("%x,", ord($c));
422 }
423 print "\n";
424 }
425 </code>
427 which outputs
429 <output>
430 DBI describes data as: UTF8 on, non-ASCII, 3 characters 6 bytes
431 Data Length: 3
432 hex ords: e2,201a,ac,
433 DBI describes data as: UTF8 on, non-ASCII, 1 characters 3 bytes
434 Data Length: 1
435 hex ords: 20ac,
436 </output>
438 Here again, you'll get different results depending on platform and
439 driver.
441 I imagine this is really going to make you wonder what on earth has
442 happened here. In Perl, the euro is internally encoded as UTF-8 as
443 0xe2,0x82,0xac. DBD::ODBC was told the column is SQL_CHAR but because
444 this is a unicode build it bound the columns as SQL_WCHAR. As far as
445 MS SQL Server is concerned this is a varchar column, you wanted to
446 insert 3 characters of codes 0xe2, 0x82 and 0xac and it is confirmed
447 that this is what is in the database when we read them back as binary
448 data. However, where did character with code 0x201a come from. When
449 DBD::ODBC read the data back it bound the column as SQL_C_WCHAR and
450 hence asked SQL Server to convert the characters in the varchar column
451 to wide (UCS2 or UTF16) characters and guess what, character 82 in
452 Windows-1252 character-set (which I was using when running this code)
453 is "curved quotes" with unicode value 0x201A. 0xe2 and 0xac in
454 windows-1252 are the same character code in unicode.
456 In the second row we bound the data as SQL_WCHAR for insert and
457 SQL_WCHAR for select B<and> the characters is in windows-1252 so we
458 got back what we inserted. However, had we tried to insert a character
459 not in the windows-1252 codepage SQL Server would substitute that
460 characters with a '?'.
462 Here is a Windows specific version of the above test with a few more
463 bells and whistles:
465 <code>
466 #
467 # A simple demonstration of why you cannot use char and varchar columns
468 # in MS SQL Server to store Unicode. char and varchar columns use a codepage
469 # and unicode characters inserted into them are converted to the codepage.
470 # If a conversion does not exist in the codepage the characters which don't
471 # convert will be '?'
472 #
473 # Show the diference between binding as SQL_CHAR and SQL_WCHAR.
474 #
475 # See
476 #
477 use 5.008.001;
478 use strict;
479 use warnings;
480 use DBI qw(:sql_types);
481 use Data::Dumper;
482 use Win32::API;
484 # chcp on my machines normally gives 850
485 #
486 #
487 # we want windows-1252 so run: chcp 1252 first
489 #use open qw( :encoding(Windows-1252) :std );
490 binmode STDOUT, ":encoding(cp1252)";
492 # get active codepage and ensure it is cp1252
493 #
494 Win32::API::More->Import("kernel32", "UINT GetConsoleOutputCP()");
495 my $cp = GetConsoleOutputCP();
496 print "Current active console code page: $cp\n";
497 if ($cp != 1252) {
498 print "Please change to codepage 1252 - run chcp 1252\n";
499 die "Incompatible active codepage - please change to codepage 1252 by running chcp 1252\n";
500 }
502 my $h = DBI->connect() or die $DBI::errstr;
503 $h->{RaiseError} = 1;
504 $h->{PrintError} = 1;
506 eval {$h->do(q/drop table varchar_test/)};
507 $h->do(q/create table varchar_test(a varchar(20) collate Latin1_General_CI_AS)/);
509 # note codepage 1252 is 255 chrs including the euro at 0x80
510 # windows-1252 does not contain U+0187 but it does contain
511 # the euro symbol (U+20ac), the curved quotes (U+201A),
512 # Latin Small Letter F with hook (U+192), dagger (U+2020)
513 # mixing code pages in SQL Server is not recommended
514 my $insert = $h->prepare(q/insert into varchar_test (a) values(?)/);
515 my $data = "\x{20ac}\x{201A}\x{192}\x{2020}\x{187}" ;
516 {
517 use bytes;
518 print "encoded length of our data is:", length($data), "\n";
519 print "encoded data in hex is:";
520 foreach my $b(split(//, $data)) {
521 print sprintf("%x,", ord($b));
522 }
523 print "\n";
524 }
525 # this execute will discover the column is varchar and bind the perl scalar
526 # as SQL_CHAR meaning the UTF-8 encoded data in the perl scalar
527 # will be inserted as separate characters not all of which will even
528 # be translateable to the current codepage.
529 $insert->execute($data);
530 # Now we force DBD::ODBC to insert the parameter as SQL_WVARCHAR
531 $insert->bind_param(1, undef, {TYPE => SQL_WVARCHAR});
532 $insert->execute($data);
534 print "\nNotice in the first row (which was inserted as SQL_CHAR), the UTF-8 stored in the perl scalar is mostly stored as individual characters but then you will be wondering why the few of the characters seem to come back as unicode. Windows sees individual characters in the UTF-8 sequence as characters in the windows-1252 codepage and the UTF-8 sequence contains some characters in windows-1252 which map back to unicode chrs. e.g., the UTF-8 sequence for the euro is e2, 82, ac and windows see the 82 as the curved quotes in windows-1252 but when you ask for it back as wide/unicode characters it maps it to U+201a (curved quotes unicode point)\n";
535 print "\nNotice how in the second row the last character is a ?. That is because U+0187 does not exist in windows-1252 codepage our column is using\n";
536 my $r = $h->selectall_arrayref(q/select a from varchar_test/);
537 print Dumper($r);
538 foreach my $row (@$r) {
539 print $row->[0], "\n";
540 }
541 $h->disconnect;
543 </code>
e982e8d add para on why you should use nvarchar
Martin J. Evans authored
545 =head2 The correct way to do Unicode with DBD::ODBC and SQL Server and why
547 If you are on Windows or Unix (and build DBD::ODBC for unicode
548 support) then your char, varchar, nchar and nvarchar columns should
549 all be correct. Even when you use char and varchar which use a
550 codepage, because DBD::ODBC asks for the data as SQL_WCHAR, SQL Server
551 will convert any character in the codepage to a unicode codepoint and
552 DBD::ODBC will encode them as UTF-8 and mark them unicode to Perl.
554 Also, when inserting unicode, DBD::ODBC will normally just do the
555 right thing i.e., use SQL_WCHAR for nchar/nvarchar columns, but if you
556 column is a char/varchar it cannot know if you really want to attempt
557 to insert unicode or just characters/bytes in the SQL Server codepage
558 (it could look at the perl scalars you are trying to insert and make a
559 guess based on whether the utf8 flag is on or not but that is likely
560 to break quite a lot of older scripts). See the example above for
561 inserting into varchar columns.
cad4090 @mjegh add surrogate pair support in sql server
563 =head3 Surrogate pairs (or unicode code points above U+FFFF)
565 ODBC supports Unicode in the form of SQL_WCHAR types and Unicode
566 versions of the ODBC API. The encoding form that ODBC expects for data
567 used with Unicode API functions is UCS-2 (or at least that how it
568 looked when Windows and MS SQL Server was using UCS-2).
570 In anticipation of things changing, when unicode support was added to
571 DBD::ODBC the internal code actually converts Perl UTF-8 encoded
572 strings into UTF-16 which for characters in the Basic Multilingual
573 Plane is identical to UCS-2. As there were no unicode drivers
574 supporting supplemental characters (above U+0FFFF) this could not be
575 proven as good decision however, at worst it meant code using unicode
576 outside the basic multingual plane would just insert what looked to
577 the database as more characters.
579 =head4 Older versions of MS SQL Server and surrogate pairs
581 As it turned out the decision in DBD::ODBC to use UTF-16 was exactly
582 what MS initially did and versions of MS SQL Server like 2000, 2005
583 and 2008 are described as surrogate neutral rather than surrogate
584 aware. MS had this to say at the time:
586 I<Both SQL Server 2000 and SQL Server 2005 can store surrogate pairs,
587 even though UCS-2 is not aware of surrogates. SQL Server treats the
588 surrogate pairs as two undefined Unicode characters rather than as a
589 single character. Such applications are usually referred to as
590 surrogate-neutral or surrogate-safe, meaning that there is no
591 intrinsic ability to interact with the data, but at least the data can
592 be stored without loss.>
594 However, there are a few things you should be aware of when using
595 these older MS SQL Server versions that are only surrogate-neutral:
597 o for each surrogate inserted you'll need 1 extra character in the column e.g., inserting 3 surrogate pairs
598 into a nvarchar requires an nvarchar(6), not an nvarchar(3).
600 o string operations are not aware of supplementary characters. So, watch out if you are
601 using substring or len functions in SQL.
603 o Sorting and searching behavior for supplementary characters may
604 change depending on the collation
606 =head4 Newer versions of MS SQL Server and surrogate pairs
608 Newer versions of SQL Server (2012 and later, version >= 11) support surrogate pairs
609 but you must set the collation to a one ending in "_SC" e.g., Latin1_General_100_CI_AS_SC. When
610 you do this string functions will recognise surrogate pairs and all of the problems listed above
611 for older SQL Servers are fixed.
613 =head4 Is my SQL Server surrogate-neutral or surrogate-aware?
615 Here is a small script you can use to test whether your SQL Server is surrogate-neutral
616 or surrogate-aware:
618 <code>
619 # Test to see if your SQL Server is surrogate-aware or just surrogate-neutral
620 use 5.008.001;
621 use strict;
622 use DBI qw(:utils);
624 my $h = DBI->connect() or die $DBI::errstr;
625 $h->{PrintError} = 0;
627 eval {$h->do(q/drop table surrogate_pairs/)};
629 # It is possible to set the collation at instance or database level.
630 # Set it on the column to make sure that, initially, we are using a
631 # non supplementary character collation.
632 $h->do(q/create table surrogate_pairs (a nvarchar(20) collate Latin1_General_100_CI_AI)/);
634 my $insert = $h->prepare(q/insert into surrogate_pairs values(?)/);
636 # Insert test supplementary character
637 print "Inserting unicode character U+2070E into db\n";
638 $insert->execute("\x{2070E}");
640 # now read it back and see what we get
641 print "\nNote when we select this character back it is still 1 unicode character and 4 bytes and the ord is correct at 0x2070E. This is because DBD::ODBC received a buffer of SQL_WCHAR chrs back from SQL Server which it then decoded as UTF-16 which recognises the surrogate pair. This is why SQL Server using this collation (or older SQL Servers) are known as surrogate-neutral.\n";
642 my $r = $h->selectrow_arrayref(q/select a from surrogate_pairs/);
643 print data_string_desc($r->[0]), "\n";
644 print "ord(chr): ", sprintf("0x%x", ord($r->[0])), "\n";
646 # This is a non _SC collation, so the length function returns "2".
647 print "\nNote in the following that len(a) returns 2 not 1 as SQL Server has not recognised this as a surrogate pair.\n";
648 $r = $h->selectrow_arrayref(q/select len(a) from surrogate_pairs/);
649 print "length in database is: ", $r->[0], "\n";
651 # now try and alter the table to change the collation to Latin1_General_100_CI_AS_SC
652 # which only later SQL Servers (>= version 11, i.e., 2012) can do.
653 # Unfortunately older SQL Servers don't error if you try and change the collation
654 # to one it does not support so we cannot test by just trying to change to a
655 # surrogate aware collation.
656 $h->do(q/alter table surrogate_pairs alter column a nchar(20) collate Latin1_General_100_CI_AS_SC/);
658 $r = $h->selectrow_arrayref(q/SELECT SERVERPROPERTY('ProductVersion')/);
659 my $version = $r->[0];
660 print "\nYou SQL Server is version: $version\n\n";
662 if (split(/\./, $version)->[0] >= 11) {
663 print "Your SQL Server is surrogate-aware\n";
664 $r = $h->selectrow_arrayref(q/select a from surrogate_pairs/);
665 print data_string_desc($r->[0]), "\n";
666 print "ord(chr): ", sprintf("0x%x", ord($r->[0])), "\n";
668 print "\nNote in the following that len(a) returns 1 as SQL Server in this collation recognises surrogate pairs\n";
669 $r = $h->selectrow_arrayref(q/select len(a) from surrogate_pairs/);
670 print "length in database is: ", $r->[0], "\n";
671 } else {
672 print "You SQL Server is surrogate-neutral but not surrogate-aware\n";
673 }
674 $h->disconnect;
675 </code>
c4e4cb5 @mjegh Add some references
677 =head2 References
679 How do I insert Unicode supplementary characters into SQL Server from Perl?
682 List of Unicode characters
685 Windows-1252
688 International Features in Microsoft SQL Server 2005
Something went wrong with that request. Please try again.