Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Encode strings as UTF-8 when it has wide characters #429

Merged
merged 1 commit into from

2 participants

@miyagawa
Owner

This has been controversial for so long, and my canned response is that an app returning data that breaks PSGI specification is ok to break, and should these kind of errors be handled by Lint in the development.

Meanwhile, a) realistically it's not cool for app mistakes to be able to crash servers and b) if some of these PSGI violation only happens sometimes in the runtime, catching them with Lint might be difficult as well.

There should always be a line drawn, since there are many other ways to break HTTP::Server::PSGI as well by returning (for example) a hash for headers instead of arrays, and handling all of these mistakes in the server isn't realistic (and could make the server slow).

This should probably be documented as part of the spec or guideline, but here's a quick fix on the reference server to catch such errors.

> perl -S plackup  -E dev -e ' sub{[200,["Content-Type","text/plain"],["\x{3092}"]]}'
Wide character in syswrite at /Users/miyagawa/.plenv/versions/5.18.1/lib/perl5/5.18.1/darwin-2level/IO/Handle.pm line 481.
@coveralls

Coverage Status

Coverage decreased (-0.06%) when pulling 86602ce on server-encode-utf8 into 365d440 on master.

@miyagawa miyagawa merged commit 6053f8c into master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 9 additions and 0 deletions.
  1. +9 −0 lib/HTTP/Server/PSGI.pm
View
9 lib/HTTP/Server/PSGI.pm
@@ -280,6 +280,7 @@ sub write_timeout {
sub write_all {
my ($self, $sock, $buf, $timeout) = @_;
return 0 unless defined $buf;
+ _encode($buf);
my $off = 0;
while (my $len = length($buf) - $off) {
my $ret = $self->write_timeout($sock, $buf, $len, $off, $timeout)
@@ -289,6 +290,14 @@ sub write_all {
return length $buf;
}
+# syswrite() will crash when given wide characters
+sub _encode {
+ if ($_[0] =~ /[^\x00-\xff]/) {
+ Carp::carp("Wide character outside byte range in response. Encoding data as UTF-8");
+ utf8::encode($_[0]);
+ }
+}
+
1;
__END__
Something went wrong with that request. Please try again.