ack may ignore files with multi-byte encodings #128

anno5 opened this Issue Sep 11, 2010 · 2 comments


None yet

2 participants

anno5 commented Sep 11, 2010

I noticed that ack may fail to report files that contain characters outside the ASCII range when the file is opened in utf8 mode. This may happen if $ENV{PERL_UNICODE} is set accordingly.

Specifically, the function needs_line_scan() compares the file size in bytes (from -c) against the return value of sysread(), which is in characters for utf8 handles. They don't match, so it returns 0. My preliminary fix is to ensure byte semantics through

sub needs_line_scan {

# ...

# byte semantics for sysread and regex match
use bytes;

my $buffer;
my $rc = do {
    sysread( $self->{fh}, $buffer, $size );
if ( not defined $rc ) {
    App::Ack::warn( "$self->{filename}: $!" );
    return 1;
return 0 unless $rc && ( $rc == $size );

my $regex = $opt->{regex};
return $buffer =~ /$regex/m;


...but that's deprecated. Making perl start with -C0 also helps in my situtation, but that's heavy-handed and doesn't play well with "#!/usr/bin/env perl". Offhand I can't think of a satisfactory solution that keeps ack encoding-agnostic as it is, so I'm not proposing an actual patch.

anno5 commented Sep 13, 2010

I forgot to demonstrate the problem. Here goes

radom:aux anno$ export PERL_UNICODE=SDA
radom:aux anno$ perl -E'say "aou\N{U+e4}\N{U+f6}\N{U+fc}"' >multibyte
radom:aux anno$ cat multibyte
radom:aux anno$ ack u multibyte
radom:aux anno$ ack ü multibyte
radom:aux anno$ unset PERL_UNICODE
radom:aux anno$ ack u multibyte
radom:aux anno$ ack ü multibyte
radom:aux anno$

hoelzro commented Aug 28, 2013

Migrated to ack2 queue.

@hoelzro hoelzro closed this Aug 28, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment