Permalink
Browse files

Fix UTF-8 bug in NSString_RegEx

This class would use the location information provided by
regex(3) as range for for a substring. However, the information
regex(3) returns is a byte-based, while NSString works on characters.

This can cause a problem when there are UTF-8 characters in the string,
as the wrong subsstring will be returned.

This is fixed by taking the UTF bytesequence, and extracting a substring
from that, rather than using NSString's own substring method
  • Loading branch information...
1 parent 4544816 commit 3324591e6cb3af729bad654b1772e3bc34d2986e @pieter committed Sep 14, 2009
Showing with 3 additions and 1 deletion.
  1. +3 −1 NSString_RegEx.m
View
@@ -57,7 +57,9 @@ - (NSArray *) substringsMatchingRegularExpression:(NSString *)pattern count:(int
break;
NSRange range = NSMakeRange(pmatch[i].rm_so, pmatch[i].rm_eo - pmatch[i].rm_so);
- NSString * substring = [self substringWithRange:range];
+ NSString * substring = [[[NSString alloc] initWithBytes:[self UTF8String] + range.location
+ length:range.length
+ encoding:NSUTF8StringEncoding] autorelease];
[outMatches addObject:substring];
if (ranges)

0 comments on commit 3324591

Please sign in to comment.