Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

Fix UTF-8 bug in NSString_RegEx

This class would use the location information provided by
regex(3) as range for for a substring. However, the information
regex(3) returns is a byte-based, while NSString works on characters.

This can cause a problem when there are UTF-8 characters in the string,
as the wrong subsstring will be returned.

This is fixed by taking the UTF bytesequence, and extracting a substring
from that, rather than using NSString's own substring method
  • Loading branch information...
commit 3324591e6cb3af729bad654b1772e3bc34d2986e 1 parent 4544816
Pieter de Bie authored September 14, 2009

Showing 1 changed file with 3 additions and 1 deletion. Show diff stats Hide diff stats

  1. 4  NSString_RegEx.m
4  NSString_RegEx.m
@@ -57,7 +57,9 @@ - (NSArray *) substringsMatchingRegularExpression:(NSString *)pattern count:(int
57 57
 			break;
58 58
 
59 59
 		NSRange range = NSMakeRange(pmatch[i].rm_so, pmatch[i].rm_eo - pmatch[i].rm_so);
60  
-		NSString * substring = [self substringWithRange:range];
  60
+		NSString * substring = [[[NSString alloc] initWithBytes:[self UTF8String] + range.location
  61
+														 length:range.length
  62
+													   encoding:NSUTF8StringEncoding] autorelease];
61 63
 		[outMatches addObject:substring];
62 64
 
63 65
 		if (ranges)

0 notes on commit 3324591

Please sign in to comment.
Something went wrong with that request. Please try again.