Incorrect text encoding with feeds containing "euc-kr" text encoding #39

Open
sylverb opened this Issue Nov 16, 2011 · 1 comment

Comments

Projects
None yet
2 participants
@sylverb

sylverb commented Nov 16, 2011

Hello,
I had issues with some specific text encoding when the http headers were not indicating the text encoding (like this one for example : http://www.torrentrg.com/bbs/rss.php?bo_table=torrent_variety ).
To fix this, I decided to get the encoding type from the XML declaration if we don't get it from http headers :

<?xml version="1.0" encoding="euc-kr"?>

If you want to check about this, here is my code ... in MWFeedParser.m / - (void)startParsingData:(NSData *)data textEncodingName:(NSString *)textEncodingName :

        [...]
        // Not UTF-8 so convert
        MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
        NSString *string = nil;

        // Attempt to detect encoding from response header
        NSStringEncoding nsEncoding = 0;
        [...]

becomes :

        [...]
        // Not UTF-8 so convert
        MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
        NSString *string = nil;

        // If no text encoding indication was in the response header
        // then try to get encoding from the XML declaration
        if (textEncodingName == nil) {
            NSData* xmlEncodingData = [NSData dataWithBytesNoCopy:(void *)[data bytes]
                                                           length:100
                                                     freeWhenDone:NO];
            NSString* xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSUTF8StringEncoding];
            if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSISOLatin1StringEncoding];
            if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSMacOSRomanStringEncoding];

            if ([xmlEncodingString hasPrefix:@"<?xml"]) {
                NSRange a = [xmlEncodingString rangeOfString:@"?>"];
                if (a.location != NSNotFound) {
                    NSString *xmlDec = [xmlEncodingString substringToIndex:a.location];
                    NSRange b = [xmlDec rangeOfString:@"encoding=\""];
                    if (b.location != NSNotFound) {
                        NSUInteger s = b.location+b.length;
                        NSRange c = [xmlDec rangeOfString:@"\"" options:0 range:NSMakeRange(s, [xmlDec length] - s)];
                        if (c.location != NSNotFound) {
                            textEncodingName = [xmlEncodingString substringWithRange:NSMakeRange(b.location+b.length,c.location-b.location-b.length)];
                        }
                    }
                }
            }
            [xmlEncodingString release];
        }

        // Attempt to detect encoding from response header or XML declaration
        NSStringEncoding nsEncoding = 0;
        [...]
@dodyrw

This comment has been minimized.

Show comment Hide comment
@dodyrw

dodyrw Apr 12, 2012

Thanks! The above code fix the problem.

dodyrw commented Apr 12, 2012

Thanks! The above code fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment