LineReader stops reading when it hits a character like "É" or "ñ" #5

Open
pkamb opened this Issue Sep 14, 2011 · 11 comments

Comments

Projects
None yet
3 participants

pkamb commented Sep 14, 2011

So you have a textfile such as:

diner
restaurant
lunch-spot
greasy spoon
café // "é" character
coffee shop
cafeteria

LineReader stops reading when it hits the "café" line above. Never gets to "coffee shop".

Owner

johnjohndoe commented Sep 15, 2011

Maybe the file is not encoded using UTF-8? I use NSUTF8StringEncoding in the FileReader. See (NSString*)readLine in line 72. Maybe you can find a way to discover the encoding type of the file before you start reading its content. You are welcome to fork the project.

ZuzooVn commented Feb 13, 2013

Hi, i still have this problems

Owner

johnjohndoe commented Feb 13, 2013

Have you verified which character encoding is used by the file you are trying to read?

ZuzooVn commented Feb 13, 2013

Hi, it's Unicode (UTF-8)

Owner

johnjohndoe commented Feb 13, 2013

Could you can upload a zipped sample somewhere? Then I will find the time to take a look at it in a few days.

ZuzooVn commented Feb 13, 2013

I think you can create new document with some character like í, é, ñ ..... Or i will update some sample data

Owner

johnjohndoe commented Feb 13, 2013

I think you should really upload an example file somewhere. I can write an ñ both into an ASCII or UTF-8 encoded file.
You can also find out yourself about the character encoding used in the file with an editor. If you are using Windows I recommend Notepad++. On MacOSX or Linux run the following command in a shell: $ file filename.

ZuzooVn commented Feb 14, 2013

This is file's info: Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators.

This is the file: http://www.mediafire.com/?1cwr4if28w504md

It have "î" character

Owner

johnjohndoe commented Feb 15, 2013

Agreed. As I suspected the file is not encoded as UTF-8.

notepadplusplus

I converted the file to UTF-8 using Notepad++ (options are visible in the menu) so you can try again with this file.

ZuzooVn commented Feb 16, 2013

Maybe we must automatically convert all file to UTF-8 before start reading its content

Owner

johnjohndoe commented Feb 16, 2013

I suggest that you look for a way to recognize the character encoding in front. Feel free to add it to the LineReader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment