New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanish accents not parsing correctly when importing from CSV file to JSON #169

Closed
AlmogRnD opened this Issue Feb 10, 2015 · 16 comments

Comments

Projects
None yet
10 participants
@AlmogRnD

I'm parsing a CSV file with spanish accents, I set the encoding to UTF-8 however it's not parsing correctly

Here is what I'm getting back in the results object
name: "GREGORIO BERNABE �LVAREZ"

and I have the following config setup

Papa.parse(evt.target.files[0], {
header: true,
dynamicTyping: true,
encoding: "utf-8",
}

@mholt

This comment has been minimized.

Show comment
Hide comment
@mholt

mholt Feb 10, 2015

Owner

Is the file really UTF-8 encoded? (And are all its characters UTF-8 encoded as well?) If not, be sure to specify a proper encoding that includes all the characters found in the file.

Owner

mholt commented Feb 10, 2015

Is the file really UTF-8 encoded? (And are all its characters UTF-8 encoded as well?) If not, be sure to specify a proper encoding that includes all the characters found in the file.

@AlmogRnD

This comment has been minimized.

Show comment
Hide comment
@AlmogRnD

AlmogRnD Feb 10, 2015

I think it has to do with the file reader not getting the encoding setting or with the jQuery select casing an issue. Here is my code

<input type="file"  id="csv-file" name="file"/>
function handleFileSelect(evt) {
    if ( !(evt.target && evt.target.files && evt.target.files[0]) ) {
        return;
    }

    Papa.parse(evt.target.files[0], {
        header: true,
        dynamicTyping: true,
        encoding: "UTF-8",
        before: function(file, inputElem)
        {
            console.log(file);
            // executed before parsing each file begins;
            // what you return here controls the flow
        },
        error: function(err, file, inputElem, reason)
        {
            console.log(err);
            // executed if an error occurs while loading the file,
            // or if before callback aborted for some reason
        },
        complete: function (results) {
            renderDataSet(results);
        }
    });
}

I think it has to do with the file reader not getting the encoding setting or with the jQuery select casing an issue. Here is my code

<input type="file"  id="csv-file" name="file"/>
function handleFileSelect(evt) {
    if ( !(evt.target && evt.target.files && evt.target.files[0]) ) {
        return;
    }

    Papa.parse(evt.target.files[0], {
        header: true,
        dynamicTyping: true,
        encoding: "UTF-8",
        before: function(file, inputElem)
        {
            console.log(file);
            // executed before parsing each file begins;
            // what you return here controls the flow
        },
        error: function(err, file, inputElem, reason)
        {
            console.log(err);
            // executed if an error occurs while loading the file,
            // or if before callback aborted for some reason
        },
        complete: function (results) {
            renderDataSet(results);
        }
    });
}
@mholt

This comment has been minimized.

Show comment
Hide comment
@mholt

mholt Feb 10, 2015

Owner

I've confirmed that the file reader is indeed getting the encoding setting correctly. What's your input file look like?

Owner

mholt commented Feb 10, 2015

I've confirmed that the file reader is indeed getting the encoding setting correctly. What's your input file look like?

@AlmogRnD

This comment has been minimized.

Show comment
Hide comment
@AlmogRnD

AlmogRnD Feb 10, 2015

I'm saving the file as CSV

I'm saving the file as CSV

@mholt

This comment has been minimized.

Show comment
Hide comment
@mholt

mholt Feb 10, 2015

Owner

I mean what are its contents? Until I can reproduce the problem I can't produce a fix.

Owner

mholt commented Feb 10, 2015

I mean what are its contents? Until I can reproduce the problem I can't produce a fix.

@AlmogRnD

This comment has been minimized.

Show comment
Hide comment
@AlmogRnD

AlmogRnD Feb 10, 2015

Ok the issue is with Microsoft excel it doesn't save it in UTF-8 format - http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding

I just copied everything into google sheets and saved it there.

Ok the issue is with Microsoft excel it doesn't save it in UTF-8 format - http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding

I just copied everything into google sheets and saved it there.

@AlmogRnD AlmogRnD closed this Feb 10, 2015

@mholt

This comment has been minimized.

Show comment
Hide comment
@mholt

mholt Feb 10, 2015

Owner

Ah, good to know. You're not the first that has had that problem - for example, #64. Maybe I will make this an FAQ on the website.

Glad you got it working! And thanks for your reports.

Owner

mholt commented Feb 10, 2015

Ah, good to know. You're not the first that has had that problem - for example, #64. Maybe I will make this an FAQ on the website.

Glad you got it working! And thanks for your reports.

@AlmogRnD

This comment has been minimized.

Show comment
Hide comment
@AlmogRnD

AlmogRnD Feb 10, 2015

Yea I would recommend adding this to the docs with some more info.

Yea I would recommend adding this to the docs with some more info.

@fabiocbinbutter

This comment has been minimized.

Show comment
Hide comment
@fabiocbinbutter

fabiocbinbutter Feb 19, 2015

Deleted my last comment, that Excel workaround does nothing...

Deleted my last comment, that Excel workaround does nothing...

@Benczyk

This comment has been minimized.

Show comment
Hide comment
@Benczyk

Benczyk Feb 27, 2015

Had the same problem and lost a lot of time due to this weird Excel bug.

What works for me: instead of exporting your Excel to CSV, save the file as .txt UTF-16
Just tested a file with the papaparse demo, and it works (with or without header).

(found the solution lost in the middle of this stackoverflow thread http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding)

Benczyk commented Feb 27, 2015

Had the same problem and lost a lot of time due to this weird Excel bug.

What works for me: instead of exporting your Excel to CSV, save the file as .txt UTF-16
Just tested a file with the papaparse demo, and it works (with or without header).

(found the solution lost in the middle of this stackoverflow thread http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding)

@arebena

This comment has been minimized.

Show comment
Hide comment
@arebena

arebena Aug 6, 2015

Try to put a line like :

encoding: "ISO-8859-1"

in your papa config.
Works for me...

arebena commented Aug 6, 2015

Try to put a line like :

encoding: "ISO-8859-1"

in your papa config.
Works for me...

@rwilliams-scottlogic

This comment has been minimized.

Show comment
Hide comment
@rwilliams-scottlogic

rwilliams-scottlogic Aug 21, 2015

Looked into this as I was using it, some notes that may help people:

  • Excel will save in the system locale code page. For our users this will be Windows-1252.
  • So it should follow that we set this to "CP1252". This works in Chrome and Firefox but not IE10 and IE11.
  • Setting this to ISO-8859-1 (of which CP1252 is a superset) strangely works in all browsers, including for characters which exist in CP1252 but not ISO-8859-1.

Looked into this as I was using it, some notes that may help people:

  • Excel will save in the system locale code page. For our users this will be Windows-1252.
  • So it should follow that we set this to "CP1252". This works in Chrome and Firefox but not IE10 and IE11.
  • Setting this to ISO-8859-1 (of which CP1252 is a superset) strangely works in all browsers, including for characters which exist in CP1252 but not ISO-8859-1.
@qroft

This comment has been minimized.

Show comment
Hide comment
@qroft

qroft Jul 20, 2016

I am still having trouble in loading the 2 CSV files correctly form this page:
http://www.coflp.org/descargas/
If i download the "Turnos de Guardia.csv" i can see that the format is correct but within the console.log the special characters show broken.

qroft commented Jul 20, 2016

I am still having trouble in loading the 2 CSV files correctly form this page:
http://www.coflp.org/descargas/
If i download the "Turnos de Guardia.csv" i can see that the format is correct but within the console.log the special characters show broken.

@mharn

This comment has been minimized.

Show comment
Hide comment
@mharn

mharn Jun 30, 2017

FWIW I've had success with using Encoding.js (https://github.com/polygonplanet/encoding.js) to detect some encodings including Japanese text (e.g. SJIS) and then run Papaparse based on that result.

Be warned that Encoding.js says that it mainly supports Unicode and Japanese formats, and not other European-centric ones.

`

<script type="text/javascript" src="./js/encoding.min.js"></script>

function onFileSelect(event) {
var file = event.target.files[0];

var reader = new FileReader();
reader.onload = function(e) {
  var codes = new Uint8Array(e.target.result);
  var encoding = Encoding.detect(codes);
  console.log(encoding);
  var files = event.target.files;
  Papa.parse(files[0], {
    skipEmptyLines: true,  // need this or papaparse adds a blank entry (despite csv only have 5 lines, it gives a 6th empty string)
    header: true, // testing adding a source/target header
    encoding: encoding,
    complete: function(results) {
	      console.log(results.data);
     }
  });
};

reader.readAsArrayBuffer(file);

}

document.getElementById('encoding.js').addEventListener('change', onFileSelect, false);
`

mharn commented Jun 30, 2017

FWIW I've had success with using Encoding.js (https://github.com/polygonplanet/encoding.js) to detect some encodings including Japanese text (e.g. SJIS) and then run Papaparse based on that result.

Be warned that Encoding.js says that it mainly supports Unicode and Japanese formats, and not other European-centric ones.

`

<script type="text/javascript" src="./js/encoding.min.js"></script>

function onFileSelect(event) {
var file = event.target.files[0];

var reader = new FileReader();
reader.onload = function(e) {
  var codes = new Uint8Array(e.target.result);
  var encoding = Encoding.detect(codes);
  console.log(encoding);
  var files = event.target.files;
  Papa.parse(files[0], {
    skipEmptyLines: true,  // need this or papaparse adds a blank entry (despite csv only have 5 lines, it gives a 6th empty string)
    header: true, // testing adding a source/target header
    encoding: encoding,
    complete: function(results) {
	      console.log(results.data);
     }
  });
};

reader.readAsArrayBuffer(file);

}

document.getElementById('encoding.js').addEventListener('change', onFileSelect, false);
`

@laitilari

This comment has been minimized.

Show comment
Hide comment
@laitilari

laitilari Aug 22, 2017

What user arebena said earlier worked for me with letters ä and ö that I needed.

Try to put a line like :
encoding: "ISO-8859-1"
in your papa config.Works for me...

laitilari commented Aug 22, 2017

What user arebena said earlier worked for me with letters ä and ö that I needed.

Try to put a line like :
encoding: "ISO-8859-1"
in your papa config.Works for me...

@MichaelBiermann

This comment has been minimized.

Show comment
Hide comment
@MichaelBiermann

MichaelBiermann Jan 20, 2018

I did export CSV from Excel (for Mac 16.9) using option "CSV - UTF8 (Comma delimited) (.csv)".
Afterwards upload via PapaParse on web page did i.e. corrupt German "Umlauts" üäö... .
Replacing "reader.readAsBinaryString" by "reader.readAsText(file);" did solve it.

function provessCsvFile(file){
    // upload CSV data
    var reader = new FileReader();  
    
    reader.onload = function(evt) {
        // parse 
        var config = {
            header: true
//            encoding: "ISO-8859-1"
        };
...
    reader.readAsText(file);
//    reader.readAsBinaryString(file);    "do not use -> reading exported MS Excel CSV files does corrupt encoding, i.e. special chars like German 'ü'

I did export CSV from Excel (for Mac 16.9) using option "CSV - UTF8 (Comma delimited) (.csv)".
Afterwards upload via PapaParse on web page did i.e. corrupt German "Umlauts" üäö... .
Replacing "reader.readAsBinaryString" by "reader.readAsText(file);" did solve it.

function provessCsvFile(file){
    // upload CSV data
    var reader = new FileReader();  
    
    reader.onload = function(evt) {
        // parse 
        var config = {
            header: true
//            encoding: "ISO-8859-1"
        };
...
    reader.readAsText(file);
//    reader.readAsBinaryString(file);    "do not use -> reading exported MS Excel CSV files does corrupt encoding, i.e. special chars like German 'ü'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment