Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanish accents not parsing correctly when importing from CSV file to JSON #169

Closed
isAlmogK opened this issue Feb 10, 2015 · 27 comments
Closed

Comments

@isAlmogK
Copy link

I'm parsing a CSV file with spanish accents, I set the encoding to UTF-8 however it's not parsing correctly

Here is what I'm getting back in the results object
name: "GREGORIO BERNABE �LVAREZ"

and I have the following config setup

Papa.parse(evt.target.files[0], {
header: true,
dynamicTyping: true,
encoding: "utf-8",
}

@mholt
Copy link
Owner

mholt commented Feb 10, 2015

Is the file really UTF-8 encoded? (And are all its characters UTF-8 encoded as well?) If not, be sure to specify a proper encoding that includes all the characters found in the file.

@isAlmogK
Copy link
Author

I think it has to do with the file reader not getting the encoding setting or with the jQuery select casing an issue. Here is my code

<input type="file"  id="csv-file" name="file"/>
function handleFileSelect(evt) {
    if ( !(evt.target && evt.target.files && evt.target.files[0]) ) {
        return;
    }

    Papa.parse(evt.target.files[0], {
        header: true,
        dynamicTyping: true,
        encoding: "UTF-8",
        before: function(file, inputElem)
        {
            console.log(file);
            // executed before parsing each file begins;
            // what you return here controls the flow
        },
        error: function(err, file, inputElem, reason)
        {
            console.log(err);
            // executed if an error occurs while loading the file,
            // or if before callback aborted for some reason
        },
        complete: function (results) {
            renderDataSet(results);
        }
    });
}

@mholt
Copy link
Owner

mholt commented Feb 10, 2015

I've confirmed that the file reader is indeed getting the encoding setting correctly. What's your input file look like?

@isAlmogK
Copy link
Author

I'm saving the file as CSV

@mholt
Copy link
Owner

mholt commented Feb 10, 2015

I mean what are its contents? Until I can reproduce the problem I can't produce a fix.

@isAlmogK
Copy link
Author

Ok the issue is with Microsoft excel it doesn't save it in UTF-8 format - http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding

I just copied everything into google sheets and saved it there.

@mholt
Copy link
Owner

mholt commented Feb 10, 2015

Ah, good to know. You're not the first that has had that problem - for example, #64. Maybe I will make this an FAQ on the website.

Glad you got it working! And thanks for your reports.

@isAlmogK
Copy link
Author

Yea I would recommend adding this to the docs with some more info.

@fabiocbinbutter
Copy link

Deleted my last comment, that Excel workaround does nothing...

@Benczyk
Copy link

Benczyk commented Feb 27, 2015

Had the same problem and lost a lot of time due to this weird Excel bug.

What works for me: instead of exporting your Excel to CSV, save the file as .txt UTF-16
Just tested a file with the papaparse demo, and it works (with or without header).

(found the solution lost in the middle of this stackoverflow thread http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding)

@arebena
Copy link

arebena commented Aug 6, 2015

Try to put a line like :

encoding: "ISO-8859-1"

in your papa config.
Works for me...

@rwilliams-scottlogic
Copy link

Looked into this as I was using it, some notes that may help people:

  • Excel will save in the system locale code page. For our users this will be Windows-1252.
  • So it should follow that we set this to "CP1252". This works in Chrome and Firefox but not IE10 and IE11.
  • Setting this to ISO-8859-1 (of which CP1252 is a superset) strangely works in all browsers, including for characters which exist in CP1252 but not ISO-8859-1.

@qroft
Copy link

qroft commented Jul 20, 2016

I am still having trouble in loading the 2 CSV files correctly form this page:
http://www.coflp.org/descargas/
If i download the "Turnos de Guardia.csv" i can see that the format is correct but within the console.log the special characters show broken.

@mharn
Copy link

mharn commented Jun 30, 2017

FWIW I've had success with using Encoding.js (https://github.com/polygonplanet/encoding.js) to detect some encodings including Japanese text (e.g. SJIS) and then run Papaparse based on that result.

Be warned that Encoding.js says that it mainly supports Unicode and Japanese formats, and not other European-centric ones.

`

<script type="text/javascript" src="./js/encoding.min.js"></script>

function onFileSelect(event) {
var file = event.target.files[0];

var reader = new FileReader();
reader.onload = function(e) {
  var codes = new Uint8Array(e.target.result);
  var encoding = Encoding.detect(codes);
  console.log(encoding);
  var files = event.target.files;
  Papa.parse(files[0], {
    skipEmptyLines: true,  // need this or papaparse adds a blank entry (despite csv only have 5 lines, it gives a 6th empty string)
    header: true, // testing adding a source/target header
    encoding: encoding,
    complete: function(results) {
	      console.log(results.data);
     }
  });
};

reader.readAsArrayBuffer(file);

}

document.getElementById('encoding.js').addEventListener('change', onFileSelect, false);
`

@laitilari
Copy link

laitilari commented Aug 22, 2017

What user arebena said earlier worked for me with letters ä and ö that I needed.

Try to put a line like :
encoding: "ISO-8859-1"
in your papa config.Works for me...

@MichaelBiermann
Copy link

I did export CSV from Excel (for Mac 16.9) using option "CSV - UTF8 (Comma delimited) (.csv)".
Afterwards upload via PapaParse on web page did i.e. corrupt German "Umlauts" üäö... .
Replacing "reader.readAsBinaryString" by "reader.readAsText(file);" did solve it.

function provessCsvFile(file){
    // upload CSV data
    var reader = new FileReader();  
    
    reader.onload = function(evt) {
        // parse 
        var config = {
            header: true
//            encoding: "ISO-8859-1"
        };
...
    reader.readAsText(file);
//    reader.readAsBinaryString(file);    "do not use -> reading exported MS Excel CSV files does corrupt encoding, i.e. special chars like German 'ü'

@alex22197
Copy link

Try to put a line like :

encoding: "ISO-8859-1"

in your papa config.
Works for me...

Thank you so much, you saved me! 2 weeks working on this...

@Pala2812
Copy link

Pala2812 commented Apr 9, 2020

Hello, i also had this issue. What worked for me: open file in Microsoft Excel -> file -> save as -> choose "CSV UTF-8" as file type

this should fix issues with utf-8 errors

@jdcaacbay
Copy link

Try to put a line like :

encoding: "ISO-8859-1"

in your papa config.
Works for me...

Worked for me! What A life saver

@daniellaera
Copy link

encoding: "ISO-8859-1" works for me too, about french accents
ex: InputStream is = new FileInputStream(csvFile); fileReader = new BufferedReader(new InputStreamReader(is, StandardCharsets.ISO_8859_1));

@alshar
Copy link

alshar commented Mar 15, 2021

Ok the issue is with Microsoft excel it doesn't save it in UTF-8 format - http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding

I just copied everything into google sheets and saved it there.

Don't be tempted to just change the encoding setting in excel and save as UTF-8
Even though you think it should just work, it doesn't :(
Google sheets is definitely the answer.

@mharn
Copy link

mharn commented Mar 16, 2021

Ok the issue is with Microsoft excel it doesn't save it in UTF-8 format - http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding
I just copied everything into google sheets and saved it there.

Don't be tempted to just change the encoding setting in excel and save as UTF-8
Even though you think it should just work, it doesn't :(
Google sheets is definitely the answer.

What version of excel did you use? Office 2011 for Mac just didn’t work with UTF8 for me, but I think the latest version worked ok? Not in my office so I can’t confirm my current version.

@alshar
Copy link

alshar commented Mar 16, 2021

What version of excel did you use? Office 2011 for Mac just didn’t work with UTF8 for me, but I think the latest version worked ok? Not in my office so I can’t confirm my current version.

The newest version of Excel, I believe it's 2019 on a Windows 10 PC.
The default encoding was Windows Western European or something like that and I switched it to UTF-8 but didn't have luck with that. Uploading the original doc to sheets then saving it worked for me though.

Good luck!

@auzaluis
Copy link

auzaluis commented Aug 10, 2022

Try this:

  • Open the csv file in Excel
  • Save it as CSV/UTF-8
  • Import the new csv archive to R using read.csv(). Don't forget to specify encode = "UTF-8"

@kairos666
Copy link

I've got a similar issue with generated CSV based on papa.unparse and french accents
Correct String out of unparse but french accents not preserved in the resulting CSV file.

All fixes above applied to no avail.
Finally got it working by prefixing the unparsed result

const BOMprefix = "\uFEFF";
new File([`${BOMprefix}${csvString}`], csvFileName, { type: 'text/csv;charset=utf-8;' });

@zainabelsayed
Copy link

I've got a similar issue with generated CSV based on papa.unparse and french accents Correct String out of unparse but french accents not preserved in the resulting CSV file.

All fixes above applied to no avail. Finally got it working by prefixing the unparsed result

const BOMprefix = "\uFEFF";
new File([`${BOMprefix}${csvString}`], csvFileName, { type: 'text/csv;charset=utf-8;' });

It worked for me, with the Arabic language, thank you for sharing this solution

@shirodkarpushkar
Copy link

encoding: "ISO-8859-1"

It solves the problem of recognizing special characters however some cases are still missed out and utf8 characters are still not parsed correctly sometimes.

Using the below typescript solution I am now reading the CSV as text and it works for all utf8 characters.

export const csvtoJSONv2 = (file: File) =>
  new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.readAsText(file, 'utf-8');
    reader.onload = (event) => {
      const csv = event.target?.result;
      const data = csv?.toString().split('\r');

      const headers = data?.[0].split(',');
      // remove new line
      const filteredData = data
        ?.slice(1)
        .filter((row) => row.split(',').some((v) => v.replace('\n', '')));
      const utf8Data = filteredData?.map((row) => {
        const utf8Row = row.split(',').reduce((acc, value, index) => {
          const key = headers?.[index] || '';
          acc[key] = value.replace('\n', '');
          return acc;
        }, {} as { [key: string]: string });
        return utf8Row;
      });
      const ut8FilterData =utf8Data?.filter((row) => Object.values(row).some((v) => v)) || [];
      resolve(ut8FilterData);
    };
    reader.onerror = (error) => {
      reject(error);
    };
  });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests