UTF-8 Character set/encoding of text stimuli not recognized in online experiment #2299

jvcasillas · 2019-02-22T03:24:33Z

Accented characters that work locally do not show up in online experiments when the stimuli are drawn from a conditions file in a loop. Example here: https://pavlovia.org/run/jvcasillas/lextale_sp_template/html/

This issue is referenced in the psychopy forums here: https://discourse.psychopy.org/t/including-utf-8-unicode-characters-in-online-experiments/6723

jvcasillas · 2019-03-15T17:01:21Z

I have resolved this issue by saving my conditions file as an excel worksheet. I was using a .csv and saving with utf-8 encoding in sublimetext, but apparently that wasn't working as I thought it was. After passing the list to an excel file and the accented characters are now showing up as they should.

peircej · 2019-03-15T21:52:28Z

Thanks for the info. I think it suggests there's still something to fix here in our decoding of csv files but I'm glad it's now working for you.

hsogo · 2019-06-12T12:09:09Z

Hi, I've also experienced this issue.
I'm not familiar with JavaScript but I guess Byte Order Mark (BOM) may have something to do with this. According to following pages, at least when exporting to xlsx, BOM is necessary to re-open exported xlsx files by Excel.

I created a test experiment to confirm UTF-8 CSV file with BOM can be imported correctly. This is a Japanese Stroop task and three conditions files are prepared for this experiment. Conditions file can be changed by expInfo dialog at the beginning of the experiment.

cnd.xlsx: Conditions file saved as xlsx file.
cnd.csv: Conditions file saved as UTF-8 CSV file (without BOM).
cnd_with_bom.csv: Conditions file saved as UTF-8 CSV file with BOM.

The results were as follows. As jvcasillas reported, xlsx file worked file (1) while UTF-8 CSV file didn't (2). UTF-8 CSV with BOM worked fine (3).

So, if we add BOM to CSV file without BOM, the CSV file would be read correctly, I guess.

Another possible way to solve this issue would be to specify codepage (65001) when opening CSV file.

Problem with reading cyrillic CSV without BOM (ANSI as UTF-8)

Sorry that I'm not good at JavaScript enough to test this by myself. I hope this information will be of some help.

lnnrtwttkhn · 2020-05-27T14:18:53Z

I had the same issue and @hsogo's solution (saving the .csv file with UTF-8 and BOM) solved it! In my case, I create the conditions.csv file with pandas, so I could simply add encoding='utf-8-sig' when saving the pandas dataframe to .csv (e.g., df.to_csv('conditions.csv', encoding='utf-8-sig')). Thanks @hsogo!

drakeasberry · 2020-06-08T17:51:49Z

@hsogo Thank you for the proposed solution and it is working for my online experiment. I was trying to understand the workings of BOM a little bit better and I noticed that the python docs:. Here they say that using BOM with utf-8 should be avoided.

Are there other side-effects that experimenters should be aware of when using BOM with utf-8 or is there a better alternative we should be using?

hsogo · 2020-06-11T10:51:13Z

Sorry, I'm not sure about potential issues of BOM with utf-8.

By the way, now that local debugging of PsychoJS works on my PC (Japanese Windows 10, PsychoJS 2020.1, Firefox 77.0.1), I tried to fix this problem . I found that line 297 of data-2020.1.js reads conditions file.

const workbook = XLSX.read(new Uint8Array(resourceValue), { type: "array" });

Replacing this line with the following, Japanese characters in CSV files were correctly read regardless of BOM.

workbook = XLSX.read((new TextDecoder).decode(new Uint8Array(resourceValue)), { type: "string" });

However, this modification caused error when reading xlsx files. So I added if statement as follows. This worked with all of xlsx, CSV without BOM and CSV with BOM on my environment.

let workbook;
if (['csv'].indexOf(resourceExtension) > -1)
	workbook = XLSX.read((new TextDecoder).decode(new Uint8Array(resourceValue)), { type: "string" });
else
	workbook = XLSX.read(new Uint8Array(resourceValue), { type: "array" });

Unfortunately, I don't know how to test this on Pavlovia server. @peircej What should I do?

peircej · 2020-06-11T12:48:55Z

We had some discussion about whether using utf-8-sig was a problem regarding data files a while ago #2166 In the end we implemented it as that default and it does not appear to have introduced any problems. @hoechenberger tested on a range of software and couldn't find anything that tripped over when the BO was present. One thing that's interesting is that the BOM is not technically needed for its original purpose by UTF-8 (because the byte order is a part of the encoding) but it is nonetheless useful in helping the receiving application to detect that this it UTF-8.

It's ideal obviously if @hsogo's fix means that people don't need BOM-encoded files. @hsogo would be able to submit a pull request on the https://github.com/psychopy/psychojs repository with your fix and @apitiot can review it and pull it in from there?

hsogo · 2020-06-12T07:58:07Z

I've sent a pull request. psychopy/psychojs#95

peircej assigned apitiot Feb 22, 2019

hoechenberger added the pavlovia label Feb 27, 2019

peircej mentioned this issue Jun 12, 2020

ENH: read CSV conditions files with or without BOM psychopy/psychojs#95

Closed

thewhodidthis mentioned this issue Aug 5, 2020

data/TrialHandler: PR#95 follow up decoding .csv conditions imports psychopy/psychojs#137

Merged

apitiot closed this as completed in psychopy/psychojs#137 Aug 20, 2020

drakeasberry mentioned this issue Jun 17, 2021

Special Characters drakeasberry/Dissertation#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 Character set/encoding of text stimuli not recognized in online experiment #2299

UTF-8 Character set/encoding of text stimuli not recognized in online experiment #2299

jvcasillas commented Feb 22, 2019

jvcasillas commented Mar 15, 2019

peircej commented Mar 15, 2019

hsogo commented Jun 12, 2019

lnnrtwttkhn commented May 27, 2020 •

edited

Loading

drakeasberry commented Jun 8, 2020 •

edited

Loading

hsogo commented Jun 11, 2020 •

edited

Loading

peircej commented Jun 11, 2020

hsogo commented Jun 12, 2020

UTF-8 Character set/encoding of text stimuli not recognized in online experiment #2299

UTF-8 Character set/encoding of text stimuli not recognized in online experiment #2299

Comments

jvcasillas commented Feb 22, 2019

jvcasillas commented Mar 15, 2019

peircej commented Mar 15, 2019

hsogo commented Jun 12, 2019

lnnrtwttkhn commented May 27, 2020 • edited Loading

drakeasberry commented Jun 8, 2020 • edited Loading

hsogo commented Jun 11, 2020 • edited Loading

peircej commented Jun 11, 2020

hsogo commented Jun 12, 2020

lnnrtwttkhn commented May 27, 2020 •

edited

Loading

drakeasberry commented Jun 8, 2020 •

edited

Loading

hsogo commented Jun 11, 2020 •

edited

Loading