Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 Support #712

Closed
wants to merge 26 commits into from
Closed

UTF8 Support #712

wants to merge 26 commits into from

Conversation

goehle
Copy link
Member

@goehle goehle commented Jun 20, 2016

This pull requests adds better support for UTF8 characters, from the rendered webpage all the to the database. While it looks simple its got a surprising number of ramifications and will need a lot of testing.

To install: Checkout both this pull request and pull request openwebwork/pg#278

To test:

  • Check that none of the existing functionality is broken.

  • Check that you can use the Russian language setting without long character warnings and that it looks correct. (This will test long character utf8 coming from the localization engine).

  • Check that you can use utf8 in anything that writes or reads from a file. A good string to test with is "á, é, í, ó, ú, ü, ñ, ¿, ¡ 한글 漢字" This will include

    • The site_info box
    • The course_info box
    • Email files (as well as the grade message feature)
    • Saving and editing pg files.
    • Rendering pg files, both via problem pages and xmlrpc (although I make no promises about unicode correctly passing through macros)
  • Check that you can use "short" utf characters in text fields (like first name, last name, comment, set description, etc...) You can use "á, é, í, ó, ú, ü, ñ" as a test string.

  • Check that you can use "long" utf characters if you set up your mysql server to support it.

    • Log into your mysql server and run the following to change the default character set of webwork to utf8
    alter database webwork character set utf8 collate utf8_general_ci;
    
    • Create a new course. (Then maybe double check that the tables created have a default character set of utf8). Then check that you can save longer utf8 characters in text fields. You can test with " á, é, í, ó, ú, ü, ñ, ¿, ¡ 한글 漢字"
  • Check that achievements (and other features which use nfreeze) still work. Note: I changed achievements to use a base64 version of freeze and thaw. This will be compatible with older installations, but the functionality should be checked.

@heiderich
Copy link
Member

heiderich commented Jul 10, 2016

I experience a bug with several non ASCII characters in .pg files. They are not rendered anymore correctly. Previously this was working. I experience this problem both with version 2.12 and with this pull request (together with pull request 278 of pg).

The following sample problem demonstrates this problem:

`
DOCUMENT(); # This should be the first executable line in the problem.

loadMacros(
"PG.pl",
"PGbasicmacros.pl",
);

TEXT(beginproblem());

BEGIN_TEXT

This is a utf8 test: ä ü ö Ä Ü Ö ß è é ñ
END_TEXT

TEXT(EV2(<<EOT));
EOT

ENDDOCUMENT(); # This should be the last executable line in the problem.
`

@goehle
Copy link
Member Author

goehle commented Jul 13, 2016

Thanks for testing this @heiderich. I am not able to reproduce this bug, however. Could you confirm that you are doing two things:

  1. You should also be checking out the PG branch UTF8 Support pg#278 This branch adds support for reading pg files in utf8 mode (https://github.com/openwebwork/pg/pull/278/files), so it could be related to what you are experiencing.
  2. You should only add the utf8 characters to the pg file after you have checked out both branches and restarted your server. If you save the characters using the develop branch I don't think they make it to the file correctly, even if the file is later read as utf8.

@heiderich
Copy link
Member

Thanks for your recommendations @goehle.

I might have forgotten to restart the webserver after checking out these branches and before testing the problems, so I did this now.

As for your second remark, I created the .pg files using a text editor and I believe that they are correctly encoded in utf8.

Now the problems are rendered correctly, but the following error message is shown on all pages I loaded (including even the login page of a course). On the top of the pages I see

Warning -- There may be something wrong with this question. Please inform your instructor including the warning messages below.

and at the bottom

Warning messages

utf8 "\xA9" does not map to Unicode at /opt/webwork/webwork2/lib/WeBWorK/Utils.pm line 188, <$dh> chunk 1.

As far as I can tell it seems that the content is displayed correctly.

@goehle
Copy link
Member Author

goehle commented Jul 14, 2016

That is caused by an invalid utf8 character in a file that WeBWorK is trying to read. If you add warn($fileName); somewhere around this line:

@goehle
Copy link
Member Author

goehle commented Jul 14, 2016

That is caused by an invalid character in a file somewhere. If you add warn($fileName); at this line

https://github.com/openwebwork/webwork2/blob/master/lib/WeBWorK/Utils.pm#L184

restart the server then reload a page you should be able to figure out which file. I tracked some of those down and they tended to come from the copyright character which was included in stuff like VERSION. I can't reproduce it on my end but if you tell me which file its in I can fix it.

@heiderich
Copy link
Member

Thanks for the hint. I was able to figure out which file was responsible and your guess was good. The problem was caused by a copyright sign in a course.conf file. I will make a pull request for the corresponding file in the model course.

@heiderich
Copy link
Member

I still experience a problem: While the user interface and the problem text is shown correctly, special characters are not shown correctly in the solutions of the problems. The following produces this problem for me.

DOCUMENT(); # This should be the first executable line in the problem.

loadMacros(
"PG.pl",
"PGbasicmacros.pl",
);

TEXT(beginproblem());

BEGIN_TEXT

This is a utf8 test: ä ü ö Ä Ü Ö ß è é ñ
END_TEXT

TEXT(EV2(<<EOT));
EOT

SOLUTION(EV3(<<'END_SOLUTION'));
Lösung: ä ü ö Ä Ü Ö ß è é ñ
END_SOLUTION

ENDDOCUMENT(); # This should be the last executable line in the problem.

@goehle
Copy link
Member Author

goehle commented Jul 15, 2016

This was an issue with how knowls and pg interacted with their base64 encoding. I have updated the PG pull request with a fix.

@heiderich
Copy link
Member

heiderich commented Jul 16, 2016

Thanks for the fix. It resolves the problem for me.

@heiderich
Copy link
Member

heiderich commented Aug 2, 2016

Sometimes I experience the following error:

Error messages

Can't locate object method "renewed" via package "Encode::utf8" at /usr/lib/x86_64-linux-gnu/perl/5.20/Encode.pm line 217. 

Call stack

The information below can help locate the source of the problem.

in Encode::decode_utf8 called at line 64 of /opt/webwork/webwork2/lib/WeBWorK/Request.pm
in WeBWorK::Request::mutable_param called at line 183 of /opt/webwork/webwork2/lib/WeBWorK.pm

It might be related to what was reported here:

http://www.nntp.perl.org/group/perl.perl5.porters/2008/01/msg133694.html

and here

https://rt.cpan.org/Public/Bug/Display.html?id=53322

@goehle
Copy link
Member Author

goehle commented Aug 3, 2016

Does this only happen when viewing problems, or on non problem pages? If it only happens when you view problems could you add

[qw(Encode::Encodings)]

to the list of PG modules (e.g. ${pg}{modules}) in defaults.conf and see if that fixes your problem? It seems like it might be caused by the PG safe compartment, but I can't reproduce it.

@heiderich
Copy link
Member

Thank you. I only observed it when viewing problems. With this modification (though I used [qw(Encode::Encoding)] instead of [qw(Encode::Encodings)]) I do not observe the error any more. I just created a pull request.

@goehle
Copy link
Member Author

goehle commented Aug 15, 2016

This is just an update for people using this patch early and for future patch notes. As discussed above Latin1 encodings in conf files (e.g. the copyright character) will cause perl warnings. The new course.conf template file fixes this but its likely established servers will have old files lying around. If people install the recode package (just called recode in both apt and yum systems) they can quickly fix this issue by recoding all of their course.conf files in utf8 with

find /opt/webwork/courses -name "course.conf" -exec recode L1..UTF8 {} \;

@bjornbe
Copy link

bjornbe commented Nov 24, 2016

Thank you for this patch. There's still an issue with utf-characters in the classlist editor.
Edit: This seems to be simply because the usernames where added with a different encoding, changing the names in the editor fixes this.

@heiderich
Copy link
Member

heiderich commented Dec 28, 2016

I experience problems with non ASCII characters in Subject, Chapter and Section tags in the .pg files and the Taxonomy2 file. Problems first occur when running OPL-update. I partially solved these problems by

For details see my pull request goehle#15 to the repo of @goehle.

It seems that the database gets properly populated. However, the library browser would not list problems in subjects/chapters/sections containing special characters (in my case umlauts).

@heiderich
Copy link
Member

There seems to be an encoding problem when the string

$error = $r->maketext("Your authentication failed.  Please try again. Please speak with your instructor if you need help.")

in lib/WeBWorK/Authen.pm is localized to something containing proper UTF-8 characters.

@mgage
Copy link
Sponsor Member

mgage commented Oct 27, 2018

This is incorporated into #893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants