-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verbatim delimiter #421
Comments
How about ASCII 127 (U+007F, DELETE), since the DELETE character is hard to get into a student's answer string (pressing delete causes an action, rather than inserts the character)? Alternatively, For example
should do it. This will usually end up with |
I gather from this reference that ASCII 127 is also not OK to use in XML (or more specifically in an attribute which has the most restrictions in general.) We do something similar in PreTeXt to what you describe for choosing a delimiter with the aim to replace the 0x85 with something more friendly to more output forms. (That's after the XML validation takes place, so it's not the case that we can repeat this with 0x1F.) There are a few complications, like I'd be inclined to go that way, except I want to check about using some other unicode character first. I don't understand the character encoding issues much more than surface level. Is a time coming when we could just do like the following?
I guess WeBWorK hardcopy would need to switch to use xelatex, but should it be moving that direction anyway to support more characters in PG problems? |
OK, the ranges seemed to indicate that U+007F was OK, but the non-restricted list seems to indicate not. Too bad.
True. I suppose you could use
to avoid the star.
I had an earlier version that used
Yes, I though of that, but no matter what you end up doing along these lines, that will be a possibility, so you are going to crash one way or another. It looks like U+0085 is allowed, so Another possibility would be to use U+000D (RETURN), which LaTeX will allow as a delimiter for The
so that If using literal returns in the attributes is problematic (though it seems to work in my hand testing), then you could perhaps encode it as
This might be a possible solution. |
One quick note. While an XML file can have \n and \r in the file, they are not allowed in attribute values. So it would be the same validation issue to use them as to use 0x1F. |
I couldn't find the specification for what's allowed in an attribute. Can you provide a link? The best I can find is the definition for AttValue, which seems to indicate that there are no restrictions other than no literal If attributes are limited in what they can support, are you considering changing to using a container whose contents is the value instead? Essay answers and ones for |
I think you are right. Sorry, I was mixing up memories. In July, Sean Fitzpatrick and I spent some time thinking about this, and characters |
Try |
Wonderful. That passes my testing too. Of the proposals, using Should I poll people for red flags? @mgage , @goehle , @taniwallach ? I think you can skip reading this whole thread. The proposal is for this code in
to become (where 0x0D = CR = carriage return = \r)
except it would also process the input string to strip any 0x0D that somehow ended up in the string. My understanding is that If no one sees a red flag, I will open a PR for this. |
You could include the braces in the Also, note that |
I read the thread. I would avoid using any multi-byte characters yet in latex generated by WeBWorK by default unless it comes in from a UTF-8 encoded problems. Simply not everyone is using xelatex or something else which expects UTF-8 encoded tex files, and such an approach is likely to cause trouble on many sites who are in no rush to support UTF-8. I strongly support the proposal to change to I do think it might be advisable to preprocess the string being put inside the verbatim I cannot speak to the history behind the choice of ASCII 31 as the new BTW I needed to add |
Writing up a PR for this. Davide, the ArbitraryString snippet breaks the input string into an array split at instances of
where regex does the replacement of |
Yes. Because the string will be printed verbatim, the |
Oh I see. I lost how the ArbitaryString re-applies |
I opened #422. |
@Alex-Jordan - I merged in #422 and unless you want to leave the issue open to track the multi-line case for future attention - I think this issue can be closed. |
Once upon a time (actually still in 2.14), in
lib/Value/String.pm
, there was this code:The idea is the
\verb
LaTeX command is going to be used on a string answer, and it needs a delimiter character. Character 0x85, ASCII 133, was chosen because it would be crazy for a student to have that as part of a string answer they "typed".Then in 539406c, the character changed to 0x1F, ASCII 31, the "unit separator character". This brought it down into 7 bit ASCII, and Geoff's comment in the commit suggests this has something to do with the utf8 conversion.
So now we have string answers that use character 0x1F in their display. This is causing an issue with PreTeXt. When WW processes a problem with "PTX" display mode, it makes XML. For each answer of the problem, it makes a single XML element, with lots of attributes and values that correspond to the Perl answer hash's keys and values. For an example, see:
https://webwork-ptx.aimath.org/webwork2/html2xml?courseID=anonymous&userID=anonymous&password=anonymous&course_password=anonymous&answersSubmitted=0&displayMode=PTX&outputformat=ptx&problemSeed=8435&sourceFilePath=Library/PCC/BasicAlgebra/Geometry/CylinderVolume10.pg
and view source, since your web browser is likely to try to read the XML as HTML.
So you can see how a string like
\verb<0x1F>foo<0x1F>
could end up inside a value for an attribute of one of these XML elements. The problem is that XML does not allow this character in an attribute value. (Well, there are varying standards for what is allowed, but even when this one is allowed, its use is discouraged, and anyway, its presence causes the python validator we use to declare this to be invalid XML.)So. We want a character that a student will not be able to type with normal use of the keyboard. So nothing in ASCII 32--127. And we want a character that is valid for XML in an attribute value. So nothing in ASCII 00--31. So we have to leave 7-bit ASCII to meet both conditions. Is it possible to do this? Can a character be chosen somewhere else in utf8 and that be compatible with the utf8 conversion happening now?
The text was updated successfully, but these errors were encountered: