Attempt to support some standard math operators when included in answers using Unicode characters #517

taniwallach · 2021-01-09T17:27:57Z

Students in my institution find it quite convenient to sometimes cut and paste pieces of equations shown using MathJax (and from external sites) into the input boxes. Many of them run into problems with Unicode characters being used for standard math operations and get an error message like:

Unexpected character '−'

for the Unicode minus (hex: e28892) https://www.fileformat.info/info/unicode/char/2212/index.htm
This one is easily triggered by copying a minus from MathJax.

Unexpected character 'ˆ'

for the Unicode "ˆ" (CIRCUMFLEX ACCENT) (hex: cb86) https://www.fileformat.info/info/unicode/char/02C6/index.htm
I'm not sure from where the student copy-pasted this one.

I think we should try to have PG handle these types of Unicode character as alternate forms of the ISO-8859-1 / "keyboard" characters we would expect them to type by hand.

I'm not yet certain where we would do this, and whether it is best accomplished by a replacement in the answer string or by adding the Unicode character are a known math operator in MathObjects.

@dvpc Any thoughts on the best approach?

The text was updated successfully, but these errors were encountered:

Alex-Jordan · 2021-01-09T17:46:12Z

The error message I see from this kind of thing is a database error message. Something trying to record the submitted string to the database rejects it. The message is like this:

Error messages

    DBD::mysql::st execute failed: Incorrect string value: '\xE2\x88\x922' for column 'answer_string' at row 1 at /opt/webwork/webwork2/lib/WeBWorK/DB/Schema/NewSQL/Std.pm line 837.

This was for submitting a −2.

taniwallach · 2021-01-09T18:39:02Z

@Alex-Jordan Was the test course used created as a new course once WW 2.15 was installed on the server?

Alex-Jordan · 2021-01-09T19:12:23Z

Ah, you got it. I just tried on a new course and do not get the error. Now that we're talking about this, I think I previously knew that new courses don't lead to the error, possibly with your help. But I since forgot.

OK, is it is as simple as just adding − to the context as a subtraction and negation operator? I added these lines to an experiment problem:

Context()->operators->add(
'−' => {precedence => 1, associativity => 'left', type => 'both', string => '-',
           class => 'Parser::BOP::subtract', rightparens => 'same'},
'u−'=> {precedence => 6, associativity => 'left', type => 'unary', string => '-',
           class => 'Parser::UOP::minus', hidden => 1, allowInfinite => 1, nofractionparens => 1}

and it had the desired effect. The answer was 4, and I could enter 6-2 or −−4 and it was accepted.

Unless there would be bad side effects, we could add this (and other similar things) to Parser/Context/Default.pm, effectively adding them to the Numeric context, and then everything downstream that builds off of Numeric.

Alex-Jordan · 2021-01-09T19:26:31Z

I do think I like your other idea better: converting to the keyboard version immediately upon receipt. That would avoid so much duplicate code, and give one place to clearly maintain all such conversions. Like in the year 2025 when some new emoji character is added that resembles an infinity sign, or whatever. But I do not know where that would be handled.

On the other hand, if these were handled the other way (as recognized by the context) then problem authors could use them too. For example, with a Mac, I can type option 5 to get ∞. So it would open the door to things like
$I = Interval("(-∞,∞)");
in code. A very very very small advancement in code readability.

taniwallach · 2021-01-09T20:21:07Z

@Alex-Jordan Thanks for providing the suggestion. Do you know if we can get this to take effect via PGcourse.pl so that it will effect the context loaded later by a problem?

Otherwise, I may to to make changes to Parser/Context/Default.pm on a development server to test it across a range of problems.

About the "mapping" approach - I suspect that any mapping might need to be handled at the webwork2 level. Another downside of that approach is that we will probably be modifying what gets recorded in the database from what the student actually sent - which I do not think is ideal.

dpvc · 2021-01-09T20:26:23Z

While the idea of adding more operators to the context would work, it has an important drawback beyond the additional code. If a context needs to be modified, that means you have to know all the versions of the operator that would need to be modified. For example, if you want to disable exponentiation, then you need to know all the characters that are tied to that (there currently are two versions, and that is problematic already). Many existing contexts modify the classes associated with existing operators, so you would need to go through an modify all of those to include any new characters that are added to the context. This seems impractical to me.

On the other hand, I would not want to see wholesale remapping of characters one to another outside of the context's control. There used to be some characters that WeBWorK would remove automatically (like dollar signs and other "special" characters), which made it impossible to write contexts that used those characters (like currency answers with dollar signs), and that was impossible to work around from within the problem. Making global replacements like this makes assumptions about the meanings of those characters in the context, and that is something I would not want to encourage.

I think the proper way is to extend the MathObject parser to allow operators (and the other items) to allow additional characters that also trigger the same operator. That is, you would only define - as minus, but it could have a list of additional forms that would also produce the same internal representation. That way, U+2212 would produce a - reference internally. The context would control that mapping, since it would be part of the definition of - in the context. The same thing could be done with U+221E for Infinity, U+2264 for <=, and so on. This would not require changes to any existing problems or contexts that modify copies of other contexts, as no new operators are being added. It would also allow the duplicates that already exist to be trimmed out (which would need to be handled carefully, as that could be a breaking change if not done properly).

I would not, however, map U+02C6 (circumflex accent) to ^. An accent should not be used for exponentiation, in my opinion. Would you also want to map U+0302 (combining circumflex accent) as well? I would not want to allow combining accents for this. How about U+FF3E (full-width circumflex accent)? U+2038 (caret)? Not everything that might look like the character you want necessarily has the same meaning.

All this may lead to requests for U+00B2 (superscript two) to be used for ^2, and so on (there are superscript parens and + and -, so would you want to be able to enter whole expressions that way? What about superscript n (U+207F)?). There are also subscript numbers, so would you want to allow those for entering _0, etc.? I think this becomes harder to handle.

drgrice1 · 2021-01-09T21:42:01Z

I am not entirely on board with supporting these characters. The majority of the time that students are entering these characters via cut and paste from other websites are cases where students are using websites to get answers to problems in ways that they should not be. I think that it would be better to fix the messages that students are given when these characters are encountered, but to still not accept it as correct. Instead of "Unexpected character '−'", give them some message about not using these methods. At least if we decide to support these sort of characters, it should be made a course option, or something like that.

Alex-Jordan · 2021-01-09T22:36:53Z

When I first encountered these kinds of submissions, I assumed they were instances of copy-paste from Chegg or something like that. But when I would investigate, it would come out that it was copy-paste from a WeBWorK screen. There two scenarios: * A student /did/ know the answer involved pi or infinity, but genuinely did not know to spell them out. They would copy the symbols from somewhere on the problem screen. Or perhaps from a different problem screen where they remembered those symbols were presented. It should be noted that even explicit messages that say "type 'pi' for π", etc., could be ineffective (the student wasn't reading them). This could even happen with a minus sign. For students of a certain persuasion using a cellphone, I found that copy-pasting a minus sign is preferable to moving their keypad to the secondary screen where the hyphen is. * A student was using Show Me Another to see a different version of the problem (with solution, answer available) and copied something from the SMA screen. It's less clear in these situations if the SMA version helped them learn something, or if they are reverse engineering through copy-paste and editing numbers, and leaving the fancy minus sign in place. In my situation, I still have pressure from faculty who would rather use MyMathLab or ALEKS because to them, there are all these little things that prop up a confirmation bias for WW's inferiority. (I don't actually know how the commercial platforms respond to these characters.) So I like the general idea here. But I wouldn't be opposed to making it configurable as a special PG environment variable.

…

On Sat, Jan 9, 2021 at 1:42 PM Glenn Rice ***@***.***> wrote: I am not entirely on board with supporting these characters. The majority of the time that students are entering these characters via cut and paste from other websites are cases where students are using websites to get answers to problems in ways that they should not be. I think that it would be better to fix the messages that students are given when these characters are encountered, but to still not accept it as correct. Instead of "Unexpected character '−'", give them some message about not using these methods. At least if we decide to support these sort of characters, it should be made a course option, or something like that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABEDOACB5YUOZSV4AMFJT2LSZDETLANCNFSM4V3XRUZQ> .

-- Alex Jordan Mathematics Instructor Portland Community College

taniwallach · 2021-01-09T22:51:51Z

@drgrice1 wrote:

I am not entirely on board with supporting these characters. The majority of the time that students are entering these characters via cut and paste from other websites ... At least if we decide to support these sort of characters, it should be made a course option, or something like that.

The Unicode minus can be copy-pasted from MathJax equations in the same question, which seems pretty legitimate and innocuous. However, I do agree that if this can be enabled/disabled at the site and course level - that would be optimal. Although there may be some small didactic benefit to forcing students to type their answers themselves, forcing them to do so will not stop some students from utilizing external aides which we would prefer and recommend they not use.

About @Alex-Jordan's suggestion about '∞': With MathQuill displaying an '∞' for infinity, it is somewhat hard to tell students that they must type "inf" and cannot paste in the "character" which will be displayed. In many questions it will be in the question text, ex. in the \lim_{n \to \infty} displayed by MathJax.

Another recent issue I have noticed in some support request was students using "mixed fraction" notation in answers, and WW treats the integer as multiplying the fraction. I think that is probably best handled by "educating" students about the expectations of syntax used. We have many local problems which explicitly remind students to use exp(kx) notation and not e^(kx) notation, as the prior behaves far better (in my experience) in the grading code, and fallback AnswerHints grading using the e^(kx) notation does not always accept an answer which works fine in exp(kx) form.

The question is not whether to draw some lines about what we want to accept and what not - but where to draw those lines, and to what extent it should be controllable by the course staff.

@dvpc:

Thanks for the suggestion about how best to approach implementation. The idea of an existing operation having additional triggers sounds quite logical.
About how far we want to take this - I would suggest we start small and decide as a community where to set the lines.
- For now, the Unicode minus, Unicode infinity, seem a good place to start.
- Maybe some Unicode relation symbols (\leq) are also something to consider early on.
- As an aside, I have found Unicode symbols a nice way to get some "math syntax" into the options in parserPopUp.pl, which otherwise does not support math syntax.
Regarding the "circumflex accent" - as I wrote I'm not sure where the student copied it from. Leaving it as not accepted is certainly a legitimate decision, and it bother me less than the Unicode minus, which seems a more justifiable character to be included in a student answer (as WW used it via MathJax).

Alex-Jordan · 2021-01-09T23:10:33Z

Re: "starting small".

I have maybe had one or two instances ever of this happening with the ≤ or ≥ symbol.

We use the RestrictedDomain context a lot with answers like 1/x, x != 2. So there have been a few instances with ≠.

But every other time this has come up (dozens of times), it has been the minus sign or infinity sign. So starting with those two things would go a long way. One is an operator, so could become the model for alternative operators. (Or I guess ^ would become the model, given **). The other is a string.

Now that I look it up, infinity is the actual string, and inf is implemented as inf => {alias=>'infinity'}. So aside from the configurability consideration, the alias approach that strings already use could be expanded. You could have like ** => {alias => '^'}.

awmorp · 2021-01-10T02:54:25Z

Another common one we have encountered are 'full width' versions of standard punctuation symbols, in particular:
， U+FF0C FULLWIDTH COMMA
（ U+FF08 FULLWIDTH LEFT PARENTHESIS
） U+FF09 FULLWIDTH RIGHT PARENTHESIS
These are used on some Chinese language keyboards instead of the standard comma and parentheses, so when a student using a Chinese keyboard tries to type comma or parentheses, they get these 'fullwidth' versions instead, which Webwork doesn't recognise. I would vote for these being included as alternatives for comma and parentheses.

We have also seen cases where the student genuinely meant to enter an answer involving pi or infinity but didn't know the syntax so copied the unicode symbol from somewhere. It is also quite easy to generate symbols like π ∞ etc on modern systems, eg you can type '\pi' in MS Word to produce π then copy & paste it. So I think it is worth supporting common unicode symbols.

I too like the approach of each math operator having a list of alternate characters that trigger the same operation. I would like to see it configurable at a global or course level though (as well as at individual problem level), so that we can set up our desired symbols once rather than having to remember to set it up in every problem.

Alex-Jordan · 2021-01-13T15:57:56Z

One strategy to impede cheating by copy/paste would be to disable pasting and similar things for the input field. Like you see in some "confirm email address" forms. https://davidwalsh.name/prevent-paste For the students that are copy/pasting when they legitimately know an answer (like infinity) this would force them to learn and use the intended keyboard-only method. Supporting the ∞ character and some others would still be good because they can be typed.

…

On Sat, Jan 9, 2021 at 6:54 PM awmorp ***@***.***> wrote: Another common one we have encountered are 'full width' versions of standard punctuation symbols, in particular: ， U+FF0C FULLWIDTH COMMA （ U+FF08 FULLWIDTH LEFT PARENTHESIS ） U+FF09 FULLWIDTH RIGHT PARENTHESIS These are used on some Chinese language keyboards instead of the standard comma and parentheses, so when a student using a Chinese keyboard tries to type comma or parentheses, they get these 'fullwidth' versions instead, which Webwork doesn't recognise. I would vote for these being included as alternatives for comma and parentheses. We have also seen cases where the student genuinely meant to enter an answer involving pi or infinity but didn't know the syntax so copied the unicode symbol from somewhere. It is also quite easy to generate symbols like π ∞ etc on modern systems, eg you can type '\pi' in MS Word to produce π then copy & paste it. So I think it is worth supporting common unicode symbols. I too like the approach of each math operator having a list of alternate characters that trigger the same operation. I would like to see it configurable at a global or course level though (as well as at individual problem level), so that we can set up our desired symbols once rather than having to remember to set it up in every problem. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABEDOAAF3LWYNMZG2K6UB4DSZEJG3ANCNFSM4V3XRUZQ> .

-- Alex Jordan Mathematics Instructor Portland Community College

taniwallach · 2021-01-13T16:04:27Z

One strategy to impede cheating by copy/paste would be to disable pasting and similar things for the input field.

There are questions where copy-paste can be used legitimately, ex. when complicated expressions are needed multiple times in a question. Ex change of variables in a question on integration - where we can ask for the new "ranges" and later use them as bounds of integration, and may ask for the Jacobian in advance and then it may be part of the integrand.

I would not want to see copy-pasted disabled by default. Certainly, if we can set it as an option at the course level, that is something to consider.

Alex-Jordan · 2021-01-13T16:23:38Z

I would not want to see copy-pasted disabled by default. Certainly, if we can set it as an option at the course level, that is something to consider.

When debugging a problem, pasting an answer can be very helpful. So if this were something to do, it could be something that only applies up to some permission level. Using it for 'guest' could be the default. (Or perhaps the permission hierarchy should have something lower than 'guest' called 'everyone', the opposite of 'nobody'.)

…

On Wed, Jan 13, 2021 at 8:04 AM Nathan Wallach ***@***.***> wrote: One strategy to impede cheating by copy/paste would be to disable pasting and similar things for the input field. There are questions where copy-paste can be used legitimately, ex. when complicated expressions are needed multiple times in a question. Ex change of variables in a question on integration - where we can ask for the new "ranges" and later use them as bounds of integration, and may ask for the Jacobian in advance and then it may be part of the integrand. I would not want to see copy-pasted disabled by default. Certainly, if we can set it as an option at the course level, that is something to consider. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Alex Jordan Mathematics Instructor Portland Community College

dpvc · 2021-01-13T16:37:39Z

I have made a PR (#518) to implement the alias and alternative token suggestions that I made earlier. It also adds the ability to convert the full-width Unicode characters to their ASCII equivalents (so there is no need to add those as alternatives directly). It may be worth trying that out to see what you think. There are course-wide configuration parameters for these features.

The aliasing took a bit of work, but the alternative tokens was pretty straight forward. It would be possible to separate out the aliasing and just do the other changes, if that is desired.

drgrice1 · 2021-01-13T16:44:12Z

I withdraw my reservations about this as regards to cheating and such. Your arguments to the contrary make a lot of sense. Furthermore, the instances where I have seen this and suspect foul play are very few.

I also would not want copy/paste disabled. I use that, and like @taniwallach I have cases where I have even told students to copy/past answers from other answer boxes.

drgrice1 · 2021-01-13T16:45:07Z

@dpvc: Thanks for the pull request. I will try to do some testing with it soon.

dpvc · 2021-01-13T16:52:27Z

The issue of Infinity printing as '∞' is easily handled by setting the context's infiniteWord flag:

Context()->flags->set(infiniteWord => "\x{221E}");

Of course, there may be other symbols that would want to be output using unicode characters (the cross product as U+00D7 rather than >< comes to mind) when the alternative tokens are being processed. There could be an alternativeString property that would be used when alternative input was being allowed (the string output should produce something that would be able to be entered as a correct answer, since it is used to create the correct answer string).

dpvc · 2021-01-13T16:58:46Z

I've added the webwork2 pull request (openwebwork/webwork2#1174) to add the configuration options to the course configuration page.

dpvc · 2021-01-13T17:02:34Z

Note: the MathObject parser would allow the alternative input tokens to be used in the PG problems themselves, but since this is controlled by a course-wide configuration option, if you actually used the alternatives, then your problem would become unusable if the option were turned off for a course.

We already have another example of something like this situation: whether log() is treated as log10() or ln() is controlled by a course parameter, so problems should never use just log(), but should always used log10() or ln() explicitly, since the meaning of log() can change without notice.

Perhaps that doesn't matter, but it is something to keep in mind.

taniwallach · 2021-01-13T17:33:05Z

We already have another example of something like this situation: whether log() is treated as log10() or ln() is controlled by a course parameter, so problems should never use just log(), but should always used log10() or ln() explicitly, since the meaning of log() can change without notice.

Perhaps that doesn't matter, but it is something to keep in mind.

It should be in the documentation somewhere reasonable prominent.

So should the log() issue.

taniwallach · 2021-01-19T22:25:22Z

@awmorp - Maybe take a look at the pull requests which @dvpc provided to address these issues.

taniwallach · 2021-03-03T21:31:17Z

Fixed by #518

dpvc mentioned this issue Jan 13, 2021

Add aliases to all context classes, and allow alternative tokens #518

Merged

dpvc mentioned this issue Jan 13, 2021

Add configuration options for parsing alternative input tokens and full-width unicode character support. openwebwork/webwork2#1174

Merged

taniwallach mentioned this issue Jan 19, 2021

Feature request: Add support for new specialPGEnvironmentVars settings to support Unicode alternatives drdrew42/renderer#35

Open

dpvc added the MathObject issue Issue involving MathObects code label Mar 1, 2021

taniwallach closed this as completed Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to support some standard math operators when included in answers using Unicode characters #517

Attempt to support some standard math operators when included in answers using Unicode characters #517

taniwallach commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

taniwallach commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

taniwallach commented Jan 9, 2021

dpvc commented Jan 9, 2021

drgrice1 commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021 via email

taniwallach commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

awmorp commented Jan 10, 2021

Alex-Jordan commented Jan 13, 2021 via email

taniwallach commented Jan 13, 2021

Alex-Jordan commented Jan 13, 2021 via email

dpvc commented Jan 13, 2021 •

edited

Loading

drgrice1 commented Jan 13, 2021

drgrice1 commented Jan 13, 2021

dpvc commented Jan 13, 2021

dpvc commented Jan 13, 2021

dpvc commented Jan 13, 2021

taniwallach commented Jan 13, 2021

taniwallach commented Jan 19, 2021

taniwallach commented Mar 3, 2021

Attempt to support some standard math operators when included in answers using Unicode characters #517

Attempt to support some standard math operators when included in answers using Unicode characters #517

Comments

taniwallach commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

taniwallach commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

taniwallach commented Jan 9, 2021

dpvc commented Jan 9, 2021

drgrice1 commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021 via email

taniwallach commented Jan 9, 2021

Alex-Jordan commented Jan 9, 2021

awmorp commented Jan 10, 2021

Alex-Jordan commented Jan 13, 2021 via email

taniwallach commented Jan 13, 2021

Alex-Jordan commented Jan 13, 2021 via email

dpvc commented Jan 13, 2021 • edited Loading

drgrice1 commented Jan 13, 2021

drgrice1 commented Jan 13, 2021

dpvc commented Jan 13, 2021

dpvc commented Jan 13, 2021

dpvc commented Jan 13, 2021

taniwallach commented Jan 13, 2021

taniwallach commented Jan 19, 2021

taniwallach commented Mar 3, 2021

dpvc commented Jan 13, 2021 •

edited

Loading