Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to support some standard math operators when included in answers using Unicode characters #517

Closed
taniwallach opened this issue Jan 9, 2021 · 23 comments
Labels
MathObject issue Issue involving MathObects code

Comments

@taniwallach
Copy link
Member

Students in my institution find it quite convenient to sometimes cut and paste pieces of equations shown using MathJax (and from external sites) into the input boxes. Many of them run into problems with Unicode characters being used for standard math operations and get an error message like:

Unexpected character '−'

Unexpected character 'ˆ'

I think we should try to have PG handle these types of Unicode character as alternate forms of the ISO-8859-1 / "keyboard" characters we would expect them to type by hand.

I'm not yet certain where we would do this, and whether it is best accomplished by a replacement in the answer string or by adding the Unicode character are a known math operator in MathObjects.

@dvpc Any thoughts on the best approach?

@Alex-Jordan
Copy link
Contributor

The error message I see from this kind of thing is a database error message. Something trying to record the submitted string to the database rejects it. The message is like this:

Error messages

    DBD::mysql::st execute failed: Incorrect string value: '\xE2\x88\x922' for column 'answer_string' at row 1 at /opt/webwork/webwork2/lib/WeBWorK/DB/Schema/NewSQL/Std.pm line 837. 

This was for submitting a −2.

@taniwallach
Copy link
Member Author

@Alex-Jordan Was the test course used created as a new course once WW 2.15 was installed on the server?

@Alex-Jordan
Copy link
Contributor

Ah, you got it. I just tried on a new course and do not get the error. Now that we're talking about this, I think I previously knew that new courses don't lead to the error, possibly with your help. But I since forgot.

OK, is it is as simple as just adding to the context as a subtraction and negation operator? I added these lines to an experiment problem:

Context()->operators->add(
'−' => {precedence => 1, associativity => 'left', type => 'both', string => '-',
           class => 'Parser::BOP::subtract', rightparens => 'same'},
'u−'=> {precedence => 6, associativity => 'left', type => 'unary', string => '-',
           class => 'Parser::UOP::minus', hidden => 1, allowInfinite => 1, nofractionparens => 1}

and it had the desired effect. The answer was 4, and I could enter 6-2 or −−4 and it was accepted.

Unless there would be bad side effects, we could add this (and other similar things) to Parser/Context/Default.pm, effectively adding them to the Numeric context, and then everything downstream that builds off of Numeric.

@Alex-Jordan
Copy link
Contributor

I do think I like your other idea better: converting to the keyboard version immediately upon receipt. That would avoid so much duplicate code, and give one place to clearly maintain all such conversions. Like in the year 2025 when some new emoji character is added that resembles an infinity sign, or whatever. But I do not know where that would be handled.

On the other hand, if these were handled the other way (as recognized by the context) then problem authors could use them too. For example, with a Mac, I can type option 5 to get ∞. So it would open the door to things like
$I = Interval("(-∞,∞)");
in code. A very very very small advancement in code readability.

@taniwallach
Copy link
Member Author

@Alex-Jordan Thanks for providing the suggestion. Do you know if we can get this to take effect via PGcourse.pl so that it will effect the context loaded later by a problem?

Otherwise, I may to to make changes to Parser/Context/Default.pm on a development server to test it across a range of problems.

About the "mapping" approach - I suspect that any mapping might need to be handled at the webwork2 level. Another downside of that approach is that we will probably be modifying what gets recorded in the database from what the student actually sent - which I do not think is ideal.

@dpvc
Copy link
Member

dpvc commented Jan 9, 2021

While the idea of adding more operators to the context would work, it has an important drawback beyond the additional code. If a context needs to be modified, that means you have to know all the versions of the operator that would need to be modified. For example, if you want to disable exponentiation, then you need to know all the characters that are tied to that (there currently are two versions, and that is problematic already). Many existing contexts modify the classes associated with existing operators, so you would need to go through an modify all of those to include any new characters that are added to the context. This seems impractical to me.

On the other hand, I would not want to see wholesale remapping of characters one to another outside of the context's control. There used to be some characters that WeBWorK would remove automatically (like dollar signs and other "special" characters), which made it impossible to write contexts that used those characters (like currency answers with dollar signs), and that was impossible to work around from within the problem. Making global replacements like this makes assumptions about the meanings of those characters in the context, and that is something I would not want to encourage.

I think the proper way is to extend the MathObject parser to allow operators (and the other items) to allow additional characters that also trigger the same operator. That is, you would only define - as minus, but it could have a list of additional forms that would also produce the same internal representation. That way, U+2212 would produce a - reference internally. The context would control that mapping, since it would be part of the definition of - in the context. The same thing could be done with U+221E for Infinity, U+2264 for <=, and so on. This would not require changes to any existing problems or contexts that modify copies of other contexts, as no new operators are being added. It would also allow the duplicates that already exist to be trimmed out (which would need to be handled carefully, as that could be a breaking change if not done properly).

I would not, however, map U+02C6 (circumflex accent) to ^. An accent should not be used for exponentiation, in my opinion. Would you also want to map U+0302 (combining circumflex accent) as well? I would not want to allow combining accents for this. How about U+FF3E (full-width circumflex accent)? U+2038 (caret)? Not everything that might look like the character you want necessarily has the same meaning.

All this may lead to requests for U+00B2 (superscript two) to be used for ^2, and so on (there are superscript parens and + and -, so would you want to be able to enter whole expressions that way? What about superscript n (U+207F)?). There are also subscript numbers, so would you want to allow those for entering _0, etc.? I think this becomes harder to handle.

@drgrice1
Copy link
Member

drgrice1 commented Jan 9, 2021

I am not entirely on board with supporting these characters. The majority of the time that students are entering these characters via cut and paste from other websites are cases where students are using websites to get answers to problems in ways that they should not be. I think that it would be better to fix the messages that students are given when these characters are encountered, but to still not accept it as correct. Instead of "Unexpected character '−'", give them some message about not using these methods. At least if we decide to support these sort of characters, it should be made a course option, or something like that.

@Alex-Jordan
Copy link
Contributor

Alex-Jordan commented Jan 9, 2021 via email

@taniwallach
Copy link
Member Author

@drgrice1 wrote:

I am not entirely on board with supporting these characters. The majority of the time that students are entering these characters via cut and paste from other websites ... At least if we decide to support these sort of characters, it should be made a course option, or something like that.

The Unicode minus can be copy-pasted from MathJax equations in the same question, which seems pretty legitimate and innocuous. However, I do agree that if this can be enabled/disabled at the site and course level - that would be optimal. Although there may be some small didactic benefit to forcing students to type their answers themselves, forcing them to do so will not stop some students from utilizing external aides which we would prefer and recommend they not use.

About @Alex-Jordan's suggestion about '∞': With MathQuill displaying an '∞' for infinity, it is somewhat hard to tell students that they must type "inf" and cannot paste in the "character" which will be displayed. In many questions it will be in the question text, ex. in the \lim_{n \to \infty} displayed by MathJax.

Another recent issue I have noticed in some support request was students using "mixed fraction" notation in answers, and WW treats the integer as multiplying the fraction. I think that is probably best handled by "educating" students about the expectations of syntax used. We have many local problems which explicitly remind students to use exp(kx) notation and not e^(kx) notation, as the prior behaves far better (in my experience) in the grading code, and fallback AnswerHints grading using the e^(kx) notation does not always accept an answer which works fine in exp(kx) form.

The question is not whether to draw some lines about what we want to accept and what not - but where to draw those lines, and to what extent it should be controllable by the course staff.

@dvpc:

  1. Thanks for the suggestion about how best to approach implementation. The idea of an existing operation having additional triggers sounds quite logical.
  2. About how far we want to take this - I would suggest we start small and decide as a community where to set the lines.
    - For now, the Unicode minus, Unicode infinity, seem a good place to start.
    - Maybe some Unicode relation symbols (\leq) are also something to consider early on.
    - As an aside, I have found Unicode symbols a nice way to get some "math syntax" into the options in parserPopUp.pl, which otherwise does not support math syntax.
  3. Regarding the "circumflex accent" - as I wrote I'm not sure where the student copied it from. Leaving it as not accepted is certainly a legitimate decision, and it bother me less than the Unicode minus, which seems a more justifiable character to be included in a student answer (as WW used it via MathJax).

@Alex-Jordan
Copy link
Contributor

Re: "starting small".

I have maybe had one or two instances ever of this happening with the or symbol.

We use the RestrictedDomain context a lot with answers like 1/x, x != 2. So there have been a few instances with .

But every other time this has come up (dozens of times), it has been the minus sign or infinity sign. So starting with those two things would go a long way. One is an operator, so could become the model for alternative operators. (Or I guess ^ would become the model, given **). The other is a string.

Now that I look it up, infinity is the actual string, and inf is implemented as inf => {alias=>'infinity'}. So aside from the configurability consideration, the alias approach that strings already use could be expanded. You could have like ** => {alias => '^'}.

@awmorp
Copy link

awmorp commented Jan 10, 2021

Another common one we have encountered are 'full width' versions of standard punctuation symbols, in particular:
, U+FF0C FULLWIDTH COMMA
( U+FF08 FULLWIDTH LEFT PARENTHESIS
) U+FF09 FULLWIDTH RIGHT PARENTHESIS
These are used on some Chinese language keyboards instead of the standard comma and parentheses, so when a student using a Chinese keyboard tries to type comma or parentheses, they get these 'fullwidth' versions instead, which Webwork doesn't recognise. I would vote for these being included as alternatives for comma and parentheses.

We have also seen cases where the student genuinely meant to enter an answer involving pi or infinity but didn't know the syntax so copied the unicode symbol from somewhere. It is also quite easy to generate symbols like π ∞ etc on modern systems, eg you can type '\pi' in MS Word to produce π then copy & paste it. So I think it is worth supporting common unicode symbols.

I too like the approach of each math operator having a list of alternate characters that trigger the same operation. I would like to see it configurable at a global or course level though (as well as at individual problem level), so that we can set up our desired symbols once rather than having to remember to set it up in every problem.

@Alex-Jordan
Copy link
Contributor

Alex-Jordan commented Jan 13, 2021 via email

@taniwallach
Copy link
Member Author

One strategy to impede cheating by copy/paste would be to disable pasting and similar things for the input field.

There are questions where copy-paste can be used legitimately, ex. when complicated expressions are needed multiple times in a question. Ex change of variables in a question on integration - where we can ask for the new "ranges" and later use them as bounds of integration, and may ask for the Jacobian in advance and then it may be part of the integrand.

I would not want to see copy-pasted disabled by default. Certainly, if we can set it as an option at the course level, that is something to consider.

@Alex-Jordan
Copy link
Contributor

Alex-Jordan commented Jan 13, 2021 via email

@dpvc
Copy link
Member

dpvc commented Jan 13, 2021

I have made a PR (#518) to implement the alias and alternative token suggestions that I made earlier. It also adds the ability to convert the full-width Unicode characters to their ASCII equivalents (so there is no need to add those as alternatives directly). It may be worth trying that out to see what you think. There are course-wide configuration parameters for these features.

The aliasing took a bit of work, but the alternative tokens was pretty straight forward. It would be possible to separate out the aliasing and just do the other changes, if that is desired.

@drgrice1
Copy link
Member

I withdraw my reservations about this as regards to cheating and such. Your arguments to the contrary make a lot of sense. Furthermore, the instances where I have seen this and suspect foul play are very few.

I also would not want copy/paste disabled. I use that, and like @taniwallach I have cases where I have even told students to copy/past answers from other answer boxes.

@drgrice1
Copy link
Member

@dpvc: Thanks for the pull request. I will try to do some testing with it soon.

@dpvc
Copy link
Member

dpvc commented Jan 13, 2021

The issue of Infinity printing as '∞' is easily handled by setting the context's infiniteWord flag:

Context()->flags->set(infiniteWord => "\x{221E}");

Of course, there may be other symbols that would want to be output using unicode characters (the cross product as U+00D7 rather than >< comes to mind) when the alternative tokens are being processed. There could be an alternativeString property that would be used when alternative input was being allowed (the string output should produce something that would be able to be entered as a correct answer, since it is used to create the correct answer string).

@dpvc
Copy link
Member

dpvc commented Jan 13, 2021

I've added the webwork2 pull request (openwebwork/webwork2#1174) to add the configuration options to the course configuration page.

@dpvc
Copy link
Member

dpvc commented Jan 13, 2021

Note: the MathObject parser would allow the alternative input tokens to be used in the PG problems themselves, but since this is controlled by a course-wide configuration option, if you actually used the alternatives, then your problem would become unusable if the option were turned off for a course.

We already have another example of something like this situation: whether log() is treated as log10() or ln() is controlled by a course parameter, so problems should never use just log(), but should always used log10() or ln() explicitly, since the meaning of log() can change without notice.

Perhaps that doesn't matter, but it is something to keep in mind.

@taniwallach
Copy link
Member Author

We already have another example of something like this situation: whether log() is treated as log10() or ln() is controlled by a course parameter, so problems should never use just log(), but should always used log10() or ln() explicitly, since the meaning of log() can change without notice.

Perhaps that doesn't matter, but it is something to keep in mind.

It should be in the documentation somewhere reasonable prominent.

So should the log() issue.

@taniwallach
Copy link
Member Author

@awmorp - Maybe take a look at the pull requests which @dvpc provided to address these issues.

@taniwallach
Copy link
Member Author

Fixed by #518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MathObject issue Issue involving MathObects code
Projects
None yet
Development

No branches or pull requests

5 participants