Unicode equivalents of ASCII characters #690

christianp · 2020-06-02T13:59:02Z

A student typed 2ˆ3, which was not interpreted as valid. The character ˆ is a modifier, but looks very similar to the 'real' character, ^.
Numbas should consider ˆ to be a synonym of ^.

The dictionaries opSynonyms and funcSynonyms in runtime/scripts/jme.js map alternative names for operators and functions onto their canonical names. Add entries to these dictionaries mapping unicode symbols onto their equivalents.

A good place to find unicode symbols is graphemica.com

The text was updated successfully, but these errors were encountered:

NewmanJ1987 · 2020-06-07T01:03:23Z

Hi,
I'm new to numbas and would like to get involved with the development. Do you know how I can reproduce this ? I would love to work on this.

christianp · 2020-06-08T12:42:00Z

@NewmanJ1987 thanks for offering! I added this character in commit 36654b3, but there are others that would be good to have if you want to help. For example, there are many characters that look like - which a student might use for 'minus'. Graphemica displays these nicely: https://graphemica.com/search?q=minus
You could add these characters as synonyms for -, following commit 36654b3 as a template.

NewmanJ1987 · 2020-06-09T13:35:07Z

Sure that looks like a good entry task. Thanks.

christianp · 2020-06-30T11:03:06Z

It looks like String.normalize can do a lot of the work.

abhijeetsharma200 · 2020-08-24T20:51:37Z

Hi, I am new to open source and would like to contribute to this issue if it is unassigned. However, I am unsure how you would like me to implement String.normalize given that currently synonyms are hard coded in a dictionary in the variable opSynonyms.

christianp · 2020-08-25T07:09:26Z

@abhijeetsharma200 String.normalize is a built-in method in JavaScript. One way it could be used would be in Numbas.jme.Parser.tokenise, to normalise expr before tokenising it.

grplyler · 2020-09-25T12:58:13Z

Hi there! I am looking to contribute to open source for Hacktoberfest 2020. Is this issue still up for grabs?

christianp · 2020-10-23T10:16:47Z

We should also accept the mathematical alphanumeric symbols, which are used by MathJax.

christianp · 2020-10-23T10:17:21Z

@grplyler sorry I didn't reply earlier - yes, this issue is still open, and there are plenty of easy things you can add in. Please submit a pull request!

see #824 The boolean `Numbas.jme.caseSensitive` controls whether name comparison is case-sensitive. The function Numbas.jme.normaliseName converts a name to lower-case if necessary. This function could do other stuff, too, such as normalising unicode characters (#690).

I've added all the left and right parenthesis characters that I could find on graphemica to Parser.(left|right)_parentheses. They're all synonyms for the ASCII parenthesis characters. see #690, and thanks to maths/moodle-qtype_stack#860 for pointing them out!

christianp · 2022-10-12T12:30:36Z

There are fullwidth equivalents of lots of punctuation marks, such as U+FF0C "Fullwidth Comma"

christianp · 2023-03-28T07:22:57Z

We had a student using a fullwidth parenthesis: （

(But that's already supported, so that's not the bug I was looking for!)

christianp · 2023-03-28T07:43:34Z

We have had a spate of students who wrote Greek letters as unicode and whose expressions were marked incorrect, because Numbas doesn't consider e.g. theta and θ to be equivalent.

sangwinc · 2023-03-28T07:49:24Z

@christianp, we already have this issue and I plan to implement something this summer: maths/moodle-qtype_stack#860 Would you be interested in sharing lists of unicode between our projects on this issue, perhaps with a JSON file of "known equivalents"?

christianp · 2023-03-28T07:52:06Z

@sangwinc - good idea! I'm now looking at typing out a big list of character mappings. The generic Unicode normalisation algorithms don't really help, because they ignore some differences that are mathematically significant, or don't consider equivalent some things that would be convenient for us. I think we have to do it character-by-character (or character-class by character class, at least)

christianp · 2023-03-28T08:30:36Z

This big list of LaTeX to unicode mappings that I made for mathstodon, based on unicodeit.net, might help: https://github.com/christianp/mastodon/blob/mathstodon-4.1.0/app/javascript/mastodon/features/compose/util/autolatex/data.js

christianp · 2023-03-29T14:49:02Z

I'm working on this at https://github.com/numbas/unicode-math-normalization. I've produced a set of files giving explicit mappings from some Unicode characters to JME syntax, and identified some things that can be normalized using the standard normalization algorithm.

Tomorrow I'll try to integrate this with the JME parser.

christianp added the good first issue label Jun 2, 2020

christianp closed this as completed in 36654b3 Jun 9, 2020

christianp reopened this Jun 30, 2020

christianp changed the title ~~ˆ is "modifier letter circumflex accent", not the normal circumflex~~ Unicode equivalents of ASCII characters Jun 30, 2020

christianp mentioned this issue Nov 19, 2020

Add annotation var for .texNameAnnotations path #771

Closed

christianp mentioned this issue Dec 17, 2020

Allow more letters in variable names #787

Closed

This was referenced Nov 29, 2021

Support superscripts ² and ³ #868

Closed

Document the JME parser's synonyms #871

Closed

christianp mentioned this issue Jul 18, 2022

Allow umlauts in variable names #936

Closed

georgekinnear mentioned this issue Sep 27, 2022

Support for non-standard unicode symbols maths/moodle-qtype_stack#860

Closed

christianp closed this as completed in 6abf5dc Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode equivalents of ASCII characters #690

Unicode equivalents of ASCII characters #690

christianp commented Jun 2, 2020 •

edited

Loading

NewmanJ1987 commented Jun 7, 2020

christianp commented Jun 8, 2020

NewmanJ1987 commented Jun 9, 2020

christianp commented Jun 30, 2020

abhijeetsharma200 commented Aug 24, 2020

christianp commented Aug 25, 2020

grplyler commented Sep 25, 2020

christianp commented Oct 23, 2020

christianp commented Oct 23, 2020

christianp commented Oct 12, 2022

christianp commented Mar 28, 2023 •

edited

Loading

christianp commented Mar 28, 2023

sangwinc commented Mar 28, 2023

christianp commented Mar 28, 2023

christianp commented Mar 28, 2023 •

edited

Loading

christianp commented Mar 29, 2023

Unicode equivalents of ASCII characters #690

Unicode equivalents of ASCII characters #690

Comments

christianp commented Jun 2, 2020 • edited Loading

NewmanJ1987 commented Jun 7, 2020

christianp commented Jun 8, 2020

NewmanJ1987 commented Jun 9, 2020

christianp commented Jun 30, 2020

abhijeetsharma200 commented Aug 24, 2020

christianp commented Aug 25, 2020

grplyler commented Sep 25, 2020

christianp commented Oct 23, 2020

christianp commented Oct 23, 2020

christianp commented Oct 12, 2022

christianp commented Mar 28, 2023 • edited Loading

christianp commented Mar 28, 2023

sangwinc commented Mar 28, 2023

christianp commented Mar 28, 2023

christianp commented Mar 28, 2023 • edited Loading

christianp commented Mar 29, 2023

christianp commented Jun 2, 2020 •

edited

Loading

christianp commented Mar 28, 2023 •

edited

Loading

christianp commented Mar 28, 2023 •

edited

Loading