Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode equivalents of ASCII characters #690

Closed
christianp opened this issue Jun 2, 2020 · 16 comments
Closed

Unicode equivalents of ASCII characters #690

christianp opened this issue Jun 2, 2020 · 16 comments

Comments

@christianp
Copy link
Member

christianp commented Jun 2, 2020

A student typed 2ˆ3, which was not interpreted as valid. The character ˆ is a modifier, but looks very similar to the 'real' character, ^.
Numbas should consider ˆ to be a synonym of ^.

The dictionaries opSynonyms and funcSynonyms in runtime/scripts/jme.js map alternative names for operators and functions onto their canonical names. Add entries to these dictionaries mapping unicode symbols onto their equivalents.

A good place to find unicode symbols is graphemica.com

@NewmanJ1987
Copy link

Hi,
I'm new to numbas and would like to get involved with the development. Do you know how I can reproduce this ? I would love to work on this.

@christianp
Copy link
Member Author

@NewmanJ1987 thanks for offering! I added this character in commit 36654b3, but there are others that would be good to have if you want to help. For example, there are many characters that look like - which a student might use for 'minus'. Graphemica displays these nicely: https://graphemica.com/search?q=minus
You could add these characters as synonyms for -, following commit 36654b3 as a template.

@NewmanJ1987
Copy link

Sure that looks like a good entry task. Thanks.

@christianp christianp reopened this Jun 30, 2020
@christianp christianp changed the title ˆ is "modifier letter circumflex accent", not the normal circumflex Unicode equivalents of ASCII characters Jun 30, 2020
@christianp
Copy link
Member Author

It looks like String.normalize can do a lot of the work.

@abhijeetsharma200
Copy link

Hi, I am new to open source and would like to contribute to this issue if it is unassigned. However, I am unsure how you would like me to implement String.normalize given that currently synonyms are hard coded in a dictionary in the variable opSynonyms.

@christianp
Copy link
Member Author

@abhijeetsharma200 String.normalize is a built-in method in JavaScript. One way it could be used would be in Numbas.jme.Parser.tokenise, to normalise expr before tokenising it.

@grplyler
Copy link

Hi there! I am looking to contribute to open source for Hacktoberfest 2020. Is this issue still up for grabs?

@christianp
Copy link
Member Author

We should also accept the mathematical alphanumeric symbols, which are used by MathJax.

@christianp
Copy link
Member Author

@grplyler sorry I didn't reply earlier - yes, this issue is still open, and there are plenty of easy things you can add in. Please submit a pull request!

christianp added a commit that referenced this issue Apr 28, 2021
see #824

The boolean `Numbas.jme.caseSensitive` controls whether name comparison
is case-sensitive.

The function Numbas.jme.normaliseName converts a name to lower-case if
necessary. This function could do other stuff, too, such as normalising
unicode characters (#690).
christianp added a commit that referenced this issue Sep 27, 2022
I've added all the left and right parenthesis characters that I could
find on graphemica to Parser.(left|right)_parentheses. They're all
synonyms for the ASCII parenthesis characters.

see #690, and thanks to maths/moodle-qtype_stack#860 for pointing them
out!
@christianp
Copy link
Member Author

There are fullwidth equivalents of lots of punctuation marks, such as U+FF0C "Fullwidth Comma"

@christianp
Copy link
Member Author

christianp commented Mar 28, 2023

We had a student using a fullwidth parenthesis: (

(But that's already supported, so that's not the bug I was looking for!)

@christianp
Copy link
Member Author

We have had a spate of students who wrote Greek letters as unicode and whose expressions were marked incorrect, because Numbas doesn't consider e.g. theta and θ to be equivalent.

@sangwinc
Copy link

@christianp, we already have this issue and I plan to implement something this summer: maths/moodle-qtype_stack#860 Would you be interested in sharing lists of unicode between our projects on this issue, perhaps with a JSON file of "known equivalents"?

@christianp
Copy link
Member Author

@sangwinc - good idea! I'm now looking at typing out a big list of character mappings. The generic Unicode normalisation algorithms don't really help, because they ignore some differences that are mathematically significant, or don't consider equivalent some things that would be convenient for us. I think we have to do it character-by-character (or character-class by character class, at least)

@christianp
Copy link
Member Author

christianp commented Mar 28, 2023

This big list of LaTeX to unicode mappings that I made for mathstodon, based on unicodeit.net, might help: https://github.com/christianp/mastodon/blob/mathstodon-4.1.0/app/javascript/mastodon/features/compose/util/autolatex/data.js

@christianp
Copy link
Member Author

I'm working on this at https://github.com/numbas/unicode-math-normalization. I've produced a set of files giving explicit mappings from some Unicode characters to JME syntax, and identified some things that can be normalized using the standard normalization algorithm.

Tomorrow I'll try to integrate this with the JME parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants