Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MathJax Arabic extension #20

Merged
merged 1 commit into from
Jun 15, 2016
Merged

MathJax Arabic extension #20

merged 1 commit into from
Jun 15, 2016

Conversation

OmarIthawi
Copy link
Contributor

@OmarIthawi OmarIthawi commented Dec 23, 2015

This pull request will be updated to get the Arabic MathJax solution from the original hack.

In this pull request the following will be done:

  • Convert the hack into a reusable Contrib extension
  • Split into three (or more) modules
    • Making the TeX jax RTL aware and allow defining lang-dependent commands
    • Making the HTML-CSS jax renders RTL well
    • Add Math-specific language-depedents units to the TeX
    • Add Physics-specific language-depedents units to the TeX
  • Make the language detection mechanism configurable for the \ar command.
    Note: Now it is hard-coded in the the lang attribute of the tag.
  • Support the Limit \lim (Unlike \sum and \int, this requires a completely different drawing)

I will not try to make a one-size fits all RTL solution, instead I will try to do the following:

  • Essential RTL support for Arabic
  • Arabic support through custom font (basic font support for now)
  • Localized physics units, variables, constants and others
  • Make the solution as modular as possible to make future expansions for other language possible

Extra Features:

  • Provide a new LaTex command that chooses between Arabic and English string depending on the page's language e.g. \transt{Radius}{نق} for text and \trans{\Pi}{\text{باي}} for generic TeX.

Pending TODOs from @dpvc 1st review:

  • Test MML.chars and MML.entity instead of override mi, mo and ms

    Result: Couldn't get it running, and resorted back to overriding mi, mo and ms.

  • Handle AMSmath array/tables

  • Test with the \tag{} for equations.

  • Refactor the AlignedArray and this.stack.env.lang to avoid the three nested loops.

@pkra
Copy link
Contributor

pkra commented Dec 23, 2015

👍 awesome to see your work move forward!

@OmarIthawi
Copy link
Contributor Author

OmarIthawi commented Dec 23, 2015

Thanks Peter :)

@OmarIthawi
Copy link
Contributor Author

Hi @pkra I've make some progress and modularized my extension. However I'm using functions as modules. Do you think it's wise to use Startup hooks and events to declare modules dependency in extensions or is that incorrect?

My goal is to put each module in a separate file and perhaps $ cat arabic/unpacked/*.js > arabic/arabic.js then uglify it.

I'm not a big fan of huge JS files.

What do you think?

@pkra
Copy link
Contributor

pkra commented Jan 11, 2016

cc @dpvc (traveling this week)

@pkra
Copy link
Contributor

pkra commented Jan 11, 2016

(to clarify: I'm traveling this week.)

@OmarIthawi
Copy link
Contributor Author

@pkra 👍

@dpvc
Copy link
Member

dpvc commented Jan 18, 2016

Do you think it's wise to use Startup hooks and events to declare modules dependency in extensions or is that incorrect?

It is correct to use those signals and hooks. That's what they are there for. I haven't looked closely at your code, but it looks like the right thing to me.

@OmarIthawi
Copy link
Contributor Author

Thanks @dpvc :)

Will do it and update the PR :)

@OmarIthawi
Copy link
Contributor Author

@pkra Looks like I've completed most of the work, if not all of it.

It has been heavily refactored. Now it should be much easier to understand and to extend.

It would be great if someone review it, and perhaps merge it ^_^.

I will be focusing on bringing this version into edX platform so we can test it and hopefully use it for our students.

@OmarIthawi
Copy link
Contributor Author

Hi @pkra and @dpvc,

I guess this is now ready for review. Could you please take a look and let me know if there are any issues?

@pkra
Copy link
Contributor

pkra commented Feb 22, 2016

@OmarIthawi awesome -- we'll take a look!

@OmarIthawi
Copy link
Contributor Author

Thanks @pkra 😄

@pkra
Copy link
Contributor

pkra commented Mar 8, 2016

Sorry we dropped the ball on this, @OmarIthawi.

@dpvc PTAL?

@OmarIthawi
Copy link
Contributor Author

No problem Peter :)

@dpvc
Copy link
Member

dpvc commented Mar 22, 2016

@OmarIthawi, first of all, thanks for all your efforts in working on this, and putting it into shape for the 3rd-party repository. We really appreciate what you have done.

I do have some comments on the code, however. I will make some general comments here, and then add in-line comments to the code about specific issues. I know that receiving comments on your code can be traumatic, so please don't take what I say as criticism. We are glad to be working with you on this.

First, I should point out that the difference between the unpacked and packed versions of the code in MathJax is that the packed versions use YUIcompressor to remove white-space and comments, and reduce the code size via variable name substitutions (from long names to short ones) and other code-analysis techniques. Since this compressed version becomes nearly unreadable, the uncompressed version is retained for development purposes. So the uncompressed copy usually is a working copy of the code with the same file structure as the compressed version, but with all the white-space and comments in it.

You seem to have the uncompressed version as a collection of separate files that are combined into your "compressed" version. While I understand that setup, it is not what we usually do in MathJax, and so it is somewhat unusual in that respect. While you are free to manage your code however you wish, I did want to point out that this is not consistent with the rest of how MathJax works.

Second, I'm not sure I understand your use of MathJax.Hub.Config(), here. This function usually is used by the page author to configure MathJax, so that the author's options override the defaults in the various modules. But the author's calls to MathJax.Hub.Config() will come before your code is loaded (since it is through MathJax.Hub.Config() that the author indicates that your extension should be loaded), and so when your code runs, it will override any settings the author has made for your extension. I'm not sure if you want the author to be able to override the defaults (like the identifiers map or the operators map), but it seems like these would be good things to allow. The way things are currently set up, however, your values will overwrite any changes made by the page author.

The usual way to do this in MathJax would be to create an object in the MathJax.Extension object to hold your extension, and include a config property where your configuration is stored. You can then use MathJax.Hub.CombineConfig() to merge the author's configuration into your default configuration. Something like

MathJax.Extension.Arabic = {
  version: "1.0.0",
  config: MathJax.Hub.CombineConfig("Arabic", {
    ... default configuration goes here ...
  });
}

This makes it a bit harder to break your default configuration into separate pieces in separate files, but it could be done, if you really want to do that.

I will make further comments in the code itself.

Arabic: {
identifiersMap: {
// Sets operations, and other stuff
'A': 'أ',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably use "\u0623" rather than 'أ', since the latter depends on the page encoding, while the former doesn't. Scripts are loaded using the encoding of the page (last time I checked), so this would require the page to be UTF-8, and we don't know what encoding may be used on the page. So it is best to use explicit unicode code points via \uXXXX instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ✓ Sure, I can do that. I might do it as a postprocessing step along with the other gulp steps that I have.

It is really handy for me to see the original Arabic text instead of unicode points. In production this is not a requirement, so I can post process this.

@dpvc
Copy link
Member

dpvc commented May 12, 2016

OK, I've had the chance to go through your code in detail, and overall, I like the changes you have made. There are still a few items to consider, which I will mark in the code itself. Most of these are minor (often issues of style), and you can ignore them if you wish. But a couple are important.

You were correct to be concerned about the replacement for the Push() method. It turns out that the issue you were having with the arrays is actually a bug in the TeX input jax and how it handles the stack.env object. I haven't fixed it yet, but have opened an issue tracker for it. Your hack does work around that, but is a bit too aggressive. I suggest an alternative in the code.

In the previous round of comments, I had suggested that you replace the raw Unicode characters by \uXXXX references, so that your file will not be dependent on the encoding of the page which you don't control). You mentioned doing that in post-processing, but it doesn't seem to have been done. Are you still planning that? If you really want to see the unicode characters, you could put them in comments, where I think they will not be a problem even in a different encoding.

className += ' mar'; // Keep the leading space
}

flipElement.className = className;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code from lines 96 to 102 could be simplified. There is really no need for a separate variable for this, and the initial space is not needed in line 96 (despite the comment) since there is no previous className value to worry about. You could just do

flipElement.className = 'mfliph';
if ('ar' === this.arabicFontLang) flipElement.className += ' mar';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@OmarIthawi
Copy link
Contributor Author

I might take longer than I thought to work on the amendments you've suggested.

In the previous round of comments, I had suggested that you replace the raw Unicode characters by \uXXXX references, so that your file will not be dependent on the encoding of the page which you don't control). You mentioned doing that in post-processing, but it doesn't seem to have been done. Are you still planning that? If you really want to see the unicode characters, you could put them in comments, where I think they will not be a problem even in a different encoding.

I intentionally left the unpacked version without encoding these characters. However the packed version has them encoded correctly. I mainly did this to enable easier debugging (which is what the unpacked versions for, right?).

If you still think that both of them needs this. I don't mind adding it to the build script.

@dpvc
Copy link
Member

dpvc commented May 17, 2016

I intentionally left the unpacked version without encoding these characters. However the packed version has them encoded correctly.

OK, great. I didn't check the minified version. You are right that the un-minified version is mostly for debugging, but ideally it would be the same code as the minified one (modulo the minification), so having one with unicode characters and the other with \uXXXX does mean that might cause different errors (one would only work in UTF-8 encoding and the other in all encodings). So that is a potential point of confusion. So my preference is to use \uXXXX in both, with comments having the actual unicode character for reference. E.g.,

    'a': '\u0623',    //  أ

so that you get the best of both worlds.

But it is your code, so you should do what seems best for you. I am only making suggestions.

@OmarIthawi
Copy link
Contributor Author

I had to work on a many internal tasks. I have time now to get back to this PR.

OK, great. I didn't check the minified version. You are right that the un-minified version is mostly for debugging, but ideally it would be the same code as the minified one (modulo the minification), so having one with unicode characters and the other with \uXXXX does mean that might cause different errors (one would only work in UTF-8 encoding and the other in all encodings). So that is a potential point of confusion. So my preference is to use \uXXXX in both, with comments having the actual unicode character for reference. E.g.,

I ended up escaping it in both the packed and the unpacked version, while keeping it only in the source code.

@OmarIthawi
Copy link
Contributor Author

@dpvc Thank you so much for your time and comments. I have addressed all the amendments you've suggested. Please let me know if you have any additional comments.

Thanks to you the extension is now much better.

I have tested the extension so far on 85 test cases (publically accessible), and it looks good so far.

The other good news, we're already using it on Edraak.org for more than 15,000 enrollee in the first course:

screen shot 2016-06-12 at 5 09 30 pm

@pkra
Copy link
Contributor

pkra commented Jun 13, 2016

It's looking great, @OmarIthawi!


var escapeRegExp = (function () {
var reRegExpChar = /[\\^$.*+?()[\]{}|]/g;
var reHasRegExpChar = new RegExp(reRegExpChar.source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is no longer used, and can be removed.

@dpvc
Copy link
Member

dpvc commented Jun 14, 2016

Thanks for all your work on this. I did note two minor items (an unused variable that can be removed, and a left-over console.log() message). Other than that, I think it is good to go. I really appreciate your willingness to work through the issues that we had with the code, and am very excited about having this arabic extension be part of the third-party repository!

@pkra
Copy link
Contributor

pkra commented Jun 14, 2016

@OmarIthawi let me know if you want to update this PR or if I should merge -- finally! 🎆

@OmarIthawi OmarIthawi changed the title (WIP) MathJax Arabic extension MathJax Arabic extension Jun 15, 2016
@OmarIthawi
Copy link
Contributor Author

Thanks a lot David for your detailed review and feedback. It truly helped me to get the extension done correctly.

It is my pleasure to work with you and contribute to such great and highly beneficial project like MathJax.

I have fixed the two issues, and finally now we're good to go.

@pkra pkra merged commit 837ea36 into mathjax:master Jun 15, 2016
@pkra
Copy link
Contributor

pkra commented Jun 15, 2016

Woohoo!

@pkra
Copy link
Contributor

pkra commented Jun 15, 2016

The extension should now be available on the CDN's contrib path (check http://cdn.mathjax.org/mathjax/contrib/arabic/arabic.js)

@pkra
Copy link
Contributor

pkra commented Jun 15, 2016

Thanks again for this amazing contribution, @OmarIthawi!

@OmarIthawi
Copy link
Contributor Author

🎉 🎉 🎉 🎉 🎉

Thank for your efforts 😀

🎉 🎉 🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants