-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Proper highlighting for the Wolfram Language (Mathematica) #2706
WIP: Proper highlighting for the Wolfram Language (Mathematica) #2706
Conversation
It seems Travis fails because now many languages are detected as Mathematica. Is there a way I can provide further meta-data that defines how to detect it correctly? |
30kb measured how? Raw size is irrelevant. What matters (some) is Brotoli or gzip size.
Off the top of my head: no. This is pretty much unmaintainable - unless there was a whole build system internal to Highlight.js that took a human readable list, built the trie by hand, etc... and I don't think adding such a complex beast to core (just to support a single large language) doesn't make sense. Out of curiosity what does that code to build the regex look like? Is this a 5-10 line snippet or a huge processing library? Although if you wanted to maintain your own 3rd party grammar module then you could do it however you wanted.
No idea, that's something we'd likely want some real numbers on before even considering something like this for core (if we were ever going to).
There are helpers in |
You have look at what it's matching and tone down the match count or the relevance... |
@joshgoebel Thanks for your very thorough replies. Let me answer some of your comments:
Gzipping the trie regex results in 43kB while the list of keywords would be 47kB. Not sure if this difference justifies the usage of a trie.
The human-readable list is not going to be maintainable either. No one is editing 6659 keywords by hand and for all highlighters I've worked on (intellij, google-prettify, rogue), the list was always created automatically through Mathematica. Everything else required for creating the regex is small, open-source and well documented. I've put the exact commands used in the comments. It only relies on a package of mine that I use for the Wolfram Language IntelliJ plugin, where the code for creating the trie is written in Kotlin. The gist is that you will need Mathematica either way to maintain the list of keywords, and the rest is open-source.
The reason I'm working on it is StackOverflow/StackExchange :)
I understand that. However, for Mathematica we're not really having different "variants". It's more like that you can prepend or append different things, e.g. if you have
Alright, I'll check the API. Question:How should I move on regarding the large list of keywords? Should I use a simple list or the trie regex? I'd also try bring some more people to this PR to get different opinions because in the end, it should be nice for the users. @CarlQLange already agreed to discuss matters. |
Yeah for such a tiny difference we definitely want a clean inline list of keywords.
Typically for longer lists (that are easily generated by the host language or script) we'll inline that code in the grammar itself (as a comment - usually these are only a few short lines of code)... so that the keywords are still listed and readable in the source, but maintainers can easily run the snippet themselves, generate a new list, copy and paste that into the file. Then we have clear readable diffs for what has changed from one release to another.
Nothing prevents them from using 3rd party grammar modules. :-)
I don't know what this means... doesn't it automatically get included with the switch to Highlight.js... just it's being "loaded as needed" rather than bundled - which makes sense if most sites aren't going to use it (only the Mathematica Stack). What am I missing here? I mean I obviously get the desire to improve it... but you lost me with "merged".
If there aren't truly variants the yes we'd split it up into more easy to read (and maintain) chunks so that reading it at a glance it's obvious what the component parts are, etc... rather than just a wall of regex as it stands now.
Simple list, and it should originally be an array, one word per line (for maintainability/git diffs/etc - see |
And once we fall back to using a simple list you should be able to use the built in There is an |
Great! Then I'll know how to move and hope I'll find some time.
StackOverflow is still in the process of deciding how regularly they will update their version of highlight.js. The most upvoted user-wish on the official announcement however is that they do this more regularly. So I guess they will pull new versions of highlight.js to their side. I don't know specifics but here is the comment of the mod regarding the question "can the highlighter be updated more often?":
The only thing I do know is that SO will not provide Mathematica highlighting for the whole network but only for our StackExchange site because even in google-prettiy days, our highlighter was huge compared to others. Here is the semi-official statement about this.
Yep, I got that. Use an array. Each keyword a line for diffing and joining it to a string in the code. Thanks again, I really appreciate your input. |
For matching built-in symbols, a sorted list of all names in the System context is now provided in a separate file which can be automatically recreated. The required Mathematica code is given in the comment. As suggested by @joshgoebel, the regular expression for matching Mathematica's numbers is now broken into (slightly more) readable chunks. To provide features we had on StackExchange with google-pretty, more rules were added. This includes - explicit matching of operators and braces - matching of patterns and slots - matching of message names aka func::usage This implementation requires additional CSS classes, but looks reasonable on the standard styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is awesome! :-)
src/languages/mathematica.js
Outdated
import * as regex from '../lib/regex.js'; | ||
|
||
// @ts-ignore | ||
export default function(highlightJS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this to match every other grammer, hjls
, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. It wasn't a (camel-case) word and was flagged by my spell-check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other grammer, hjls, etc
This is why I renamed it to something readable.. even you mistyped it 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legacy, consistency, etc... and it's to match the fact that hljs
is our global. :-) Not something we're going to change at this time. :)
/* | ||
This dangerously looking beast of a regex was carefully assembled by Robert Jacobson. | ||
See: https://wltools.github.io/LanguageSpec/Specification/Syntax/Number-representations/ | ||
This rather scary looking matching of Mathematica numbers is carefully explained by Robert Jacobson here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Much nicer.
src/languages/mathematica.js
Outdated
const number_re = /(\d*\.\d+|\d+\.\d*|\d+)/; | ||
const base_number_re = regex.either(regex.concat(base_re, base_digits_re), number_re); | ||
|
||
const accuracy_re = /``[+-]?(\d*\.\d+|\d+\.\d*|\d+)/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ALL_CAPS_CASE for constants like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... so there is actually a pattern here... you're using lowercase for the building blocks but then cap case for the modes? Could you explain the thinking? I just realized it was mixed and now I'm trying to understand it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference here. You guessed the idea correctly: lowercase for building blocks and upper case for the things that go into the final return value. I can make all uppercase if this is the convention for JS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make all uppercase if this is the convention for JS.
I think it's a common convention and definitely one we typically follow. I like it now that I get it, but sadly it's inconsistent with every other grammar so I think we could make them all caps. Some grammars do end up with their own "micro-conventions" but overall I try and keep the style the same as they are part of a collection.
src/languages/mathematica.js
Outdated
$pattern: symbol_re, | ||
keyword: Mathematica.SYSTEM_SYMBOLS.join(" ") | ||
}, | ||
end: /[a-zA-Z0-9$]*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this all be a single match with begin
? I see that the first character can be different but traditionally we'd love this with a single match regex in begin
to make it clear we're matching a single term and that this isn't a block match with nested rules, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give a more elaborate explanation in a separate comment. The summary is that I constantly ran into two issues:
- When I define keywords like all languages do at the top-level and all other symbols (that have the same regex) in the
contains:
section, I didn't get highlighting for the keywords because the "other symbols matching" took precedence. - I ran into "0 match" exceptions more than I'd like to admit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I define keywords like all languages do at the top-level and all other symbols (that have the same regex) in the contains: section, I didn't get highlighting for the keywords because the "other symbols matching" took precedence.
I think a specific example would help... but each level has it's own keywords so once you drop into a contains you no longer have keywords from the parent so if you ALSO wanted the keywords to kick in you'd need to re-include them at that level.
It's possible you're inventing a new useful pattern, but I've have to see the exact problem you're trying to solve to comment further - so a more detailed example would be appreciated. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into "0 match" exceptions more than I'd like to admit
Just avoid any regex that can make a 0 width match. :-) That means complex regex with 20 optional parts (maybe digit) need to be pinned down to ONE core match... (like ALWAYS matching \d somewhere, etc)... often this can result in variants. ie\d(\.\d)?
and .\d
vs [+-]?(\d)?(\.\d)
. This problem originally dates back to a complex digit match with all optional regex groups.
src/languages/mathematica.js
Outdated
className: 'symbol', | ||
begin: new RegExp("\\\\\\[(?:A(?:Acute|Bar|Cup|DoubleDot|E|Grave|Hat|Ring|Tilde|kuz|l(?:eph|i(?:as(?:Delimiter|Indicator)|gnmentMarker)|pha|tKey)|n(?:d(?:y)?|g(?:le|strom))|quariusSign|riesSign|scendingEllipsis|uto(?:LeftMatch|Operand|Placeholder|RightMatch|Space))|B(?:ackslash|e(?:amed(?:EighthNote|SixteenthNote)|cause|t(?:a)?)|lack(?:Bishop|K(?:ing|night)|Pawn|Queen|Rook)|reve|ullet)|C(?:Acute|Cedilla|Hacek|a(?:ncerSign|p(?:ital(?:A(?:Acute|Bar|Cup|DoubleDot|E|Grave|Hat|Ring|Tilde|lpha)|Beta|C(?:Acute|Cedilla|Hacek|hi)|D(?:Hacek|elta|i(?:fferentialD|gamma))|E(?:Acute|Bar|Cup|DoubleDot|Grave|Ha(?:cek|t)|psilon|t[ah])|Gamma|I(?:Acute|Cup|DoubleDot|Grave|Hat|ota)|K(?:appa|oppa)|L(?:Slash|ambda)|Mu|N(?:Hacek|Tilde|u)|O(?:Acute|Double(?:Acute|Dot)|E|Grave|Hat|Slash|Tilde|m(?:ega|icron))|P(?:hi|i|si)|R(?:Hacek|ho)|S(?:Hacek|ampi|igma|tigma)|T(?:Hacek|au|h(?:eta|orn))|U(?:Acute|Double(?:Acute|Dot)|Grave|Hat|Ring|psilon)|Xi|YAcute|Z(?:Hacek|eta))|ricornSign)?)|e(?:dilla|nt(?:er(?:Dot|Ellipsis))?)|h(?:eck(?:edBox|mark(?:edBox)?)|i)|ircle(?:Dot|Minus|Plus|Times)|l(?:o(?:ckwiseContourIntegral|seCurly(?:DoubleQuote|Quote)|verLeaf)|ubSuit)|o(?:lon|mmandKey|n(?:ditioned|gruent|jugate(?:Transpose)?|stantC|t(?:inu(?:ation|edFractionK)|ourIntegral|rolKey))|p(?:roduct|yright)|unterClockwiseContourIntegral)|ross|u(?:p(?:Cap)?|r(?:l(?:y(?:CapitalUpsilon|Epsilon|Kappa|P(?:hi|i)|Rho|Theta))?|rency)))|D(?:Hacek|a(?:gger|let|sh)|e(?:gree|l(?:eteKey|ta)?|scendingEllipsis)|i(?:am(?:eter|ond(?:Suit)?)|fferen(?:ceDelta|tialD)|gamma|rectedEdge|s(?:cret(?:e(?:Ratio|Shift)|ionary(?:Hyphen|LineSeparator|Pa(?:geBreak(?:Above|Below)|ragraphSeparator)))|tributed)|v(?:ergence|i(?:de(?:s)?|sionSlash)))|o(?:t(?:Equal|less[IJ]|tedSquare)|uble(?:ContourIntegral|D(?:agger|o(?:t|wnArrow))|L(?:eft(?:Arrow|RightArrow|Tee)|ong(?:Left(?:Arrow|RightArrow)|RightArrow))|Prime|Right(?:Arrow|Tee)|Struck(?:A|B|C(?:apital[ABCDEFGHIJKLMNOPQRSTUVWXYZ])?|D|E(?:ight)?|F(?:ive|our)?|G|H|I|J|K|L|M|N(?:ine)?|O(?:ne)?|P|Q|R|S(?:even|ix)?|T(?:hree|wo)?|U|V|W|X|Y|Z(?:ero)?)|Up(?:Arrow|DownArrow)|VerticalBar|d(?:Gamma|Pi))|wn(?:Arrow(?:Bar|UpArrow)?|Breve|Exclamation|Left(?:RightVector|TeeVector|Vector(?:Bar)?)|Pointer|Question|Right(?:TeeVector|Vector(?:Bar)?)|Tee(?:Arrow)?)))|E(?:Acute|Bar|Cup|DoubleDot|Grave|Ha(?:cek|t)|arth|ighthNote|l(?:ement|lipsis)|mpty(?:Circle|D(?:iamond|ownTriangle)|Rectangle|S(?:et|mall(?:Circle|Square)|quare)|UpTriangle|VerySmallSquare)|nt(?:erKey|ity(?:End|Start))|psilon|qu(?:al(?:Tilde)?|i(?:librium|valent))|rrorIndicator|scapeKey|t[ah]|uro|x(?:ists|p(?:ectationE|onentialE)))|F(?:i(?:Ligature|lled(?:Circle|D(?:iamond|ownTriangle)|LeftTriangle|R(?:ectangle|ightTriangle)|S(?:mall(?:Circle|Square)|quare)|UpTriangle|VerySmallSquare)|nalSigma|rstPage|vePointedStar)|l(?:Ligature|at|orin)|or(?:All|mal(?:A(?:lpha)?|B(?:eta)?|C(?:apital(?:A(?:lpha)?|B(?:eta)?|C(?:hi)?|D(?:elta|igamma)?|E(?:psilon|ta)?|F|G(?:amma)?|H|I(?:ota)?|J|K(?:appa|oppa)?|L(?:ambda)?|M(?:u)?|N(?:u)?|O(?:m(?:ega|icron))?|P(?:hi|i|si)?|Q|R(?:ho)?|S(?:ampi|igma|tigma)?|T(?:au|heta)?|U(?:psilon)?|V|W|X(?:i)?|Y|Z(?:eta)?)|hi|urly(?:CapitalUpsilon|Epsilon|Kappa|P(?:hi|i)|Rho|Theta))?|D(?:elta|igamma)?|E(?:psilon|ta)?|F(?:inalSigma)?|G(?:amma)?|H|I(?:ota)?|J|K(?:appa|oppa)?|L(?:ambda)?|M(?:u)?|N(?:u)?|O(?:m(?:ega|icron))?|P(?:hi|i|si)?|Q|R(?:ho)?|S(?:ampi|igma|tigma)?|T(?:au|heta)?|U(?:psilon)?|V|W|X(?:i)?|Y|Z(?:eta)?))|re(?:akedSmiley|eformPrompt)|unction)|G(?:amma|eminiSign|imel|othic(?:A|B|C(?:apital[ABCDEFGHIJKLMNOPQRSTUVWXYZ])?|D|E(?:ight)?|F(?:ive|our)?|G|H|I|J|K|L|M|N(?:ine)?|O(?:ne)?|P|Q|R|S(?:even|ix)?|T(?:hree|wo)?|U|V|W|X|Y|Z(?:ero)?)|r(?:a(?:dient|y(?:Circle|Square))|eater(?:Equal(?:Less)?|FullEqual|Greater|Less|SlantEqual|Tilde)))|H(?:Bar|a(?:cek|ppySmiley)|e(?:artSuit|rmitianConjugate)|orizontalLine|ump(?:DownHump|Equal)|yphen)|I(?:Acute|Cup|DoubleDot|Grave|Hat|m(?:aginary[IJ]|pli(?:citPlus|es))|n(?:dentingNewLine|finity|linePart|te(?:gral|rsection)|visible(?:Application|Comma|P(?:ostfixScriptBase|refixScriptBase)|Space|Times))|ota)|Jupiter|K(?:appa|e(?:rnelIcon|yBar)|oppa)|L(?:Slash|a(?:mbda|placian|stPage)|e(?:ft(?:A(?:ngleBracket|rrow(?:Bar|RightArrow)?|ssociation)|BracketingBar|Ceiling|Do(?:ubleBracket(?:ingBar)?|wn(?:TeeVector|Vector(?:Bar)?))|Floor|Guillemet|Modified|Pointer|Right(?:Arrow|Vector)|Skeleton|T(?:ee(?:Arrow|Vector)?|riangle(?:Bar|Equal)?)|Up(?:DownVector|TeeVector|Vector(?:Bar)?)|Vector(?:Bar)?)|oSign|ss(?:Equal(?:Greater)?|FullEqual|Greater|Less|SlantEqual|Tilde)|tterSpace)|i(?:braSign|ghtBulb|mit|neSeparator)|o(?:ng(?:Dash|Equal|Left(?:Arrow|RightArrow)|RightArrow)|wer(?:LeftArrow|RightArrow)))|M(?:a(?:rs|thematicaIcon|xLimit)|e(?:asuredAngle|diumSpace|rcury)|ho|i(?:cro|n(?:Limit|us(?:Plus)?))|o(?:d(?:1Key|2Key)|on)|u)|N(?:Hacek|Tilde|a(?:nd|tural)|e(?:gative(?:MediumSpace|Thi(?:ckSpace|nSpace)|VeryThinSpace)|ptune|sted(?:GreaterGreater|LessLess)|utralSmiley)|o(?:Break|nBreakingSpace|r|t(?:C(?:ongruent|upCap)|DoubleVerticalBar|E(?:lement|qual(?:Tilde)?|xists)|Greater(?:Equal|FullEqual|Greater|Less|SlantEqual|Tilde)?|Hump(?:DownHump|Equal)|Le(?:ftTriangle(?:Bar|Equal)?|ss(?:Equal|FullEqual|Greater|Less|SlantEqual|Tilde)?)|Nested(?:GreaterGreater|LessLess)|Precedes(?:Equal|SlantEqual|Tilde)?|R(?:everseElement|ightTriangle(?:Bar|Equal)?)|S(?:quareSu(?:bset(?:Equal)?|perset(?:Equal)?)|u(?:bset(?:Equal)?|cceeds(?:Equal|SlantEqual|Tilde)?|perset(?:Equal)?))|Tilde(?:Equal|FullEqual|Tilde)?|VerticalBar)?)|u(?:ll|mberSign)?)|O(?:Acute|Double(?:Acute|Dot)|E|Grave|Hat|Slash|Tilde|m(?:ega|icron)|p(?:enCurly(?:DoubleQuote|Quote)|tionKey)|r|ver(?:Brac(?:e|ket)|Parenthesis))|P(?:a(?:geBreak(?:Above|Below)|r(?:agraph(?:Separator)?|tialD))|er(?:mutationProduct|pendicular)|hi|i(?:ecewise|scesSign)?|l(?:aceholder|u(?:sMinus|to))|r(?:ecedes(?:Equal|SlantEqual|Tilde)?|ime|o(?:babilityPr|duct|portion(?:al)?))|si)|QuarterNote|R(?:Hacek|awEscape|e(?:gisteredTrademark|turn(?:Indicator|Key)|verse(?:DoublePrime|E(?:lement|quilibrium)|Prime|UpEquilibrium))|ho|ight(?:A(?:ngle(?:Bracket)?|rrow(?:Bar|LeftArrow)?|ssociation)|BracketingBar|Ceiling|Do(?:ubleBracket(?:ingBar)?|wn(?:TeeVector|Vector(?:Bar)?))|Floor|Guillemet|Modified|Pointer|Skeleton|T(?:ee(?:Arrow|Vector)?|riangle(?:Bar|Equal)?)|Up(?:DownVector|TeeVector|Vector(?:Bar)?)|Vector(?:Bar)?)|ound(?:Implies|SpaceIndicator)|u(?:le(?:Delayed)?|pee))|S(?:Hacek|Z|a(?:dSmiley|gittariusSign|mpi|turn)|c(?:orpioSign|ript(?:A|B|C(?:apital[ABCDEFGHIJKLMNOPQRSTUVWXYZ])?|D(?:otless[IJ])?|E(?:ight)?|F(?:ive|our)?|G|H|I|J|K|L|M|N(?:ine)?|O(?:ne)?|P|Q|R|S(?:even|ix)?|T(?:hree|wo)?|U|V|W|X|Y|Z(?:ero)?))|e(?:ction|lectionPlaceholder)|h(?:a(?:h|rp)|iftKey|ort(?:DownArrow|LeftArrow|RightArrow|UpArrow))|i(?:gma|xPointedStar)|keletonIndicator|mallCircle|p(?:a(?:ce(?:Indicator|Key)|deSuit|nFrom(?:Above|Both|Left))|hericalAngle|ooky)|q(?:rt|uare(?:Intersection|Su(?:bset(?:Equal)?|perset(?:Equal)?)|Union)?)|t(?:ar|e(?:pper(?:Down|Left|Right|Up)|rling)|igma)|u(?:bset(?:Equal)?|c(?:ceeds(?:Equal|SlantEqual|Tilde)?|hThat)|m|n|perset(?:Equal)?)|ystem(?:EnterKey|sModelDelay))|T(?:Hacek|a(?:bKey|u(?:rusSign)?)|ensor(?:Product|Wedge)|h(?:e(?:refore|ta)|i(?:ckSpace|nSpace)|orn)|i(?:lde(?:Equal|FullEqual|Tilde)?|mes)|r(?:a(?:demark|nspose)|ipleDot)|woWayRule)|U(?:Acute|Double(?:Acute|Dot)|Grave|Hat|Ring|n(?:d(?:er(?:Brac(?:e|ket)|Parenthesis)|irectedEdge)|ion(?:Plus)?|knownGlyph)|p(?:Arrow(?:Bar|DownArrow)?|DownArrow|Equilibrium|Pointer|Tee(?:Arrow)?|per(?:LeftArrow|RightArrow)|silon)|ranus)|V(?:e(?:ctor(?:Greater(?:Equal)?|Less(?:Equal)?)|e|nus|r(?:tical(?:Bar|Ellipsis|Line|Separator|Tilde)|yThinSpace))|i(?:lla|rgoSign))|W(?:a(?:rningSign|tchIcon)|e(?:dge|ierstrassP)|hite(?:Bishop|K(?:ing|night)|Pawn|Queen|Rook)|olf(?:ram(?:AlphaPrompt|LanguageLogo(?:Circle)?))?)|X(?:i|nor|or)|Y(?:Acute|DoubleDot|en)|Z(?:Hacek|eta))]") | ||
begin: /\\\[/, | ||
end: /[$a-zA-Z][$a-zA-Z0-9]+]/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note above regarding using just begin
for simple matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it was the version that worked and I'm happy to get a helping hand if this can be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible I don't undersatnd what you're matching but if it's a SINGLE unit then:
begin: /\\\[[$a-zA-Z][$a-zA-Z0-9]+]/,
Should work equally well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know what you're trying to solve now, but just give me a code example and I'll take a closer look.
src/languages/mathematica.js
Outdated
begin: /([a-zA-Z$][a-zA-Z0-9$]*)?_+([a-zA-Z$][a-zA-Z0-9$]*)?/, | ||
keywords: { | ||
$pattern: symbol_re, | ||
strong: Mathematica.SYSTEM_SYMBOLS.join(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain this variant and why it's keywords would be "strong".... typically we try to use semantic use of classes, not visual... so this seems strange or else I simply don't understand what is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll post a screenshot. Basically, when we have a parameter of a function, Mathematica users love to see this in green. When this parameter has a type-specification like Integer
which is a keyword, it should stay green but bold like keywords. This decision is not final and I'd like to discuss this with the community. It's quite possible that the keywords
section in this goes away entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this parameter has a type-specification like Integer which is a keyword, it should stay green but bold like keywords.
But that's why we have "keyword" and variants - if it's a special type of keyword then we need to find the proper variant, not overload strong
. strong
simply means "strong/bold text" (as in something like Markdown or perhaps Latex)... also it's not always rendered bold. Themes can choose to render it however, and sometimes just use a color. That's what I mean when I say that these styles are semantic, not visual.
Your job (as the grammar) is to describe what something IS semantically, not describe how it should appear on the screen.
Perhaps semantically it's an 'important keyword' or something... and that's something I'm open to (see the issue for more nuanced styling), but we wouldn't do it by overloading "strong"...
@joshgoebel Let me try to explain the high-level problem that led to the awkward
These two forms can be mixed. So you can have a variable So what I want to have is:
I implemented all your suggestions on a separate branch and, for the love of God, it works. So in particular, I put the whole regex in the One thing I don't understand: I still need the Do you have an idea, why this happens? I know that Your other question was concerning the usage of So the parameter |
Glad to hear it. :)
The default keyword pattern doesn't include $ I don't think and pattern isn't magically recursive... so anywhere you have a keyword list you're going to need to pass $pattern also. Keywords are found by FIRST executing a match with $pattern and THEN seeing if any of those matches are in the keyword list. So that $pattern is correct very important for languages with special characters in their keywords.
Lists of keywords (as passed to
I know what you're getting at but I have a very hard time even parsing this - because the parsing step is VERY different from the highlighting.
Right - and it would be, and if a theme says "keywords" are orange, then it would be orange. That means some themes MIGHT be friendlier to some languages than other themes... simply because of language semantics. Since we much support an arbitrary number of themes (including user defined ones) the only way we can solve this properly is with proper semantic tags - and then themes being aware of them. It's possible what's needed here semantically is a Ie, this is a theme problem, not a grammar problem. |
The closest comparison (I can think of) would be to Fortran where we have highlighting like:
The type simply being part of the number, not trying to a apply a different style to it. |
Of course that only solves your problem with Stack Overflow... if the general parsing looks "ugly/wrong" for most themes, then that's a separate issue and probably means the grammar may have to be artificially "crippled" (vs what might be considered perfection) to better fit with the larger world of themes. I'd suggest trying:
The latter would probably be most similar to many other languages (where the |
OK, you convinced me. I'm opting for this solution
because the underscore must be green. You know how users are. |
This is the option I would go for, personally. I'd also note that inside the front-end the pattern spec (e.g. the |
There might also need to be a larger discussion here if you're inventing new css classes that are not documented in We may have to fall back to generics and then look at expanding them along with the other issue relating to more complex grammar support. To start could you make a list of what any custom "classes" you have are, what they are (short explanation) and the closest thing that they would match with in our existing set of semantic classes? |
Even this type of talking/thinking is problematic. :-) For Stack overflow it may be green (if your ultimately in charge of the theming)... but it will definitely not be green as a general rule... it'll be whatever color the theme decides arguments are, etc. |
I'm fine with that for now. Though it's possible someone raises an issue in the future and it gets changed (and I would probably support that)... that's why if a site wants a very specific look it's best that the grammar handles the semantics and the site handles the theming. Ie, it might be more "future proof" for the grammar to flag it as what it is and let site owners decide how they want to theme it... but I'm happy revisiting that on a different day. :) I just tried it as "keyword" with several themes and it looked a bit strange, but not terrible... but also hard to say since I already find the code very strange and weird to begin with. :-) If I was more familiar with Mathematica I might be a stronger opinion here. :-) |
We understand that, of course, but 90% of Mathematica users don't. It's a weird programming community composed of the usual SWE types, but also a majority of people who use it as a fancy graphing program and who only ever see it in Wolfram's front-end. This means that to the best of our ability we want to get Stack Overflow to display it the way those users would see it so that we can have a maximally useful forum for that. If someone somewhere else decides to change the theme, who cares, but we want to make it possible for it to look right if we can convince SO of the utility. |
FYI: I type a lot because I'm verbose and try to be thorough, it isn't an attempt to overwhelm with more words than you. :-) I've been told sometimes it comes off that way. |
Sure, I think I understand the GOAL. :-) It just doesn't fit with HLJS as a generic highlighter for 200+ languages... to get what you want really involves controlling the grammar and theme tightly.
In this case perhaps you should consider #2 then:
And if someone finds that it looks "weird" with other themes, then we revisit THAT in the future. :) I found it "tolerable"... |
BTW from a pure semantic perspective when we have an expression like myFunc[i_, int_Integer, r_Real, q_?(Internal`RealValuedNumericQ)]:=...; that Pattern[int, Blank[Integer]] and so the From a formal grammar perspective the entire pattern expression syntax can be given as name:pat:default -> Pattern[name, Optional[pat, default] and |
Add
Sure, but we'd want to just tell it the type of the data (the whole variable is a Mode), it doesn't need to be annotated on-site. Resolved. Does this resolve on your side also? I've been a lot stricter with the type annotations with the source in the past than the grammars themselves.
Wait, I think I lost the thread about what we're talking bout here. :) |
By my count:
Without looking it's easy to find themes that blend I expect most sites/implementors typically use a single theme (or two) and if they have issues/disagreements with that theme then they can customize it. (using CSS or not aliases)
This kind of seals the deal for me... these are indeed more built-ins than they are keywords. Your reasoning from a purely visual perspective.
I think this is very dependent on how you define "better results". When your results become "the opposite of what the theme author intended" that is not better. I get that your heart is in the right place here, but I feel this is short-term thinking... perhaps it does make your theme slightly better today but it costs the whole library tomorrow in terms of muddier semantics, theme authors not being certain of what things mean what in which grammars, etc... ...now a theme author who already clearly expressed "really i want numbers and built-ins to look the same" (for whatever reason) has to add a special case CSS to handle Mathematica because you pulled the rug right out from underneath them by calling your built-ins keywords. If there is a true issue with our themes here then we need to work with the theme authors (or a visual designer) to actually fix the themes, not just let grammars one at a time redefine the semantic meaning of the terms in order to achieve a certain "look". This is the road to insanity. This probably also touches on the broader long-term initiative of higher fidelity grammars. We could surely use some help here, but the solution isn't grammars just going rogue like this. I will spin this off into a new issue. |
@halirutan Lets focus on the remaining technical issues here (if any). Are they any? I'm still not sure what your reference was to If you want to respond regarding the theming (built_in vs keyword) please continue that discussion over in the new thread I started. |
@joshgoebel I think it's good to go now. |
@joshgoebel Ah, didn't you want to turn on autodetect? I tested it locally already and it seems that it doesn't break anything. |
….js into WIP_Wolfram_Language
@halirutan Thanks for all the great work on this!!! :-) Hopefully it serves the SE community (and others) well. |
This was before our new PR template... could you update the changelog (this might be worthy of several bullets if you wanted, you decide)... And if you wanted to provide me a short summary for the squashed commit that'd be great also. |
|
||
A list of additional names (besides the canonical one given by the filename) that can be used to identify a language in HTML classes and in a call to :ref:`getLanguage <getLanguage>`. | ||
|
||
|
||
classNameAliases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allejo Any thoughts on this naming? It seems clear to me... Other ideas were nesting, but that seems more complex:
themes: { aliases: {}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with classNameAliases
. Another suggestion would be styleClassAliases
(or even styleAliases
) which makes it a bit clearer what we're talking about. You should decide this having the newbie user in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
className
has meaning though because it's the key we use to specify such things already... making className and classNameAliases consistent.
@joshgoebel Would something like this work - enh(mathematica) Rework entire implementation [Patrick Scheibe][]
- Correct matching of the many variations of Mathematica's numbers
- Matching of named-characters aka special symbols like `\[Gamma]`
- Updated list of version 12.1 built-in symbols
- Matching of patterns, slots, message-names and braces How does the linking of author name to their GitHub page work? I guess it requires that the PR is merged first and then it automatically retrieves it from people who have committed in the repo? Never seen this before. For squashed commit, you could use something like this which details the technical points more: Fix several issues and implement additional features for the Wolfram Language (Mathematica)
- Include an up-to-date list of built-in symbols in a separate `lib/mathematica.js` file. It's one keyword per line and more easy to maintain.
- Fix regexp to identify symbols/variables which requires special treatment and does not follow the common `IDENT_RE` matching.
- Replace generic `C_NUMBER_MODE` matching with dedicated regular expressions for all possible numbers in Mathematica.
- Include named-characters in the matching of symbols.
- Allow for dedicated styling of
- pattern-like forms, e.g. `par_String`
- slots of anonymous functions, e.g. `##3`
- message names, e.g. `myFunc::usage`
- braces, curly braces and brackets
- Introduce `classNameAliases` to map specific styles to general styles used by all themes. This allows for using built-in themes and writing sophisticated Mathematica themes. |
No magic, it's just a footnote link... see lines 35-41, etc... I'm gonna noodle on this a bit more to see if I come up with a better name for the alias stuff (or anyone comments). This should get merged in the next day or two though! :-) |
Huge thanks to both of you! 👏👏👏👏 |
@joshgoebel Thanks a bunch and thanks for guiding me so well along the way. I really enjoyed our discussions! |
Are there any common patterns from other languages that are ILLEGAL in Mathematica so we can add an
Often times comment patterns can be good illegals for this... |
This is work-in-progress and not ready for merging, but I wanted to start a discussion about what is feasable/wanted for highlight.js. The Wolfram Language (WL) is one of those languages that are very different in some points. Most notably, it has close to 7000 built-in symbols and none of the usual "keywords" that other languages have. I rewrote the entire Mathematica (it's the same as "WL", don't ask, it's confusing) highlighter and here are the major changes:
I'll attach an image of a more realistic test-case at the end, but first I'll have some questions:
begin:
and specifying the regex there? I mean, it works but what do I know?Any tips are appreciated.