Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More nuance for match classes (class title? function title? etc.) #2521

Closed
joshgoebel opened this issue Apr 30, 2020 · 24 comments
Closed

More nuance for match classes (class title? function title? etc.) #2521

joshgoebel opened this issue Apr 30, 2020 · 24 comments
Labels
discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature parser theme

Comments

@joshgoebel
Copy link
Member

joshgoebel commented Apr 30, 2020

Table of some specific suggestions:

Name Description
title.class A class name
title.function A function name
title.namespace A namespace name
comment.block A block comment
meta.shebang A shebang line for a script file

Is your request related to a specific problem you're having?

Yes, I'd like to have a way to label/highlight names of a given type outside of their original context.

For example:

class Cat extends Pet { ... 

Turns into <class><title>Cat</title><keyword>extends</keyword><title>Pet</title>. And if we wanted to style it a certain way we could target .hljs-class .hljs-title. Although Pet is semantically different here and that's not indicated at all.

The same is true for functions:

function cleanMyRoom() {}

Turns into <function><title>cleanMyRoom</title... and hence could be targeted by a theme with .hljs-function .hljs-title... but now we have the usage of these things outside of their definition:

var kitty = new Cat();
Cat.prototype.pet = function() { ... }

In both cases [re: Cat] we can easily figure out this is a "class" object and we could apply title to it... but is it a class? a namespace? a function name? We've lost all context.

And for functions:

let launchTheRockets = () => { ...

Technically launchTheRockets is a name/title here (and we could easily detect that and highlight)... but it's not the function itself (which is () => {...}... so again we have a name without context... and we're forced to fall back to the generic title.

And this problem get worse for languages that also have namespaces that could also be syntactically inferred... all we have is title and perhaps context at the definition site - although we have no namespace class...

The solution you'd prefer / feature you'd like to see added...

I think we may need to consider some new classes, or expanding existing ones with sub classes:

  • Alternative 1: New classes class_name, function_name, namespace_name
  • Alternative 2: Expand existing classes with subclass: title.function, title.class, title.namespace...

The way alternative 2 would unfolding during rendering CSS might look something like:

<span class="hljs-title hljs-title-function">launchTheRockets</span>

IE, the existing "broader" class is preserved. This would require no changes to any existing styles, while allowing new themes (or upgrades) to target items with more specificity. I'm a little more attached to system 2 as it seems more easily "backwards compatible".

Any alternative solutions you considered...

Nope, I outlined my two thoughts above.

Additional context...

I spent some time reviewing how Prism handles this and it has a lot more named options for tracking what "kind of thing" a given identifier might be:

  • constant
  • class-name
  • function-name (though most grammars use function)
  • namespace

This allows it to make some really nice distinctions when highlighting (like treating classes differently than function name, etc).

@joshgoebel joshgoebel added enhancement An enhancement or new feature theme parser discuss/propose Proposal for a new feature/direction labels Apr 30, 2020
@joshgoebel joshgoebel changed the title [Proposal] Improve our ability to highlight names distinctly [Proposal] Improve our ability to highlight important identifiers distinctly Apr 30, 2020
@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 30, 2020

Option 2 also leaves a lot of runway for expanding other classes:

  • variable
  • variable.declaration
  • title.constant
  • title.function
  • function.invocation
  • meta.doctype
  • meta.shebang
  • comment.block
  • number.hex
  • literal.boolean

This allows grammars to start slowly becoming more expressive while also still thinking in terms of our initial list of classes to avoid all our existing themes from suddenly becoming invalid overnight. Some of these would even come for "free" to existing grammars simply by our upgrading the mode library to be more explicit, such as "comment.block"...

@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 30, 2020

I should note that technically it's possible to do this today in a grammar:

      { // ES6 class
        className: 'class', beginKeywords: 'class', ...,
        contains: [
          { skipBefore:/extends\s+/, begin: IDENT_RE, className: "title title-class title-class-super" },
          hljs.inherit(hljs.UNDERSCORE_TITLE_MODE, {className: "title title-class"})
        ]
      },

Ok, scratch that ALMOST possible (or possible if you want to hack the CSS a bit)... we only add the hljs prefix at the beginning of the string so you end up with:

hljs-title title-class

If someone called this a bug though I'm not sure I'd disagree. But I'd like to formalize this and come up with naming guidance so that over time we can see themes start to taken advantage of these things.

@joshgoebel
Copy link
Member Author

joshgoebel commented Apr 30, 2020

I get tripped up on things like function.call/function.invoke vs title.function-call. I think one theme might decide it's nice to highlight function invocations while another might decide that's too noisy.

So filing them all under title.function is technically correct in both cases, yet semantically not enough information to be truly useful.

@joshgoebel joshgoebel pinned this issue Apr 30, 2020
@joshgoebel joshgoebel changed the title [Proposal] Improve our ability to highlight important identifiers distinctly [Proposal] Add additional nuance to highlight classes (Ex: class title? function title? constant title?) Apr 30, 2020
@joshgoebel joshgoebel changed the title [Proposal] Add additional nuance to highlight classes (Ex: class title? function title? constant title?) Propose: More nuance for match classes (class title? function title? etc.) Apr 30, 2020
@egor-rogov
Copy link
Collaborator

Nice idea.
I've had a look into the classes reference and saw names like meta-keywords and meta-string, that indicates a desire for some hierarchical system.

@joshgoebel
Copy link
Member Author

Yes, meta is the only thing we have broken up like that. It's a very good idea. TextMate goes a little overboard with it... going 3-5 levels deep - you can get VERY nuanced. :-)

And if we did it the way I'm suggesting above we could technically switch those to meta.keyword and meta.string and they'd output would still be exactly the same as it's always been.

@joshgoebel
Copy link
Member Author

Oh Actually we have builtin-name also, which pairs very poorly with built_in (dash vs underscore, separation vs one word)... so our naming now isn't the most consistent. :-)

@egor-rogov
Copy link
Collaborator

I'm not sure you really need that level of details in highlighting, especially given that you can't 100% correctly parse source code with regexps... But consistency in classes naming (and usage) certainly won't hurt us (:

@joshgoebel
Copy link
Member Author

Oh I wasn't suggesting we go that crazy, but it is easily possible for us to do things like contextually detect differences (in many cases) from a class name and say a function name, etc. So those are the easy wins.

@TheElectronWill
Copy link

TheElectronWill commented Aug 17, 2020

Being more precise is a good idea!
Is there a reason why Pet in class Cat extends Pet is labeled as a "title"? It does seem wrong to me: what we are declaring here is Cat, this is the title, this is what is important here. Pet should be a type IMHO.

@joshgoebel
Copy link
Member Author

joshgoebel commented Aug 17, 2020

Just how it's always been done? OTTOMH I think your logic sounds right. It's also still a class though, just not the one being defined... that might make it different than say string, not sure.

@joshgoebel
Copy link
Member Author

joshgoebel commented Oct 18, 2020

The title below would also become name.function as grammars were updated.

Screen Shot 2020-10-18 at 8 18 06 AM

return options.noHighlightRe.test(languageName);
//             \             ^-- name.function
//              `-- variable.property

Spending some time thinking about this... and it seems "variable.property" and "name.function" would add a lot of fidelity (often requested) and also be fairly easy to pattern recognize for at least most C-like languages. So instead of adding these and no themes having support, and then perhaps waiting forever (or living with only a very few themes that are "enhanced")... I think we could try our best to reuse the existing styling (even if it's not perfect)... so then immediately existing styles pick up the new highlighting and if someone wants to tweak them... I think this might cause some slight pain in the immediate future but have the best long-term result. So here is what I'm currently thinking:

  • title renamed to "name" => name, name.class, name.function, etc.
  • variable adds nuance => variable, variable.property, variable.constant, etc.
  • normal (unhighlighted) used to auto-generate => punctuation (~85% opacity)
  • string or number used to auto-generate => operator (tweaked opacity and small hue shift)

All existing themes would receive the new styles via programatic updates during the build process, ie these new styles will simply be added when missing. If a theme wants to opt-out (or override the defaults) they need only add the missing selector to the style to prevent the auto-upgrade from adding a selector.

The "hierarchy" here would be handled by the build process... for example a CSS rule for .name would be auto-expanded to cover .name-function (name.function), .name-class (name.class), etc... I like the . notation, so I'd probably support that in the grammars, but we wouldn't necessarily have to - could just use the older - syntax there. So the generated HTML would include only a single class:

class <span class="name-class">Person</span> 

The alternative (for hierarchy) is generating a bunch of extra HTML which doesn't seem like a good idea to me. Rather a few additional selectors than a bunch of new HTML.

Thoughts? @allejo @egor-rogov

@joshgoebel
Copy link
Member Author

At this point I think we could also perhaps consider allowing enhanced grammars to define their own semantic subgroups... Ie we dictate the top-level domain space "name" but then might allow name.whatever if 100% of programmers in that language would agree whatever is the appropriate contextual naming. This could also be solved with language aliases though (and is probably how we should do it)... So the fictional "Bicycle" language would just define an alias from "name.bicycle" to say "name.class"... then they could use the more descriptive name.bicycle when writing their grammar... this is a trick Prism uses to great effect.

@joshgoebel
Copy link
Member Author

One final note about what I mean when I say:

I think this might cause some slight pain in the immediate future

I worry that some (not all) of our themes may have skewed towards being optimized for "scarcity"... ie perhaps "title" pops more than it should because this class was handed out more rarely in the past. That some themes may not handle the sudden flood of color as well as others... but again I think this could be dealt with on a case by case basis - even using the auto-generation logic to purposely "tune" or "exclude" some themes that got flagged as doing very poorly. This isn't an entirely new problem though, I noticed this a bit with the recent change to start highlighting class functions, etc.

@Torxed
Copy link

Torxed commented Feb 26, 2021

Not sure if this would catch:

handle = WithContext()

For instance? It would be nice to be able to highlight the class instantiation of WithContect() in this case.
Much like github does here. Compared to highlight js:
image

@joshgoebel
Copy link
Member Author

I'm not sure what language that is meant to be... in many languages there isn't enough absolutely information in such a snippet to know WithContext is a constructor vs just any old function...

@Torxed
Copy link

Torxed commented Feb 27, 2021

I'm not sure what language that is meant to be... in many languages there isn't enough absolutely information in such a snippet to know WithContext is a constructor vs just any old function...

It was a subset of a larger code. Since higlight.js clearly detects function definitions and classes, it aught to be able to detect the initation/calls to those definitions? And by doing so add more nuance and highlight to the initiation calls.
The broader example where the snippet was taken from, was a Python code:

class WithContext():
	def __init__(self):
		self.value = 5

	def __enter__(self):
		return self

	def __exit__(self, *args, **kwargs):
		if args and args[1] == KeyError:
			pass
		return True

	def multiply(self):
		return self.value * 2

	def crash(self):
		raise KeyError("crashed")

for i in range(100000000):
	handle = WithContext()
	handle.multiply()

@joshgoebel
Copy link
Member Author

joshgoebel commented Feb 27, 2021

Since higlight.js clearly detects function definitions and classes, it aught to be able to detect the initiation/calls to those definitions?

We are not a full language parser and we therefore (with small exceptions) do not track any context or parsing state. Once we highlight class WithContext on line 1 we forget about it entirely and when we see WithContext later it has no special meaning (apart from "immediate" context). This is by design as we are a pattern matching highlighter at heart.

Now some languages you have "immediate" context such as JS:

let x = new Widget();

It's possible for us to understand what Widget is here in Javascript (with mere pattern matching) because it's immediately proceeded by new. In other languages this can be determined by casing alone. ThisIsAClass, THIS_IS_A_CONSTANT, this_is_something_else...

But that's not possible in all languages.

@joshgoebel
Copy link
Member Author

I've mentioned elsewhere that I'd be curious to see how we might allow 3rd party grammars to integrate a full parser of their own and pair that parser with our own output engine - but that is not a desire for the core library.

@Torxed
Copy link

Torxed commented Feb 28, 2021

I understand, I just started using highlight.js and came across this issue when I wanted to highlight the classes/functions in the code, not just the definition. I get that the parsing might be hell so I fully understand if that might never be implemented. It's fine the way it is :)

@joshgoebel
Copy link
Member Author

I wanted to highlight the classes/functions in the code, not just the definition

If all the classes in Python are camel case by STRONG convention it's still possible with just patten matching, and I think we'd be open to such a PR.

@joshgoebel
Copy link
Member Author

joshgoebel commented Mar 26, 2021

See #3078. We already have classes with - and _ in them:

  • template-tag
  • built_in
  • etc...

Since . isn't allowed in class names I wonder if we need a unique sequence for the new sub scope support? IE: title.class turns into what (in it's most expanded form)?

Right now I'm using .title-class... but I wonder if perhaps we shouldn't a bit grander:

  • .title--class
  • .title__class

@Turnerj
Copy link

Turnerj commented Mar 27, 2021

These classes will still have the leading hljs- right? If so probably hljs-title--class over hljs-title__class (in my opinion).

@joshgoebel joshgoebel changed the title Propose: More nuance for match classes (class title? function title? etc.) More nuance for match classes (class title? function title? etc.) Apr 5, 2021
@joshgoebel
Copy link
Member Author

This may still change again because of the discussion happening over in PrismJS/prism#2849.

joshgoebel added a commit that referenced this issue Apr 9, 2021
Removed the builtin-name replacing it with built_in in the few (4-5)
grammars it was used. Also only 4 themes made any distinction between
built_in and builtin-name, the rest choosing to style these exactly the
same - so it's never been clear what the difference is or there is none.

If more nuance is needed here it should be addressed via #2500 and #2521.

In only one grammar was both built_in and builtin-name used side by
side so other than that (in combo with very particular themes) this
should be a quiet change.
@joshgoebel
Copy link
Member Author

joshgoebel commented May 14, 2021

Progress has been made on this with v11 (see all the new advanced scopes, etc) and I'm closing this and considering it folded into #2500 which will remain open.

@joshgoebel joshgoebel unpinned this issue Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature parser theme
Projects
None yet
Development

No branches or pull requests

7 participants
@joshgoebel @Torxed @Turnerj @TheElectronWill @egor-rogov and others