Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: Change ignoreIllegals default to TRUE (or remove setting) #3149

joshgoebel opened this issue Apr 18, 2021 · 1 comment
discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature help welcome Could use help from community parser


Copy link

joshgoebel commented Apr 18, 2021

Context: #3148

A user was seeing bad behavior because of illegals causing their highlighting to fail... and they considered just turning it off... and I wondered if that isn't the best choice for 99% of users. Illegals are good for two things(?):

  1. Fail fast while doing auto-detect.

Illegal rules are probably one of the reasons our auto-detect system can be so fast despite literally trying almost 200 languages in a worse case scenario. They have immense value there as they can allow auto-detection to short circuit when a language seems to "make no sense".

  1. Possible doing some type of error checking or code validation?

I feel like I'm reaching here... but I feel like the only other good use for illegal is some type of code validation? Wanting to know "is this value C++ code or not?"... while it might have SOME use in that situation we also do not aim (at all) to be a 100% parser for complex languages, nor do we have much interest in highlighting incorrect or invalid code. (it's also pretty hard to do that without being a full parser)

IE, we do not think we are a good tool for this use case.

So here is the current end user experience. You say:

"Hey, here is some C++ code, please highlight."

And we say:

"Nope. Don't think so, doesn't look like C++ to me. I'm going to not highlight it at all."

It's very "all or nothing" and and if the problem is actually with our illegal ruleset itself (it often is) then we're failing to highlight valid C++ code for no obvious reason... when if we just ignored illegals entirely we'd probably make at least a passable attempt - perhaps even a good one.

I think for 99% of use cases you'd want us to still make our "best attempt" at highlighting rather than just highlight NOTHING because we get a little confused.

Given that illegal really only/mostly seems to serve the purposes of internal auto-detect I'd actually even consider removing it entirely (not exposing it externally)... but we can save that for another day... right now it seems clear to me that we should reverse the default (and possible rename the option) for v11...

If anyone reads this and has a valid use case or is doing something interested with illegal I'd certainly love it if you weighed in.

CC @highlightjs/core

@joshgoebel joshgoebel added enhancement An enhancement or new feature parser discuss/propose Proposal for a new feature/direction help welcome Could use help from community labels Apr 18, 2021
@joshgoebel joshgoebel added this to the 11.0 milestone Apr 18, 2021
Copy link
Member Author

joshgoebel commented Apr 20, 2021

Now I'm also wondering about the behavior of illegal... right now ignoreIllegal does that quite literally... it pretends they never matched... I wonder if a better behavior might be to terminate the current mode... this came up in the context of Markdown where if you have a "false" positive on *... like:

blah blah blah
blah blah 

Lots more doc.

You'll be stuck in emphasis forever... the first thing I thought of was illegal: /\n\n/, making paragraph breaks illegal inside of emphasis - thinking that would terminate the emphasis rule, but instead it's jut silently ignored. I wonder if a better behavior when an illegal is hit (and "ignored") would be to terminate the current mode.

Though now I'm wondering if this isn't just an regex.either on end... I gave up pursuing it because that doesn't actually solve the problem with Markdown... really you want the emphasis to never trigger in the first place, which requires some type of complex look-ahead.

It's possible then we might need a different name than "ignore though". Not sure what is right here, just journaling my thoughts over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature help welcome Could use help from community parser
None yet

No branches or pull requests

1 participant