Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: XML vs HTML - discoverability and one vs two grammars #2888

Closed
Neohiro79 opened this issue Nov 23, 2020 · 15 comments
Closed

Discuss: XML vs HTML - discoverability and one vs two grammars #2888

Neohiro79 opened this issue Nov 23, 2020 · 15 comments
Labels

Comments

@Neohiro79
Copy link

Neohiro79 commented Nov 23, 2020

opened issue as requested from @joshgoebel in our conversation from

PDConSec/vsc-print#63

Could you please consider this input for eventually upcoming new major-branches:

  1. put the language description file into the right folder (and maybe rename it with an "_" underscore at the beginning to be listed at the beginning) - so that when someone tries to find the correct language description files one has at least a chance to find help without asking someone:

https://github.com/highlightjs/highlight.js/blob/master/SUPPORTED_LANGUAGES.md
https://github.com/highlightjs/highlight.js/tree/master/src/languages

  1. rename the XML description file into HTML-XML or at least XML-HTML - so that when one is used to the term HTML the proper HTML language description file is showing up

and finally, 3) just as proposal:

to think about the option of having three basic subsets, one that is optimised for xml-html-css-js usage, one that is bundled for some-broad-but-special-programming-languages-used and one that is the full metal jacket release - which should only be used in rare cases of course - it would be so much more convenient ... and for sure help a lot of beginners to find their proper way of using highlight.js from startup.

And for those geeks and nerds who really love to look into each file and tweak the shit out of everything there still is the option to bundle the language-files individually, which in turn will for sure find the _SUPPORTED_LANGUAGES.md file in the correct folder.

@joshgoebel joshgoebel changed the title Request to think about some minor but major changes in the overall language-description handling Discuss: XML vs HTML - discoverability and one vs two grammars Nov 23, 2020
@joshgoebel
Copy link
Member

joshgoebel commented Nov 23, 2020

rename the XML description file into HTML-XML or at least XML-HTML

This was the specific suggestion I thought had some legs. It's reasonable to think some people might be skimming the file list to see what is available and since the language is named "HTML, XML" you could make the argument that should carry on into the name also as in html-xml, etc. This is breaking change though and would have to wait until v11.

My larger thought though:

  • Should we have a MUCH simplified XML syntax vs overloading XML with HTML? Basically HTML would import XML and then add script and styles rules... and if you want JUST XML then you use xml. if you want html you use HTML.

Right now there is no way of highlighting pure XML if it should include script or style tags... possibly an edge case, but true.

@Neohiro79
Copy link
Author

If you think in a way of a class system yes, I guess this makes sense, but the fun-fact would be to proper know that XML would NOT highlight script and styles - or in other terms to KNOW that HTML would need XML to inherit from.

I would implement XML and HTML for example - but not in the correct order - so it breaks.
I would implement XML only and wonder why it does not highlight everything properly.
I would implement HTML only and wonder why it does not work at all.

Or in case you implement XML as a class inside the HTML file itself I potentially would include XML and HTML, although I might not need the XML part.

It's hard to answer - I mean why even split them? You could implement both in one file - the XML class and the HTML class beyond. The HTML class will get activated once "language-html" is used - the XML class will get activated once "language-xml" is used (as classname).

But "language-html" also automatically includes "css" and "js" - since this is the common thing to do in "html" (even php maybe).

You could maybe put this whole language-set under "language-web" or "language-common" whatever, if you want to keep the names strict - that I don't know.

@joshgoebel
Copy link
Member

joshgoebel commented Nov 23, 2020

You could implement both in one file

We can't implement two DIFFERENT languages in one file - our build system does not allow this. The only good argument IMHO to split them is if they have utility as individual grammars. Not just because of naming issues - that's why we have docs and aliases, etc... and we could possibly rename the filename (as you suggested) is that was the only issue.

If we did that I'd probably do it alphabetically, html-xml vs xml-html.

@Neohiro79
Copy link
Author

Neohiro79 commented Nov 23, 2020

Yeah - I'm fine with that. You know your system best what works and what not. I mean just that I can make the right choice. I want to use HTML - I will find the "html-xml" package when searching for HTML - or vice versa with XML.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 16, 2020

@allejo Any thoughts? I'm not sure there is a right answer. The reason it's xml is because HTML is (sort-of) a subset of XML (not technically but that's a useful way to think of it)... so technically we could just call it "XML" not "HTML" at all.

Although then it has HTML specific things like script and style... This can't be changed until v11 anyways so unless someone has a "everyone wins" idea I think either way someone loses - so I'm inclined to leave it alone. Renaming it just makes XML harder to find.

@Neohiro79
Copy link
Author

@josh I think renaming it into "HTML_XML" would be just fine (if both are the 'same' from the codeside and to get that it is for both) && to "find" it when typing "HTML" OR "XML" into the browser search.

Alternative you might've just keep things how they are till v11, I mean, I might be the only one who was searching for it.

@allejo
Copy link
Member

allejo commented Dec 16, 2020

So I skimmed through the discussion behind this thread, and I hope I'm not too far off with my thoughts here.

Should we have a MUCH simplified XML syntax vs overloading XML with HTML? Basically HTML would import XML and then add script and styles rules... and if you want JUST XML then you use xml. if you want html you use HTML.

I like this idea the best, separating them into two separate languages. In order to keep things backward-compatible until v11:

  • Simplify the XML grammar significantly to be loyal to just XML
  • Introduce utilities for adding <script> and <style> support into a src/languages/lib/xml_like.js
  • Introduce a new html grammar, have it extend the xml grammar and call the utilities from lib/xml_like.js
  • In the simplified xml grammar, include the lib/xml_like.js utilities and throw out a deprecation message that behavior will change in v11

(another way would be to have a circular dependency of having xml.js depending on html.js and just extracting the <script> and <style> rules but I'm less fond of this)

And when v11 is finally released, move the lib/xml_like.js utilities into html and remove the deprecated stuff from xml.

@Neohiro79
Copy link
Author

I'm not that deep into your codebase at this point, but that sounds pretty damn well thought through and straight forward!

@joshgoebel
Copy link
Member

joshgoebel commented Dec 16, 2020

The problem with duplicating them is it is going to increase the bundle size for people building custom packages since XML and HTML are pretty much the same (other than script/style). And so far there has been no real reason to split them (unlike C which had major conflicts between C and C++ that have come up multiple times) - this naming thing is not a good reason in and of itself.

I think renaming it into "HTML_XML" would be just fine

How is that better than XML_HTML? Or are we really just saying the name should include both? IE, xml could be renamed to xml_and_html.js?

@Neohiro79
Copy link
Author

@ JOSH

Or are we really just saying the name should include both?

Yeah, that was basically my request - but you meant that it is not possible to include "more than one language" with your build tools:

We can't implement two DIFFERENT languages in one file - our build system does not allow this.

At least that's what I understood.

IE, xml could be renamed to xml_and_html.js?

Yes, or into html_and_xml.js - if your build system allows this - that I don't know.

I was "searching" for "HTML" originally since I thought I'm gonna need this to make it work and I found "0 results".
One need to know to look and search for "XML" specifically, and also since "X" ist not "A" it is "listed" on the bottom of the page ...

@Neohiro79
Copy link
Author

The problem with duplicating them is it is going to increase the bundle size for people building custom packages since XML and HTML are pretty much the same (other than script/style).

I think the solution proposed by @ allejo is not duplicating the content, rather than "importing" from one file to another, or did I understood something wrong?

@joshgoebel
Copy link
Member

I think the solution proposed by @ allejo is not duplicating the content, rather than "importing" from one file to another, or did I understood something wrong?

Yes, I understand the distinction, but that effectively duplicates the code the way our build system is currently implemented (grammar modules are stand-alone, all dependencies are resolved at build time).

Yes, or into html_and_xml.js

I suppose you could argue "HTML is more popular" or something... if we list both it seems one has to come first... so we either make it harder to find HTML or XML... I'm not convinced this is a big problem for many people since if you just throw "html" or "xml" at the library itself it "does the right thing" automatically.

It's only an issue if you go looking for the file and don't find an "html" file... then hopefully you'd go to the docs and find than XML/HTML are the same thing... so far you're the only person to bring this to our attention as an issue.


I think it's possible a simple fix here may be to just alter the official list to include "HTML" in alphabetical order and then refer someone to XML.

@joshgoebel
Copy link
Member

Anything that involves adding new files/renaming/splitting the grammars is something that can't happen until v11 anyways... so if there is a simpler solution to be found that might solve the problem right now and then we see if it ever comes up again...

@Neohiro79
Copy link
Author

Neohiro79 commented Dec 17, 2020

@ JOSH

As much as I wish I could help you or assist with this topic - I can't. It might be just the best to do nothing at all and just keep the changes @ allejo proposed in mind for v11, since anyways it is included in the "basic" build (which I didn't know at that time).

Maybe just don't bother any more - it sounds way to f***ed up for me for any kind of decision - doing anything at all right now.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 17, 2020

Oh I have a long memory for this kind of thing, particularly if it comes up over and over. I will close this for now and we'll see if it comes up again in the future. People have been building languages using the download website for a long time as well where it's listed as "HTML, XML" (and having no difficulties finding it).

In our SUPPORTED_LANGUAGES we already have it listed as "HTML, XML"... (so its alphabetical by H)... to me if someone was literally looking thru the file list in an editor and can't find html the first thing i'd do is a text search for "html" and then it'd quickly be clear that xml.js is the name you wanted.

So perhaps with v11, html_xml.js makes sense... but if it never comes up again, then it never comes up again. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants