Skip to content

Conversation

@ThePuzzlemaker
Copy link
Contributor

@ThePuzzlemaker ThePuzzlemaker commented Mar 28, 2021

I would like to start out by saying that this is definitely not something that should be merged (yet). I'm mostly posting this PR to document how this could be potentially helpful (once made ready for production, as it's definitely not feature-complete yet), and to gauge the community's opinion of having this as an option in mdBook.

Background and Rationale

syntect is a library by Tristan Hume that is meant for syntax highlighting for static sites, command line applications, and more. It has interfaces for both CLI apps (with ANSI codes and similar), and for websites (with HTML and CSS). syntect uses Sublime Text syntax definitions, which are much more widespread and are easier to convert from the ubiquitous TextMate syntax definition format (which itself is used in editors like Atom (which recommends tree-sitter but I believe still supports TextMate syntax definitions), VSCode, and more). However, highlight.js uses its own format that is not compatible with TextMate or Sublime Text syntax definitions due to major architectural differences in how syntax is parsed. This is my main reason for creating this fork, but there are a few other benefits of using syntect.

Since syntect does all of the syntax highlighting when building a book, it means that page load times for chapters with lots of syntax highlighting would be theoretically significantly reduced. (I haven't tested this but it would make sense because syntax highlighting at build-time would cause the only impact to be from HTML/CSS, rather than JS and HTML/CSS).

It also means that adding a new language to the highlighter doesn't mean you have to modify the index.hbs file or highlight.js (which I had to do for my programming language's books). Instead, you can just drop a .sublime-syntax file into a folder such as src/theme/syntaxes and it will just work.

Implementation

Right now this PR completely replaces highlight.js with syntect. I imagine that this is probably not what the majority of people would want, as some people may still have reasons for wanting to use highlight.js. Also, because the CSS generated by syntect from the .sublime-color-scheme files that I used is not exactly the same as the CSS that was used before for highlight.js, there are some differences in styling. There may also be some small bugs.

Line hiding is not very efficient and definitely not bug-free at the moment, which is something I'll have to sort out eventually. Also, since now the effort of syntax highlighting is put into build-time rather than run-time (and because my code is... not the best to say the least), this does impact build time by a few seconds. I believe this probably scales up as the number of highlighted lines/code blocks increases, but I haven't done much testing. I'm sure that the performance could be improved, probably by moving syntax highlighting into a different stage of book building.

This build-time overhead is one possible reason why it may be helpful to allow users to choose between syntect and highlight.js — it's a tradeoff between build-time overhead and run-time overhead. Some users may choose one or the other.

I would also like to note that as Sublime Text syntax definitions are a bit more verbose on scoping than highlight.js, the size of generated HTML is somewhat increased. In my opinion, this probably isn't too big of a problem.

If you have any questions for me or find any bugs/things that could be improved, feel free to open an issue or PR on the forked repo. I'd rather not overwhelm the maintainers of mdBook with questions/improvements for my fork.

Checklist

This is a checklist of what minimally has to be done before this can be used well in production:

  • Improve performance so that there's minimal build overhead
  • Make code line hiding less buggy
  • Don't completely replace hljs, just provide an option to use hljs or syntect in the build config
  • General testing and bugfixes
  • Fix CSS styles so that there's more parity between hljs styles and syntect styles, also fixing bugs that may have been introduced while using syntect's .sublime-color-scheme to CSS generator

@ehuss
Copy link
Contributor

ehuss commented May 4, 2021

This is interesting, thanks for working on it!

Yea, the performance is quite a bit worse than I would expect (the rust book goes from 1.6s to 14.4s, or about 9x slower). Is the majority of this time spent in the syntect renderer?

This is my main reason for creating this fork, but there are a few other benefits of using syntect.

Can you say more about what differences you are interested in? Are there languages missing in highlight.js that you want? Or are you interested in more sophisticated highlighting?

it means that page load times for chapters with lots of syntax highlighting would be theoretically significantly reduced.

Have you had a chance to try to measure this? Do you happen to know of a good way to measure it? I poked around for a bit with Firefox and Chrome, but couldn't see a noticeable difference in load times.

@ThePuzzlemaker
Copy link
Contributor Author

ThePuzzlemaker commented May 4, 2021

Is the majority of this time spent in the syntect renderer?

I'd assume so, but I did not really implement it in the best way as I haven't used pulldown-cmark that much so it may just be inefficiency from that. Hiding lines is really janky and I think that may be one of the culprits.

Can you say more about what differences you are interested in? Are there languages missing in highlight.js that you want? Or are you interested in more sophisticated highlighting?

I mostly made this just because I'm working on a programming language, and I'd like to highlight it within mdBook. However, this means that I'd have to make a highlight.js syntax, and highlight.js uses its own format that's not compatible with other formats I'm already using (tmLanguage), so it means writing a whole new grammar for my programming language that I would only use for its documentation and for nothing else, whereas the tmLanguage would be versatile for pretty much every other syntax highlighting system I would need to use (barring things like vim/nano). Using syntect also brings easier options for custom code block styling, as you can use Sublime-compatible styles and convert them to CSS using syntect. This way it's slightly easier to customize code block styles without doing some weird CSS class munging.

Have you had a chance to try to measure this? Do you happen to know of a good way to measure it? I poked around for a bit with Firefox and Chrome, but couldn't see a noticeable difference in load times.

Not yet, but I plan to at some point. Perhaps it wouldn't cause noticeable client-side performance increases.

@ehuss
Copy link
Contributor

ehuss commented Aug 2, 2021

@ThePuzzlemaker I was wondering if you have had a chance to look at improving performance here. We've run into an issue (#1622) with highlightjs, and I'm feeling the desire to have server-side rendering even more. However, I don't think I can swallow a nearly 9x performance loss.

@ThePuzzlemaker
Copy link
Contributor Author

I'll try to do some profiling, then I'll have a rough sketch of what needs to be fixed. I'm probably going to reimplement it on top of the master branch again entirely, as many things have been changed since I first made this fork, which also gives me ample chance to fix mistakes that I made relating to performance more easily.

@ThePuzzlemaker
Copy link
Contributor Author

See updated version: #1624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants