-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release: runtime/grammars is quite large #6187
Comments
Does this do what you want? https://docs.helix-editor.com/master/languages.html#choosing-grammars |
Ideally I would like the releases here to be smaller: https://github.com/helix-editor/helix/releases whether that means stripping the DLLs, or omitting them and having them as a separate download, I am not particular to any one solution. For more data, currently Vim uses three files for Verilog: https://github.com/vim/vim/blob/master/runtime/ftplugin/verilog.vim combined size is 14.85 KB. so the Helix Verilog DLL is currently over 1,000 times larger than that. |
These grammars are 1. compiled and 2. do quite a bit more than provide some regex patterns for syntax highlighting. Generally speaking, these releases are meant to provide a ready to use package for helix, with batteries included. This is part of helix's vision. Minimalism for the sake of minimalism is not. And providing extra "stripped down" variants of our releases on an ongoing basis for something that isn't a problem for most people probably isn't something the maintainers have an appetite for. If you'd like to reduce the disk footprint, it's quite easy to delete these files after installing. |
If you want a minimal release, handpicking the grammars is what you want. A release is 10MB which is still in the acceptable range for me. In the future we'll probably stop shipping as many grammars and compile some on demand. |
As mentioned also: we use tree-sitter grammars that are compiled libraries that do more than just simple highlighting: they're actual parsers that allow us to understand the syntax a lot more deeply let us determine indentation and other decisions (jump to definition, unused references etc) |
@dead10ck does maintain various areas of this editor. He's the sole maintainer of auto pairs.
If you have custom requirements you're always welcome to build from source to your specifications. |
I agree that verilog could be an optional grammar that's not included in the default release though |
It's understandable that you feel irked about the previous responses, but let's try to move on from it so other users can benefit from this issue :) |
He is a part of the org. Either way, it shouldn't matter in a discussion.
And as I've previously stated, that's not a fair comparison because you're comparing vimscript code containing a bunch of highlight regexes vs libraries (binary code) that are full-blown parsers. A more fair comparison would be versus nvim+nvim-treesitter (after you've compiled grammars). Releases are there for convenience (prefer to use an official package when you can, it'll actually set up the runtime path for you correctly) so we try to include all possible grammars for now. If we don't, then we have to go down the rabbit hole of which popular languages to include (upsetting users when their preferred language doesn't work out of the box). More importantly, if we drop verilog from the build now verilog users have to manually compile the grammar (causing a lot more headaches vs you downloading a slightly larger tarball and excluding some grammars on unpack -- grammars compress quite well). As an example, here's the official Alpine package: https://pkgs.alpinelinux.org/package/edge/community/x86_64/helix It comes with no grammars installed and it's up to the user to install what they need (https://pkgs.alpinelinux.org/packages?name=tree-sitter-*&branch=edge&repo=&arch=x86_64&maintainer=) and the grammars can be shared with other editors. But this is dependent on upstream packaging decisions. We do listen to feedback in this area: In older versions helix actually bundled all the grammars into a single binary, which was a lot more inefficient and didn't allow excluding grammars or building custom grammars without a full editor recompile.
Yeah I've stuck with Verilog example because it's an outlier, the rest of grammars are a lot more reasonably sized. It's partly how the grammars are structured, but very complex grammars also produce a lot of complex parsing states: https://github.com/tree-sitter/tree-sitter-verilog/blob/master/grammar.js |
That said, I think binary size isn't the best metric for judging whether a program is minimal -- I'm sure if you compared the source code of the two you'd find helix is a lot slimmer. (Though our package is only slightly larger than neovim but provides support for lots of languages out of the box and has almost no build dependencies.) |
@4cq2 I don't mean to spam this issue, but as mentioned in another comment in that thread, would
satisfy your requirements? (FWIW: I also thought always being in block cursor mode was kind of weird, but now I think it makes more sense given that you always have a selection that is maintained through entering/exiting insert mode. In my case, I was only bothered by it until I set editor.color-modes = true, when I realized that I was having a bit of an XY problem which was mostly just having a really hard time differentiating modes with the default settings.) |
FWIW this issue is also something that irks me about helix, even though its actually a tree-sitter issue (ie tree-sitter/tree-sitter#1799). Reading some other tree-sitter related discussions I got the impression that sometimes the grammar is just written in a way that disproportionately increases the generated code size and can be mitigated by writing it slightly differently, but that might be language specific. Packaging the editor without the grammars would be nicer if you only work with some specific languages, but that would also make the update process more tedious. I guess it depends on how often the grammars need to be updated, or are they always tied to the release they are packaged with? |
I don't mind the binary being 100M, but having single grammar sources be 100M feels extreme du -h . | sort -h | tail
34M ./grammars/sources/scala
41M ./grammars/sources/ponylang
56M ./grammars/sources/verilog
96M ./grammars/sources/lean ponylang?? I understand that we want batteries included with helix, but where do we draw the line? |
I say keep it in single digits. pick the 9 most popular languages, pick whatever metric you want. then put the rest of them as a separate download, either each language as a separate download, or one bulk "other languages" download. but I hope others can agree that the current system is not ideal, and only going to get worse |
I agree, the distribution size is frankly just ridiculous. |
This is not really something we're interested in tackling right now. If you'd like a smaller package, you are welcome to make one that fits your needs. |
Grammar sources also aren't meant to be packaged and distributed, and are usually excluded by packagers. They're simply there for development since re-cloning the grammar every single time from scratch would take a ton of time. |
Currently I use Vim, but I have been thinking about switching to Helix. I prefer minimal programs, so the first test I did was extracted size. First Vim:
https://github.com/vim/vim-win32-installer/releases/download/v9.0.1380/gvim_9.0.1380_x64.zip
where extracted size is 51 MB. then Helix:
https://github.com/helix-editor/helix/releases/download/22.12/helix-22.12-x86_64-windows.zip
where extracted size is 111 MB, over double. I was curious what takes the size, so I checked. Currently the
runtime/grammars
folder is 95.6 MB, or 86% of the total extracted size. More detail:for example, the Verilog grammar is currently 17.3 MB, or 16% of the total Helix extracted size. I have never used Verilog, and dont even know what it is. Would it be possible to have a Helix download without the grammars, then users can just download whatever grammars they might need separately?
The text was updated successfully, but these errors were encountered: