Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add basic modeline support #7788

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

add basic modeline support #7788

wants to merge 2 commits into from

Conversation

doy
Copy link
Contributor

@doy doy commented Jul 30, 2023

fixes #729

currently only supports setting language, indent style, and line endings, but maybe this is enough?

Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, thanks for taking this on!

const LENGTH_TO_CHECK: usize = 256;

static MODELINE_REGEX: Lazy<Regex> =
Lazy::new(|| Regex::new(r"(?:\S+\s+)?(?:vi|vim|Vim|ex):\s*(?:set? )?(.*)(?::[^:]*)?").unwrap());
Copy link
Member

@pascalkuthe pascalkuthe Jul 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex should be anchored (that makes it both faster and correct since the modeline is specific about how it may be prefixed We also don't need capture groups here. Since modeline support escapes (with \: we can't phrase the option with regec anyway (it's technically possible but not worth the effort. You can simply find the regec match and then use the rest of the line (from the start if the regec). That also removes the need for non-capturing groups (since you will not compute captures anywya which sinplfies the regec a bit further

Suggested change
Lazy::new(|| Regex::new(r"(?:\S+\s+)?(?:vi|vim|Vim|ex):\s*(?:set? )?(.*)(?::[^:]*)?").unwrap());
Lazy::new(|| Regex::new(r"^(\S*\s+)?(vi|[vV]im[<=>]?\d*|ex):\s*(set?\s+)?").unwrap());

}

fn parse_line(line: &str) -> Option<Self> {
if let Some(captures) = MODELINE_REGEX.captures(line) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described above simply use find instead of captures here

let mut indent_style = None;
let mut line_ending = None;

let options = &captures[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then do line[march.end..]

let mut line_ending = None;

let options = &captures[1];
for option in options.split([' ', '\t', ':']) {
Copy link
Member

@pascalkuthe pascalkuthe Jul 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a simple split here doesn't work since mode lines allow escaping with \: you need to handle that here. You can take a look at the query_atoms function to see how you can use split to split on a char that can be escaped (in that case it was spaces).

Right now it won't matter much since you don't support option with arbitrary values yet but it's good uo be correct from the start

None
}

fn merge(&mut self, other: Self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary. Instead of Creating a new ModeLine for each line that is parsed simply make the parsing accept a mutable reference to self and write the results directly into self

// regexes can't operate over chunks yet, but we can at least
// limit how much we potentially need to copy because modelines
// are typically quite short.
let line = Cow::<str>::from(line.slice(..(LENGTH_TO_CHECK.min(line.len_chars()))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just ignore lines longer than LENGTH_TOK_CJECK since otherwise we cut off mode lines in the middle and get a wrong result. I don't think this occurs in practice anyway

@pascalkuthe pascalkuthe added E-medium Call for participation: Experience needed to fix: Medium / intermediate S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 30, 2023
@pascalkuthe
Copy link
Member

Also one more thing: I think the modeline.rs file can/should be moved to helix-core

@pascalkuthe pascalkuthe added this to the next milestone Jul 30, 2023
Copy link
Member

@the-mikedavis the-mikedavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this feels like plugin territory rather than something that should be built into the core. I believe only each editor supports their own modeline natively: vim variants are the only ones to support vim modelines, only emacs supports emacs modelines natively, vscode doesn't support any modelines natively AFAIK. The translation between filetypes and line ending values from Vi to Helix feels a little hacky to me here: we're basically parsing and trying to intepret Vi set commands. If we do support this in core, I'm not sure where we draw the lines for which Vi settings should be supported. And do we support interpreting emacs modelines as well?

}

impl Modeline {
pub fn parse(text: &Rope) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn parse(text: &Rope) -> Self {
pub fn parse(text: RopeSlice) -> Self {

Comment on lines 27 to 29
text.lines_at(text.len_lines())
.reversed()
.take(LINES_TO_CHECK),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this iterator be reversed again after the take? The lines at the end of the file would have the opposite merging behavior as the lines at the beginning otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fascinatingly enough no, the vim implementation also reads from the end of the file up

@pascalkuthe
Copy link
Member

pascalkuthe commented Jul 30, 2023

vim modelines are pretty standard even in editors other than vim. Kakoune also parses these by default (https://github.com/mawww/kakoune/blob/master/rc/detection/modeline.kak). I think it would be reasonable to mirror kakoune here. It supports the following options currently:

  • scrolloff
  • sidescrolloff (do we even have a sepreat setting for this?
  • tabstop
  • shiftwidth
  • textwidth
  • fileformat
  • filetype
  • spelllang
  • bom
  • nobom

@doy
Copy link
Contributor Author

doy commented Aug 2, 2023

yeah, i don't think this belongs in a plugin. i'm not tied to vim syntax specifically (we could also totally just do our own thing) but it feels pretty important to be able to have some way to override settings like file type that heuristics aren't always going to correctly identify (i have quite a few files that aren't identifiable by filename or shebang).

@doy
Copy link
Contributor Author

doy commented Aug 2, 2023

this should resolve the issues mentioned so far (other than the extended list of options that kakoune supports - should i implement those now or can it wait?). let me know if there are any other issues!

}

// if vim and helix use different names for a language, put that mapping here
fn vim_ft_to_helix_language(val: &str) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-mikedavis was concerned about too much vim emulation here. I tend to agree, kakoune doesn't do this either. We don't need to translate file types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough - i do kind of wonder what the right answer is for files that are intended to be shared between people using vim and helix though. maybe we do want our own format in addition?

@doy
Copy link
Contributor Author

doy commented Sep 17, 2023

i added a second commit which adds support for parsing a helix-specific modeline, since that is probably going to be more useful going forward. i think parsing some amount of vim modelines is useful because they are so widespread, but for configuration that doesn't exist in vim (or is different in vim, such as language names), we are going to want a different method for configuring that.

this currently uses the same toml configuration syntax we use for the configuration file, but not sure if that's what we actually want (given that my understanding is that the toml configuration file is eventually intended to go away in favor of some kind of scripted configuration - telling people to rewrite their local configuration file is different from telling them to fix modelines in every file they want to use). let me know if you think this is fine or if we can come up with something better here.

@the-mikedavis the-mikedavis added S-waiting-on-review Status: Awaiting review from a maintainer. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 20, 2023
doy added 2 commits April 20, 2024 18:42
currently only supports setting language, indent style, and line endings
for example, if the language name differs between vim and helix
.take(LINES_TO_CHECK),
) {
// can't guarantee no extra copies, since we need to regex and
// regexes can't operate over chunks yet, but we can at least
Copy link
Member

@pascalkuthe pascalkuthe Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use regex cursor so we actually can do regex search without buffering so that should be used here.

The length to check limit id probably unnecessary with that

self.line_ending = vim_ff_to_helix_line_ending(val);
}
}
"noet" | "noexpandtab" => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should habdle expandtab without an explicit shiftwidth. I think that isn't too rare. We could simply set the indentation to the tabwidth currently set for that document. This would match sw=0 (which we may also want to handle) and is generally what a user would expect.

One minor csbeat in both cases: If the document ks already configured to use spaces then expandtab should do nothing. That will need special representation in the ModeLine struct but IMO it's worth it since that is pretty common.


if let Some(pos) = HELIX_MODELINE_REGEX.find(line) {
let config = &line[pos.end()..];
match toml::from_str::<ModelineConfig>(config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think tonl is they syntax I would go for here. We are removing toml hopefzlly soon and generally it makes more sense to me what vim does and emulate typable commands (everything you allow users to set here has an interactive typable command that allows setting that config per document interactively). The parsing should be pretty simple. So for example it would be // helix: line-ending=lf indent=4 lang=rust

In terms of aliases you should also just support the aliases these commands support.

@@ -668,6 +672,7 @@ impl Document {
focused_at: std::time::Instant::now(),
readonly: false,
jump_labels: HashMap::new(),
modeline,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of adding a field for this. Mofeline detection should only run when reading a file for the first time and set the buffer setting. In the future with the new config system there will be a dedicated buffer config. For now all fields that you allow to overwrite already have fields on document and you should only set these fields automatically after parsing the ModeLine/bypass the existing autodetection

@pascalkuthe pascalkuthe added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from a maintainer. labels Apr 30, 2024
},
),
(
"/* vim: set noai ts=4 sw=4: */",
Copy link

@lucasew lucasew Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny

This : at the end of the directive is not required in Vim

And this commit only works with the : at the end

@lucasew
Copy link

lucasew commented Jul 26, 2024

There is some exoteric examples that I tested if Vim can properly detect the file type. Vim got it right in all but the FSharp example, that is expected because Vim doesn't have by default the file type definition for FSharp. In my tests the only thing that seems to break detection here is that requirement of a : at the end of the directive.

https://github.com/lucasew/nix-flake-shell/tree/c896400199f12a293ec6ee7d54b5f2e19f0f0a99/tests

@markstos
Copy link
Contributor

vim modelines are pretty standard even in editors other than vim.

I don't know common it is parse vim's modelines. Gedit can parse expandtab, tabstop, shiftwidth, wrap and textwidth: https://help.gnome.org/users/gedit/stable/gedit-plugins-modelines.html.en

The Kate editor supports their own Kate-specific modelines:
https://kate-editor.org/2006/02/09/kate-modelines/#:~:text=Kate%20Part's%20modelines%20%E2%80%93%20also%20called,to%20Emacs%20and%20vim%20modelines.

But I imagine a lot of Helix's users will come from vim/neovim, which parse them.

I agree parsing vim's modelines would be a good fit for a plugin. Emacs can parse Vim's modelines using a plugin:
https://www.emacswiki.org/emacs/vim-modeline

Regarding the trailing colon. I'm one of the vim users who waw aware that's not required, so I don't put in in my own files. I'm not sure how common that is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-medium Call for participation: Experience needed to fix: Medium / intermediate S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vim modeline parsing
5 participants