Add experimental Str API #1275

matklad · 2019-05-14T15:51:19Z

cc @BurntSushi, this is reply to that reddit comment.

I must say that I work on the higher level than text editors, so I am interested in API generality and ergonomics and have a vague understanding of performance implications.

I write parsers which take input text and produce trees. I'd love to make parsers general. Specifically, advanced editors rarely use str as an internal storage. Usually text is stored in some kind of chunked representation, like piece table, gap buffer or tree. While I can write parser against impl Iterator<Item=char>, that is not really convenient, as some form of random-access helps. For example, it's useful to be able to take a substring of the input (to intern identifier text) or to find the next \n.

So, as a user, I want to write code against the trait that is more convenient than iterator, but still allows the code to work with various text representations.

I also am an implementer of this trait: because syntax trees are lossless, you can always extract text representation from a syntax node (by concatenating text of all tokens), and it is useful to ask questions like "how many newlines does this text contains?".

As an implementer, I'd love to provide some fundamental API (slicing and iteration over string chunks), and get convenient string methods like find, contains, trim for free.

In this PR I've tried to sketch the Str<'a> trait and implement it for my Tree.

Fun fact:

Despite the fact that Java has CharSequence, jFlex operates on char[]: jflex-de/jflex#153

For this reason, IntelliJ uses a CharSequence-based fork of jFlex:
JetBrains/intellij-deps-jflex@9c9da13

matklad · 2019-05-14T15:55:10Z

Oh, and here's a cool example of non-trivial CharacterSequence:

Here, the text inside doc comment is parsed with a Rust parser, which operates on a CharSequence that "slices" the text out of doc comment. Note that this sequence removes leading ///.

Ideally, I want a Str trait that would allow parser author to write code against S: Str, the IDE author to implement "extract code from fenced block in doc comment", and someone else to just glue the two together seamlessly :D

matklad · 2019-05-14T15:58:18Z

Also note that in the above picture the underlying buffer is itself on-trivial CharSequence, so this things tend to nest.

matklad · 2019-05-15T11:25:15Z

I've taken a close look at IntelliJ text buffer, and it is surprisingly simple, just a little over 500 lines: https://github.com/JetBrains/intellij-community/blob/master/platform/util/src/com/intellij/util/text/ImmutableText.java

Add experimental Str API

ebe0536

matklad closed this May 14, 2019

matklad deleted the str-experiment branch September 2, 2019 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental Str API #1275

Add experimental Str API #1275

matklad commented May 14, 2019

matklad commented May 14, 2019

matklad commented May 14, 2019

matklad commented May 15, 2019 •

edited

Loading

Add experimental Str API #1275

Add experimental Str API #1275

Conversation

matklad commented May 14, 2019

matklad commented May 14, 2019

matklad commented May 14, 2019

matklad commented May 15, 2019 • edited Loading

matklad commented May 15, 2019 •

edited

Loading