Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental Str API #1275

Closed
wants to merge 1 commit into from
Closed

Add experimental Str API #1275

wants to merge 1 commit into from

Conversation

matklad
Copy link
Member

@matklad matklad commented May 14, 2019

cc @BurntSushi, this is reply to that reddit comment.

I must say that I work on the higher level than text editors, so I am interested in API generality and ergonomics and have a vague understanding of performance implications.

I write parsers which take input text and produce trees. I'd love to make parsers general. Specifically, advanced editors rarely use str as an internal storage. Usually text is stored in some kind of chunked representation, like piece table, gap buffer or tree. While I can write parser against impl Iterator<Item=char>, that is not really convenient, as some form of random-access helps. For example, it's useful to be able to take a substring of the input (to intern identifier text) or to find the next \n.

So, as a user, I want to write code against the trait that is more convenient than iterator, but still allows the code to work with various text representations.

I also am an implementer of this trait: because syntax trees are lossless, you can always extract text representation from a syntax node (by concatenating text of all tokens), and it is useful to ask questions like "how many newlines does this text contains?".

As an implementer, I'd love to provide some fundamental API (slicing and iteration over string chunks), and get convenient string methods like find, contains, trim for free.

In this PR I've tried to sketch the Str<'a> trait and implement it for my Tree.

Fun fact:

Despite the fact that Java has CharSequence, jFlex operates on char[]: jflex-de/jflex#153

For this reason, IntelliJ uses a CharSequence-based fork of jFlex:
JetBrains/intellij-deps-jflex@9c9da13

@matklad matklad closed this May 14, 2019
@matklad
Copy link
Member Author

matklad commented May 14, 2019

Oh, and here's a cool example of non-trivial CharacterSequence:

image

Here, the text inside doc comment is parsed with a Rust parser, which operates on a CharSequence that "slices" the text out of doc comment. Note that this sequence removes leading ///.

Ideally, I want a Str trait that would allow parser author to write code against S: Str, the IDE author to implement "extract code from fenced block in doc comment", and someone else to just glue the two together seamlessly :D

@matklad
Copy link
Member Author

matklad commented May 14, 2019

Also note that in the above picture the underlying buffer is itself on-trivial CharSequence, so this things tend to nest.

@matklad
Copy link
Member Author

matklad commented May 15, 2019

I've taken a close look at IntelliJ text buffer, and it is surprisingly simple, just a little over 500 lines: https://github.com/JetBrains/intellij-community/blob/master/platform/util/src/com/intellij/util/text/ImmutableText.java

@matklad matklad deleted the str-experiment branch September 2, 2019 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant