-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PHP] completion and toggle comment fixes #425
Conversation
After spending a bunch of time thinking about how to handle PHP and HTML being mixed at every possible context, I am thinking there are two major approaches:
|
I was thinking about this too, and I had the idea that it would be great if it were possible to completely switch contexts in a syntax definition, rather than just To explain what I mean: a PHP file starts in a HTML context. A This idea is based on the principle that, at the point when we are in HTML, it doesn't matter what PHP content has come previously, we only care about the HTML contexts/scopes/nesting. And vise versa, when we are in PHP, it doesn't matter what HTML content has come previously, we only care about the PHP contexts/scopes/nesting. If plugins might want to know about scope info, I'm sure there would be some way to implement it to retain the relevant applicable HTML and the PHP scopes. I believe that it could (albeit with some appropriate tweaks to the HTML syntax) even cope with PHP inside a HTML attribute name declaration like the following, and still highlight it correctly:
and would allow "wrong" nesting to be used without negatively affecting anything (contrived example because HTML isn't currently tracking nesting):
But back to the framework we have at the moment. I personally think your point number 2 isn't worth exploring, due to the possibility of arbitrary-depth nesting - trying to cover all (common) scenarios would require a large amount of duplication in the syntax definition and make it hard to maintain IMO. Number 1 is certainly possible but I personally don't like that we would lose proper meta scoping as I believe that it is useful for plugin development. Regarding number 3, I have done some experimentation to see what can be accurately guessed about the context without lookbehinds, based on use cases mentioned in this forum thread, in case that helps decide which approach to take here. |
I'm with @keith-hall on this one. The proper solution seems to be adding parallel context stacks into ST's core lexer that syntax definitions can switch between arbitrarily. It's definitely complex, but the problem we are trying to solve is as well. This could also be used to better implement preprocessors in C languages. Besides that, I also believe that option 2 is infeasable due to its arbitrary nesting. |
I have spent some time trying to make the lexer more powerful for complex situations, however the constraints of performance make it somewhat difficult. In short, the lexer is a state machine, to allow resumption of lexing from any point in the file. This means when a user types in new code, the whole file doesn't have to be lexed again. Instead, lexing is resumed from right before the edit and continued until the context matches up with an existing context. In short, it only has to re-lex for the duration of the changes. The idea of having multiple stacks means that the state would no longer be a pointer to a stack position, but some sort of amalgamation of N stacks (HTML -> JS -> HTML TEMPLATE STRING -> PHP -> HTML or HTML -> CSS -> PHP -> CSS). Another very possible situation: Markdown -> HTML -> JS -> PHP -> ABORT! PHP could also be written that a function is defined in a block of HTML, but outputs JS and it only ever called inside of a C runs into similar problems with the preprocessor. There is no way to properly highlight code that uses certain preprocessor constructs. If someone writes: if (foo) {
common_code();
#if LINUX
linux_code();
}
#endif
#if OSX
osx_code();
}
#endif We can't properly determine when the meta scope for the if block closes. However, most people don't write code that way. If they do, they just have to deal with incorrect highlighting. I think the best thing to do is only ever highlight literal output in a PHP function as HTML. If someone writes JS in there, we can't really know what they were intending. Same goes for CSS. If they want proper highlighting they can use one of the named heredocs. Most modern developers will use a templating language anyway, and not directly switch to literal output in the middle of a function. For outside of functions/classes, I think we are just going to ditch the meta scopes for Taking this approach shouldn't affect our tests other than removing the meta scopes on blocks that aren't part of a class or function. I'm doing to be working on this soon. I believe it will probably require some of these changes to allow literal output inside of a function to work properly. |
Thanks for the info about how the lexer works, @wbond 👍 I thought it probably did something like that, but didn't quite realize exactly how clever it is, to avoid unnecessarily re-lexing the rest of the file when an edit is made 😉 I can see how my proposal would be far too much work, and unnecessary considering that the block meta scopes on the I also like the suggestion of using the named heredocs for correct highlighting - good thinking sir 😉 I'm not quite sure I follow why the meta scopes on functions need to be kept though? |
There is some parsing that is specific to class bodies, so ideally we keep that. To keep that, we need to keep function block parsing. This requires a little bit of nesting of PHP -> HTML -> PHP for certain constructs. The test file will have some examples once I push my changes up. |
The contortions that would have been required for all PHP snippets, completions, and color schemes to handle HTML nested inside of PHP was going to be pretty rough. Instead I ended up implementing:
The tests have been updated to test out end ensure that we are functioning effectively the same as before the big PHP changes. There will be situations with deep curly brace nesting inside of functions and classes, or nested closures/anonymous classes where the
I'm going to explore adding the ability to reset scope stacks in a future release to make the maintainability of this a little cleaner, but I think for now this should be functional for almost all users and package developers. Commit with changes: cd7f8c2 |
fixes #424