Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PHP] completion and toggle comment fixes #425

Closed
wants to merge 4 commits into from

Conversation

keith-hall
Copy link
Collaborator

fixes #424

@wbond
Copy link
Member

wbond commented May 26, 2016

After spending a bunch of time thinking about how to handle PHP and HTML being mixed at every possible context, I am thinking there are two major approaches:

  1. Don't track meta scopes or push for curly braces. This way we are always at the top level and we don't need to worry about popping out of contexts when a user embeds HTML in the middle of a function. This means we lose proper meta scoping on if/else and function blocks. Currently the class definition only needs to be pushed into the class body context to properly handle use statements, but those almost always happen at the beginning of the class, and most people won't mix traits and embedded HTML anyway.
  2. Get into some gymnastics with switching contexts upon ?> and <? inside of blocks, like we do with the C preprocessor. This would mean that when we were in a class -> method and found a ?> we would switch to something like class-method-embedded-html that would handle HTML and push back into the appropriate context once we hit <? again. This could work OK, except for the complication of two things:
    • The PHP could have begun inside of a <script> tag, in which case we need a copy of every context to push into JS instead of HTML
    • Anonymous functions and classes mean we can have arbitrary-depth nesting of class, method and function scopes.
  3. Leave the contexts and meta scoping as-is and let color scheme authors deal with background color changes and merge in these fixes. There will still need to be a fix for if/else inside of <script> tags, and we'd still need to fix <? happening in the middle of a nested context such as an HTML attribute, etc.

@keith-hall
Copy link
Collaborator Author

I was thinking about this too, and I had the idea that it would be great if it were possible to completely switch contexts in a syntax definition, rather than just push, pop or set.

To explain what I mean: a PHP file starts in a HTML context. A <? is encountered, and it switches to PHP. When the ?> is encountered, whether it be nested inside curly braces or not, it would switch back to the same HTML context it was in previously (including JavaScript if it was in a <script> block at the time). When <? is encountered again, it would go back into the same (potentially deeply nested) PHP context that it was in previously. That way, the HTML and the PHP would be kept completely separate, and don't have to "worry" about each other.

This idea is based on the principle that, at the point when we are in HTML, it doesn't matter what PHP content has come previously, we only care about the HTML contexts/scopes/nesting. And vise versa, when we are in PHP, it doesn't matter what HTML content has come previously, we only care about the PHP contexts/scopes/nesting. If plugins might want to know about scope info, I'm sure there would be some way to implement it to retain the relevant applicable HTML and the PHP scopes.

I believe that it could (albeit with some appropriate tweaks to the HTML syntax) even cope with PHP inside a HTML attribute name declaration like the following, and still highlight it correctly:

<?php if (true) { ?>
<div attr-<?= $someVar ?>-example="hello world">

and would allow "wrong" nesting to be used without negatively affecting anything (contrived example because HTML isn't currently tracking nesting):

<div>
<?php if (true) { ?>
</div><span>some text here</span><div>
<? } else { ?>
more text or HTML
<? } ?>
</div>

But back to the framework we have at the moment.

I personally think your point number 2 isn't worth exploring, due to the possibility of arbitrary-depth nesting - trying to cover all (common) scenarios would require a large amount of duplication in the syntax definition and make it hard to maintain IMO.

Number 1 is certainly possible but I personally don't like that we would lose proper meta scoping as I believe that it is useful for plugin development.

Regarding number 3, I have done some experimentation to see what can be accurately guessed about the context without lookbehinds, based on use cases mentioned in this forum thread, in case that helps decide which approach to take here.

@FichteFoll
Copy link
Collaborator

FichteFoll commented May 26, 2016

I'm with @keith-hall on this one. The proper solution seems to be adding parallel context stacks into ST's core lexer that syntax definitions can switch between arbitrarily. It's definitely complex, but the problem we are trying to solve is as well.
I don't have a particular suggestion on how this would look in the syntax definition, but I believe that Jon will be able to come up with something feasable.

This could also be used to better implement preprocessors in C languages.


Besides that, I also believe that option 2 is infeasable due to its arbitrary nesting.
Option 1 seems to be the easiest and generally most applicable solution while 3 would be the most accurate solution, although I expect this not to work well with php in Javascript. It basically comes down to "meta scopes for javascript/html" vs "meta scopes for php".

@wbond
Copy link
Member

wbond commented May 26, 2016

I have spent some time trying to make the lexer more powerful for complex situations, however the constraints of performance make it somewhat difficult.

In short, the lexer is a state machine, to allow resumption of lexing from any point in the file. This means when a user types in new code, the whole file doesn't have to be lexed again. Instead, lexing is resumed from right before the edit and continued until the context matches up with an existing context. In short, it only has to re-lex for the duration of the changes.

The idea of having multiple stacks means that the state would no longer be a pointer to a stack position, but some sort of amalgamation of N stacks (HTML -> JS -> HTML TEMPLATE STRING -> PHP -> HTML or HTML -> CSS -> PHP -> CSS). Another very possible situation: Markdown -> HTML -> JS -> PHP -> ABORT!

PHP could also be written that a function is defined in a block of HTML, but outputs JS and it only ever called inside of a <script> tag. According to the definition, the output should be highlighted as HTML, but in this case the user may be actually outputting JS.

C runs into similar problems with the preprocessor. There is no way to properly highlight code that uses certain preprocessor constructs. If someone writes:

if (foo) {
    common_code();
#if LINUX
    linux_code();
}
#endif
#if OSX
    osx_code();
}
#endif

We can't properly determine when the meta scope for the if block closes. However, most people don't write code that way. If they do, they just have to deal with incorrect highlighting.

I think the best thing to do is only ever highlight literal output in a PHP function as HTML. If someone writes JS in there, we can't really know what they were intending. Same goes for CSS. If they want proper highlighting they can use one of the named heredocs. Most modern developers will use a templating language anyway, and not directly switch to literal output in the middle of a function.

For outside of functions/classes, I think we are just going to ditch the meta scopes for {} blocks such as if, else, try, etc. That way we can pop out to HTML and then back into the top-level PHP. This should preserve the way that PHP and HTML, CSS and JS all interacted before. It worked fine, and I don't think it is worth sacrificing that for theoretical correctness of block meta scopes.

Taking this approach shouldn't affect our tests other than removing the meta scopes on blocks that aren't part of a class or function.

I'm doing to be working on this soon. I believe it will probably require some of these changes to allow literal output inside of a function to work properly.

@keith-hall
Copy link
Collaborator Author

Thanks for the info about how the lexer works, @wbond 👍

I thought it probably did something like that, but didn't quite realize exactly how clever it is, to avoid unnecessarily re-lexing the rest of the file when an edit is made 😉 I can see how my proposal would be far too much work, and unnecessary considering that the block meta scopes on the ifs and fors could just be removed like you suggest. 😄 I am happy with this solution, as I believe plugins could still make use of the punctuation.definition.block.begin/end scopes to achieve the same things they could with the meta scopes, and, as you say, it will restore the old interaction between PHP/HTML/CSS/JS which worked really well. 👍

I also like the suggestion of using the named heredocs for correct highlighting - good thinking sir 😉

I'm not quite sure I follow why the meta scopes on functions need to be kept though?

@wbond
Copy link
Member

wbond commented May 27, 2016

There is some parsing that is specific to class bodies, so ideally we keep that. To keep that, we need to keep function block parsing. This requires a little bit of nesting of PHP -> HTML -> PHP for certain constructs. The test file will have some examples once I push my changes up.

@wbond
Copy link
Member

wbond commented Jun 1, 2016

The contortions that would have been required for all PHP snippets, completions, and color schemes to handle HTML nested inside of PHP was going to be pretty rough. Instead I ended up implementing:

  1. Contexts no longer are pushed when entering a curly brace block, but set is used so we are only ever one level deep
  2. There are contexts to handle five levels of curly brace nesting combined with literal HTML output while inside of functions and methods
  3. Curly braces that are not for functions or classes are not tracked outside of functions or classes (they have to be inside to know when we've consumed the end of the data structure)

The tests have been updated to test out end ensure that we are functioning effectively the same as before the big PHP changes. There will be situations with deep curly brace nesting inside of functions and classes, or nested closures/anonymous classes where the text.html.basic scope will be contained within a source.php parent scope. This seems like a very unlikely occurrence because of:

  1. My own experience working with PHP and seeing how literal output is usually used. While some instances of literal output in functions and classes exists, it is usually not deeply nested in too many layers of if, else, try, etc.
  2. Most users writing modern PHP and using such constructs (closures/anonymous classes) usually won't switch to literal output inside of them, and will likely be using a template library

I'm going to explore adding the ability to reset scope stacks in a future release to make the maintainability of this a little cleaner, but I think for now this should be functional for almost all users and package developers.

Commit with changes: cd7f8c2

@wbond wbond closed this Jun 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PHP] incorrect snippets and comments shown for HTML inside PHP control structures
3 participants