Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: How we handle the PHP templating language #2330

Closed
joshgoebel opened this issue Dec 21, 2019 · 13 comments · Fixed by #2417
Closed

Discuss: How we handle the PHP templating language #2330

joshgoebel opened this issue Dec 21, 2019 · 13 comments · Fixed by #2417
Labels
enhancement An enhancement or new feature language

Comments

@joshgoebel
Copy link
Member

joshgoebel commented Dec 21, 2019

Related:

Technically PHP is a templating language that exists inside HTML, ie a full PHP template snippet would look like:

<?php echo '<p>Hello World</p>'; ?> 

Although echo '<p>Hello World</p>'; is still valid PHP code, per se... and it's possible to run this with php -r. So it's possible to have PHP code without the PHP tags...

So keep these two very different context in mind...


Our current handling of PHP is problematic in several ways. It's not very intuitive, and it gives people the wrong idea about how to handle templating languages in general. PHP currently is handled by xml and php. php is the actual php code itself, while the php templating functionality is buried inside xml.

That means a snippet such as <?php echo '<p>Hello World</p>'; ?> would be auto-detected as XML, not PHP. I believe that is one reason we score so poorly on some of the language detection stats. They are throwing PHP files at us and expecting us to return "php" but instead we return "xml".

This also isn't intuitive if you are manually trying to specify the highlighting language... if you have a FULL PHP template and specify "php" your highlighting will be pretty broken because HTML won't be highlighted at all... you'd instead need to specify "xml" to get proper highlighting.

But if you have only php code, without the <?php ?> start/end tags then php would work and xml would be badly broken.


Just including PHP in xml makes people start thinking about this wrongly (IMHO)... See issue #725, for example. They see PHP handled by XML and get the idea that that's perhaps where we should dump other templating languages.

Should XML also handle Mojolicious? ERB? Handlebars? etc... I think it's obvious this is not the correct path. The correct way to specify a grammar for PHP would look something like:

export default function(hljs) {
  return {
    subLanguage: 'xml',
    contains: [
      {
        begin: '<\?(php)?', end: '\?>',
        subLanguage: 'php-code'
      }
    ]
  };
}

If we just made this change today (moving the existing php to php-code) this should "just work" regarding auto-detect. But now we'd have two different languages that PHP could be detected as, depending on whether it was enclosed in <?php ?> tags or not... php or php-code. This seems suboptimal.

Also, this presents a new problem when someone wants to manually specify the language. Should they use php or php-code?

I'm not 100% sure what the best course of action is here, but I think we need to figure this out and slide it into version 10, as it's likely going to be another breaking change.

@joshgoebel joshgoebel added language enhancement An enhancement or new feature labels Dec 21, 2019
@joshgoebel
Copy link
Member Author

@egor-rogov Any thoughts/ideas?

@egor-rogov
Copy link
Collaborator

I agree that current PHP support which is buried inside XML doesn't look right.

I am not absolutely sure but probably we can combine php and php-code in your example above? Something like define php-code locally in php so that you can never use it by itself. Anyway, the idea is to move PHP support from XML into PHP in order to (1) make it intuitive and (2) rid XML grammar from unknown number of future templating languages.

But what about #725? Won't we have problems with template tags for PHP inside <script> that way? Or we'll have to include all templating languages into sublanguages list?

subLanguage: ['actionscript', 'javascript', 'handlebars', 'xml']

Other idea may be to teach the core the templating/preprocessing languages concept. So that it could make a pass over the source code with, say, handlebars, and then make a second pass over "unrecognized" parts with some other language, say, XML. It looks like the right thing to me, but it evidently requires a lot of rework.

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 21, 2019

Something like define php-code locally in php so that you can never use it by itself.

Well sure, you can attach them at the hip or bury one within the other but that doesn't solve the problem that one works for tagged php and the other works for naked php. You still have the problem of how do people pick one or the other if they want to manually specify... and even if they let auto-detect do it do we really want to hand out "php" or "php-code" as the language names?

But what about #725? Won't we have problems with template tags for PHP inside <script> that way? Or we'll have to include all templating languages into sublanguages list?

Not sure I follow... it should "just work" unless the script is inside yet another sublanguage and that breaks things... I'm really not sure how sublanguages work completely much less layered sublanguages... It was was any other tag than a script tag I think it would just work... would need to test.

Other idea may be to teach the core the templating/preprocessing languages concept. So that it could make a pass over the source code with, say, handlebars, and then make a second pass over "unrecognized" parts with some other language, say, XML. It looks like the right thing to me, but it evidently requires a lot of rework.

Yeah, that would take some real thought. First we should probably look more closely at sublanguages and what they do and can't do. I think they do a lot.

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 21, 2019

But what about #725? Won't we have problems with template tags for PHP inside <script> that way? Or we'll have to include all templating languages into sublanguages list?

The example actually shows PHP already "just working" inside script... I presume that was XML parsing, though I don't think they say.

@egor-rogov
Copy link
Collaborator

Something like define php-code locally in php so that you can never use it by itself.

Well sure, you can attach them at the hip or bury one within the other but that doesn't solve the problem that one works for tagged php and the other works for naked php.

Actually I was thinking about what seems an ideal solution: one php language with accommodates both naked and tagged PHP. But what I suggested won't work of course, you're right.
Maybe it could be possible with some sort of before-highlighting plugin that sniffs the source code for the presence of <? and ?> and makes a decision what grammar to use?.. Just a thought.

@egor-rogov
Copy link
Collaborator

The example actually shows PHP already "just working" inside script... I presume that was XML parsing, though I don't think they say.

Yes, and it show that Mojolicious (of which XML knows nothing) doesn't "just work". So what will happen if we cut out PHP from XML?

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 22, 2019

So what will happen if we cut out PHP from XML?

Then xml will only highlight XML - not php. You'd use php (or something) to highlight PHP and the auto-detect would do the right thing if both HTML and PHP was detected in a single file.

That's the easy part.

@joshgoebel
Copy link
Member Author

One thought I've had but it'd be a bit of a change:

php.js

function(hljs) {
  aliases: ['php', 'php3', 'php4', 'php5', 'php6', 'php7'],
  // runs both and chooses the best match, externally this grammar is always presented as
  // a singular "php" grammar (for class naming, `lang-php`, etc.)
  multiLanguage: ["_php-code", "_php-template"]
}

This would be helpful for things like shexc also, which has multiple modes (or even the css grammar this has come up before).

These might not actually even be full "grammars" in the HLJS sense... they might be completely self-contained inside the single grammar module:

function(hljs) {
  aliases: ['php', 'php3', 'php4', 'php5', 'php6', 'php7'],
  // multiple start modes
  multiMode: [PHP_CODE_MODE, PHP_TEMPLATE_MODE]
}

@joshgoebel
Copy link
Member Author

joshgoebel commented Dec 25, 2019

As a more immediate solution i think I'd be in favor of two PHP grammars:

  • php (template)
  • php_snippet

PHP being a simple template wrapper like vbscript-html that ties XML and php_snippet together... if a language actually gets highlighted as php or php_snippet depending, that doesn't seem like the worst outcome... better than xml and php IMHO.

Now:

  • PHP templates are detected as xml
  • Raw php code (snippets) are detected as php

Recommend:

  • PHP templates are detected as php
  • Raw php code (snippets) are detected as php_snippet

A slightly less breakage approach:

  • PHP templates are detected as php-template
  • Raw php code (snippets) are detected as php

This would preserve the history meaning of php, and only change xml => php-template.

@joshgoebel joshgoebel pinned this issue Feb 1, 2020
@joshgoebel
Copy link
Member Author

@allejo Perhaps you could read thru this and add any thoughts if you have any? If we're going to make some breaking changes here we need to figure this one out before v10.

I'm just not 100% sure on how to best proceed here.

@taufik-nurrohman
Copy link
Member

Just in case you want to keep the old class names, or just because you are too lazy to update the old class names:

let classes = /\b(lang(uage)?-)?(html|php|xml)\b/;
document.querySelectorAll('pre > code:not(.hljs)').forEach(block => {
    if (
        // Check for legacy class names
        classes.test(block.className)
        &&
        // PHP template mostly contains the closing `?>`
        // May get conflicted with `<?xml … ?>` string though
        /\?>/.test(block.textContent)
    ) {
        block.className = (block.className.replace(classes, "")) + ' php-template';
    }
});

hljs.initHighlighting();

@joshgoebel
Copy link
Member Author

Not possible to do the same thing as a plugin? :)

@taufik-nurrohman
Copy link
Member

taufik-nurrohman commented Aug 8, 2020

It will probably be a plugin to detect languages by file extension (because any files with extension .php can be anything).

But to be honest I’m still a bit annoyed by this and so this happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants