Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: (PHP) built-ins are currently purposely not highlighted. #2996

Closed
xCrimin4L opened this issue Feb 10, 2021 · 14 comments
Closed

Discuss: (PHP) built-ins are currently purposely not highlighted. #2996

xCrimin4L opened this issue Feb 10, 2021 · 14 comments
Labels
autoclose Flag things to future autoclose. enhancement An enhancement or new feature help welcome Could use help from community language

Comments

@xCrimin4L
Copy link

Describe the issue

When PHP is put into the pre and code tags, everything renders correctly except certain functions. They are missing and just left as regular text while other functions get the .

Which language seems to have the issue?

Seems to be PHP but maybe other languages also that need the class="hljs-keyword".

Are you using highlight or highlightAuto?

I've tried both and both are giving me the same exact problem.

Sample Code to Reproduce

https://jsfiddle.net/b2dj79qh/

Expected behavior

You should see "json_decode" & "file_get_contents" functions white as the other function "isset" is syntaxed.

Additional context

I'm pretty sure they are all considered the same (isset, file_get_contents, json_decode, etc...), so I feel like they should all be the same color no?

@xCrimin4L xCrimin4L added bug help welcome Could use help from community language labels Feb 10, 2021
@joshgoebel
Copy link
Member

joshgoebel commented Feb 10, 2021

There (isset, etc.) are reserved words. https://www.php.net/manual/en/reserved.keywords.php

PHP does not currently highlight built-ins such as file_get_contents or json_decode.

If there is a full list of built-ins available I'd be curious to see it, but I worry it might be too large (since PHP is also included in our common build). I'd prefer to perhaps see the blahBlah() pattern detected instead and flagged as a generic function call. See #2500

@joshgoebel joshgoebel added the autoclose Flag things to future autoclose. label Feb 10, 2021
@joshgoebel
Copy link
Member

joshgoebel commented Feb 10, 2021

We need to get the details here on how long a full list would be and see if anyone is willing to work on generic function dispatch detection for PHP. If neither of these seems to be viable then this will be auto-closed at some point as "expected behavior" - with #2500 tracking the overall issue (of improved fidelity across all grammars for function dispatch).

@joshgoebel joshgoebel changed the title (PHP) No <span class="hljs-keyword"></span> being put on functions Discuss: (PHP) built-ins are currently purposely not highlighted. Feb 10, 2021
@joshgoebel joshgoebel added enhancement An enhancement or new feature and removed bug labels Feb 10, 2021
@Ayesh
Copy link
Contributor

Ayesh commented Feb 11, 2021

Here is a list: https://www.php.net/manual/en/indexes.functions.php

I don't think highlighting built-in functions at the cost of storing this many functions is worth it. I think even the built-in interfaces list is a far fetch to be honest.

@joshgoebel
Copy link
Member

I don't think highlighting built-in functions at the cost of storing this many functions is worth it.

Yikes. Fully agree.

I think even the built-in interfaces list is a far fetch to be honest.

You mean the built_in list we already have? I might agree with you (with only a quick glance). Are these things not likely to show up in actual PHP code or snippets? These lists don't exist solely for highlighting purposes but also to allow us to "autodetect" if a given snippet of code is PHP or not, etc...

But a list of pure interfaces is no help at all if interfaces aren't commonly named in actual PHP code...

Does PHP have the common convention that CamelCasedThings are always interfaces/classes?

@Ayesh
Copy link
Contributor

Ayesh commented Feb 11, 2021

But a list of pure interfaces is no help at all if interfaces aren't commonly named in actual PHP code...

All those interfaces are somewhat niche interfaces, but I think Throwable and Closure are used fairly often. But you are right, they are great indicators for language identification. I'm working on a couple PRs for PHP 8.0 and PHP 8.1-specific string patterns, hopefully to make it more complete with recent changes.

Does PHP have the common convention that CamelCasedThings are always interfaces/classes?

Internally, functions are declared with snake_case, and interfaces/classes with CamelCase. However, they are all case-insensitive. It's unlikely to not use that pattern, but it's not technically wrong code if someone were to use SnakeCase functions or SCREAMCASE class names for example.

@taufik-nurrohman
Copy link
Member

taufik-nurrohman commented Feb 11, 2021

Does PHP have the common convention that CamelCasedThings are always interfaces/classes?

Unfortunately, PHP classes and functions are case-insensitive.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 11, 2021

Unfortunately, PHP classes and functions are case-insensitive.

OH grrrr... and therefore we have to flag the language as case_insensitive: true which prevents us from doing useful things with those casing conventions, even if we wanted to... though we still could with callbacks (that checked the case using JS), but I'm not sure I want to encourage that until we have some way of not paying the callback penalty during auto-detection.

All those interfaces are somewhat niche interfaces ... you are right, they are great indicators for language identification.

If they are rarely used (niche) then they might be a strong signal but quite expensive (in terms of us maintaining a big list) and in an overall sense not useful... if 95% of PHP code wouldn't ever reference them - we should drop them. The kind of lists we want are lists where 95% of PHP code/snippet MIGHT use those words (which is why keywords are always a good bet)...


The problem with "selective" lists comes when someone says why do you highlight the popular built-in SnapperInterface but not the built-in BoogerInterface... which leads back to perhaps trying to find a way to highlight ALL interfaces... (by case convention) and have a smaller list only for auto-detection purposes. This is also a discussion happening with regards to Javascript.

@Ayesh
Copy link
Contributor

Ayesh commented Feb 11, 2021

I understand that removing the list of built-in interfaces is a breaking change in a semver sense. With PHP getting more and more object-oriented, I don't think its feasible to keep an up-to-date list of interfaces or internal classes. In fact, there was a proposal just today to introduce a new interface.

Language constructs like empty and isset are very strong indicators in language detection, and we often try to keep the keywords list minimal. For example, the recent Enumerations proposal uses case keyword that is already a reserved keyword.

Case sensitivity in PHP is all over the place too. Variables are case-sensitive, but function names, classes, enums, attributes, etc are case-insensitive. We are generally moving into case-sensitive approach though, but it's highly unlikely that existing structs will change. Constants for example, lost their ability to declare a case-insensitive constant, and recently added named parameters are also case-sensitive.

Personally, I'd all in favor to:

  • Remove built-in interfaces and other class names. Except for Throwable and perhaps ArrayAccess, the other ones are quite niche ones, and there are more to come. It will only add maintenance burden with very little gain. This reduces the php.js file by ~1.5KB pre-gzip.
  • Remove magic constants. They always follow __[A-Z_]+__ pattern. We don't include a built-in list in highlight js, but PHP also has magic methods; several magic methods exist in PHP, and they all follow __[A-z] pattern. Unlike the built-in interfaces, we use these magic constants frequently. __DIR__ and __FILE__ being some of the most used; good for language-detection. Reduces php.js by 84 bytes.

@joshgoebel
Copy link
Member

joshgoebel commented Feb 11, 2021

Thanks so much for feedback.

Variables are case-sensitive, but function names, classes, enums, attributes, etc are case-insensitive.

What about keywords? That's the real issue, we don't have a way to do case insensitive keywords apart from making the grammar case insensitive. We could maybe hack it, but I'm not thrilled with the idea.

Remove built-in interfaces and other class names.

Makes sense to me, but we still need to highlight them - or else PHP will look very "dry". Would doing so via CamelCase work? If so we still have to tackle the case sensitivity issue to even be able to detect camelcase...

I understand that removing the list of built-in interfaces is a breaking change in a semver sense.

We do not consider highlighting changes within grammars to be "breaking" changes in a semver sense - the grammars are always improving and in flux. We do try to avoid going the wrong direction with a grammar though - ie they don't get "worse" often.

Unlike the built-in interfaces, we use these magic constants frequently.

That's usually a reason they should stay. One could argue to just include the most popular, but we aren't that tight on space... so if there are only a handful it's easy to just list them all... unless there are really 200 and we only have a few now, in which case revisiting trimming a few more could be done... Just replacing with a __A-Z__ rule doesn't provide near the confidence than it's PHP as the actual named magic constants. [of course many languages share these, but that's a diff problem]

@Ayesh
Copy link
Contributor

Ayesh commented Feb 12, 2021

What about keywords? That's the real issue, we don't have a way to do case insensitive keywords apart from making the grammar case insensitive. We could maybe hack it, but I'm not thrilled with the idea.

Keywords are case-insensitive as well. There are some changes in PHP 8.0 to allow reserved keywords be part of namespaces for example, but they will keep its semantics.

That's usually a reason they should stay...

I think you are right, special with language-detection. PHP language detection is a bit difficult as-is, with its following c-like syntax/comments and __A-Z__ alone isn't a very string signal.


Remove built-in interfaces and other class names.

Makes sense to me, but we still need to highlight them - or else PHP will look very "dry". Would doing so via CamelCase work? If so we still have to tackle the case sensitivity issue to even be able to detect camelcase...

I think we have great title highlighting already. I tested with ArrayAccess (currently listed under built_in) removed, and both Factory (not listed) and ArrayAccess are highlighted as title elements. I don't think removing the build_in interface/class will make visual changes; only in detection.

image

@joshgoebel
Copy link
Member

I'm talking built highlighting them as keywords without context. title is easier because you can realize it's a class, interface etc vs something like: (I don't know PHP at all, so this is just made up!!!)

$arr = ArrayAccess.new()
ArrayAccess.fastSort($arr)

Currently we highlight ArrayAccess there, not as a title, but as a built_in... does PHP have no static methods such that interfaces are never referenced in such a fashion? Did you not think of this, or just don't see it as a big loss?

@taufik-nurrohman
Copy link
Member

taufik-nurrohman commented Feb 12, 2021

Some tests that need to be performed:

class HighlightMe {}
interface HighlightMe {}

class HighlightMe extends HighlightMe {}
class HighlightMe implements HighlightMe {}
$a = new HighlightMe;
$b = new HighlightMe();
public HighlightMe $a;
public array $b = [];
if ($a instanceof HighlightMe) {}
$a = HighlightMe::const;
$b = HighlightMe::method();
use HighlightMe;
use HighlightMe as HighlightMe;
function foo(HighlightMe $a, array $b, $c = null) {}
function foo(): HighlightMe {}

@joshgoebel
Copy link
Member

joshgoebel commented Feb 14, 2021

Remove built-in interfaces and other class names. Except for Throwable and perhaps ArrayAccess, the other ones are quite niche ones, and there are more to come. It will only add maintenance burden with very little gain. This reduces the php.js file by ~1.5KB pre-gzip.

@Ayesh I'd say if you want to do this along with the other work you're doing on PHP lately this would be a small improvement. We may want to move the few remaining to a _ attribute inside keywords (since we don't have a nice way to say "keywords, but do not highlight") vs built_in so that NO interfaces are highlighted (rather than just randomly highlighting the few we include) for now.

@joshgoebel
Copy link
Member

Closing this original issue as #wontfix, expected behavior in agreement with @Ayesh:

I don't think highlighting built-in functions at the cost of storing this many functions is worth it.

We only highlight reserved words - not ALL built-ins (which is a huge list), though there is already separate discussion to highlight all function dispatch as part of the #2500 initiative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoclose Flag things to future autoclose. enhancement An enhancement or new feature help welcome Could use help from community language
Projects
None yet
Development

No branches or pull requests

4 participants