Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend cpp keyword set #3178

Merged
merged 37 commits into from
May 17, 2021
Merged

Extend cpp keyword set #3178

merged 37 commits into from
May 17, 2021

Conversation

krisvanrens
Copy link
Contributor

@krisvanrens krisvanrens commented May 5, 2021

Added a set of commonly used C++ keywords/identifiers. We can't add them all of course, but things like optional and variant which are common vocabulary types since 2017 should be in there in my opinion.

I would also like to order the lists in the language file lexicographically, making extension easier (without having to search). Is this OK? I will update this PR if desired.

Changes

  • Extend set of common keywords/identifiers for C++.

Checklist / TODO

  • Remove redundant primitive types from keyword list.
  • Cleanup keyword/function list.
  • Update/review tests where needed.
  • Explode dense lists and order lexicographically.
  • Update the changelog at CHANGES.md.

src/languages/cpp.js Outdated Show resolved Hide resolved
src/languages/cpp.js Outdated Show resolved Hide resolved
@joshgoebel
Copy link
Member

I would also like to order the lists in the language file lexicographically,

We can do that when done in a separate commit at the end so it doesn't complicate the review.

We also should go ahead and explode KEYWORDS into a proper array itself (one word per line, etc).

@joshgoebel
Copy link
Member

Would be nice to have this in v11. :) Probably getting close to release.

@krisvanrens
Copy link
Contributor Author

Ha, I'm working on it as we speak ;-) Still figuring out some of the JS stuff going on in the language file. I will push a new version later this evening (evening for me, that is).

@krisvanrens
Copy link
Contributor Author

Hey!

I've cleaned up some parts according to Josh's comments above. However, three tests fail for some reasons not completely clear to me. The first one is an autodetection error from a C++ comment test that is detected as 'sqf' now, and another one where Arduino C++ is detected as C++ instead of Arduino. This latter one probably also triggers the last test to fail.

Furthermore I've got a few remarks/questions:

  • Handling of function highlighting is not bug-free. E.g. when a C++ function is a function template that is called with explicit template arguments it's not being detected as a function (e.g.: func() (OK)m func<int>() (not OK)). I'm not sure how to fix this yet as I don't feel very confident with the HL.js language code yet.
  • I'm not sure exactly what I changed when I moved the COMMON_HINTS array out of the CPP_KEYWORDS _relevant_hints field. The CPP_KEYWORDS object is used in numerous places. It feels kind of random/messy when I look at it -- but then again, I'm not really understanding the language parsing file yet.
  • In the reorganization and cleanup of the types/keywords/function arrays, it was inevitable that I had to sort them.
  • I'm not really sure how to beautify the über-long regex for containers.
  • Handling of common objects that are used through their operators is not working correctly yet. For example: std::cout << "blah"; should highlight cout because it is a very commonly used object. It is an object that is used through operator<<(), this kind-of is a function, but not really, and HL.js doesn't detect it as such because no parentheses are used. This and probably others as well need a specific COMMON_OBJECTS class or something to trigger the highlighting in the right context.
  • How far should I go in the adding of keywords/common functions? Right now there's a set that probably covers about 75% of all common C++ code (this is really wild guess). But I can imagine someone who uses this for some more exotic code (some of the code that tends to be presented at conferences via the likes of reveal.js etc.) there may be inconsistencies because not all hundreds and hundreds of common C++ functions are in there. There are still quite a number missing. I'd like to extend the list some more, but I'm not sure where to stop.

At least I checked all the examples and have added a few TODOs for this language parsing file. I might be able to pick these up in a couple of weeks (I'm too busy now). It feels like this language parsing file may need a proper update/refresh anyway.

@joshgoebel
Copy link
Member

The first one is an autodetection error from a C++ comment test that is detected as 'sqf' now, and another one where Arduino C++ is detected as C++ instead of Arduino. This latter one probably also triggers the last test to fail.

I can look into these, probably Arduino needs to be updated based on other changes you made here.

Handling of function highlighting is not bug-free. E.g. when a C++ function is a function template that is called with explicit template arguments it's not being detected as a function (e.g.: func() (OK)m func<int>() (not OK)). I'm not sure how to fix this yet as I don't feel very confident with the HL.js language code yet.

We'd need to expand the regex to match the <> part as well. Examples of this can be found in other grammars but we may want to use the new multi-match stuff here and try to simplify.

I'm not sure exactly what I changed when I moved the COMMON_HINTS array out of the CPP_KEYWORDS _relevant_hints field.

Before the list was mixed (having more than functions) so it needed to be here so the non-functions could still be found and counted (since FUNCTION_DISPATCH wouldn't cover them). If it now only contains functions then it no longer needs to be in CPP_KEYWORDS, so it was right to remove it.

The CPP_KEYWORDS object is used in numerous places. It feels kind of random/messy when I look at it -- but then again, I'm not really understanding the language parsing file yet.

That's because the grammar has too many nested modes... and keywords don't inherit, so if you want the keywords to be present in every sub-mode then you have to keep specifying them over and over. Flattening the grammar to avoid all these nested modes would help with this.

I'm not really sure how to beautify the über-long regex for containers.

Why does it need to be beautified? I mean you could put the list in an array and then built it with the regex helpers (regex.concat, regex.either, etc)... that would be an improvement.

Handling of common objects that are used through their operators is not working correctly yet. For example: std::cout << "blah"; should highlight cout because it is a very commonly used object. It is an object that is used through operator<<(), this kind-of is a function, but not really, and HL.js doesn't detect it as such because no parentheses are used. This and probably others as well need a specific COMMON_OBJECTS class or something to trigger the highlighting in the right context.

Perhaps, though I thought I remembered @klmr saying something about not highlighting these, or am I thinking of something else?

  • How far should I go in the adding of keywords/common functions? Right now there's a set that probably covers about 75% of all common C++ code

This is sufficient. The list is only for heuristics - all functions are highlighted either way. Someone using the library for serious work should be tagging languages in any case, NOT relying on auto-detect which is not always reliable.


This looks like it's in a fairly good spot if I can fix the test issues. We'll see about getting this merged then we can discuss what's next for another PR.

@joshgoebel
Copy link
Member

It looks like the keywords list still includes some types... int, etc... those should be in TYPES.

@krisvanrens
Copy link
Contributor Author

It looks like the keywords list still includes some types... int, etc... those should be in TYPES.

Mmm. Yes, you're right. Let me fix that. I initially went for "this is the list of reserved keywords in C++" but it contains type keywords as well of course. In the context of a highlighting tool it doesn't matter if a keyword is reserved or not of course.

@krisvanrens
Copy link
Contributor Author

We'd need to expand the regex to match the <> part as well. Examples of this can be found in other grammars but we may want to use the new multi-match stuff here and try to simplify.

This is fixed now.

That's because the grammar has too many nested modes... and keywords don't inherit, so if you want the keywords to be present in every sub-mode then you have to keep specifying them over and over. Flattening the grammar to avoid all these nested modes would help with this.

This can be fixed in later versions I would say.

I'm not really sure how to beautify the über-long regex for containers.

Why does it need to be beautified? I mean you could put the list in an array and then built it with the regex helpers (regex.concat, regex.either, etc)... that would be an improvement.

Yeah, that's exactly what I meant. I left it in for now, this could be picked up later on.

Handling of common objects that are used through their operators is not working correctly yet. For example: std::cout << "blah"; should highlight cout because it is a very commonly used object. It is an object that is used through operator<<(), this kind-of is a function, but not really, and HL.js doesn't detect it as such because no parentheses are used. This and probably others as well need a specific COMMON_OBJECTS class or something to trigger the highlighting in the right context.

Perhaps, though I thought I remembered @klmr saying something about not highlighting these, or am I thinking of something else?

Hmmm. OK, I can look up this conversation for a later version. For now they are in the functions list (which is not entirely correct).

This is sufficient. The list is only for heuristics - all functions are highlighted either way. Someone using the library for serious work should be tagging languages in any case, NOT relying on auto-detect which is not always reliable.

This looks like it's in a fairly good spot if I can fix the test issues. We'll see about getting this merged then we can discuss what's next for another PR.

OK.

I think the highlighting is working quite well now. I've been evaluating the test examples and didn't see any weirdness. It turns out quite a bit of the things didn't work before (many of the keyword functions). Mostly because of the lack of matching for <..> before function parentheses.

In a next PR I would like to dig in a little deeper to simplify the grammar. It really feels like some parts of it are dead, wrong or not up-to-date with the latest C++ standard.

@krisvanrens
Copy link
Contributor Author

Oh, and by the way, the Arduino 'default' markup test fails again. In the developer markup tester it is auto-detected as JavaScript. I think it's not detected as Arduino code in the tests as well.

@joshgoebel
Copy link
Member

joshgoebel commented May 14, 2021

Oh, and by the way, the Arduino 'default' markup test fails again.

Because Arduino is ALSO cpp and uses that same grammar... so you need to update those markup tests as well (keywords are now type, etc)... There is a commented line in markup/index.js that can rewrite all the tests results for you but if you use it you need to read the diff carefully to make sure it's doing what you expect.

Please don't rewrite history here but just add commits so I can more easily review.

In the developer markup tester it is auto-detected as JavaScript.

No matter, markup tests have nothing to do with auto-detection.

I think it's not detected as Arduino code in the tests as well.

I don't see a problem... run ./tools/checkAutodetect.js... all is good.

@joshgoebel joshgoebel added this to the 11.0 milestone May 14, 2021
@krisvanrens
Copy link
Contributor Author

Oh, and by the way, the Arduino 'default' markup test fails again.

Because Arduino is ALSO cpp and uses that same grammar... so you need to update those markup tests as well (keywords are now type, etc)... There is a commented line in markup/index.js that can rewrite all the tests results for you but if you use it you need to read the diff carefully to make sure it's doing what you expect.

I see, yes. I now fixed it using the results from the test output of Node, because the 'Arduino' language highlighting is not in the developer markup checker.

Please don't rewrite history here but just add commits so I can more easily review.

Uhm, I'm not sure what I did wrong then -- this was not my intention. I pulled in your commits with git pull -r then committed my additions.

In the developer markup tester it is auto-detected as JavaScript.

No matter, markup tests have nothing to do with auto-detection.

OK.

@joshgoebel
Copy link
Member

joshgoebel commented May 14, 2021

because the 'Arduino' language highlighting is not in the developer markup checker.

It's whatever you choose to build... build -t browser :common arduino would add it, etc.

@joshgoebel
Copy link
Member

Uhm, I'm not sure what I did wrong then -- this was not my intention. I

No worries could just be me reading it wrong... a lot of the more git-savvy contributors do that and often it makes reviews harder.

}

<span class="hljs-comment">// Disable overload for already valid operands.</span>
<span class="hljs-keyword">template</span>&lt;<span class="hljs-keyword">class</span> <span class="hljs-title class_">T</span>, class = std::<span class="hljs-keyword">enable_if_t</span>&lt;!impl::is_streamable_v&lt;<span class="hljs-keyword">const</span> T &amp;&gt; &amp;&amp; std::is_convertible_v&lt;<span class="hljs-keyword">const</span> T &amp;, std::wstring_view&gt;&gt;&gt;
<span class="hljs-keyword">template</span>&lt;<span class="hljs-keyword">class</span> <span class="hljs-title class_">T</span>, <span class="hljs-type">class</span> = std::<span class="hljs-type">enable_if_t</span>&lt;!impl::is_streamable_v&lt;<span class="hljs-keyword">const</span> T &amp;&gt; &amp;&amp; std::is_convertible_v&lt;<span class="hljs-keyword">const</span> T &amp;, std::<span class="hljs-type">wstring_view</span>&gt;&gt;&gt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe class is a type here? We should probably remove it from the list of types and (in a follow-up PR) add it instead with a class [IDENT] multi-matcher rule as done in many other languages - allowing us to also flag the class name as a title.class, etc...

Copy link
Member

@joshgoebel joshgoebel May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also pretty sure class is not a type but a keyword... you use class to specify classes, it's not uses (as a type) to declare a variable class a... you'd say NameOfClass a... so in that usage:

class Booger { 
}

It would be a keyword.

For our intents and purposes it's also a keyword here:

template <class T> // template argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right. I'm still trying to get used to the internal conflicts between the language lawyer and the highlight mindset. I'll fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to move auto, class, struct, union and enum, as they're all somewhere in the twilight zone of being a keyword/storage specifier and a type specifier. I don't have a strong opinion about this regarding the highlighting class, but language-strictly spoken they're probably more type than keyword. But at least they're considered 'special' for the highlighter -- this is the most important fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And indeed in templates class serves a double role as a typename specifier. With some regular expression magic this could be made a special case, but it's fine like it is.

Copy link
Member

@joshgoebel joshgoebel May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to move auto, class, struct, union and enum,

I'd say broadly these are all keywords (from our highlighting POV and historically how we treat them across many grammars). We don't currently differentiate between "type of storage" vs "type of type"... I think the context of their usage is relevant as well... lets take a made up class-based language (very TypeScript like):

(this is me just thinking about this seat of my pants)

// define a class
class Person { }

// a new variable of class Person
var bob : class<Person> = Person.new(...)

The first class is a keyword. You could argue the second class is a bit closer to type (and that could be worth discussing), but for convenience we'd probably just call it a keyword, rather than try to contextually parse the context. Person of course would be title.class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe class is a type here? We should probably remove it from the list of types and (in a follow-up PR) add it instead with a class [IDENT] multi-matcher rule as done in many other languages - allowing us to also flag the class name as a title.class, etc...

Could you point me at a language definition file that uses these multi-matcher rules, or at least a language definition that is a very good example of how a language definition should be done? I'm going to try to study this stuff for the next PR. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at Wren.

@joshgoebel
Copy link
Member

Very very close. :) I think just fix the class issue and this is good to go.

@joshgoebel
Copy link
Member

@klmr Would you mind quickly reviewing the list of keywords/types/hints just as a sanity check incase there are other issues I should flag but I'm not because I don't know C++?

@krisvanrens
Copy link
Contributor Author

Uhm, I'm not sure what I did wrong then -- this was not my intention. I

No worries could just be me reading it wrong... a lot of the more git-savvy contributors do that and often it makes reviews harder.

Ah, yes I see what you mean. I was not aware of the merge strategy that you have. For some reason I assumed you don't like merge commits. Will you squash all the commits from this PR at merge-time?

@joshgoebel
Copy link
Member

joshgoebel commented May 14, 2021

Ah, yes I see what you mean. I was not aware of the merge strategy that you have.

Well, typically the scope of the PRs is small enough that squash works well. It's not set in stone. If someone has a very clean commit history and I see real value in each individual commit then I'll rebase/merge it. 95% of the time we squash.

  • many times the commit history is a mess
  • many other times the individual commits just aren't significant enough to be left stand on their own

For some reason I assumed you don't like merge commits.

I don't like messy commit histories.

Will you squash all the commits from this PR at merge-time?

Most likely.


About rebasing though:

What is hard about rebasing is it's hard to review the changes on GitHub because they keep changing... when I'm working with someone on a PR and they push a tiny change/fix I want to see JUST that change (often it might be 1-2 lines)... that way I can confirm it matches what I was expecting. Often when someone instead rebases you lose track of "what just changed" because they smartly merged it back into their prior commits.

Rebasing can be great (as the very last step)... and along the way add fixup: or squash: style commits... it can just be hard to follow if it's done over and over in the middle of working on a PR. :)

@joshgoebel
Copy link
Member

I would probably have given you "auto" as a type (I flagged it before in my head but didn't deal it worth mentioning), but I don't have a strong argument either way. :-)

@joshgoebel
Copy link
Member

joshgoebel commented May 14, 2021

@krisvanrens You should consider joining our Discord if you want to dig into C++ further in the future. I'd be happy to explain what I mean by a "flatter" syntax. I don't know if C++ is "wrong/outdated" now so much as "overly-complex". Certainly possible it's also both. :)

The c grammar could also use much of the same thinking if you were willing.

@krisvanrens
Copy link
Contributor Author

@krisvanrens You should consider joining our Discord if you want to dig into C++ further in the future. I'd be happy to explain what I mean by a "flatter" syntax. I don't know if C++ is "wrong/outdated" now so much as "overly-complex". Certainly possible it's also both. :)

The c grammar could also use much of the same thinking if you were willing.

I will definitely join the Discord channel for info. I've got a very busy month ahead, but after that there's time for this. Thanks for your patience anyway :-)

My guess is that if I take on the C++ grammar, the C-one should be doable as well.

@joshgoebel
Copy link
Member

Still needs changelog entries. :)

CHANGES.md Show resolved Hide resolved
@klmr
Copy link
Contributor

klmr commented May 16, 2021

things like optional and variant which are common vocabulary types since 2017 should be in there in my opinion.

As Josh alluded to I strongly disagree with this. I’ve explained the reasons for this here: #3093 (comment) — tl;dr: standard library types and functions are not reserved words, and highlighting them in builtins leads to numerous false positives. They also don’t have any special semantics compared to user-defined types and functions, so giving them special highlighting even where correct serves little purpose in making code more readable; on the contrary, it creates a distraction by emphasising an irrelevant distinction.

Compare this with IDEs and source code editors: the good ones don’t highlight these identifiers either (or rather, they highlight them the same as user-defined types and functions).

In fact, previous revisions of the C++ grammar underwent a rigorous cleanup to get rid of arbitrary identifiers that get highlighted incorrectly and I strongly advocate keeping it that way.

@joshgoebel
Copy link
Member

I'm wrapping this up and merging now that we seem at a nice spot... This PR is already too large. Any remaining work can fall into a second PR.

@krisvanrens
Copy link
Contributor Author

I'm wrapping this up and merging now that we seem at a nice spot... This PR is already too large. Any remaining work can fall into a second PR.

OK. I had a change ready for the remark I made to Konrad about the list of additional types. I can put it into a future PR.

@joshgoebel joshgoebel merged commit 3749c1a into highlightjs:main May 17, 2021
@joshgoebel
Copy link
Member

AFAIK, we also have to remove the regular expression CPP_PRIMITIVE_TYPES now. It matches all _t types, which is considered incorrect now (e.g. there's false positives on trait helper alias templates).

What is a "trait helper template", can someone show me an example?

@klmr
Copy link
Contributor

klmr commented May 17, 2021

Traits in C++ are types which describe properties of other types. They’re used for compile-time metaprogramming. Trait helpers are convenience wrappers for trait classes ending in _v or _t, e.g. std::is_integral_v<int> inherits from true_type, whereas std::is_integral_v<float> inherits from false_type.

There are other trait helpers which describe types rather than values (e.g. std::make_unsigned_t<int>, which resolves to int), and those end in _t rather than _v.

@joshgoebel
Copy link
Member

Is the variety such that we couldn't handle this with a smarter regex? IE, exclude is*,*make, etc but allow the other _t types to shine thru?

@krisvanrens
Copy link
Contributor Author

Perhaps we could detect the template arguments and the std:: namespace indication, but then still if someone writes using namespace std; (kind of saying: I don't have to write std:: anymore, it is implied) and then defines a custom type alias that ends in _t it would lead to a false positive. I don't know what your opinion about corner cases like this is? There's similar cases in other languages I would suspect?

@joshgoebel
Copy link
Member

I don't know what your opinion about corner cases like this is?

That we shouldn't overly concern ourselves with them, perhaps they never come up at all.

I worry we're being too pre-emptive here. That can often result in bloated grammars with questionable benefits. I'm not yet convinced we need ANY further work here on _t types. We have had the _t rule for a long time now with no complaints about false positives that I recall. Perhaps they aren't happening, perhaps no one cares, or perhaps the existing behavior is somehow preferred?

I'd prefer to wait until someone raises an actual concern vs adding another list of keywords... there is a cost to all these explicit lists, they increase the file size of our bundle, etc...

@klmr
Copy link
Contributor

klmr commented May 17, 2021

Is the variety such that we couldn't handle this with a smarter regex? IE, exclude is*,*make, etc but allow the other _t types to shine thru?

I’m not sure I understand what that should accomplish: users are free to define their own names ending in _v or _t, and these obviously shouldn’t be highlighted. And the names in the standard library also shouldn’t be highlighted, analogous to my argument against highlighting other standard library names.

The clash between POSIX and C++ naming conventions is unfortunate but, ultimately, not something that can be solved (nor IMHO a real issue).

@joshgoebel
Copy link
Member

joshgoebel commented May 17, 2021

And the names in the standard library also shouldn’t be highlighted, analogous to my argument against highlighting other standard library names.

These situations are different though... in that the problem with standard library is we can't (or don't want to) include a large enough list... so we will always fail to highlight SOME standard library things making the highlighting appear inconsistent... false negatives. Resolving that doesn't add any complexity to the grammar.

This _t problem seems to be a case of false positives (of which no one has even filed an issue)... and it's remaining as a mode prevents us from adding yet another list of types from the cstdint header, (which do look like things I would expect to see highlighted). It's behavior is also 100% predictable and consistent even if not 100% correct, making it different than the highlighting of standard library functions.

@krisvanrens
Copy link
Contributor Author

And the names in the standard library also shouldn’t be highlighted, analogous to my argument against highlighting other standard library names.

These situations are different though... in that the problem with standard library is we can't (or don't want to) include a large enough list... so we will always fail to highlight SOME standard library things making the highlighting appear inconsistent... false negatives. Resolving that doesn't add any complexity to the grammar.

This _t problem seems to be a case of false positives (of which no one has even filed an issue)... and it's remaining as a mode prevents us from adding yet another list of types from the cstdint header, (which do look like things I would expect to see highlighted). It's behavior is also 100% predictable and consistent even if not 100% correct, making it different than the highlighting of standard library functions.

Mmm. Yes. I'm not sure what the signal "no one has filed an issue" in the light of type traits in C++ really means. It's an advanced feature not everyone dabbles with (from what I can tell looking around me), multiplied with the fact that there's only a few people filing issues (again, my feeling) 😏

I would very much agree to keep the grammar as simple as possible. However I'm still divided as to why or why not highlight other conventions if the *_t types are highlighted (e.g. the atomic_* types or *_v values). Choosing only one of them seems inconsistent, to me they feel equally-important. I would very much vouch for a regex-only approach here. Consistent highlighting for all standard library facilities is utopia if the grammar must be short.

Regarding the cstdint list; this must be done with a list..but it's not a long list and the types in it are quite relevant.

Is there a difference in how important the highlighting of 'types' vs. 'functions' vs. 'constants' etc. is considered for Highlight.js?

@klmr
Copy link
Contributor

klmr commented May 17, 2021

the problem with standard library is we can't (or don't want to) include a large enough list

That’s one problem. The other being that highlighting standard library names is not useful and creates many false positives when these names aren’t reserved and are routinely overridden in user code.

Regarding the cstdint names: for the same reason, I don’t particularly think they need to be highlighted, nor would I expect them to be highlighted. But I can see how others might want that (some editors do highlight e.g. size_t and int16_t) so I think it’s defensible. But user-defined type aliases ending in _t are idiomatic and extremely common in C++ code, so highlighting them all would lead to many false positives.

If you don’t want to include all names from cstdint manually, why not use something like this:

/u?int(_fast|_least)?(8|16|32|64)_t|u?int(max|ptr)_t|size_t|ptrdiff_t|nullptr_t/

That captures all types defined in cstdint and three related types (size_t, ptrdiff_t, nullptr_t) defined elsewhere (but often treated identically).

@joshgoebel
Copy link
Member

joshgoebel commented May 17, 2021

But user-defined type aliases ending in _t are idiomatic and extremely common in C++ code

I think we're conflating things a bit here. Our goal is not to only highlight built-ins or reserved types. If we know bob_data_t is a type alias (by common convention) then it can be highlighted with type - this is a positive benefit. type is NOT reserved to only built-in or reserved types. The same way we highlight classes in Ruby regardless of whether they are core library classes or user-defined classes, a class is a class. A type alias is a type alias.

If we wish to draw a distinction between built-in type aliases and user-defined ones we should consider doing that with classification - not be just ignoring the user-defined ones.

The false positive argument of trait helpers does have some merit... could we exclude those with just a negative look-ahead for <?

@klmr
Copy link
Contributor

klmr commented May 17, 2021

If we know bob_data_t is a type alias (by common convention) then it can be highlighted with type - this is a positive benefit.

Oh I agree, that would be awesome. The issue is that this introduces (substantial, not just occasional) inconsistencies since most user-defined types (/type aliases) are would not be highlighted for C++, since there is simply no way, based on syntax alone, to construct a heuristic with (IMHO) acceptable error rate to recognise type names in C++. There is just no sufficiently widespread convention. Even code that uses _t suffixes for some type aliases will commonly use many types and type aliases without that suffix.

@klmr
Copy link
Contributor

klmr commented May 17, 2021

The false positive argument of trait helpers does have some merit... could we exclude those with just a negative look-ahead for <?

Yes, that should be possible. There might be cases where this fails but these would be very rare and wouldn’t be catastrophic.

@joshgoebel
Copy link
Member

There is just no sufficiently widespread convention. Even code that uses _t suffixes for some type aliases will commonly use many types and type aliases without that suffix.

Well, now you've moved me right back to the middle again. LOL. If _t truly isn't a common or widespread convention then getting rid of the *_t rule makes more sense. Does _t signify anything useful at all? In a mixed codebase why would someone use _t for some type aliases but not all of them?

@krisvanrens
Copy link
Contributor Author

There is just no sufficiently widespread convention. Even code that uses _t suffixes for some type aliases will commonly use many types and type aliases without that suffix.

Well, now you've moved me right back to the middle again. LOL. If _t truly isn't a common or widespread convention then getting rid of the *_t rule makes more sense. Does _t signify anything useful at all? In a mixed codebase why would someone use _t for some type aliases but not all of them?

The typename postfix _t is a convention in the standard for many types (but not all, e.g. vector/string/etc.). What Konrad means here (correct me if I'm wrong) is that there's no "general consensus" on this convention in user land. Some code bases do it, most don't.

@joshgoebel
Copy link
Member

joshgoebel commented May 17, 2021

/u?int(_fast|_least)?(8|16|32|64)_t|u?int(max|ptr)_t|size_t|ptrdiff_t|nullptr_t/

I'd be ok with this in this use case, but generally I try and stay away from this for lack of maintainability... but this is clear enough. We'd probably want \b on both ends though with the middler wrapped in ()? We could also use variant or regex.either to split it up just a little if we so desired.

@klmr
Copy link
Contributor

klmr commented May 17, 2021

We'd probably want \b on both ends

Definitely, the regex wasn’t meant to be production ready since I wasn’t sure in which context it would be applied. In general we would want word boundaries, correct.

@krisvanrens
Copy link
Contributor Author

/u?int(_fast|_least)?(8|16|32|64)_t|u?int(max|ptr)_t|size_t|ptrdiff_t|nullptr_t/

I'd be ok with this in this use case, but generally I try and stay away from this for lack of maintainability... but this is clear enough. We'd probably want \b on both ends though with the middler wrapped in ()? We could also use variant or regex.either to split it up just a little if we so desired.

In my opinion this lack of maintainability, or rather: static definition of additional types is also a signal to future maintainers: "there's no room for an extensive list of other types here, we're being conservative about it."

I would have worked for me 😉

@joshgoebel
Copy link
Member

I read it more as:

bugger off unless your a regex expert, we're elite RegEx ninjas here

Though as I said this one ain't so bad. If we want to signal future maintainers though I'd prefer we write a readable note in the comments. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants