Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve highlighting consistency by special-casing common types #3234

Closed
wants to merge 3 commits into from
Closed

Improve highlighting consistency by special-casing common types #3234

wants to merge 3 commits into from

Conversation

krisvanrens
Copy link
Contributor

@klmr @joshgoebel

Follows up #3178

Description

In the current version of the C++ highlighting set is a regular expression rule called CPP_PRIMITIVE_TYPES that highlights POSIX-defined special types. However, this does not hold for C++ and leads to false positives in highlighting. Especially the _t convention for alias template helpers for type traits come to mind (e.g. std::decay_t<...> et al).

There is, however, a set of common language-defined types that for a large part have the _t suffix. These types should be highlighted and could be considered primitive types.

In this PR I propose to change the regular expression to match suffix _t for any type to be changed to a list of selected types for highlighting. It is the intent for this list to remain as minimal as possible, perhaps this should be clarified in a comment.

Changes

  • Change CPP_PRIMITIVE_TYPES to a list of selected types for highlighting.

Checklist

  • Checked build and test results + updated test results
  • Updated the changelog at CHANGES.md

@joshgoebel
Copy link
Member

What does the _t supposed to symbolize anyways? I always thought it was "type", hence this made a lot of sense...

@krisvanrens
Copy link
Contributor Author

What does the _t supposed to symbolize anyways? I always thought it was "type", hence this made a lot of sense...

Yeah it does represent 'type', but it is a POSIX convention that does not hold for C++. It does sort of hold for C (as per cppreference. Also C is often used in conjunction with POSIX system code where this is an explicit rule (see link in PR description).

I guess it's a form of Hungarian notation that does make sense, yes, but also leads to many false positives for C++.

Technically speaking the _t false positives like std::decay_t are types too (they're type aliases), but highlighting them as primitive types is not correct IMHO. There's quite an extensive set of trait helpers (i.e. many potential false positives), and accidentally making a special case out of every alias for these by using CPP_PRIMITIVE_TYPES is inconsistent with the wealth of other types in the language/library that are not treated as special.

@klmr
Copy link
Contributor

klmr commented Jun 10, 2021

Yeah, this looks generally good to me.

For consistency I would consider adding clock_t, time_t, sig_atomic_t, and, potentially, jmp_buf, fpos_t, va_list, FILE, streamoff and streamsize. They’re all in the same category: typedefs to primitive types (or, in the case of va_list, fpos_t, and FILE, to an opaque struct and in the case of jmpbuf to an array).

Also, what about the atomic typedefs from C?

@krisvanrens
Copy link
Contributor Author

For consistency I would consider adding clock_t, time_t, sig_atomic_t, and, potentially, jmp_buf, fpos_t, va_list, FILE, streamoff and streamsize. They’re all in the same category: typedefs to primitive types (or, in the case of va_list, fpos_t, and FILE, to an opaque struct and in the case of jmpbuf to an array).

Yes I guess that makes sense. Then in the meantime the C++ implementation has better C-highlighting than the C implementation 😏

Also, what about the atomic typedefs from C?

Yeah this makes sense as well. I will change the implementation to a regular expression like you suggested here before.

@krisvanrens
Copy link
Contributor Author

Hey!

I'm sorry for leaving this PR open for so long. I haven't had much time to work on it.

Now I created an inventory of all the additional types that are considered special. I also added some extra test content to cover them. In the current implementation the additional types are in an array ADDITIONAL_TYPES that is merged with array RESERVED_TYPES to fill the type field of CPP_KEYWORDS. However, I'd now like to create a regular expression to replace ADDITIONAL_TYPES.

Can anyone help me to have this regular expression ADDITIONAL_TYPES_RE working alongside the array RESERVED_TYPES? I tried a couple of things but my knowledge of HL.js data structures is lacking..

Any help is appreciated! Thanks in advance 😄

@joshgoebel
Copy link
Member

However, I'd now like to create a regular expression to replace ADDITIONAL_TYPES.

This isn't necessary, as compression will clean up much of the "wasted" space... and the readability aids future maintenance.

Technically speaking the _t false positives like std::decay_t are types too (they're type aliases), but highlighting them as primitive types is not correct IMHO.

I'm still hoping to get another pair of eyeballs on this as I still worry about killing the generic _t rules, even if it's not 100% "correct". The question is whether the highlighting overall does more harm than good (for the complexity), not whether it's 100% correct. And we've never defined harmful as "anything less than 100% correct". :-)

Then in the meantime the C++ implementation has better C-highlighting than the C implementation

When we're thru here if you're willing to circle back and fix C that's be great... but now that they have diverged such is the possibility, that one will become better than the other...

@krisvanrens
Copy link
Contributor Author

However, I'd now like to create a regular expression to replace ADDITIONAL_TYPES.

This isn't necessary, as compression will clean up much of the "wasted" space... and the readability aids future maintenance.

True, true, but I'm not sure it has any impact on the speed for instance. With the types added that Konrad suggested, it's quite a list now (92 items with a fair bit of partly overlapped words).

Technically speaking the _t false positives like std::decay_t are types too (they're type aliases), but highlighting them as primitive types is not correct IMHO.

I'm still hoping to get another pair of eyeballs on this as I still worry about killing the generic _t rules, even if it's not 100% "correct". The question is whether the highlighting overall does more harm than good (for the complexity), not whether it's 100% correct. And we've never defined harmful as "anything less than 100% correct". :-)

Well yeah another vote on how to handle this is a good idea. I feel, as a drive-by contributor, I should only provide advice about this matter. Personally I lean towards the treating-additional-types-special approach, but it's a trade-off between simplicity + some false positives vs. complexity + (IMHO) more correct (not 100%) highlighting.

Then in the meantime the C++ implementation has better C-highlighting than the C implementation

When we're thru here if you're willing to circle back and fix C that's be great... but now that they have diverged such is the possibility, that one will become better than the other...

Yes, I would like to do this. But due to summer holidays etc. etc. I won't be able to pick this up until late August probably.

@joshgoebel
Copy link
Member

True, true, but I'm not sure it has any impact on the speed for instance. With the types added that Konrad suggested, it's quite a list now (92 items with a fair bit of partly overlapped words).

92 is not a large list - we have languages with thousands of keywords. There is no run-time performance impact when using the keywords system - the lookups are done via pre-compiled object literal. There might be some impact if a MODE was used (with an either regex), but even then 92 is not a huge list and we typically still prefer readability over maximum theoretical performance.

IF speed became a higher priority I would prefer we somehow do this type of conversion at build or compile time - rather than hard code it into the source, making the source much harder to work with.

@krisvanrens
Copy link
Contributor Author

OK, I pushed the update of the additional type list for now and the extension of the tests to verify all of these.

I hope someone else with a C++ background can chime in to discuss this PR!

Copy link
Contributor

@klmr klmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@krisvanrens krisvanrens closed this by deleting the head repository Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants