Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(cpp) grammar confused by complex Template << >> patterns #2102

Closed
sylveon opened this issue Aug 13, 2019 · 21 comments · Fixed by #2752
Closed

(cpp) grammar confused by complex Template << >> patterns #2102

sylveon opened this issue Aug 13, 2019 · 21 comments · Fixed by #2752
Labels
bug help welcome Could use help from community language

Comments

@sylveon
Copy link
Contributor

sylveon commented Aug 13, 2019

This code results in everything after L':' to be highlighted as if it was a string: (Editor: This is now resolved, the second issue still standing).

template<typename T>
class PropertyChangedBase {
public:
    winrt::event_token PropertyChanged(const winrt::Windows::UI::Xaml::Data::PropertyChangedEventHandler &value)
    {
        return m_propertyChanged.add(value);
    }

    void PropertyChanged(const winrt::event_token &token)
    {
        m_propertyChanged.remove(token);
    }

protected:
    template<typename U>
    void compare_assign(U &value, const U &new_value, std::wstring_view name)
    {
        if (value != new_value)
        {
            value = new_value;
            name.remove_prefix(name.find_last_of(L':') + 1);
            m_propertyChanged(*static_cast<T *>(this), winrt::Windows::UI::Xaml::Data::PropertyChangedEventArgs(name));
        }
    }

private:
    winrt::event<winrt::Windows::UI::Xaml::Data::PropertyChangedEventHandler> m_propertyChanged;
};

On this one, it just completely stops highlighting after the second is_streamable

namespace impl {
    template<typename T, typename = void>
    struct is_streamable : std::false_type { };

    template<typename T>
    struct is_streamable<T, std::void_t<decltype(std::declval<std::wostream &>() << std::declval<T>())>> : std::true_type { };

    template<typename T>
    inline constexpr bool is_streamable_v = is_streamable<T>::value;
}

// Disable overload for already valid operands.
template<class T, class = std::enable_if_t<!impl::is_streamable_v<const T &> && std::is_convertible_v<const T &, std::wstring_view>>>
std::wostream &operator <<(std::wostream &stream, const T &thing)
{
    return stream << static_cast<std::wstring_view>(thing);
}

Both pieces above is completely correct code which compiles, assuming the right headers are included

@joshgoebel
Copy link
Member

The actual issue: #1412

@sylveon
Copy link
Contributor Author

sylveon commented Oct 4, 2019

This is indeed the first issue, but the second piece of code does not feature any wchar_t character literals.

@joshgoebel
Copy link
Member

If i could read that code I might try fixing this, but I've never fully understood the crazier C/C++ syntactic stuff... I'm guessing it gets stuck inside "struct" and doesn't find what it expects so it never escapes, and that breaks the rest of the parsing.

@sylveon
Copy link
Contributor Author

sylveon commented Oct 4, 2019

It's possible that the mess of < and > on that line is what breaks it.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 4, 2019

It's possible that the mess of < and > on that line is what breaks it.

It's highly likely. :-) But since I don't understand C++ enough (at this level) [templates] I don't even know what's possible and what isn't, and hence no way I could hope to alter the parser for this without potentially just making it even worse.

I actually don't see any support for templates in the parser right now.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 4, 2019

Nevermidn it has it:

      {
        className: 'class',
        beginKeywords: 'class struct', end: /[{;:]/,
        contains: [
          {begin: /</, end: />/, contains: ['self']}, // skip generic stuff
          hljs.TITLE_MODE
        ]
      }

It doesn't look like the < and > pairs match to me (in your code sample)... so that would be a problem.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 4, 2019

Is << always guaranteed to be an operator or is there anything in C/C++ where you can have a nested <<something>> expression... if it's ONLY ever an operator this could probably be fixed by adding a check to eat up any extra <<s as such and that would bring the brackets back into balance.

@sylveon
Copy link
Contributor Author

sylveon commented Oct 5, 2019

AFAIK, << is always an operator (if there's cases where it can not be, I have never encountered them)

In this snippet, it is an operator, albeit it's very hidden (basically it evaluates if someiostream << someT is correct, and if it is, gets what type this expression results in)

@sylveon
Copy link
Contributor Author

sylveon commented Oct 5, 2019

Note that >> can not be an operator, mostly in cases where templates are nested, eg std::vector<std::vector<int>>

@joshgoebel
Copy link
Member

AFAIK, << is always an operator

Well, as I said if that is a given then this should be fixable. :) Assuming that's the core issue, which seems likely.

@sylveon
Copy link
Contributor Author

sylveon commented Oct 5, 2019

Just checked, << is indeed always an operator

@joshgoebel
Copy link
Member

I opened a PR for the string issue: #2156

@egor-rogov
Copy link
Collaborator

Fixed

@sylveon
Copy link
Contributor Author

sylveon commented Oct 6, 2019

This hasn't resolved the second snippet

@joshgoebel
Copy link
Member

Yeah, this is still open. Someone else might have to look into second issue.

@egor-rogov egor-rogov reopened this Oct 7, 2019
@joshgoebel joshgoebel added bug help welcome Could use help from community labels Oct 9, 2019
@joshgoebel joshgoebel changed the title Highlight.js dies at some C++ code (cpp) grammar confused by complex Template << >> patterns Oct 9, 2019
@paul-reilly
Copy link

I came across this today, a function with a trailing return type appears to break highlighting too:

auto foo() -> void;

@joshgoebel
Copy link
Member

That is likely a separate issue.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 14, 2020

@klmr Lets step back and take a second look at this... I worry we've made it too complex... all the dealing with nested ( and < really feels like we've crossed the border and are now parsing the text - knowing far more than we should about the context.

What about:

diff --git a/src/languages/c-like.js b/src/languages/c-like.js
index ece09524..47afb47d 100644
--- a/src/languages/c-like.js
+++ b/src/languages/c-like.js
@@ -132,7 +132,7 @@ export default function(hljs) {
     // definition and shouldn't highlight anything that looks like one:
     // `return some()`, `else if()`, `(x*sum(1, 2))`
     variants: [
-      {begin: /=/, end: /;/},
+      {begin: /=/, end: /[;<>]/},
       {begin: /\(/, end: /\)/},
       {beginKeywords: 'new throw return else', end: /;/}
     ],
@@ -225,10 +225,10 @@ export default function(hljs) {
       },
       {
         className: 'class',
-        beginKeywords: 'class struct', end: /[{;:]/,
+        beginKeywords: 'class struct', end: /[{;:<>,]/,
         contains: [
           { beginKeywords: "final" },
-          {begin: /</, end: />/, contains: ['self']}, // skip generic stuff
           hljs.TITLE_MODE
         ]
       }

IE, just terminating most rules when they see a < or > that could be problematic... falling back into the normal highlighting loop. This does mean class T would be title highlighted everywhere, but I don't see that as a serious issue.

I also see a tiny issue with something like class = std but that could be fixed with a guard.

What would be a few other cases this would not handle so well?

@klmr
Copy link
Contributor

klmr commented Oct 14, 2020

@joshgoebel Despite my radio silence I’be been mulling this over and I’ve reached a similar (though not quite the same) conclusion. It’s definitely a good fix in the first instance, and we can add improved heuristics later. One thing I’d probably do is to collect test cases that currently fail in a separate PR or issue.

If you want to apply your diff as a “quick fix” go ahead, I think it shouldn’t work too too badly. I would add union (and probably also enum and) to the beginKeywords: they both initiate type declarations, and union (but not struct) can even be a template in C++.

Something else that I just thought of: how would this handle C++11’s enum class construct?

enum class Boolean : char {
    True, False, FileNotFound
}

@joshgoebel
Copy link
Member

how would this handle C++11’s enum class construct?

Does that require special handling? Is enum just a keyword there and it would just work with the existing rules?

@klmr
Copy link
Contributor

klmr commented Oct 14, 2020

It currently doesn’t require special handling but once added to the beginKeywords, the snippet in my last comment is highlighted as follows:

<class><keyword>enum</keyword> <title>class</title> <title>Boolean</title> :</class> <keyword>unsigned</keyword>…

… but adding class struct to the nested beginKeywords: "final" fixes that (both enum class and enum struct are valid, because of course they are).

joshgoebel added a commit that referenced this issue Oct 15, 2020
* fix(cpp) fix class/template parsing issues. Fixes #2716. Fixes #2102.
* enh(cpp) add support for `enum struct` and `enum class`
* add support for `union`
* minor file ending with CR changes for some `markup` tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help welcome Could use help from community language
Projects
None yet
6 participants
@joshgoebel @klmr @sylveon @paul-reilly @egor-rogov and others