Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cpp): Allow declaring multiple functions and parenthetical initializers #3155

Merged
merged 5 commits into from Apr 22, 2021

Conversation

edemaine
Copy link
Contributor

@edemaine edemaine commented Apr 20, 2021

Fix #3148 by allowing commas within a function declaration, to allow declaring multiple functions, as well as multiple parenthetical initializers.

Unfortunately, syntax like

string a("hello"), b = "world", c("another");

(i.e. a mixture of parenthetical and equals-sign initializers) still isn't parsed correctly, because equals sign currently exits function mode. But this is progress at least.

Checklist

  • Added markup tests
  • Updated the changelog at CHANGES.md (should this go under ## Unreleased or similar section?)

@joshgoebel
Copy link
Member

Showing output from the developer tool visual mode is always nice for stuff like this, but I don't think this is what we want. The first item isn't a function and the second two definitely aren't... I was thinking we'd add , to the end matcher so that as soon as we concluded we were not dealing with a function we'd stop treating it as one.

Is there a reason you went this way instead?

@edemaine
Copy link
Contributor Author

edemaine commented Apr 20, 2021

I tried adding to end as well, but thought this approach was "more correct", getting right the multiple declaration case:

extern void f(int), g(char);

(unless you don't think of that as a function declaration?). It also highlights the parens-as-initializer example better in a visual sense, though you're right that those aren't function declarations so that actual tags are incorrect.

Happy to switch to end if you prefer. That would fix the original "invalid" issue but not improve the output at all.

@joshgoebel
Copy link
Member

joshgoebel commented Apr 21, 2021

extern void f(int), g(char);

OH... all this time I was thinking of variables... can you actually declare function declarations that way? I didn't know that.

Declaration vs definition, correct?

Is this valid syntax in C as well or only C++?

@joshgoebel
Copy link
Member

string a("hello"), b = "world", c("another");

Function isn't really intended to do variable declarations at all, that's just a side-effect of being impossible to distinguish so def a problem for another day.

@edemaine
Copy link
Contributor Author

edemaine commented Apr 21, 2021

Yes, these are function prototypes, which are valid in C and C++. https://en.wikipedia.org/wiki/Function_prototype has some examples. extern makes it more obvious, but they're also valid without. (In C, they might only be valid at the top level, outside a function? I forget.) These are all valid:

int f(int);
int f(int x);
extern int f(int);

It hadn't occurred to me before that C++ has a pretty ambiguous distinction between function prototypes and call-style (parenthetical) initializers:

string f(string);  // function prototype because string is a type
string s(x); // variable declaration because x is a variable

So I don't think highlight.js should aim to distinguish between these; it should probably treat both as variable declarations. (And honestly a function prototype is essentially a variable declaration.) Currently we're calling both of them function definitions.

I think the "right" answer is to only trigger function definition mode when there's a { coming up before the next ;. Is that easy to do? (I'm still new to language definitions.) Then we could add support for parentheses to variable declaration.

@joshgoebel
Copy link
Member

Actually we call it FUNCTION_DECLARATION as the idea is to capture both definitions and declarations with the rule. So I think this is a reasonable fix, but I think we should do it with a rule instead of "hiding" it inside a negative illegal match.

Could you also apply this to the C grammar while we're fixing it for C++?

@joshgoebel
Copy link
Member

string a("hello"), b = "world", c("another");

Another question is does the context really matter... if we slapped everything like blah( with a title.function scope on blah we'd hit everything, without having to parse anything other than local context.

@edemaine
Copy link
Contributor Author

edemaine commented Apr 22, 2021

If I understand your last comment appropriately, you mean the following:

  • When seeing a leading type (like string or vector<int>), enter a "declaration mode".
  • Within declaration mode, if you see blah(, then mark blah as a function title, and the part in parentheses as params.
  • Declaration mode ends with { or ;, and continues through , and =.

A separate declaration mode is needed so that a lone f(arg) (function call) isn't considered a function; if I understand the definition correctly, function is only supposed to be for function declarations, not function calls, which are tagged as built_in. Or is this current behavior incorrect, and f could also be tagged as function?

I agree that this should properly handle cases like string a(int), b = "world", c(float); — only a and c will be marked as function titles. However, it will also mark a and c as function titles in string a("hello"), b = "world", c("another");. But I don't really see how to distinguish that case...

I take it there isn't a goal to annotate variable declarations? Or should b be tagged variable in the above examples?

@joshgoebel
Copy link
Member

joshgoebel commented Apr 22, 2021

A separate declaration mode is needed so that a lone f(arg) (function call) isn't considered a function; if I understand the definition correctly, function is only supposed to be for function declarations, not function calls, which are tagged as built_in. Or is this current behavior incorrect, and f could also be tagged as function?

This is all changing (see the docs in Github). I need to see fix why they aren't getting published to readthedocs any longer.

  • function is now deprecated. We do not care about highlighting the "body" of a declarations and never should have. (plus it was never applied consistently anyways)
  • In many grammars a function mode can actually be avoided entirely (whether scoped or not)
    • I'm not sure C is one of those grammars or not, I haven't given it much thought.
  • we now have title.function and title.function.call
    • A grammar does not have to make the distinction if it's hard/impossible.

not function calls, which are tagged as built_in.

Actually they aren't, they are tagged "function.dispatch" which is aliased back to built_in. This was a stop-gap for v10. With version 11 this should now be just a simple title.function.call. Though most themes don't yet care about the call distinction.

I take it there isn't a goal to annotate variable declarations? Or should b be tagged variable in the above examples?

How would we if they are indistinguishable?


Long term I'd like to see if the entire FUNCTION_DECLARATION rule can be removed, but that's probably out of scope here... so if we just take the comma fix from da7cedd and apply it to C also then I think this is mergeable and we can put off the larger discussion for another day.

@@ -330,7 +330,7 @@ export default function(hljs) {
NUMBERS
]
},
// allow for multiple declarations, ie:
// allow for multiple declarations, e.g.:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL.

@joshgoebel
Copy link
Member

Changelog entry?

@edemaine
Copy link
Contributor Author

edemaine commented Apr 22, 2021

I've ported the changes over to C, and now added to CHANGES, so perhaps this is ready to merge.

I also realized I forgot to git add some "fancy initializers" examples (as in #3148). Let me know if you think those aren't helpful and I/you can remove them.

I can play with a second PR that aims to handle all the cases. I think we can get rid of the FUNCTION_DECLARATION rule if we don't care about distinguishing title.function and title.function.call. If we want to distinguish, I think we need the bit of context (but perhaps it could be simplified).

Thanks for clarifying all my confusion about the new intended labels, which indeed seem much nicer. I was looking at .expect.txt files, which still have built_in classes, I guess from the aliasing you mention. I went to readthedocs because https://github.com/highlightjs/highlight.js/blob/main/docs/css-classes-reference.rst doesn't seem to be formatting correctly (the main table is entirely omitted).

@joshgoebel
Copy link
Member

joshgoebel commented Apr 22, 2021

I can play with a second PR that aims to handle all the cases. I think we can get rid of the FUNCTION_DECLARATION rule if we don't care about distinguishing title.function and title.function.call. If we want to distinguish, I think we need the bit of context (but perhaps it could be simplified).

Well mostly I'm trying to remove huge swaths of complexity when possible - with the idea that grammars should pattern match more than they should parse... If we remove the distinction then I'm pretty sure that EXPRESSION_CONTEXT can go as well... The sticking point is params. params vs arguments is a helpful distinction that historically we've had. C is a harder language to do this with since it does not have a function keyword (which makes things infinitely easier).

I do think that at the least things could be simplified a lot with the new multi-scope tools, but not sure how familiar you are with those.

@joshgoebel joshgoebel merged commit 570bbd5 into highlightjs:main Apr 22, 2021
@joshgoebel
Copy link
Member

    // This mode covers expression context where we can't expect a function
    // definition and shouldn't highlight anything that looks like one:
    // `return some()`, `else if()`, `(x*sum(1, 2))`

I do wonder if this couldn't be built in the function regex detection itself...

@edemaine
Copy link
Contributor Author

I do think that at the least things could be simplified a lot with the new multi-scope tools, but not sure how familiar you are with those.

I'm not sure that I am. Could you recommend something to read? Is there a good language that's been fully modernized and would serve as a good example? (Most still seem to use className over scope...)

@joshgoebel
Copy link
Member

That's just a name change really... all the new stuff is documented in the mode reference docs. beginScope, endScope, etc...

@joshgoebel
Copy link
Member

You could look at the recent commit history as I'm not sure the new key names are used anywhere yet...

@@ -0,0 +1,2 @@
<span class="hljs-function">string <span class="hljs-title">a</span><span class="hljs-params">(<span class="hljs-string">&quot;hello&quot;</span>)</span>, <span class="hljs-title">b</span><span class="hljs-params">(<span class="hljs-string">&quot;world&quot;</span>)</span></span>;
<span class="hljs-function">vector&lt;<span class="hljs-keyword">int</span>&gt; <span class="hljs-title">u</span><span class="hljs-params">(<span class="hljs-number">1</span>)</span>, <span class="hljs-title">v</span><span class="hljs-params">(<span class="hljs-number">2</span>)</span></span>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it is declaring four functions? To me I think it's initializing four variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have found a potential problem please open a new issue rather than commenting on a 3 year old issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(CPP) Declaring multiple variables with parentheticals treated as invalid
3 participants