Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Rewriting from scratch #93

Merged
merged 57 commits into from Dec 21, 2016
Merged

[C#] Rewriting from scratch #93

merged 57 commits into from Dec 21, 2016

Conversation

gwenzek
Copy link
Contributor

@gwenzek gwenzek commented Aug 26, 2015

Hi, I started from scratch a new syntax for C# on a personal repository.
You'll find there several tests. I tried to use old scopes when they existed.

The principal features are:

  • Detection of variables declarations (that's a lot !)
  • Detection of tag: [TestMethod]
  • Support for new C# 6 syntax: $"{1 + 2}", "x?.y", ...
  • Detection of operators
  • Detection of generic types
  • Detection of labels
  • Detection of function calls

I'll be happy to have some feedback on my work.

@wbond
Copy link
Member

wbond commented Jan 29, 2016

Thanks for the work on this!

In order to reasonably consider these changes, we'll need some fairly extensive tests. These will consist of various C# files that are named a specific way, and include comments that indicate what scopes should be applied to what code. I recently added one in the C# folder for a bug someone reported. You can read more about syntax testing by scrolling to the Testing header on http://www.sublimetext.com/docs/3/syntax.html.

Additionally, with dev build 3098 of Sublime Text, a "Performance" variant of the Syntax Tests build system was introduced. This will allow timing the highlighting of the currently open file using the syntax currently selected. Could you find some large-ish C# files and compare the performance of before and after your tweaks, and post them here? That will help us ensure that the regular expressions don't have any performance issues.

@gwenzek
Copy link
Contributor Author

gwenzek commented Jan 29, 2016

Thanks for the feedback!
As mentionned in my PR, I have made several tests that you can see in a separate repo.
Can you have a look at them in order to tell me if you find them sufficient?
I'll add them to my PR soon.

@wbond
Copy link
Member

wbond commented Jan 29, 2016

Sorry, I did miss that! Those tests look like they cover a good amount of ground. If you could just include the existing test I created to ensure your version supports the same features, then I think we'll be in a good place.

Do note that I changed "annotations" to "attributes" in the scope names. From what I can tell, "attributes" is the correct term for C#. I'm obviously happy to be corrected since C# is not something I have worked very much with.

@GuilhermeHideki
Copy link

Attributes seems the "right" word. MSDN link about: https://msdn.microsoft.com/en-us/library/z0w1kczw.aspx

I think you should include one test with the number suffixes because the sublime's version had the c/c++ LL "long long" but didn't have the "decimal" D

@gwenzek
Copy link
Contributor Author

gwenzek commented Jan 30, 2016

I'm not sure it's a good idea to change the scope names to reflect the
vocabulary of a specific language.

My point is that people creating themes shouldn't have to know all languages and their vocabulary.

For example I think the C# 'attributes' should be named with a more conventional label. I used 'entity.name.tag' but it probably should be something like 'entity.name.annotation'.

It would be really nice to have guidelines on scopes naming. The Texmate
documentation have some but their aren't very detailed and let a lot of
things open to interpretation.

@gwenzek
Copy link
Contributor Author

gwenzek commented Feb 3, 2016

Regarding performances, I tested on a 2600 lines C# files made of a concatenation of several files:

Syntax "Packages/User/csharpSyntax/csharp.sublime-syntax" took an average of 51.3ms over 10 runs
Syntax "Packages/C#/C#.sublime-syntax" took an average of 58.0ms over 10 runs

So my version seems a little bit faster even if the syntax itself is bigger (weird, I would have expected the contrary).

@gwenzek
Copy link
Contributor Author

gwenzek commented Feb 4, 2016

Here is a gif showing the differences between the 'old' syntax and my proposition.
Note how the old syntax was broken by generics. I also tried to be more coherent on naming:
double is now marked as a type not as a keyword, like in most other syntaxes.

Differences between 'old' syntax and mine

@buildersbrewery
Copy link

Any updates regarding the status of this PR?

@wbond
Copy link
Member

wbond commented May 19, 2016

@buildersbrewery If I haven't closed an issue, it means that I am still intending to do something with it, or am waiting on changes from the author.

In the case of this PR, I haven't yet gotten to it, but also there are a few things that would need to be resolved:

  1. The tests are written to test a different syntax file (see the SYNTAX TEST lines)
  2. There are merge conflicts with the master branch - existing tests should be preserved to prevent regressions
  3. Snippets are added in addition to tweaking the syntax (they should be a separate PR, and even then I'm not sure I want to open the floodgates of adding snippets to default packages)

I have not reviewed the actual syntax definition yet to see if there are incompatible regex constructs, or if scope names are consistent with the other languages that have been rewritten recently (C++ is one of the most recent, non-trivial examples).

@gwenzek
Copy link
Contributor Author

gwenzek commented May 19, 2016

Thanks @wbond for the feedback,
I'll have a look at the C++ syntax.
I tried to be coherent with myself, but if you want people to follow conventions you should write them down 😃

What do you mean by incompatible regex? Is it just look-behind? Or are they other patterns I should avoid?

@wbond
Copy link
Member

wbond commented May 19, 2016

@gwenzek My plan is to write them down, I have just been using my experience of rewriting a bunch of the syntaxes to figure out what needs to be written down. I've covered a number of the languages with C-like syntax, however there are a number that I have not yet. I will definitely provide feedback about the scope names once I have time to review this PR in-depth. Part of this whole process of getting all of the syntaxes up-to-date is to make the scopes more consistent so we can document them for syntax and color scheme authors.

The Syntax Tests build system has a number of variants (ctrl+shift+b), one of which allows testing the compatibility with the new regex engine in Sublime Text. Moving forward we are removing all regexes that use incompatible features because the new regex engine is so much faster. This allows the highlighting of files and the indexer to run faster. There may be some edge cases of constructs from Oniguruma we aren't supporting in the new engine yet, so feel free to report those. We won't, however, be supporting lookbehinds, backreferences or atomic subgroups (those are generally unnecessary with the new engine). This has to do with the fact that the new engine is not a backtracking engine.

@gwenzek
Copy link
Contributor Author

gwenzek commented May 19, 2016

@wbond I removed the backreferences, the syntax is now compatible with the new regex engine.

I added back the tests that you added in the meanwhile but I don't have exactly the same convention for scopes.
In particular I feel that the scope for "attribute" should be something else than meta.method.attribute.
Attribute or decorators exists in several language, their should be a real scope for that.

Also I'd like to have a scope for the . I currently use keyword.operator.accessor as others but I think it should rather be a punctuation.separator.something, what do you think ?

My syntax does a great job at handling type parameters, variable declarations and method declarations.
It's also compatible with the last C# 6 syntax so it really improves the readability of C# code.

@buildersbrewery
Copy link

keyword.operator.accessor makes perfectly sense for languages where you access the x part of a vector via float x = v.x;

@buildersbrewery
Copy link

Really depends on the context.

@FichteFoll
Copy link
Collaborator

FichteFoll commented May 19, 2016

The current convention is to scope the dot as punctuation.accessor, but imo it should be punctuation.accessor.dot in this case.

Most languages just use punctuation.accessor but imo adding the dot makes sense since there are other accessors in different languages. Take a::b or a->b for example, which could be punctuation.accessor.double-colon and punctuation.accessor.arrow.

@gwenzek
Copy link
Contributor Author

gwenzek commented May 20, 2016

Thanks for the suggestion @FichteFoll ,
I renamed keyword.operator.accessor to punctuation.accessor.dot in 0e33931,
with a distinction between dot and double-colon.

Also in C# 6 x?.foo means if (x == null) null else x.foo.
I wasn't sure how to scope that, so I just put punctuation.accessor.dot on both.
I didn't wan't to put a keyword.operator on the ? to avoid bringing too much attention on it.
The point of this syntax is to avoid boilerplate, so I didn't want to create visual boilerplate by adding another scope here.

@gwenzek gwenzek changed the title Enhanced C# syntax [C#] Rewriting from scratch May 25, 2016
gwenzek added a commit to gwenzek/csharpSyntax that referenced this pull request May 25, 2016
Improvements from sublimehq/Packages#93 .
Removed backreferences.
Renamed some scopes.
@michaelblyons
Copy link
Collaborator

I took a look at this a few months ago (not the source, but the highlighting) and I wasn't pleased. However, revisiting recently, I am much happier with your syntax than the default.

I did notice something odd with one specific case, though. If you don't use implicit typing on declarations with <T> typing, the first instance of the type is incorrectly marked as a function:

MyType foo = new MyType();
//^^^^ variable.other.type.cs
List<int> bar = new List<int>();
//^^ variable.function.cs

Another thing that I think is a little odd is the handling of [] and ? on types, and some other miscellany:

int? foo = 4;
// ^ keyword.operator.ternary
// It's not really a ternary operator.

string[] bar = {"baz"};
// ^^^^^ support.type.cs
//    ^^ - punctuation

var foo = typeof(int);
//        ^^^^^^ keyword.operator.reflexion
// Is this really an operator? Also, misspelling of "reflection" ?

Overall, though, I think this is an improvement over the default syntax def.

@gwenzek
Copy link
Contributor Author

gwenzek commented Jun 22, 2016

Thanks for testing @michaelblyons.
I fixed the first bug, my look-ahead detecting function-call versus variable declaration was fooled by the =.
I also corrected the misspelling of "reflection".

In the current version of this PR I mark int? and string[] as support.type.
For me the ? and the [] are just a short-name for Option<T> and Array<T> so I don't feel they should receive a special treatment.
But it's quite easy to add a more precise scope, I'm just not sure what they should be.

The typeof receives the same scope as as and is which are more operators, maybe I could just use a support.function.

gwenzek pushed a commit to gwenzek/csharpSyntax that referenced this pull request Jun 22, 2016
@gwenzek gwenzek force-pushed the master branch 3 times, most recently from cb22453 to 35c21bb Compare July 2, 2016 07:34
@keith-hall
Copy link
Collaborator

When creating an anonymous type, like:

new {hello = "world", foo = "bar"}

The { and } characters get the scope punctuation.definition.array.begin.cs and punctuation.definition.array.end.cs respectively. This may be a nitpick, but it's not an array... :)

Maybe it would be possible to check whether [] precedes the { character, so that:

new[] { "hello", "world" }

will still be correctly recognized as an array.

@michaelblyons
Copy link
Collaborator

I am pretty excited about this getting all the way through. I use it as my primary C# syntax now, but that has illuminated a couple more frustrations.

Fragments of code containing >1 function are not highlighted correctly:

string Permute(string input)
{
    char[] charArray = input.ToCharArray();
    Array.Reverse(charArray);
    return new string(charArray);
}

// this function has no highlighting
string DontPermute(string input)
{
    return input;
}

This guy doesn't work as a fragment either:

List<string> Frag
{
    get
    {
        var list = new List<string>();
        list.Add("foo");
        list.Add("bar");
        list.Add("baz");
        return list;
    } // <-- This and the next "}" are marked "invalid"
}

Both work inside a class{...} context.

@keith-hall
Copy link
Collaborator

Can we get it to scope the placeholders in string.Format("{0} {1}", "hello", "world") like Python does? i.e. so the {0} and {1} should be constant.other.placeholder.

@gwenzek
Copy link
Contributor Author

gwenzek commented Oct 1, 2016

@LoneBoco the regression regarding generics has been fixed.
@LoneBoco the #region now works as described in the ECMA C# reference.
@keith-hall #646 "Hello {0}" is now highlighted like in Python.

@michaelblyons Your first example is working with the current version. It has been added recently.
I also fixed the second one

@michaelblyons
Copy link
Collaborator

@gwenzek Thanks! I notice that the first case is still weird with access modifiers.

private string Permute(string input)
{
    char[] charArray = input.ToCharArray();
    Array.Reverse(charArray);
    return new string(charArray);
}

// this function has no highlighting
private string DontPermute(string input)
{
    return input;
}

@gwenzek
Copy link
Contributor Author

gwenzek commented Oct 2, 2016

@michaelblyons thanks, the "visibility" modifiers weren't allowed inside code, so I added them.
I principally tested my syntax with fully compiling code (for browsing), so there might be other hiccup ahead.

@LoneBoco
Copy link
Contributor

LoneBoco commented Oct 3, 2016

@gwenzek Thanks, regions and generics are looking good.

public class B
{
    public void Test()
    {
        int test = 1;
        test = (test) ? 1 : 0;
    }
}

Found an odd issue with the ternary operator. It looks like it thinks the (test) is a cast operator. I'm not sure how difficult that would be to fix. The test for casting probably needs to see if we have a variable name after it, or an opening parenthesis.

@gwenzek
Copy link
Contributor Author

gwenzek commented Oct 3, 2016 via email

@keith-hall
Copy link
Collaborator

keith-hall commented Nov 16, 2016

There is a slight inconsistency for generic types:

private Dictionary<string, object> Data;

...

this.Data = new Dictionary<string, object>();

currently, the < is punctuation.definition.arguments.type.begin.cs but the > is punctuation.definition.generic.end.cs


also, a similar situation with attributes:

[ServiceBehavior(Namespace = "http://test/", InstanceContextMode = InstanceContextMode.PerCall)]

the opening [ is punctuation.definition.tag.begin.cs but the closing ] is punctuation.definition.annotation.end.cs. also, the scope for ( is missing the .cs suffix

@keith-hall
Copy link
Collaborator

the Nullable shortcut ? isn't scoped in property definitions, and is scoped as keyword.operator.ternary.cs for casts (but I know you plan to work on casts anyway):

public DateTime? FirstRun
{
  get
  {
    return (DateTime?) this.GetValue("start");
  }
  set
  {
    this.StoreValue("start", (object) value);
  }
}

@michaelblyons
Copy link
Collaborator

Is this going to get merged sometime soon? I've been using it for months.

From time to time I find bugs (generally minor), but I hesitate adding them to this ticket because I don't want to slow it down any further.

@wbond
Copy link
Member

wbond commented Dec 19, 2016

@michaelblyons I will try to spend some time this week going through this again. I believe I only got part-way through last time.

@wbond
Copy link
Member

wbond commented Dec 19, 2016

I started going through this, but rather than make @gwenzek go through and apply a bunch of things I know need to change, I am going to merge it and make the changes that I know need to be made for consistency.

Once that is done I can get feedback from anyone interested about tweaks where I may have changed something I shouldn't have.

Hopefully this will be more efficient than a bunch more back-and-forth.

BTW, thanks for your consistent effort on this @gwenzek, it is much appreciated! And thank you @LoneBoco and @michaelblyons for using it and providing feedback.

@wbond
Copy link
Member

wbond commented Dec 19, 2016

Quick question: I presume the type int is built-in to the language and not added by a framework?

@michaelblyons
Copy link
Collaborator

RE int: Yes, it's a built-in.
RE merging: Thanks so much!

@gwenzek
Copy link
Contributor Author

gwenzek commented Dec 19, 2016

Thanks @wbond for taking up that work !

I know that a lot of things still have to be done, but I haven't got much time for open source lately,
and as I don't use C# at work anymore, my motivation kinda diminished.
Thanks for all the helpful feedback and sorry for not taking the time to finish it properly.

@wbond wbond merged commit 6b7854d into sublimehq:master Dec 21, 2016
wbond added a commit that referenced this pull request Dec 21, 2016
@wbond
Copy link
Member

wbond commented Dec 21, 2016

Ok, so I just finished updating the rewrite to be consistent with our current scope guidelines. I added a ton of test assertions, and I think I checked the various bug reports in this PR, but I may have missed some.

If you use C#, please put the new syntax through its paces, and provide snippets of any broken highlighting so we can get it fixed up for the next dev build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants