Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# flavor #156

Closed
shiralizadeh opened this issue Oct 22, 2014 · 140 comments
Closed

C# flavor #156

shiralizadeh opened this issue Oct 22, 2014 · 140 comments

Comments

@shiralizadeh
Copy link

Hi regex101,
Can you add C# to your Flavor section? If you want I can help you for this.

Thanks,
Shiralizadeh

@3F
Copy link

3F commented Oct 22, 2014

@shiralizadeh :( see #124

You will need to use a PCRE-compatible syntax.
#124 (comment)

+1 also need !

@firasdib
Copy link
Owner

As with all the other requests for more flavors, there's not much I can do right now.

@MulleDK19
Copy link

+1
It's sad I can't use patterns like "(?<=(.*?))HELLOWORLD", while .NET supports this.

.NET still has the best RegEx engine.

@MNF
Copy link

MNF commented May 30, 2016

I needed to use named capturing groups and .Net has different syntax for it.
I had to use http://regexstorm.net/tester and it is not comparable-RegEx101 is significantly user friendlier. Having .Net flavor will be very useful.

@JohnLouderback
Copy link

This would be a very welcomed feature! 👍

@AnderssonPeter
Copy link

AnderssonPeter commented Jan 9, 2017

What is hindering you form adding support for the .net regex engine?
You have multiple options for running .net under non windows environments if that´s the problem.

@TWiStErRob
Copy link
Collaborator

TWiStErRob commented Jan 13, 2017

@AnderssonPeter engines are currently all client-side, so only JS would be considered IMO. @firasdib https://github.com/bridgedotnet/Bridge/tree/master/Bridge/Resources/Text/RegularExpressions package looks good. With a bit of work (namely implementing Bridge.define) it may be useful.

@laschon
Copy link

laschon commented Jun 28, 2017

The variable-width lookbehind support in .NET is very useful and not easy to imitate using PCRE.

@Doqnach
Copy link
Collaborator

Doqnach commented Jun 28, 2017

Has the variable width look-behind been made easier to implement maybe, now that python regex also supports it?

@bjorndavis
Copy link

Would love a C# flavor - this is the best RegEx site I've found, but unfortunately I can't always use it with C#.

@DazEdword
Copy link

I concur here, this is clearly my favourite Regex online tool and would love to have full C# support. Thanks for your hard work!

@Shane32
Copy link

Shane32 commented Jul 2, 2018

If it at least could support copy/paste to c# it would be extremely helpful

@mavericksevmont
Copy link

+1, Best regex site, would be amazing if it had the C# regex flavor!

@Kaon68
Copy link

Kaon68 commented Sep 12, 2018

I use regexhero for .Net, but an online version would be great.
+1 for a .Net (C#) flavor :)

@OmrSi
Copy link

OmrSi commented Jan 30, 2019

Ah. +1

@joshuaquiz
Copy link

+1 referencing named groups is shown in here as \g but for .NET it is \k. There are other small differences that it would be handy to be able to know/learn/lookup.

@Doqnach
Copy link
Collaborator

Doqnach commented Feb 5, 2019

There are other small differences that it would be handy to be able to know/learn/lookup.

This resource can help you with that: https://www.regular-expressions.info/dotnet.html

@chucker
Copy link

chucker commented Feb 20, 2019

engines are currently all client-side, so only JS would be considered IMO.

With WebAssembly (and Mono-Wasm in particular), this theoretically changes. This is clearly in early stages and a lot of overhead, but might be something to be considered for the long run.

@AlbertoMonteiro
Copy link

AlbertoMonteiro commented Feb 11, 2022

@firasdib I tried with AOT version, total time was reduced by 20%, 495ms in the first execution then in next executions it when down to ~300ms, so 2x time improvement(compared to no AOT version);
In the dotnet side, it evaluated in 45ms
image

Btw I am not using LINQ anymore @Shane32

@AlbertoMonteiro
Copy link

Small text times: ~3ms total

image

@firasdib
Copy link
Owner

firasdib commented Feb 11, 2022

@AlbertoMonteiro Its still too slow, native JS (which of course we can never match) is <1ms, and we should ideally be way below 50ms (for the large text tests)

@AlbertoMonteiro
Copy link

AlbertoMonteiro commented Feb 11, 2022

@firasdib I agree that faster is good, but looking for what regex101 currently supports, we already have engines that process regex slower

image

Input

Hi regex101,
Can you add C# to your Flavor section? If you want I can help you for this.

Thanks,
Shiralizadeh

Regex

.

Results in Regex101

Engine Result
PCRE2 (PHP >=7.3) 106 matches (216 steps, 2.2ms)
PCRE (PHP <7.3) 106 matches (217 steps, 2.1ms)
ECMAScript (JavaScript) 106 matches (0.7ms)
Python 106 matches (217 steps, 2.0ms)
Golang 106 matches (26.2ms)
Java 8 106 matches (2.4ms)

My test

Engine Result
.NET 6.0 106 matches (5.5ms)

Also, what do you consider large text?

@Shane32
Copy link

Shane32 commented Feb 11, 2022

Certainly seems fine to me, especially considering the speed of the other flavors. As you said @firasdib, it certainly cannot hope to compete with native javascript. That is going to be true of any other flavor. Even if the .NET source was transpiled into javascript, so there was no IL interpretation or anything going on, it would be much slower than native javascipt. Just because we wish it to be faster does not mean that it can be.

So @firasdib I'm a bit confused as to what you wish for that could possibly be better than the .NET source running as webassembly. I would be happy if the site just interpreted the text boxes in C# style and ran the native javascript regex. (No, the code generator isn't what I want.) But you've expressed that if the site supports a "flavor", it should run the exact actual implementation. This is exactly that -- it runs the exact .NET source -- and it won't ever get much faster than AOT compiling the .NET IL into bytecode and running it as webassembly. And when it runs in the neighborhood of the other flavors - slower than PHP and faster than golang, I'd think you'd be more concerned about download size than speed.

Personally I'm quite impressed both that it runs as fast as it does, and that the download size is down to 1MB, which although a bit large, seems acceptable. (I would wish for the download size to be 100kb, but it isn't.)

Maybe .NET 7 will improve on what we have with .NET 6. Perhaps the IL interpreter or AOT compiler will get better. Perhaps it can strip more unnecessary code out. I think you should decide if it is acceptable today @firasdib regardless of what we wish it to be or what it could be in the future. I would hope you do find it acceptable, but I understand if you do not.

The other aspect you may wish to consider is: are people that find the .NET flavor useful willing to put up with a slower download or slower execution than ideal? I think the answer is yes. I certainly would use C# flavor even if it were notably slower, for the convenience that it offers.

@geekley
Copy link

geekley commented Feb 11, 2022

<!-- Remove some unused features. Shrinks the published app by ~700KB. -->
<InvariantGlobalization>true</InvariantGlobalization>

Does that mean that the site will force the flag RegexOptions.CultureInvariant to be enabled?

Or should all flags be supported? It seems not all of them are available to be enabled inline (in the regex itself) and some flags are incompatible with others ...

@AlbertoMonteiro You might want to include an option in your page to set flags, so you can test at least the behavior of InvariantGlobalization and CultureInvariant flag.

@geekley
Copy link

geekley commented Feb 11, 2022

Personally I'm quite impressed both that it runs as fast as it does, and that the download size is down to 1MB, which although a bit large, seems acceptable. (I would wish for the download size to be 100kb, but it isn't.)

I don't know anything about Blazor, but it seems from the README it's generating .br brotli files, but the github pages site is serving them as gzip instead. Maybe brotli can bring the download size down a bit?

Also, (again I know nothing about blazor but) maybe it's possible to remove whatever dll is doing JSON parsing and serializing (Microsoft.JSInterop.JSInvokable attribute?) by handling the data manually in some other way? Though trying that might not be worth the effort. I dunno, just throwing an idea here.

@firasdib
Copy link
Owner

I think I need to clear up some assumptions that have been made here, @Shane32 and @AlbertoMonteiro

To start, the times you are seeing on the website include warmup of the JS engine, as well as setup of the web worker. These are not comparable. If I run a direct call to the PCRE engine (which also runs under WASM), for the text input you provided, we are within margin of error (<1ms). This is why I expect a direct call to the engine to be faster than the times we are getting, or we will have issues when it finally lands on the site (where additional latencies will stack up). If you want to compare apples to apples, you could have a look at the Java flavor, which is considerably faster.

The amount of files and the size will put additional load on the server, and come with additional costs. Granted, this isn't my primary concern, its more that of the end user, which might not be as fluid or smooth as other flavors. A large input is a mix of regex complexity and input string, but I normally use a simple regex that enforces a consistent amount of steps, e.g. /./g and then a long input string, say 20,000 a. If this can be matched in <100ms, this is a slow but acceptable implementation. If its below <50ms its ok, and if its below 20ms its good.

I want to stress that I appreciate your help and time, and I'm not trying to work against you, I'm simply trying to create the best experience for everyone.

Note: Golang is an exception, it runs poorly optimized and will be improved by 500% in the upcoming release

Now, that's out of the way, we can focus on whats important: getting this done. If we analyze the numbers provided by @AlbertoMonteiro, it would seem that the .NET code is finishing very fast, but the transfer between WASM and JS is very inefficient. Perhaps there is some sort of serialization going on that we can avoid? Perhaps we can access the memory directly (this is what I do in other flavors)?

The web server supports brotli, which will help reduce the file size a bit further.

@AlbertoMonteiro
Copy link

I don't know anything about Blazor, but it seems from the README it's generating .br brotli files, but the github pages site is serving them as gzip instead. Maybe brotli can bring the download size down a bit?

@geekley the blazor webassembly publish process already generaters .gz(gzip) and .bt(brotli) files within the other file without compression.
GitHub pages doest only serves gziped files when they have and specific extension, that doesnt happen when it server the .dll files.

You can use the http-server tool (get it from npm) and run this command http-server -g or http-server -b if the request does include the compression header, it will serve the .gz or .bt and then we can see it working fine.

For now this is a github pages limitation, I could publish it on AWS S3 and wont be a problem

@geekley
Copy link

geekley commented Feb 11, 2022

For now this is a github pages limitation, I could publish it on AWS S3 and wont be a problem

Regarding brotli testing if you still want to use github pages, I found this:
https://docs.microsoft.com/en-us/aspnet/core/blazor/host-and-deploy/webassembly?view=aspnetcore-6.0#compression

When hosting on static hosting solutions that don't support statically-compressed file content negotiation, such as GitHub Pages, consider configuring the app to fetch and decode Brotli compressed files: [...]

@AlbertoMonteiro
Copy link

Thanks for the link @geekley
Since this is just an PoC for now its fine, but feel free to send a PR to my repo and we can get it done =D

@AlbertoMonteiro
Copy link

AlbertoMonteiro commented Feb 12, 2022

@firasdib I've started another PoC using other lib Uno.Wasm.Bootstrap, with that lib we got some perfomance improvement.

The repo is here: https://github.com/AlbertoMonteiro/RegexWasmUno

It is hosted here: https://albertomonteiro.github.io/RegexWasmUno/ (AOT) version

For the scenario that I described here: #156 (comment)

Engine Result
Blazor .NET 6.0 106 matches (5.5ms)
Uno .NET 6.0 106 matches (2.5ms)🆕

For 20k text size

Engine Result
PCRE2 (PHP >=7.3) 20000 matches (40 000 steps, 67.4ms)
PCRE (PHP <7.3) 20000 matches (40 001 steps, 52.8ms)
ECMAScript (JavaScript) 20000 matches (31.3ms)
Python 20000 matches (40 001 steps, 649.3ms)
Golang timeout
Java 8 20000 matches (181.2ms)
Uno .NET 6.0 20000 matches (480.3ms)
AOT Uno .NET 6.0 20000 matches (145.1ms) 🔥

Open developer tools to see the console logs that log the measurement of the full process

@firasdib
Copy link
Owner

@AlbertoMonteiro Thank you, that looks very nice! I tried to contact you over email, did you see it?

@Shane32
Copy link

Shane32 commented Feb 12, 2022

@AlbertoMonteiro Nice! 🎉 I'm curious - what's the download size, for AOT with the uno library?

@AlbertoMonteiro
Copy link

@AlbertoMonteiro Thank you, that looks very nice! I tried to contact you over email, did you see it?

@firasdib Thanks for pointing out, your message was in spam 😂. I just answered you.

@AlbertoMonteiro
Copy link

@AlbertoMonteiro Nice! 🎉 I'm curious - what's the download size, for AOT with the uno library?

@Shane32 I don't remember the specific value, but when I looked at size using brotli compression it was about 5mb.

In the current gh pages deployment it may be higher since its not and default behavior of gh pages.

@AlbertoMonteiro
Copy link

@Shane32 I changed back to using Blazor, I managed a way to get better performance so it became the same time as Uno PoC and the trimming for Blazor is a lot better than Uno, so the current Blazor required 18 requests and with brotli compression its ~2.4mb.

@firasdib
Copy link
Owner

After more than 7 years since the creation of this issue, and with the help from @AlbertoMonteiro, I can now finally announce that regex101 will receive native support for .NET in the coming days.

We managed to improve the performance to such a degree that it now performs in close proximity to the other flavors, and provides a good user experience.

Where it does fall short is file size, which uncompressed is a whopping ~6.4 MB, which is in stark contrast to PCRE, which is ~270 KB. However, with good compression we are able to get this number down to ~1.7 MB, which to most users with a good internet connection, should not be a big issue. The files are cached in your browser, and only downloaded when the flavor is selected, so there will be no unnecessary transfer of data.

The flavor will be made available in the coming update, which further improves performance of other flavors and generally improves the experience. Full details will be made available in the changelog.

Thank you all for your patience.

@lfr
Copy link

lfr commented Feb 25, 2022

This is a great day for .NET devs 👏

@ypedroo
Copy link

ypedroo commented Feb 25, 2022

Congrats @firasdib and @AlbertoMonteiro and thank you!

@AlbertoMonteiro
Copy link

It was a pleasure to work on that, I had an opportunity to learn a lot of things.
I love this!!!

@L4ZZA
Copy link

L4ZZA commented Feb 25, 2022

Thank you guys!!!

@Shane32
Copy link

Shane32 commented Feb 25, 2022

Great work and thank you!

@AlbertoMonteiro Does this repo https://github.com/AlbertoMonteiro/BlazorAppRegex contain your end result with whatever optimizations you came up with? I'd be interested to take a look to learn from your experiences here.

@AlbertoMonteiro
Copy link

@AlbertoMonteiro Is this repo https://github.com/AlbertoMonteiro/BlazorAppRegex contain your end result with whatever optimizations you came up with? I'd be interested to take a look to learn from your experiences here.

Yeah, @Shane32 at least I don't think that @firasdib changed something else.

We can try to make a call and I can explain the evolution of it, from the first version to the latest one.

@TWiStErRob
Copy link
Collaborator

TWiStErRob commented Feb 28, 2022

Congratulations! Awesome to see this through!

I'd love to read a technical blog post about the story of this. The problems, the solutions, the setbacks, the discoveries, the ups and downs.

We know it's a happy end, but still would be an interesting read, and it would probably benefit others too who are trying to run .NET in the browser.

@ypedroo
Copy link

ypedroo commented Feb 28, 2022

Congratulations! Awesome to see this through!

I'd love to read a technical blog post about the story of this. The problems, the solutions, the setbacks, the discoveries, the ups and downs.

We know it's a happy end, but still would be an interesting read, and it would probably benefit others too who are trying to run .NET in the browser.

This is an awesome idea, I would love to hear more about the story.

@chucker
Copy link

chucker commented Feb 28, 2022

The problems, the solutions, the setbacks, the discoveries, the ups and downs.

The short version is: WebAssembly lets us run non-JavaScript code in the browser. Mono-WASM (and Blazor, which builds on top of that) in particular lets us run .NET/C# in the browser. In order to do so, though, we also need to ship the entire .NET runtime, which takes up space. JavaScript will always have an 'edge' here because browsers already come with the JavaScript runtime out of the box.

Thus, the main challenge is getting the code size small enough to be acceptable. One approach is what .NET calls "linking" (JavaScript calls something similar "tree shaking"), where you analyze the code for members (methods, properties, entire classes) that aren't actually being used anywhere and can be thrown out to save space. Determining this automatically isn't that easy (for example, what if a method is never called regularly, but is called through reflection? This is much harder to find), and It took a while for .NET's linker toolchain to become easy enough to use.

Nonetheless, Regex101 now has a mode where you don't just get a regular expression roughly like .NET's, but exactly like .NET's, because it is literally running .NET.

@firasdib firasdib moved this from Requested to Completed in Flavor Requests Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests