-
-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to use \"" in the .net 7 regex #2286
Comments
Can you post a regex101 link? I can't replicate it. |
Seems like the Problem might be related to you using verbatim strings which don't use
|
@md-at-slashwhy Can you confirm this is how you input the string? https://regex101.com/r/XL7xtQ/1 |
Yes, that was my input.
Which I think might be the intended error report. I could imagine the report did contain backslashes which haven't been escaped and thus don't show up in the rendered Markdown.
would fit. It would also work for the updated code snippet that would be created:
EDIT: Just saw the title, so I'm pretty sure that markdown formatting is the issue here. Also tried with a test string of |
Hi, you're right I used the same input described in https://regex101.com/r/L8u10o/1. The produced C# code by regex101 has no issue. the problem lies is the website interpretation of the regex which makes debugging quite hard. |
The problem seems to lie in the fact, that the page uses |
TL;DR I think I'm getting the point of confusion here. If .NET doesn't care about the Firas can probably clear this up since I'd just be making assumptions. I generally treat .NET regex as an outlier/exception to the rule for regex101.com. I know it's javascript parsing the input based on that flavor's expected rules and capabilities. Here are a couple things that might at least offer some insight into the behavior:
// this confuses the living snot out of .NET 7.
// It just picks @" rather than @""" with a single regex token: \w+
// I don't know how it chooses it.
string pattern = @"""\w+""";
// This one refuses to be considered @""" also. Instead, @" which results in
// a ", followed by a space character, and so on.
string pattern = @""" \w+ """; |
Hi, when you use @ to begin a .net string it takes most of the string as
is. Only a quote escape sequence ("") isn't interpreted literally; it
produces one double quotation mark.
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/verbatim
Le sam. 8 juin 2024, 07:08, Alan ***@***.***> a écrit :
…
*TL;DR @" ""\w+\"" "; gets passed as "\w+\" to the engine, so maybe treat
"" as a single " in CM's tokenizer when the raw string is @"?*
I think I'm getting the point of confusion here. If .NET doesn't care
about the \ before an escaped " as "", why does the site?
Firas can probably clear this up since I'd just be making assumptions. I
generally treat .NET regex as an outlier/exception to the rule for
regex101.com. I know it's javascript parsing the input based on that
flavor's expected rules and capabilities.
Here are a couple things that might at least offer some insight into the
behavior:
1. regex101.com *does NOT parse* your regex input through that
flavor's string parser and then pass it down to the engine. You're
instructing the regex engine directly. This is usually where the confusion
lies when copy pasting a string version of the regex (instead of its parsed
output) from your favorite programming language and seeing errors on the
site...
2. for @" ""\w+\"" "; specifically, .NET 7 pushes the string below to
the regex engine. The regex engine is correct to not error out for a
superfluous escaping of a non-metacharacter ".
"\w+\"
3. The code generator *does NOT* check your regex's ability to run in
that programming language. It just escapes characters or dresses the
pattern in whatever function is appropriate for the target you pick - all
in javascript.
------------------------------
Off topic-ish: Some stuff .NET 7 does with raw strings that I find
intriguing.
// this confuses the living snot out of .NET 7. // It just picks @" rather than @""" with a single regex token: \w+// I don't know how it chooses it.string pattern = @"""\w+""";// This one refuses to be considered @""" also. Instead, @" which results in // a ", followed by a space character, and so on.string pattern = @""" \w+ """;
—
Reply to this email directly, view it on GitHub
<#2286 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACA26VEBILH4L2AEHVUYUTZGKGU7AVCNFSM6AAAAABIO6FOQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVHAYTEMRRGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Your off-topic question is interesting, if I didn't checked Microsoft
documentation I would have said that only two way to declare a string
existed (string interpolation beginning by $ is excluded from the
following) :
string pattern = "some text";
string pattern = @"some text";
The first way use a standard escaping of special characters with \ like ",
carriage return, unicode characters,...
The second is using the verbatim escaping explained in the previous message.
It seems that since C# 11 (.net 7) a new way is using """ at the beginning
and the end of a string to allow brut string without any escaping sequence
interpretation but it is not preceded by @. However I never used it.
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/
Le sam. 8 juin 2024, 08:18, Franck LEVEQUE ***@***.***> a
écrit :
… Hi, when you use @ to begin a .net string it takes most of the string as
is. Only a quote escape sequence ("") isn't interpreted literally; it
produces one double quotation mark.
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/verbatim
Le sam. 8 juin 2024, 07:08, Alan ***@***.***> a écrit :
>
> *TL;DR @" ""\w+\"" "; gets passed as "\w+\" to the engine, so maybe treat
> "" as a single " in CM's tokenizer when the raw string is @"?*
>
> I think I'm getting the point of confusion here. If .NET doesn't care
> about the \ before an escaped " as "", why does the site?
>
> Firas can probably clear this up since I'd just be making assumptions. I
> generally treat .NET regex as an outlier/exception to the rule for
> regex101.com. I know it's javascript parsing the input based on that
> flavor's expected rules and capabilities.
>
> Here are a couple things that might at least offer some insight into the
> behavior:
>
> 1. regex101.com *does NOT parse* your regex input through that
> flavor's string parser and then pass it down to the engine. You're
> instructing the regex engine directly. This is usually where the confusion
> lies when copy pasting a string version of the regex (instead of its parsed
> output) from your favorite programming language and seeing errors on the
> site...
> 2. for @" ""\w+\"" "; specifically, .NET 7 pushes the string below to
> the regex engine. The regex engine is correct to not error out for a
> superfluous escaping of a non-metacharacter ".
>
> "\w+\"
>
>
> 3. The code generator *does NOT* check your regex's ability to run in
> that programming language. It just escapes characters or dresses the
> pattern in whatever function is appropriate for the target you pick - all
> in javascript.
>
> ------------------------------
>
> Off topic-ish: Some stuff .NET 7 does with raw strings that I find
> intriguing.
>
> // this confuses the living snot out of .NET 7. // It just picks @" rather than @""" with a single regex token: \w+// I don't know how it chooses it.string pattern = @"""\w+""";// This one refuses to be considered @""" also. Instead, @" which results in // a ", followed by a space character, and so on.string pattern = @""" \w+ """;
>
> —
> Reply to this email directly, view it on GitHub
> <#2286 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACA26VEBILH4L2AEHVUYUTZGKGU7AVCNFSM6AAAAABIO6FOQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVHAYTEMRRGA>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Raw literals don't use
I didn't look into what exactly the definition of "Delimiter" in regex101 is, but that is the source of confusion for me. The UI suggest I'd be working in a verbatim string (indicated by the |
Bug Description
I need to capture elements that contains a " When I add the token " or "" neither work and I have a pattern error :
" This token has no special meaning and has thus been rendered erroneous
" An unescaped delimiter must be escaped; in most languages with a backslash ()
The generated C# code however works like a charm :
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"""\w+""";
string input = @"this a ""test""";
RegexOptions options = RegexOptions.Multiline;
}
it will match correctly the "test" in the target string
Reproduction steps
Use as a .Net7 Regular Expression ""\w+""
Use as text : this a "test"
You obtain the following pattern error :
" This token has no special meaning and has thus been rendered erroneous
" An unescaped delimiter must be escaped; in most languages with a backslash ()
" This token has no special meaning and has thus been rendered erroneous
" An unescaped delimiter must be escaped; in most languages with a backslash ()
Expected Outcome
No error shown as the pattern is correct.
Browser
Include browser name and version
Microsoft Edge for Business
Version 125.0.2535.67 (Version officielle) (64 bits)
OS
Include OS name and version
Windows 11
The text was updated successfully, but these errors were encountered: