Commands for convenient regex testing#952
Conversation
|
|
||
| @group(name='regexp', aliases=('regex', 're'), invoke_without_command=True) | ||
| async def regexp_group(self, ctx: Context) -> None: | ||
| """Commands for exploring the misterious worldof regular expressions.""" |
There was a problem hiding this comment.
Minor typos here.
| """Commands for exploring the misterious worldof regular expressions.""" | |
| """Commands for exploring the mysterious world of regular expressions.""" |
There was a problem hiding this comment.
Thanks, I'll fix that. I think format_* docstrings can be improved as well.
|
I'll take a more in depth look at the code soon, but one question I have now is whether executing arbitrary RegExp code is safe? We used to have an issue on the bot where the RegExp used for code block detection resulted in the bot locking up due something called catastrophic backtracing. How can we prevent unsafe regular expressions from locking up the bot? |
|
Found the conversation for when we discovered the issue in the bot, the solution was to adjust the regular expression to something safe but this still leaves the risk of abuse if people use dangerous regular expressions. |
|
@Jos-B Why not set a small time limit? |
|
Right, but then we need to isolate the process which would require migrating a regex executor to another platform (which with the new UNIX commands in snekbox and possibility for more things in future is feasible) which would make this implementation obsolete. |
|
Could they use |
|
Snekbox could definitely be used, either by adding a new executor or by basically sending all the processing code as a regular eval to snekbox. |
|
However at that point I think it becomes something where we need to discuss things among the core developers before we can think about implementation detail. |
|
The third-party regex module seems to handle such regular expressions without any issues. All examples of catastrophic regexes I could find work instantly. |
|
A possible issue with |
Turns out that bumping the flake8 version up to 3.8 introduces a long list of new linting errors. Since this PR is the one that bumps the version, I suppose we will also fix all the linting errors in this branch. (cherry picked from commit e993566)
ks129
left a comment
There was a problem hiding this comment.
Found one bug. One other thing I want to say is when I tested it, all results (errors and matches) is so plain. I think some language formatting should be added to there. This will increase readability.
for errors I made this with diff formatting and added - before each line, then I got the red text. And maybe should success result inside embed that implements wait_for_deletion?
| @group(name='regexp', aliases=('regex', 're'), invoke_without_command=True) | ||
| async def regexp_group(self, ctx: Context) -> None: | ||
| """Commands for exploring the misterious world of regular expressions.""" | ||
| await ctx.invoke(self.bot.get_command("help"), "regexp") |
There was a problem hiding this comment.
This will not work. We migrated to new help command lately, so this has to be changed:
| await ctx.invoke(self.bot.get_command("help"), "regexp") | |
| await ctx.send_help(ctx.command) |
MarkKoz
left a comment
There was a problem hiding this comment.
I do like this feature and want to see it added. If the regex module handles catastrophic expressions well, but re doesn't, then re support needs to be dropped. I am not too concerned about people being mislead by the extended features of regex. However, if possible, using re to check for successful compilation is a good idea. On the other hand, if re can also handle catastrophic expression, then the regex module is redundant.
Regarding execution time, let's keep also keep in mind that users will only be able to search at most ~2000 characters of text.
| """Commands for exploring the misterious world of regular expressions.""" | ||
| await ctx.invoke(self.bot.get_command("help"), "regexp") | ||
|
|
||
| @regexp_group.command(name='search', aliases=('find', 's', '🔍')) |
There was a problem hiding this comment.
I don't like the use of emojis to invoke commands. Emojis are general-purpose, so someone could unintentionally invoke this command.
| @regexp_group.command(name='search+', aliases=('find+', 's+', '🔍+')) | ||
| async def match_plus_command(self, ctx: Context, pattern: RegexRegex, test: str) -> None: | ||
| """ | ||
| Like `!re search`, but with an extended regex format. |
There was a problem hiding this comment.
It's a bad user experience to make them have to invoke the help command again to look at the docstring for !re search. Just copy the docstring over to this one.
|
|
||
| @group(name='regexp', aliases=('regex', 're'), invoke_without_command=True) | ||
| async def regexp_group(self, ctx: Context) -> None: | ||
| """Commands for exploring the misterious world of regular expressions.""" |
There was a problem hiding this comment.
| """Commands for exploring the misterious world of regular expressions.""" | |
| """Commands for exploring the mysterious world of regular expressions.""" |
| ReRegex = ConvertRegex(supports_extended_features=False) | ||
| RegexRegex = ConvertRegex(supports_extended_features=True) |
There was a problem hiding this comment.
I think better names would be Regex and ExtendedRegex.
| r""" | ||
| Format a match result to display in a response. | ||
|
|
||
| >>> format_match(re.search("(\d\d)+", "hello123456world")) | ||
| [' hello123456world', ' 0: ^^^^^^', ' 1: ^^'] | ||
|
|
||
| Which will look as: | ||
| | hello123456world | ||
| | 0: ^^^^^^ | ||
| | 1: ^^ | ||
| """ |
There was a problem hiding this comment.
What do you think about showing the actual text matched instead of the carets? That may end up being easier to read, especially if there are a lot of groups (eyes won't have to travel far to match the carets to the original string).
There was a problem hiding this comment.
I think this would be better:
| hello123456world
| 0: 123456
| 1: 56
just showing the text might be ambiguous if a substring appears multiple times in a string.
However, we still have to figure out how to display zero-width matches 🤔
| await ctx.send(match_and_format(pattern, test)) | ||
|
|
||
| @regexp_group.command(name='search+', aliases=('find+', 's+', '🔍+')) | ||
| async def match_plus_command(self, ctx: Context, pattern: RegexRegex, test: str) -> None: |
There was a problem hiding this comment.
test is not a clear name for the parameter. Perhaps simply named it string?
There was a problem hiding this comment.
This parameter should also be a kw-only arg so it "consumes all" if the string has spaces. This means the pattern needs to be in quotes if it has spaces, but I think this is the best option we have.
There was a problem hiding this comment.
I thought about that. Is there any way to put the pattern or the string inside a code block?
/re search `\d+` 123456helloworld
There was a problem hiding this comment.
I am not sure. I think discord.py does not consider backticks to be the same as quotes, which means you will need to parse all arguments yourself from a single string to accomplish this. But don't take my word for it, double check this.
|
I've been investigating I'll attach an example: >>> import regex
>>> regex.match("^(([a-z])+.)+[A-Z]([a-z])+$", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!")
... hang ... |
>>> regex.match("^(([a-z])+.)+[A-Z]([a-z])+$", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!",
... timeout=0.05)
>>> TimeoutError: regex timed out |
|
Huh! |
|
Good point, though, I'll add that as well. |
|
Alternatively, we can use snekbox for execution, leaving DoS and other similar attacks out of the equation. Would require some effort to move the code there, though. EDIT: I see this has been discussed already, my bad. |
…/decorator-factory/bot into decorator-factory-add-re-command
| await ctx.invoke(self.bot.get_command("help"), "regexp") | ||
|
|
||
| @regexp_group.command(name='search', aliases=('find', 's')) | ||
| async def match_command(self, ctx: Context, pattern: Regex, test: str) -> None: |
There was a problem hiding this comment.
| async def match_command(self, ctx: Context, pattern: Regex, test: str) -> None: | |
| async def match_command(self, ctx: Context, pattern: Regex, *, test: str) -> None: |
| await ctx.send(match_and_format(pattern, test)) | ||
|
|
||
| @regexp_group.command(name='search+', aliases=('find+', 's+')) | ||
| async def match_plus_command(self, ctx: Context, pattern: ExtendedRegex, test: str) -> None: |
There was a problem hiding this comment.
| async def match_plus_command(self, ctx: Context, pattern: ExtendedRegex, test: str) -> None: | |
| async def match_plus_command(self, ctx: Context, pattern: ExtendedRegex, *, test: str) -> None: |
There was a problem hiding this comment.
Thanks, added now



!regexp searchcommand