-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Github] Improve formating of PR diffs in bot notifications #66118
Conversation
I wonder whether |
b74a7f4
to
1ab5ca7
Compare
Yes, done! |
# and the file list to 20kB each. | ||
STAT_LIMIT = 20 * 1024 | ||
DIFF_LIMIT = 20 * 1024 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't in the PR description, can you add it there? (with some context for the "why" of the change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, but I agree with @MaskRay about #
, it is likely we don't want to escape it alone, but the sequence of #[1-2]+
maybe?
@joker-eph I'll rebase and try to be more selective in what we escape. escaping unconditionally probably does not hurt though |
Unconditionally escaping |
You missed my point: I commented that we should match |
I'd argue that issue/PR links can be achieved with normal comments, not with bot comments. We should not allow |
I'll rework that further, i found a way to improve escaping of html in code blocks |
* This avoid pinging folks on all issue when they got pinged on bugzilla eaons ago * Avoid formating bugs when there is html in the issue description
…oes not get escaped
* Escape PR description * Trunkate the list of files if it's > 20K * Color the diff on github
This way we can render < and & properly. This only work if there is a new line before the 3 ticks.
Spoiler+ You need a new line before the ```
- `[]<typename>(auto&)` |
@
and html in the <details> block
Can you test this out by running the script against this PR? |
@llvm/pr-subscribers-github-workflow Changes
Test: This is #66118 @cor3ntin Full diff: https://github.com/llvm/llvm-project/pull/66118.diff 1 Files Affected:
diff --git a/llvm/utils/git/github-automation.py b/llvm/utils/git/github-automation.py
index eac5816b5499f6a..8578a83262717ab 100755
--- a/llvm/utils/git/github-automation.py
+++ b/llvm/utils/git/github-automation.py
@@ -47,6 +47,21 @@ def _get_curent_team(team_name, teams) -> Optional[github.Team.Team]:
return None
+def escape_description(str):
+ # https://github.com/github/markup/issues/1168#issuecomment-494946168
+ str = html.escape(str, False)
+ # '@' followed by alphanum is a user name
+ str = re.sub("@(?=\w+)", "@<!-- -->", str)
+ # '#' followed by digits is considered an issue number
+ str = re.sub("#(?=\d+\s)", "#<!-- -->", str)
+ return str
+
+
+def sanitize_markdown_code_block(str):
+ # remove codeblocks terminators
+ return re.sub("^\s*```\s*$", r"` ` `", str)
+
+
class IssueSubscriber:
@property
def team_name(self) -> str:
@@ -67,12 +82,15 @@ def run(self) -> bool:
if team.slug == "issue-subscribers-good-first-issue":
comment = "{}\n".format(beginner_comment)
- comment = (
- f"@llvm/{team.slug}"
- + "\n\n<details>\n"
- + f"{self.issue.body}\n"
- + "</details>"
- )
+ body = escape_description(self.issue.body)
+
+ comment = f"""
+@llvm/{team.slug}
+
+<details>
+{body}
+</details>
+"""
self.issue.create_comment(comment)
return True
@@ -113,6 +131,11 @@ def run(self) -> bool:
print(f"couldn't find team named {self.team_name}")
return False
+ # GitHub limits comments to 65,536 characters, let's limit the diff
+ # and the file list to 20kB each.
+ STAT_LIMIT = 20 * 1024
+ DIFF_LIMIT = 20 * 1024
+
# Get statistics for each file
diff_stats = f"{self.pr.changed_files} Files Affected:\n\n"
for file in self.pr.get_files():
@@ -125,35 +148,43 @@ def run(self) -> bool:
if file.status == "renamed":
print(f"(from {file.previous_filename})")
diff_stats += "\n"
- diff_stats += "\n"
+ if len(diff_stats) > STAT_LIMIT:
+ break
# Get the diff
try:
- patch = html.escape(requests.get(self.pr.diff_url).text)
+ patch = requests.get(self.pr.diff_url).text
except:
patch = ""
- diff_stats += "\n<pre>\n" + html.escape(patch)
- # GitHub limits comments to 65,536 characters, let's limit the diff to 20kB.
- DIFF_LIMIT = 20 * 1024
+ patch = sanitize_markdown_code_block(patch)
+
patch_link = f"Full diff: {self.pr.diff_url}\n"
if len(patch) > DIFF_LIMIT:
patch_link = f"\nPatch is {human_readable_size(len(patch))}, truncated to {human_readable_size(DIFF_LIMIT)} below, full version: {self.pr.diff_url}\n"
- diff_stats = diff_stats[0:DIFF_LIMIT] + "...\n<truncated>\n"
- diff_stats += "</pre>"
+ patch = patch[0:DIFF_LIMIT] + "...\n[truncated]\n"
team_mention = "@llvm/{}".format(team.slug)
- body = self.pr.body
+ body = escape_description(self.pr.body)
+ # Note: the comment is in markdown and the code below
+ # is sensible to line break
comment = f"""
{self.COMMENT_TAG}
{team_mention}
-
+
<details>
<summary>Changes</summary>
+
{body}
---
+---
{patch_link}
+
{diff_stats}
+
+```diff
+{patch}
+```
+
</details>
"""
|
llvm/utils/git/github-automation.py
Outdated
|
||
def sanitize_markdown_code_block(str): | ||
# remove codeblocks terminators | ||
return re.sub("^\s*```\s*$", r"` ` `", str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be code blocks that specify syntax in order to enable syntax highlighting, e.g.
```cpp std::vector v; ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need to escape the closing ```
```cpp does not close a code block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I though you match both opening and closing triple backticks.
But is this going to work correctly for multiple code blocks in a message? Multiple unescaped openings might be interpreted as opening and closing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - but re-reading the spect and doing more tests, we can use more backticks instead of escaping https://spec.commonmark.org/0.30/#fenced-code-block
That way we do not have to escape ` ``` `
{body} | ||
-- | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (three dashes) causes the last paragraph of the body to be rendered like a title, which looks awful, e.g.: the "After legalization is done" paragraph in this issue: #66567 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add an extra new line before the dashes, I'll do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
* This avoid pinging folks on all issue when they got pinged on bugzilla eons ago * Avoid formatting bugs when there is html in the issue description * Truncate the list of files and the diff independently of each other. This avoids truncating cutting a file line in 2 and to cut in the middle of html markup. This is a fringe case but it does happen when people accidentally push weird branches conflicting on all the files.
Test: This is #66118 @cor3ntin