Skip to content

Conversation

@JoannaaKL
Copy link
Contributor

Implement basic content sanitisation that is (for now) applied only to title and body of issues and pull requests.
It removes hidden Unicode characters that pose a risk to LLM input.

@JoannaaKL JoannaaKL requested a review from a team as a code owner October 31, 2025 10:18
Copilot AI review requested due to automatic review settings October 31, 2025 10:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds sanitization functionality to filter invisible Unicode characters from GitHub issue and pull request titles and bodies. This helps prevent potential security issues and display problems caused by invisible control characters.

Key Changes

  • Created a new sanitize package with FilterInvisibleCharacters function that removes various invisible Unicode characters
  • Applied sanitization to PR and issue titles/bodies in the GitHub API response handlers
  • Added comprehensive test coverage for the sanitization logic

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/sanitize/sanitize.go Implements the FilterInvisibleCharacters function to remove invisible Unicode characters
pkg/sanitize/sanitize_test.go Comprehensive test suite covering various invisible character scenarios
pkg/github/pullrequests.go Applies sanitization to PR titles and bodies in GetPullRequest and ListPullRequests
pkg/github/issues.go Applies sanitization to issue titles and bodies in fragmentToIssue and GetIssue
.gitignore Adds .history to ignored files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +327 to +335
// Sanitize title/body on response
if issue != nil {
if issue.Title != nil {
issue.Title = github.Ptr(sanitize.FilterInvisibleCharacters(*issue.Title))
}
if issue.Body != nil {
issue.Body = github.Ptr(sanitize.FilterInvisibleCharacters(*issue.Body))
}
}
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sanitization logic is duplicated in both GetIssue (lines 327-335) and fragmentToIssue (lines 215, 223). Since GetIssue already calls the GitHub API and returns an issue object, the sanitization in GetIssue may be redundant if the issue comes from a fragment that's already sanitized, or vice versa. Consider whether both locations need sanitization or if one location is sufficient to avoid double-processing.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, GetIssue is using a REST API and fragmentToIssue is used for graphQL.

@JoannaaKL JoannaaKL merged commit 304f29a into main Oct 31, 2025
23 checks passed
@JoannaaKL JoannaaKL deleted the filter_hidden_characters branch October 31, 2025 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants