-
Notifications
You must be signed in to change notification settings - Fork 2
Merge #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge #2
Conversation
|
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
| // Add text before the link | ||
| if (match.index > lastIndex) { | ||
| const beforeText = html.substring(lastIndex, match.index); | ||
| const cleanBefore = beforeText.replace(/<[^>]+>/g, ''); |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
<script
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To fix this problem, we need to ensure that all HTML tags (including those that may be split or repeated) are fully removed from the string. The optimal solution is to repeatedly apply the replacement until no further matches are found. This approach is robust against edge cases and multi-character reintroduction.
Concretely, replace all uses of .replace(/<[^>]+>/g, '') (on lines 85, 102, and 110) with a helper function that strips HTML tags by running this pattern until the string does not change anymore. The helper function can be defined inside the <script setup> block and should be used to sanitize the beforeText, remainingText, and in line 110 when no links are found.
No extra dependencies are needed for robust tag removal if looping is sufficient for the application (PDF text).
File/region to change:
components/DownloadPdfButton.vue, specifically the body ofparseHtmlForPdfLinksfunction (lines 85, 102, and 110).
Implementation details:
- Add a
stripHtmlTagshelper function beforeparseHtmlForPdfLinks. - Replace direct
.replace(/<[^>]+>/g, '')calls withstripHtmlTags(text).
-
Copy modified lines R73-R84 -
Copy modified line R97 -
Copy modified line R114 -
Copy modified line R122
| @@ -70,6 +70,18 @@ | ||
| * @param {string} html - HTML string that may contain anchor tags | ||
| * @returns {Array} Array of text segments with link information | ||
| */ | ||
| // Repeatedly remove all HTML tags until none remain | ||
| function stripHtmlTags(input) { | ||
| if (!input || typeof input !== 'string') return input; | ||
| let output = input; | ||
| let prev; | ||
| do { | ||
| prev = output; | ||
| output = output.replace(/<[^>]+>/g, ''); | ||
| } while (output !== prev); | ||
| return output; | ||
| } | ||
|
|
||
| function parseHtmlForPdfLinks(html) { | ||
| if (!html || typeof html !== 'string') return [{ text: html || '', isLink: false }]; | ||
|
|
||
| @@ -82,7 +94,7 @@ | ||
| // Add text before the link | ||
| if (match.index > lastIndex) { | ||
| const beforeText = html.substring(lastIndex, match.index); | ||
| const cleanBefore = beforeText.replace(/<[^>]+>/g, ''); | ||
| const cleanBefore = stripHtmlTags(beforeText); | ||
| if (cleanBefore) { | ||
| segments.push({ text: cleanBefore, isLink: false }); | ||
| } | ||
| @@ -99,7 +111,7 @@ | ||
| // Add remaining text after last link | ||
| if (lastIndex < html.length) { | ||
| const remainingText = html.substring(lastIndex); | ||
| const cleanRemaining = remainingText.replace(/<[^>]+>/g, ''); | ||
| const cleanRemaining = stripHtmlTags(remainingText); | ||
| if (cleanRemaining) { | ||
| segments.push({ text: cleanRemaining, isLink: false }); | ||
| } | ||
| @@ -107,7 +119,7 @@ | ||
|
|
||
| // If no links found, return the cleaned HTML | ||
| if (segments.length === 0) { | ||
| const cleanText = html.replace(/<[^>]+>/g, ''); | ||
| const cleanText = stripHtmlTags(html); | ||
| return [{ text: cleanText, isLink: false }]; | ||
| } | ||
|
|
| // Add remaining text after last link | ||
| if (lastIndex < html.length) { | ||
| const remainingText = html.substring(lastIndex); | ||
| const cleanRemaining = remainingText.replace(/<[^>]+>/g, ''); |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
<script
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The best way to fix the problem is to avoid simple one-pass regexes for removing HTML tags, as explained. You have two main choices:
- Use a well-tested sanitizer library. In Vue.js/JavaScript, a highly recommended choice is the
sanitize-htmlnpm package, which reliably and recursively removes all dangerous tags and attributes, handling even fragmented or nested input. - If use of a library is not an option here, repeatedly apply the
replace()operation until no more changes occur, as shown in the background examples.
Given only the shown code is editable, and library installation may not always be possible, a robust and generic solution is to repeatedly apply the regex replacement in parseHtmlForPdfLinks, for each location where we strip tags. Concretely, this means:
- On lines 85, 102, and 111, rather than doing a single
.replace(/<[^>]+>/g, ''), use a helper function which loops until no more changes. - Implement a helper function (locally or above
parseHtmlForPdfLinks) calledstripHtmlTags(input), which repeatedly applies the regular expression replacement as demonstrated in the Background.
If using a library is allowed (project uses npm), you could install sanitize-html and use that instead everywhere tags are stripped for much better results.
-
Copy modified lines R73-R83 -
Copy modified line R96 -
Copy modified line R113 -
Copy modified line R121
| @@ -70,6 +70,17 @@ | ||
| * @param {string} html - HTML string that may contain anchor tags | ||
| * @returns {Array} Array of text segments with link information | ||
| */ | ||
| // Helper function for robust tag stripping | ||
| function stripHtmlTags(input) { | ||
| if (!input || typeof input !== "string") return input; | ||
| let previous; | ||
| do { | ||
| previous = input; | ||
| input = input.replace(/<[^>]+>/g, ''); | ||
| } while (input !== previous); | ||
| return input; | ||
| } | ||
|
|
||
| function parseHtmlForPdfLinks(html) { | ||
| if (!html || typeof html !== 'string') return [{ text: html || '', isLink: false }]; | ||
|
|
||
| @@ -82,7 +93,7 @@ | ||
| // Add text before the link | ||
| if (match.index > lastIndex) { | ||
| const beforeText = html.substring(lastIndex, match.index); | ||
| const cleanBefore = beforeText.replace(/<[^>]+>/g, ''); | ||
| const cleanBefore = stripHtmlTags(beforeText); | ||
| if (cleanBefore) { | ||
| segments.push({ text: cleanBefore, isLink: false }); | ||
| } | ||
| @@ -99,7 +110,7 @@ | ||
| // Add remaining text after last link | ||
| if (lastIndex < html.length) { | ||
| const remainingText = html.substring(lastIndex); | ||
| const cleanRemaining = remainingText.replace(/<[^>]+>/g, ''); | ||
| const cleanRemaining = stripHtmlTags(remainingText); | ||
| if (cleanRemaining) { | ||
| segments.push({ text: cleanRemaining, isLink: false }); | ||
| } | ||
| @@ -107,7 +118,7 @@ | ||
|
|
||
| // If no links found, return the cleaned HTML | ||
| if (segments.length === 0) { | ||
| const cleanText = html.replace(/<[^>]+>/g, ''); | ||
| const cleanText = stripHtmlTags(html); | ||
| return [{ text: cleanText, isLink: false }]; | ||
| } | ||
|
|
| // If no links found, return the cleaned HTML | ||
| if (segments.length === 0) { | ||
| const cleanText = html.replace(/<[^>]+>/g, ''); |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
<script
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
The best way to fix this issue is to use a well-tested HTML sanitization library (like sanitize-html) to safely remove all HTML tags and ensure no unsafe content remains. However, if dependencies are to be minimized, another robust approach is to repeatedly apply the regex replacement until no matches remain (as shown in the background), since this will ensure that multi-character overlapping matches are removed. This applies specifically to the region where html.replace(/<[^>]+>/g, '') is used (line 110 and elsewhere for tag removal).
To implement the change:
- Replace direct single-pass usages of
.replace(/<[^>]+>/g, '')with a loop that repeatedly strips all HTML tags until none remain. - Optionally, encapsulate this logic in a helper function (for maintainability).
- Insert the helper function above its usage in the file.
- Update usages to call the new helper method.
No imports are needed for this approach. Changes are all in file components/DownloadPdfButton.vue, in the region with the PDF HTML parsing/cleaning logic.
-
Copy modified lines R73-R82 -
Copy modified line R95 -
Copy modified line R112 -
Copy modified line R120
| @@ -70,6 +70,16 @@ | ||
| * @param {string} html - HTML string that may contain anchor tags | ||
| * @returns {Array} Array of text segments with link information | ||
| */ | ||
| // Helper function to robustly remove all HTML tags (multi-pass) | ||
| function stripHtmlTags(input) { | ||
| let previous; | ||
| do { | ||
| previous = input; | ||
| input = input.replace(/<[^>]+>/g, ''); | ||
| } while (input !== previous); | ||
| return input; | ||
| } | ||
|
|
||
| function parseHtmlForPdfLinks(html) { | ||
| if (!html || typeof html !== 'string') return [{ text: html || '', isLink: false }]; | ||
|
|
||
| @@ -82,7 +92,7 @@ | ||
| // Add text before the link | ||
| if (match.index > lastIndex) { | ||
| const beforeText = html.substring(lastIndex, match.index); | ||
| const cleanBefore = beforeText.replace(/<[^>]+>/g, ''); | ||
| const cleanBefore = stripHtmlTags(beforeText); | ||
| if (cleanBefore) { | ||
| segments.push({ text: cleanBefore, isLink: false }); | ||
| } | ||
| @@ -99,7 +109,7 @@ | ||
| // Add remaining text after last link | ||
| if (lastIndex < html.length) { | ||
| const remainingText = html.substring(lastIndex); | ||
| const cleanRemaining = remainingText.replace(/<[^>]+>/g, ''); | ||
| const cleanRemaining = stripHtmlTags(remainingText); | ||
| if (cleanRemaining) { | ||
| segments.push({ text: cleanRemaining, isLink: false }); | ||
| } | ||
| @@ -107,7 +117,7 @@ | ||
|
|
||
| // If no links found, return the cleaned HTML | ||
| if (segments.length === 0) { | ||
| const cleanText = html.replace(/<[^>]+>/g, ''); | ||
| const cleanText = stripHtmlTags(html); | ||
| return [{ text: cleanText, isLink: false }]; | ||
| } | ||
|
|
| }); | ||
| // Strip any remaining HTML tags | ||
| out = out.replace(/<[^>]+>/g, ''); |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
<script
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 3 months ago
To correctly remove all HTML tags, including potentially dangerous or nested occurrences, we should either (a) use a well-known and robust library such as sanitize-html to sanitize the HTML string, or (b) if that is not available, repeatedly apply the regular expression replacement until no further replacements occur. The second approach can be implemented by running the replacement in a loop until the output no longer changes, which ensures that nested or consecutive tags are also removed. As we are only permitted to modify existing code in components/DownloadPdfButton.vue, we will update the implementation of htmlAnchorsToPlainText to use this approach, with no external dependencies. No code outside this function will change.
- Edit the function
htmlAnchorsToPlainText. - Replace the line
out = out.replace(/<[^>]+>/g, '');with a loop that continues replacing tags until none remain (outremains unchanged by the replacement). - No imports or other project files need to be changed.
-
Copy modified lines R191-R196
| @@ -188,7 +188,12 @@ | ||
| }); | ||
|
|
||
| // Strip any remaining HTML tags | ||
| out = out.replace(/<[^>]+>/g, ''); | ||
| // Repeatedly remove tags to handle nested/multiple cases | ||
| let previous; | ||
| do { | ||
| previous = out; | ||
| out = out.replace(/<[^>]+>/g, ''); | ||
| } while (out !== previous); | ||
|
|
||
| return out; | ||
| } |
No description provided.