New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Desktop: Fix #5626: When web clipper clipping code blocks, keep code in multiline and delete code number lines #10126
Conversation
We would need test units for this, but even then I don't understand what is being fixed at the moment |
From the description, this seems to fix #5626 (closed due to inactivity). Edit: For reference, tests for existing Webclipper HTML->MD logic can be found here. |
Yes it is. this PR try to fix #5626, I well trying to add test units. |
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
c5cfa1a
to
c9c324a
Compare
@laurent22 Add test units packages/app-cli/tests/html_to_md/{code_multiline_1.html,code_multiline_2.html} run ok. |
… delete code number lines
// delete code number lines | ||
code = code.replace(/(^-\s+\d+\s*$\r?\n)+/gm, ''); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this may also remove information that shouldn't be removed from certain code blocks. In the comment above could clarify exactly what strings are being removed? (as I'm unable to read that regex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regular expression removes consecutive lines of code that start with a - followed by a number. The regular expression marker /gm
means that each line break is separated by a line, and can be perfectly matched with a line using(^ - \ s+\ d+\ s * $\ r? \ n)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurent22 Match the lines:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuous lines of code starting with - followed by numbers are basically useless information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense to ignore code-block children of type ul
with class pre-numbering
. This should also match line numbers in the second example provided.
This would allow the web clipper (and perhaps also the Rich Text Editor) to save blocks that contain lists of numbers, while also removing line numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patient guidance. I will try to modify the code using the method you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@personalizedrefrigerator Already edit rules.list as your guidance. Test units run ok. Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this may also remove information that shouldn't be removed from certain code blocks.
@laurent22 This regular replacement code has been removed.
f7913ab
to
68b0963
Compare
|
||
content = content.replace(/\r?\n|\r/g, '\n') | ||
// if code is multiline and in codeBlock, just return it, codeBlock well add fence(default is ```) | ||
if (content.indexOf('\n') != -1 && node.parentNode && isCodeBlock(node.parentNode)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use strict inequality !==
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed, submitted.
I've noticed that the web clipper has trouble with code blocks on Joplin's website (e.g. the table of contents plugin tutorial page). Do you know if this pull request would also fix processing of those code blocks? (Thank you for taking the time to work on this!) |
18ded33
to
64ece4f
Compare
a52a6ab
to
1e9da84
Compare
Add Copy first code blocks from: toc_plugin html looks:befor
|
// Fix: Web clipper has trouble with code blocks on Joplin's website. | ||
// See https://github.com/laurent22/joplin/pull/10126#issuecomment-2016523281 . | ||
// Web clipper clippering contents is all in oneline. The format features are: <pre ...><code ...><span class="token-line">. Span with class of "token-line" represent one line. | ||
// Test case: packages/app-cli/tests/html_to_md/code_multiline_3.html . | ||
rules.joplinOrgTokenLineSpan = { | ||
filter: function (node) { | ||
const grandfather = node.parentNode.parentNode?? null; | ||
return node.nodeName === 'SPAN' && node.getAttribute('class') === 'token-line' && grandfather && isCodeBlock(grandfather); | ||
}, | ||
|
||
replacement: function (content, node, options) { | ||
content = content.replace(/\r?\n|\r/g, ''); | ||
// If content replaced '\r\n' is empty, it indicates that the line is empty line; if not, keep the leading before and after content. | ||
return content === '' ? '\n\n' : node.flankingWhitespace.leading + content + node.flankingWhitespace.trailing + '\n'; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to review this, I don't really understand what it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@personalizedrefrigerator, was this for the bug you mentioned, and does the fix make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@personalizedrefrigerator, was this for the bug you mentioned, and does the fix make sense to you?
This was the bug I mentioned, however I'm not sure that the fix makes sense to me. I think the issue here is that the <br/>
s at the end of Docusaurus code blocks aren't being converted to linebreaks by Turndown.
For context, I've attached simplified HTML for a Docusaurus code block (taken from here):
<code class="codeBlockLines_FJaf">
<span class="token-line" style="color: rgb(212, 212, 212);">
<span class="token front-matter-block punctuation" style="color: rgb(212, 212, 212);">---</span>
<br>
</span>
<span class="token-line" style="color: rgb(212, 212, 212);">
<span class="token front-matter-block"></span>
<span class="token front-matter-block front-matter yaml language-yaml key atrule" style="color: rgb(86, 156, 214);">id</span>
<span class="token front-matter-block front-matter yaml language-yaml punctuation" style="color: rgb(212, 212, 212);">:</span>
[...]
<br>
</span>
[...]
<span class="token-line" style="color: rgb(212, 212, 212);">
<span class="token front-matter-block"></span>
<span class="token front-matter-block punctuation" style="color: rgb(212, 212, 212);">---</span>
<br>
</span>
</code>
Observe that each (inline) token-line
span ends with a <br>
that previously wasn't detected by Turndown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove packages/turndown/src/commonmark-rules.js/rules.joplinOrgTokenLineSpan
. Analyze the current code, find the reasons for incorrect formatting in the code block, modify code for preserve the line break formatting for code block. Web clipper clipping code blocks on Joplin's website run ok. Test case run ok.
One of the main goals of Joplin's 3.0 version development is to add mobile plugin functionality. Can we take this opportunity to add a user script type for |
…ecial handling of code blocks
…ecial handling of code blocks, keep space
…ecial handling of code blocks, keep space, test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments based on a quick read through the code.
Thank you for working on this!
Coding style fix Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
fix typo Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
Optimize regular expressions: .replace(/( *\t*\n)+( *\t*)*$/, '') equivalent to .trimEnd() Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
Add Feature request: Add content script type of ContentScriptType.TurndownPlugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this locally with Joplin's website and it works!!!
I've left a few minor stylistic/commenting suggestions. For the most part, I think this is ready to merge!
fix typo Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
better comments Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
better comments Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
add space Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
remove space Co-authored-by: Henry Heino <46334387+personalizedrefrigerator@users.noreply.github.com>
Thank you for the revision. My English proficiency is relatively weak and annotations are not well expressed. Thank you for your guidance. |
Thanks for implementing this @wljince007 and thanks for the review @personalizedrefrigerator! |
Desktop: Fix #5626: when clipper, keep code in multiline and delete code number lines
See an example, clipper from https://blog.csdn.net/qq_42025798/article/details/121727781,
html look:
befor this modify, web clipper clipping code blocks, code blocks are in one line:
after this modify, web clipper clipping code blocks, code blocks are in multiline, readability is good: