-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve multiline spaces for code blocks #347
Comments
@victory-sokolov thank you for your idea. Could you explain more detail and suggest some links to test? I consider that if that is a specific case, we can simply use transformation. |
Not sure if it will be straightforward to implement it using custom transformation. Here is an example of the extracted code block
and this is the original code copied from the dev console (site)
In general code, blocks have 4 spaces for nested blocks, but when an article is being scrapped it has only one, I guess this is because of the Thanks in advance. |
@victory-sokolov yeah, you are right. It's because of |
@ndaidong Awesome thanks a lot! Now code blocks are formatted properly. |
@victory-sokolov nice to see it works for you. |
Hello, was wondering if it's possible to make
stripMultispaces
as a boolean flag for theparserOptions
of theextract
method, so stripMultispaces could be an optional argumentIn specific cases when I'm extracting HTML that has code blocks that need to have multi spaces preserved, in order to maintain formatted code blocks.
Maybe you have any other ideas on how this can be achieved?
Thanks!
The text was updated successfully, but these errors were encountered: