You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to copy some code from PDF and paste into text editor,but lost code indentation.Is that possible to generate a PDF with code that can copy and paste with correct indentation so we don't need to type indentation by hand again.
Here is an example that can not copy/paste with indentation:
The text was updated successfully, but these errors were encountered:
Hm ... not out of the box.
First of all, you would have to use one of the Page.getText(option, flags=nnn) variants.
Second, it all depends on how the text in the PDF is encoded: if indentation is encoded as spaces, you are fine to just use the output of page.getText().
If tabs are used instead, modify the flags parameter such that white spaces are preserved (TEXT_PRESERVE_WHITESPACE).
If all fails, use a methods which also provides text position information. As program code generally uses mono-spacing, for every text piece its start position can be translated into a unique number of spaces to prefix it with.To make this a bit clearer (hopefully):
``page.getText("dict")["blocks"] is a list of dictionaries.
Each item represents a text block (think of it as a paragraph). It contains a list of sub dictionaries, which each represent a line.
Each line again contains a list of sub dicts, called "spans". A spans contains text with identical font properties. So in case of a program, a line would just contain one span.
Step one: determine which text x-coordinate represents column 0. This is the minimum of the x0 coordinate of the line (or span) bboxes.
Step two: determine the (constant!) width of one character. Take any span, divide its bbox width by its character count.
After this loop through the spans and output each span["text"] prefixed with the correct number of spaces determined by the x0 coordinate of the span bbox.
Note:
I outputted text lines instead of spans, because program code maybe colored (see pygments), which produces (see above) a separate span each time. So I concatenate the spans for each line ...
I want to copy some code from PDF and paste into text editor,but lost code indentation.Is that possible to generate a PDF with code that can copy and paste with correct indentation so we don't need to type indentation by hand again.
Here is an example that can not copy/paste with indentation:
The text was updated successfully, but these errors were encountered: