Reflow sometimes doesn't break the lines properly #305

Closed
Markismus opened this Issue Oct 13, 2013 · 10 comments

Comments

Projects
None yet
2 participants
@Markismus
Member

Markismus commented Oct 13, 2013

As the images show, the text doesn't reflow. Instead every line is treated as a new paragraph. This problem affects the whole page every time it happens.

I've encountered this with multiple pdf-files in build 545 on a Kobo Aura HD. The images show a gothic text (https://copy.com/rNHwENxHUa1uVx0E) , but it also happens with roman texts.

imag0076_resize
imag0077_resize
imag0078_resize

@chrox

This comment has been minimized.

Show comment
Hide comment
@chrox

chrox Oct 13, 2013

Member

The small defect on the top left corner indicated by the red arrow confused k2pdfopt. All lines below that defect are treated as indented text and indented lines are processed separately and are not joined together (it holds reasonable for indented code, but in the above case it just make the whole page messy.) A better algorithm in k2pdfopt is needed to determine indentation of the text paragraph so as to avoid this situation.
screenshot from 2013-10-13 19 25 07-

For now you can just crop out the defect like in the above image. Then it should reflow without problem.
screenshot from 2013-10-13 19 25 28

Member

chrox commented Oct 13, 2013

The small defect on the top left corner indicated by the red arrow confused k2pdfopt. All lines below that defect are treated as indented text and indented lines are processed separately and are not joined together (it holds reasonable for indented code, but in the above case it just make the whole page messy.) A better algorithm in k2pdfopt is needed to determine indentation of the text paragraph so as to avoid this situation.
screenshot from 2013-10-13 19 25 07-

For now you can just crop out the defect like in the above image. Then it should reflow without problem.
screenshot from 2013-10-13 19 25 28

@Markismus

This comment has been minimized.

Show comment
Hide comment
@Markismus

Markismus Oct 13, 2013

Member

Damn, that is indeed a small spot to have such impact!!
Thanks for the solution. I'll clean the document up.

It would be great when the docs don't have to be cleaned by hand anymore, though. :)

Member

Markismus commented Oct 13, 2013

Damn, that is indeed a small spot to have such impact!!
Thanks for the solution. I'll clean the document up.

It would be great when the docs don't have to be cleaned by hand anymore, though. :)

@chrox

This comment has been minimized.

Show comment
Hide comment
@chrox

chrox Oct 13, 2013

Member

The defect size option is supposed to eliminate this problem but the medium defect size (8.0) is too large for this document that normal characters are ate too. And a defect size of 3.0 does the trick perfectly. I'm considering changing the defect sizes in koptoptions.lua.

Reflowing with 3.0 defect size in auto cropping mode:
screenshot from 2013-10-13 21 24 41

Member

chrox commented Oct 13, 2013

The defect size option is supposed to eliminate this problem but the medium defect size (8.0) is too large for this document that normal characters are ate too. And a defect size of 3.0 does the trick perfectly. I'm considering changing the defect sizes in koptoptions.lua.

Reflowing with 3.0 defect size in auto cropping mode:
screenshot from 2013-10-13 21 24 41

@Markismus

This comment has been minimized.

Show comment
Hide comment
@Markismus

Markismus Oct 13, 2013

Member

Lovely!
I changed line 190 to: "values = {1.0, 3.0, 15.0},", set the Defect size to medium and it works!!

When I change the defect size to large the page is rendered as a modern painting rather than readable text. Maybe the value 15 is too large, too. Can't those values scale with the text glyphsize? The reflow uses boxes, can't the boxheight be a scaling factor?

Member

Markismus commented Oct 13, 2013

Lovely!
I changed line 190 to: "values = {1.0, 3.0, 15.0},", set the Defect size to medium and it works!!

When I change the defect size to large the page is rendered as a modern painting rather than readable text. Maybe the value 15 is too large, too. Can't those values scale with the text glyphsize? The reflow uses boxes, can't the boxheight be a scaling factor?

@chrox

This comment has been minimized.

Show comment
Hide comment
@chrox

chrox Oct 13, 2013

Member

Setting defect size to half of line height probably is a good guess. We need to dig into the k2pdfopt source to fix this though.

Member

chrox commented Oct 13, 2013

Setting defect size to half of line height probably is a good guess. We need to dig into the k2pdfopt source to fix this though.

@Markismus

This comment has been minimized.

Show comment
Hide comment
@Markismus

Markismus Oct 13, 2013

Member

I tried to set "DKOPTREADER_CONFIG_DEFECT_SIZE = 1.0" in default.lua to "DKOPTREADER_CONFIG_DEFECT_SIZE = 3.0" but it didn't result in anything. I still had to set the defect size manually to medium.
Would you know why?

I also saw "DKOPTREADER_CONFIG_DETECT_INDENT = 1". If I disabled this option, wouldn't I get rid of the problem, too?

Member

Markismus commented Oct 13, 2013

I tried to set "DKOPTREADER_CONFIG_DEFECT_SIZE = 1.0" in default.lua to "DKOPTREADER_CONFIG_DEFECT_SIZE = 3.0" but it didn't result in anything. I still had to set the defect size manually to medium.
Would you know why?

I also saw "DKOPTREADER_CONFIG_DETECT_INDENT = 1". If I disabled this option, wouldn't I get rid of the problem, too?

@chrox

This comment has been minimized.

Show comment
Hide comment
@chrox

chrox Oct 14, 2013

Member

Some parameters in defaults.lua are used only once when the parameters are selected from config panel. Then the parameters are stored in the history file associated with the document. So the DKOPTREADER_CONFIG_DEFECT_SIZE = 3.0 only affects newly opened document.

Member

chrox commented Oct 14, 2013

Some parameters in defaults.lua are used only once when the parameters are selected from config panel. Then the parameters are stored in the history file associated with the document. So the DKOPTREADER_CONFIG_DEFECT_SIZE = 3.0 only affects newly opened document.

@Markismus

This comment has been minimized.

Show comment
Hide comment
@Markismus

Markismus Oct 14, 2013

Member

That explains it. Thanks!

Member

Markismus commented Oct 14, 2013

That explains it. Thanks!

@Markismus Markismus closed this Oct 14, 2013

chrox added a commit to chrox/libk2pdfopt that referenced this issue Oct 27, 2013

Calibrate indentation
Median indent is added to calic1/2 array for each row so that
This should fix koreader/koreader#305.
rows indented at median indentation still get wrapped.

@chrox chrox referenced this issue in koreader/libk2pdfopt Oct 27, 2013

Merged

Calibrate indentation before reflowing #15

@chrox

This comment has been minimized.

Show comment
Hide comment
@chrox

chrox Oct 27, 2013

Member

screenshot from 2013-10-27 21 40 51

With chrox/libk2pdfopt@2397e67 indentention is preserved even the defect spot is not removed. This should be helpful for pages that have margin notes or reference marks.

Member

chrox commented Oct 27, 2013

screenshot from 2013-10-27 21 40 51

With chrox/libk2pdfopt@2397e67 indentention is preserved even the defect spot is not removed. This should be helpful for pages that have margin notes or reference marks.

@Markismus

This comment has been minimized.

Show comment
Hide comment
@Markismus

Markismus Oct 27, 2013

Member

Great. This will reduce the preprocessing of scanned documents significantly.

Member

Markismus commented Oct 27, 2013

Great. This will reduce the preprocessing of scanned documents significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment