Skip to content

Reflow sometimes doesn't break the lines properly #305

Closed
Markismus opened this Issue Oct 13, 2013 · 10 comments

2 participants

@Markismus
KOReader Community member

As the images show, the text doesn't reflow. Instead every line is treated as a new paragraph. This problem affects the whole page every time it happens.

I've encountered this with multiple pdf-files in build 545 on a Kobo Aura HD. The images show a gothic text (https://copy.com/rNHwENxHUa1uVx0E) , but it also happens with roman texts.

imag0076_resize
imag0077_resize
imag0078_resize

@chrox
KOReader Community member
chrox commented Oct 13, 2013

The small defect on the top left corner indicated by the red arrow confused k2pdfopt. All lines below that defect are treated as indented text and indented lines are processed separately and are not joined together (it holds reasonable for indented code, but in the above case it just make the whole page messy.) A better algorithm in k2pdfopt is needed to determine indentation of the text paragraph so as to avoid this situation.

screenshot from 2013-10-13 19 25 07-

For now you can just crop out the defect like in the above image. Then it should reflow without problem.
screenshot from 2013-10-13 19 25 28

@Markismus
KOReader Community member

Damn, that is indeed a small spot to have such impact!!
Thanks for the solution. I'll clean the document up.

It would be great when the docs don't have to be cleaned by hand anymore, though. :)

@chrox
KOReader Community member
chrox commented Oct 13, 2013

The defect size option is supposed to eliminate this problem but the medium defect size (8.0) is too large for this document that normal characters are ate too. And a defect size of 3.0 does the trick perfectly. I'm considering changing the defect sizes in koptoptions.lua.

Reflowing with 3.0 defect size in auto cropping mode:
screenshot from 2013-10-13 21 24 41

@Markismus
KOReader Community member

Lovely!
I changed line 190 to: "values = {1.0, 3.0, 15.0},", set the Defect size to medium and it works!!

When I change the defect size to large the page is rendered as a modern painting rather than readable text. Maybe the value 15 is too large, too. Can't those values scale with the text glyphsize? The reflow uses boxes, can't the boxheight be a scaling factor?

@chrox
KOReader Community member
chrox commented Oct 13, 2013

Setting defect size to half of line height probably is a good guess. We need to dig into the k2pdfopt source to fix this though.

@Markismus
KOReader Community member

I tried to set "DKOPTREADER_CONFIG_DEFECT_SIZE = 1.0" in default.lua to "DKOPTREADER_CONFIG_DEFECT_SIZE = 3.0" but it didn't result in anything. I still had to set the defect size manually to medium.
Would you know why?

I also saw "DKOPTREADER_CONFIG_DETECT_INDENT = 1". If I disabled this option, wouldn't I get rid of the problem, too?

@chrox
KOReader Community member
chrox commented Oct 14, 2013

Some parameters in defaults.lua are used only once when the parameters are selected from config panel. Then the parameters are stored in the history file associated with the document. So the DKOPTREADER_CONFIG_DEFECT_SIZE = 3.0 only affects newly opened document.

@Markismus
KOReader Community member

That explains it. Thanks!

@Markismus Markismus closed this Oct 14, 2013
@chrox chrox added a commit to chrox/libk2pdfopt that referenced this issue Oct 27, 2013
@chrox chrox Calibrate indentation
Median indent is added to calic1/2 array for each row so that
This should fix koreader/koreader#305.
rows indented at median indentation still get wrapped.
2397e67
@chrox chrox referenced this issue in koreader/libk2pdfopt Oct 27, 2013
Merged

Calibrate indentation before reflowing #15

@chrox
KOReader Community member
chrox commented Oct 27, 2013

screenshot from 2013-10-27 21 40 51

With chrox/libk2pdfopt@2397e67 indentention is preserved even the defect spot is not removed. This should be helpful for pages that have margin notes or reference marks.

@Markismus
KOReader Community member

Great. This will reduce the preprocessing of scanned documents significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.