Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Normalize Content Size #89
One very, very useful improvement would be a new step between steps 4 (Select content) and 5 (Margins) which I would call "4A - Content size normalization".
What you do, is select a typical page of text, and the width of that content becomes the baseline. You then say "apply to all pages", "selected pages", etc. to scale the other pages' content so the width matches the baseline page, and then the margins are applied.
This way problems with varying page sizes on images -- and hence content size (i.e. text is not uniform size on every page), due to camera-book distance drift as you go through a thick book (a problem I and other simpler scanner designs have as the page surface to camera distance is not fixed as compared to more sophisticated v-platen designs) can be eliminated during post processing and the content -- i.e. text size, and line lengths -- will have uniform size.
Right now because I do all right pages, then spin the book around and do all left page (starting at the back end of the book, so page turning is easy and the same as "normal reading"), so the last page from the right hand set is "small" as it's furthest away from the camera, but it's facing, left page is "large" because that's done as the first page when doing the left hand pages as I start at the end of the book, so when flipping pages in the PDF the text alternates between large and small.
I'm sure there are other details that need to be thought through -- or even better methods, but that's the basic idea.
Further to my explanation in the initial request on this issue, I have posted some details and a sample of the problem at the following link:
And some rational behind why it would be a good issue to solve via software here:
As to the solutions, upon thinking further about the 2 methods for this I suggested I have the following thoughts them.
Pick the first and last pages (assuming they are the largest and smallest) from one side of the book, calculate the difference and determine the adjustment factor to apply to every page.
Problems that can make this unreliable are:
You select a page that you'd like to be the baseline from which the the 'standard' content to be measured is identified (this will typically be the text width) and then every page (or selected pages, etc.) this 'standard' content area is identified, and it's width scaled to match the width of the 'standard' content on the baseline page (in width because height may vary as content may not fill the page).
Problems with this method are:
Advantages of this method are:
I was thinking a simpler, very easy way to solve this problem would be to use a calibration sheet (say a 1" checker board) that can be easily de-skewed and analysed. So that would be scanned it first (placing it on top of the first page of the book) and again last, (placing it on top of the last page after scanning the book) and the size difference of the squares (by number of pixels) between the first and last image would give the image scale change which can then be applied across all pages proportionately.
It might not even need a special calibration pattern, but just a sheet with some bold pattern -- that Scan Tailor can consistent and accurately content select -- that is just scanned as both the first and last page. Then, if there was an option in step 5 (so after the auto-content selection of step 4) to say first and last images are calibration pages, the scale factor can be determined -- by comparing the number of pixels (width or height) the auto-content selection found for the two calibration images -- and applied as part of the margin application process (since there is already scaling being done when they are applied).
As an update to my last comment about using a calibration page, the scale adjustment factor has to be applied differently, depending on whether 1 or 2 cameras were used for the scanning. So there would need to be an additional option to say if calibration pages were for 1 or two cameras.