Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cropping limited to letter size? #8

Closed
blankman373 opened this issue Jul 14, 2020 · 9 comments
Closed

Cropping limited to letter size? #8

blankman373 opened this issue Jul 14, 2020 · 9 comments
Assignees
Labels

Comments

@blankman373
Copy link

I'm working on setting up a basic document scanning station for my monthly bills. some of the bills are odd sizes, but they closely resemble legal size. its maybe 7x14, instead of 8.5x14. When I include the crop option, it crops the side, but also cuts off the bottom 3 inches. If i set the pagesize to legal, then it works, but that would do it for every scan then. Here's my code that i'm using with a Fujitsu SnapScan S510.

now=date +"%Y-%m-%d-%H%M"
#/home/pi/sane-scan-pdf/scan -d -r 300 -v -m Gray --crop --deskew --ocr -o /home/pi/scans/scan-$now.pdf

@rocketraman
Copy link
Owner

rocketraman commented Jul 14, 2020

Hi @blankman373 . The --crop option actually tells the Fujitsu SANE driver to do the cropping, so this is either a bug with the SANE driver, OR its a bug with the post-processing (conversion of pnm to PDF).

A few things to check:

  1. Does it happen if you do not use the --ocr option? When using the --ocr option, its actually tesseract that does the conversion to PDF, and it may not be dealing with the EPS bounding box correctly.

  2. If it does still happen when you remove the --ocr option, then the problem is likely to be the driver, but you can try commenting out the rm for the temporary intermediate files, and then you can look at the pnm output from the scanner, the ps, and the pdf files individually to see where in the process the crop is going wrong.

@blankman373
Copy link
Author

hi @rocketraman, thanks for getting back so quickly. I had some time to test tonight. It looks like it is a driver issue.

  1. It does happen when i don't use --ocr.
  2. It also happens in the pnm output right from the scanner.

I can't believe I didn't test this earlier, but I also tested with a page of actual Legal size, and with the --crop option, it cropped it to Letter. When i used --size Legal it worked fine.

Thanks for your help. I'll try to reach out to the SANE project listserv.

@rocketraman
Copy link
Owner

@blankman373 If you do end up creating an issue for the SANE project, please link it here for future searchers!

@rocketraman
Copy link
Owner

@blankman373 Since the cropping is done in the driver, it probably depends on the scanner to have scanned enough data in the first place. Since the driver is defaulting to scanning a letter size page, that's all the image data the driver gets from the scanner, and so there is nothing to crop.

Using the code from https://github.com/rocketraman/sane-scan-pdf/tree/issue-8, try specifying both a page height and crop i.e. --size Legal --crop and see if that works.

I'm going to re-open this for now, as some changes in the script might be useful here.

@rocketraman rocketraman reopened this Jul 15, 2020
@blankman373
Copy link
Author

Thank you again for looking at this. it looks like it's working now.

so i deleted the old code and cloned this branch using
git clone --single-branch --branch issue-8 https://github.com/rocketraman/sane-scan-pdf.git

updated my script to
/home/pi/sane-scan-pdf/scan -d -r 300 -v -m Gray --size Legal --crop --deskew -o /home/pi/scans/scan-$now.pdf

And now it properly crops the legal length sized page. I then added the --ocr and that works also, so it's definitely not a tesseract thing.

The last thing I tested, was a regular letter sized page. I thought that since --size Legal was present, it would give me 3 inches of white space? black empty space? I wasn't sure. But I'm happy to report that even letter sized pages crop properly now. Thank you again so much.

Now I just need to figure out the python script to file these away into folders after OCR.. :)

@rocketraman
Copy link
Owner

@blankman373 I just noticed the Fujitsu driver has an option for paper lower edge detection. Can you pull the updated issue-8 branch code, with the command line:

/home/pi/sane-scan-pdf/scan -d -r 300 -v -m Gray --crop --deskew -o /home/pi/scans/scan-$now.pdf

(i.e. without the --size option)

@blankman373
Copy link
Author

blankman373 commented Jul 22, 2020

hey @rocketraman, sorry it took so long to test this. i pulled the latest code from this branch (after making a backup of the old code) and tested without --size.

Unfortunately, it cropped the legal sized page at letter dimensions. I then put --size Legal back in and it cropped it again. I went back to the old issue-8 code and it scanned the full legal page.

@rocketraman
Copy link
Owner

@blankman373 I believe I fixed the issue. Now, you should be able to specify --crop and it should scan (approximately [1]) the right size, smaller or larger than letter.

[1] I say approximately because the cropping logic doesn't try to match a "stanard" size, it just crops to the content. This often results in page outputs with custom sizes, a bit smaller or larger than the standard size. This is to be expected.

@rocketraman
Copy link
Owner

Closing, feel free to re-open if you still have problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants