Option to remove blank pages when importing documents #668
-
I'd like to have to option to automatically remove blank pages when importing a document. This correlates to #528 where the possiblity to remove pages manually is requested. |
Beta Was this translation helpful? Give feedback.
Replies: 14 comments 46 replies
-
https://github.com/baltpeter/scanprep |
Beta Was this translation helpful? Give feedback.
-
Hi Mates, I use the docker-setup would like to give u brief overview of my solution:
Features
|
Beta Was this translation helpful? Give feedback.
-
In my eyes this could be a great feature since for instance I always have to split one sided and double sided documents before scanning and even though the mixed docs will have the blank pages still. I wasn't able to install the script solution from @psi-4ward on my QNAP Docker installation. May it could make it's way to the roadmap. Would be great. Paperless is really awesome 👍 thanks. |
Beta Was this translation helpful? Give feedback.
-
Great feature and great idea :) |
Beta Was this translation helpful? Give feedback.
-
Unfortunately, I haven't been able to get the script to work. Can anyone help me out? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Here is my script which uses ink_cov and with a threshold suitable for my scanner and use cases. Also I added some output to stderr to see which threshold was detected on added or removed pages:
|
Beta Was this translation helpful? Give feedback.
-
I have changed the script to only delete blank pages when it is a singlesided document. For documents which are doublesided I want to keep any blank pages to keep the original page ordering.
|
Beta Was this translation helpful? Give feedback.
-
Regarding #668 (reply in thread): Updated version from @felixgonschorek which checks for file format and skips all non PDF files.
|
Beta Was this translation helpful? Give feedback.
-
I personally have not tried any of these solutions. However, I'd like to warn that the real solution should actually delete the page contents. I know there are some PDF tools (like this) that can reorder or delete pages, but they just remove them from the index/list of pages, they do not delete the real content (it's still in there, just not visible anymore, but can still be extracted, and still takes up space). I hope the |
Beta Was this translation helpful? Give feedback.
-
For those who are searching for the final script, with checking all pages in a PDF. Please find my working script below which is a sum of latest comments and improvements. #!/bin/bash
#set -x -e -o pipefail
set -e -o pipefail
export LC_ALL=C
#IN="$1"
IN="$DOCUMENT_WORKING_PATH"
# Check for PDF format
TYPE=$(file -b "$IN")
if [ "${TYPE%%,*}" != "PDF document" ]; then
>&2 echo "Skipping $IN - non PDF [$TYPE]."
exit 0
fi
# PDF file - proceed
#PAGES=$(pdfinfo "$IN" | grep ^Pages: | tr -dc '0-9')
PAGES=$(pdfinfo "$IN" | awk '/Pages:/ {print $2}')
>&2 echo Total pages $PAGES
# Threshold for HP scanners
# THRESHOLD=1
# Threshold for Canon MX925
THRESHOLD=1
non_blank() {
for i in $(seq 1 $PAGES) ; do
PERCENT=$(gs -o - -dFirstPage=${i} -dLastPage=${i} -sDEVICE=ink_cov "${IN}" | grep CMYK | nawk 'BEGIN { sum=0; } {sum += $1 + $2 + $3 + $4;} END { printf "%.5f\n", sum } ')
>&2 echo -n "Color-sum in page $i is $PERCENT: "
if awk "BEGIN { exit !($PERCENT > $THRESHOLD) }"; then
echo $i
>&2 echo "Page added to document"
else
>&2 echo "Page removed from document"
fi
done
}
NON_BLANK=$(non_blank)
if [ -n "$NON_BLANK" ]; then
NON_BLANK=$(echo $NON_BLANK | tr ' ' ",")
qpdf "$IN" --replace-input --pages . $NON_BLANK --
fi |
Beta Was this translation helpful? Give feedback.
-
It would be great to either edit these scripts in the front end or get an entry in options to exclude blank pages by default. |
Beta Was this translation helpful? Give feedback.
-
In my experience, deleting blank pages often isn't sufficient as the reverse side of a document frequently contains irrelevant information or graphics. It would be great if users could decide which pages to delete through a user interface. |
Beta Was this translation helpful? Give feedback.
-
This discussion has been automatically closed because it was marked as answered. |
Beta Was this translation helpful? Give feedback.
-
This discussion has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion for related concerns. |
Beta Was this translation helpful? Give feedback.
Hi Mates,
I use the docker-setup would like to give u brief overview of my solution:
scripts
nearby thedocker-compose.yaml
- ./scripts:/scripts:ro
to thevolumes:
section of paperlessPAPERLESS_PRE_CONSUME_SCRIPT=/scripts/pre-consume.sh
to the environment (ie env-file or environment block in docker-compose.yaml)scripts/pre-consume.sh
withscripts/remove-blank-pages.sh