-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Polling Consumer adds duplicates to Work Queue #1760
Comments
OK, after some more deep dive and analysis, it seems I have a race condition involving my HorizontalPodAutoscaler. I will dig deeper tomorrow and provide an update if I can proof I am the Idiot. |
OK, I can happily confirm that I managed to find a way. Use a […]
- name: paperless-app
envFrom:
- configMapRef:
name: paperless-config
image: ghcr.io/paperless-ngx/paperless-ngx:latest
command:
- bash
- "-c"
- |
set -ex
printf 'Preparing paperless-app ... ';
#
# Statefulset has sticky identity, number should be last
#
[[ `hostname` =~ -([0-9]+)$ ]] || ( echo "Strange Hostname: $(hostname)"; exit 1 )
ordinal=${BASH_REMATCH[1]}
#
if [[ $ordinal -eq 0 ]]; then
printf 'First Instance .. '
export PAPERLESS_CONSUMER_POLLING=${PAPERLESS_CONSUMER_POLLING_FIRST}
else
printf 'Supporting Instance .. '
export PAPERLESS_CONSUMER_POLLING=${PAPERLESS_CONSUMER_POLLING_OTHER}
fi
printf 'PAPERLESS_CONSUMER_POLLING=%s\n\n' ${PAPERLESS_CONSUMER_POLLING}
#
env
printf '\nNow handing over to normal entrypoint ... ';
/sbin/docker-entrypoint.sh /usr/local/bin/paperless_cmd.sh
[…] This takes two separate PAPERLESS_CONSUMER_POLLING_FIRST: "15"
PAPERLESS_CONSUMER_POLLING_OTHER: "99999999" i. e. the other containers will have a 3-year polling interval, whereas the initial Pod will use 15 Seconds. I’m happy to discuss other/ better solutions, but this seems to work for me (based on a 30-minute test throwing a load of documents at Paperless-NGX without the errors from above showing up and with dynamic scaling of pods). |
Small correction. This works better in scaling situations: […]
export PAPERLESS_CONSUMER_POLLING=${PAPERLESS_CONSUMER_POLLING_OTHER}
export PAPERLESS_CONSUMPTION_DIR=/tmp/nothing-here
mkdir /tmp/nothing-here
chmod 777 /tmp/nothing-here
[…] because otherwise the autoscaled pods will pick up the documents already in the consume folder. |
I am kinda curious why this has been closed. This bug has rendered my Paperless install unusable. It got well over 100k jobs for only a few hundred files. I don't see any way to clear the queue, so its kind of stuck right now. |
Well, as you can see it was closed by the original poster who left a detailed explanation of how he solved the issue. If you have a different issue or the solution above doesnt work then you can decide what to do. My guess is it is something specific to your setup (and thus potentially can be fixed on your end, though of course not necessarily). Many, many people use this software and we havent seen any other reports of this, so Im not so sure there is a true "bug" somewhere... |
I closed it because I feel it was not a real bug — the installation on kubernetes with more than one container seems not officially supported. I have created a feature request for that purpose. |
I can understand that. I have a simple setup, only one instance. No Kubernetes, only 1 worker/thread. Only thing is its behind a standard reverse proxy and its using an external Redis server. I still get this issue, and have for months. I wonder if its maybe a timing issue using external Redis? |
Oohh — maybe you create a separate ticket for that? Since my issue clearly was caused by my parallel operation of multiple pods and reproducibly is gone after I implemented my fix. Your issue hence is very most likely a different one. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
Description
I struggle singe two days to get my containerized Paperless-NGX 1.9.2 working properly in conjunction with my flatbed scanner. Now I am reaching out for your help.
Files get picked up from the Consume directory (a NFS share in my case) multiple times, being added as duplicates to the work queue, and (after being removed by the first successful processing) resulting in loads and loads of errors.
The Log below is the result of me putting 4 files into the consume directory, resulting in 11 Errors.
Steps to reproduce
Install from docker image.
This is the config I use, but I also tried with PostGreSQL first and had the same result:
I also fiddled with the Polling Interval, Delay and Retries, the Worker Count and the Threads, but never yield and satisfactory behaviour.
Webserver logs
The text was updated successfully, but these errors were encountered: