Airflow Worker and Scheduler is not picking up job #94

kush99993s · 2017-06-24T03:14:28Z

hello everyone,

I am trying to run docker-compose-CeleryExecutor.yml, however, worker is not picking up job.

I am seeing following message.

webserver_1  | [2017-06-24 02:59:43 +0000] [29] [INFO] Handling signal: ttin
webserver_1  | [2017-06-24 02:59:43 +0000] [68] [INFO] Booting worker with pid: 68

After that I am not seeing scheduler and worker picking up job and executing

The text was updated successfully, but these errors were encountered:

flynnCoolblue · 2017-07-04T18:18:36Z

Me also! Any ideas?

openp2pdesign · 2017-07-05T18:15:10Z

Same for me, and it seems that it might also cause #44.
It seems a problem with Airflow, and until 1-2 weeks ago it was working for me, see the issue here: https://issues.apache.org/jira/browse/AIRFLOW-1355

openp2pdesign · 2017-07-06T10:38:02Z

I left several comments in #44 about this, since both might be related. To recap:

I have the same issue with 1.8.1, but in my case it seems like a consequence of #94.
If I use Airflow on my machine without Docker (macOS Sierra 10.12.5) by launching first the scheduler and then the webserver with SequentialExecutor (i.e. basic default configuration), it works (DAGs run, and I get the log). But with this docker compose, DAGs are not launched.
It might also be a problem with Airflow itself, but if I launch it with basic configuration, it works.. so this docker compose should be checked as well. It used to work well until 1-2 weeks ago, not sure what has changed (I use a modified version, but both the modified version and this docker compose have the same problem).

I have:
Docker: Version 17.06.0-ce-mac18 (18433) Channel: stable
docker-compose: version 1.14.0, build c7bdf9e

I tried with previous versions of Docker: 1.12.6 (14937) and 17.03.1 (16048), the problem is still the same.

DAGs are found and launched in the webserver, but scheduler and worker actually don't run them:

webserver_1  | [2017-07-06 09:42:20,577] [123] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
worker_1     | [2017-07-06 09:42:21 +0000] [43] [INFO] Handling signal: ttou
worker_1     | [2017-07-06 09:42:21 +0000] [116] [INFO] Worker exiting (pid: 116)
webserver_1  | [2017-07-06 09:42:21 +0000] [42] [INFO] Handling signal: ttou
webserver_1  | [2017-07-06 09:42:21 +0000] [115] [INFO] Worker exiting (pid: 115)
scheduler_1  | [2017-07-06 09:42:22 +0000] [43] [INFO] Handling signal: ttin

openp2pdesign · 2017-07-10T06:25:57Z

I solved the issue for me: the problem was a recent Docker update. So I had to delete all Docker files, install a previous version and reboot. Now it works with: Version 17.03.0-ce-mac2 (15654)

puckel · 2017-07-27T08:46:20Z

@kush99993s Do you use example dags or your own dags ?

hporter · 2017-08-01T14:18:52Z

Just trying out airflow for the first time this week was struggling with this issue of no jobs running, on both the default test and local executor. Thanks to a post from @puckel I tried disabling the example dags, and now everything seems to be working well - tested with an example DAG from the Airflow repo and loaded in to the container using docker-compose:

Docker version 17.06.0-ce, build 02c1d87 <-- latest stable


docker-compose -f docker-compose-LocalExecutor.yml up -d

Tasks seem to run fine now - strange that the example DAGs flag lead to this behaviour.

rdtr · 2017-08-24T22:48:28Z

I'm running Airflow on kubernetes based on this Dockerfile setting + some adjustment and facing the similar issue. When I manually run a DAG, some will run but after that, all remaining tasks will get stuck at a queued status. I use CeleryExecutor with Redis.

[INFO] Handling signal: ttin
[INFO] Booting worker with pid: xxx

I also see this log on a web container, but not sure if it's related. The web server cannot retrieve a log from a worker directly, but it eventually can be seen via S3 when a task is complete, so I thought it's not a critical problem. Is the log retrieval related to this issue?

So far, every time I see this issue I manually "clear" the task which get stuck, then it will run. I really have no clue what is the root cause of this problem🙁

hporter · 2017-09-28T11:24:32Z

That's strange. I'm currently only using LocalExecutor - which works well - however when I try Celery I will see if I hit any trouble

LanDeQuHuXi · 2017-10-11T00:29:18Z

@rdtr I noticed the same issue.
Manually invoked DAG's tasks stuck at queued state a lot! Did you find a cause to this?

rafaelnovello · 2017-10-16T16:10:24Z

Hi guys!
I'm trying to run with Local Executor but apparently I got a Zombie or Undead. When I run sudo docker-compose -f docker-compose-LocalExecutor.yml up I got:

webserver_1  | [2017-10-16 15:58:48,813] {jobs.py:1443} INFO - Heartbeating the executor
webserver_1  | [2017-10-16 15:58:49,816] {jobs.py:1407} INFO - Heartbeating the process manager
webserver_1  | [2017-10-16 15:58:49,817] {dag_processing.py:559} INFO - Processor for /usr/local/airflow/dags/tuto.py finished
webserver_1  | [2017-10-16 15:58:49,825] {dag_processing.py:627} INFO - Started a process (PID: 151) to generate tasks for /usr/local/airflow/dags/tuto.py - logging into /usr/local/airflow/logs/scheduler/2017-10-16/tuto.py.log

... over and over again.

I have tried downgrade docker, as mentioned earlier, from 17.09.0-ce to 17.03.0-ce but I got the same problem.

Could someone give me some help?
Thanks!

h5chauhan · 2018-02-23T20:55:17Z

We are running this docker image in ECS one or more tasks in the same dag get queued but dont start workers and scheduler seem to have low cpu utilization. All tasks are databricks operators which sleep 60 secs after checking job status.

eschwartz · 2018-05-16T16:55:28Z

Any update on this from the Airflow team? This seems like a critical issues, that tasks are not getting run.

I am seeing this issue intermittently as well, with Airflow running on Amazon ECS

Rixamos · 2018-07-18T18:45:55Z

Hello, I had a similar problem on a dockerized airflow scheduler and the reason of the scheduling stop has been the scheduler logs size that filled the disk.
I hope this information can help someone.

ayush-chauhan · 2018-07-20T09:32:23Z

Hi, I am not using docker to run airflow but facing the same problem.

[INFO] Handling signal: ttou
[INFO] Worker exiting (pid: 31418)
[INFO] Handling signal: ttin
[INFO] Booting worker with pid: 32308

DAGs are not running manually or even picked by the scheduler.

I am using packaged dag.

unzip alerting.zip   

creating: airflow_utils/
inflating: airflow_utils/enums.py  
inflating: airflow_utils/psql_alerting_dag.py  
extracting: airflow_utils/__init__.py  
inflating: airflow_utils/hive_alerting_dag.py  
inflating: airflow_utils/alerting_utils.py  
inflating: alerting_12hrs.py       
inflating: alerting_15mins.py      
inflating: alerting_3hrs.py

If I place all these files in dags folder instead of packaging them, airflow scheduler is able to schedule the dags.

mukeshnayak1 · 2018-12-15T06:15:37Z

Same here. I'm using Airflow as a standalone app (not in Docker or Kubernetes). When I start the DAG, the DAG shows the state as running. But none of the tasks show queued or started. It takes a long time. I don't have any other DAG running or anybody else using this Airflow.

OleksiiDuzhyi · 2019-03-01T09:51:09Z

Same problem as @mukeshnayak1

[2019-03-01 11:45:05 +0200] [49524] [INFO] Handling signal: ttin
[2019-03-01 11:45:05 +0200] [9318] [INFO] Booting worker with pid: 9318
[2019-03-01 11:45:05,631] {__init__.py:51} INFO - Using executor SequentialExecutor
[2019-03-01 11:45:05,832] {models.py:273} INFO - Filling up the DagBag from /Users/alexeyd/airflow/dags
[2019-03-01 11:45:06 +0200] [49524] [INFO] Handling signal: ttou
[2019-03-01 11:45:06 +0200] [9115] [INFO] Worker exiting (pid: 9115)
[2019-03-01 11:45:36 +0200] [49524] [INFO] Handling signal: ttin
[2019-03-01 11:45:36 +0200] [9373] [INFO] Booting worker with pid: 9373
[2019-03-01 11:45:36,937] {__init__.py:51} INFO - Using executor SequentialExecutor
[2019-03-01 11:45:37,142] {models.py:273} INFO - Filling up the DagBag from /Users/alexeyd/airflow/dags
[2019-03-01 11:45:38 +0200] [49524] [INFO] Handling signal: ttou
[2019-03-01 11:45:38 +0200] [9166] [INFO] Worker exiting (pid: 9166)

Do you have an idea how to deal with it? Running tasks works though...

bhrd · 2019-03-11T20:36:13Z

Same problem as @OleksiiDuzhyi
Airflow version: 1.10.1 with no progress on DAG tasks, even though the DAG status says running. Same log as #94 (comment)

ee07dazn · 2019-03-14T13:12:25Z

Same issue as reported by @bhrd @OleksiiDuzhyi @mukeshnayak1 , while running in standalone mode : Airflow 1.10.2 sequentialExecutor

breath103 · 2019-03-29T02:02:34Z

This been the case for me too. driving me absolutely crazy
tried on both mac + docker 1.18, ubuntu + docker 1.17, same issue. task stuck at "running" doesn't give any progress or change whatsoever

kirkportas · 2019-04-07T01:15:48Z

I had this problem for several hours today with several different containerized airflow options.

This one from Bitnami is working for me, hope it helps someone else.
beware do not use the curl download command for the docker-compose.yml file at the top of this link, it only has 2 services inside it. Instead copy paste the docker-compose text on the page itself (with 5-6 services).
https://hub.docker.com/r/bitnami/airflow/

BigJerBD · 2019-05-02T22:08:13Z

I got this problem too recently while i was using the docker-compose-LocalExecutor.yml

I did not fix it, but by making sure that I run docker-compose down before restarting my containers with docker compose up, I was able to run my jobs properly.

So simply stopping the containers might cause something wrong when restarting them? i'm just speculating

nasmithan · 2019-05-13T20:45:14Z

Seeing this issue quite often as well. 50-100 DAGs in Running state, with only 3-4 "running" tasks. CPU/Memory/Disk space is all fine.

We're using CeleryExecuter. Restarting containers sometimes helps, but a lot of the time we need to mark DAGs as Successful to clear things out.

tmulc18 · 2019-06-05T00:32:54Z

I'm having the same issue, but only when using CeleryExecutor. LocalExecutor seems to work.

Amertz08 · 2019-06-23T18:59:58Z

To the people running into the tasks never actually running...it sounds like an issue w/ the scheduler not being ran (I wondered the same thing for half a day). If it is a LocalExecutor it should be ran in the same container as the web server. You can see this in scripts/docker-entrypoint.sh. I've been working on my own POC for Airflow + Spark + Celery. You can see that here. I used this repo as a basis for a lot of that work.

anshulrohira · 2019-08-03T02:18:44Z

Has anyone able to solve this issue , running into the exact same error , with dag in running state but not actually picked by the scheduler.

[INFO] Handling signal: ttou
[INFO] Worker exiting (pid: 31418)
[INFO] Handling signal: ttin
[INFO] Booting worker with pid: 32308

Read in a stackoverflow post that the handling ttou/ttin is a gunicorn behaviour to refresh workers and is as expected but doesnt quite explain why scheduler is not picking up anything.

leejiancai · 2019-08-13T04:57:47Z

solutiion1: try to run
export C_FORCE_ROOT='true'

solution2: run airflow worker as non-root user

jamiedoornbos · 2019-09-05T18:38:59Z

I am having the same issue with running in ECS and using Databricks operators. I have 4 main DAGs, two with 2 tasks and two with 1 task. The scheduling seems to work fine for a while, then it stalls, with the DAG still "running", but tasks completed. It stays this way indefinitely, and no new tasks trigger. Restarting the service allows it to continue.

As an added complication for debugging, I'm running in Fargate and it's not possible (or not easy) to see internals of the container.

jaygcoder · 2019-09-23T06:00:58Z

I ran into this one too and may have found a potential solution. In the doc for CeleryExecutor (https://airflow.apache.org/howto/executor/use-celery.html):

Make sure to use a database backed result backend

From this recommendation I configured my local setup to have a valid result_backend. I created a 2nd postgres db and added it here:
[celery] result_backend = db+postgresql://username:****@localhost:5432/results_backend_db

It's working okay so far, but the slowness of the tasks are pretty noticeable. There seems to be a 15-20 second gap between each task:

Will give it a shot with the Docker-Compose one later this week.

kadensungbincho · 2019-10-04T05:18:54Z

@leejiancai

solution2: run airflow worker as non-root user

Oh, this worked for me on Airflow in Container, Celery Executor, RabbitMQ(another Container) with MySQL(localhost) backend.

chris-aeviator · 2019-10-09T09:26:17Z

having this with celery as well as dask – local is not working for me

EDIT:

I have a af scheduler, a dask scheduler (3 dask workers) and a af webserver, which should I run as root or non root?

jamiedoornbos · 2019-10-09T16:33:27Z

Hi, just thought I'd post back here. We solved our issue by increasing the memory on the machine. My guess is the scheduler was dumping lots of errors, but we're currently unable to see the logs due to AWS Fargate limitations. We have a task queued up to expose the logs in S3, which will also allow us to see historical logs. They currently get wiped on deployment of a new container.

mikolaje · 2019-10-11T14:20:29Z

check if airflow scheduler is running
check if airflow webserver is running
check if all DAGs are set to On in the web UI
check if the DAGs have a start date which is in the past
check if the DAGs have a proper schedule (before the schedule date) which is shown in the web UI
check if the dag has the proper pool and queue.

tooptoop4 · 2019-10-22T20:20:46Z

@chris-aeviator how did u allow dask workers to run airflow tasks? I get no such file error. Any setup script u can share?

chris-aeviator · 2019-10-22T23:37:43Z

@tooptoop4 The trick is to create the same env on the airflow side and on the dask side and then use the dask executor (https://airflow.apache.org/howto/executor/use-dask.html) . When pip installing airflow on the dask workers (!!) they will magically pick up airflow tasks (and run them as seq. executor as far as I remember). Also make sure you have your DAG files cloned to the airflow scheduler, worker AND the dask worker. Am 23. Oktober 2019 03:20:55 GMT+07:00 schrieb tooptoop4 <notifications@github.com>:

…

@chris-aeviator how did u allow dask workers to run airflow tasks? I get no such file error. Any setup script u can share? -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #94 (comment)

vasinata · 2020-03-11T00:48:09Z

I have same issue. Are there any solutions on this one?

chris-aeviator · 2020-03-11T02:06:42Z

@vasinata did you check the steps described in my last comment? Both the worker and the scheduler need to have the same python packages, the same dags (at least the same filename for the dag on the scheduler, content of the file does not matter much),...installed for the worker correctly picking up jobs. I implemented this by making sure both containers are always in sync with a Kubernetes sidecar doing a git clone + pip install on both In order to help you further please describe your setup and the steps you run and which issues you encounter Am 11. März 2020 07:48:23 GMT+07:00 schrieb vasinata <notifications@github.com>:

…

I have same issue. Are there any solutions on this one? -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #94 (comment)

vasinata · 2020-03-11T12:50:49Z

@chris-aeviator Yes. I did check steps as you have recommended. I have kubernetes cluster . every time it creates the pod it gets image from ECR repository (same as scheduler pod) and mounts efs dags to it. I have been even trying to run testing dags and I see same issue: task gets queued, pod is getting created on kubernetes, pod runs and successfully finishes, but task never gets to running or finished state. Am I missing anything?

shamoji-tech · 2020-06-11T06:44:07Z

Have you check end_time ? Scheduler never pick up Jobs after end_time, even if you trigger in your Web UI.

K-7 · 2021-04-15T12:45:51Z

I am having the same issue after I migrated to 2.0 using docker.
I dont run celery , I use executor = LocalExecutor. When I run my DAG i get the following error

[INFO] Handling signal: ttin
[INFO] Booting worker with pid: 32308
INFO - DAG test already has 18 active runs, not queuing any tasks for run 2021-04-15
[INFO] Worker exiting (pid: 458)
[INFO] Handling signal: ttou

kush99993s changed the title ~~job is not starting~~ Airflow Worker and Scheduler is not picking up job Jun 24, 2017

openp2pdesign mentioned this issue Jul 6, 2017

"Failed to fetch log file from worker" when running LocalExecutor #44

Open

ghost mentioned this issue Oct 10, 2019

DAG not running straight out of the box using LocalExecutor with docker-compose? #446

Open

Airflow Worker and Scheduler is not picking up job #94

Airflow Worker and Scheduler is not picking up job #94

Comments

kush99993s commented Jun 24, 2017

flynnCoolblue commented Jul 4, 2017

openp2pdesign commented Jul 5, 2017

openp2pdesign commented Jul 6, 2017

openp2pdesign commented Jul 10, 2017

puckel commented Jul 27, 2017

hporter commented Aug 1, 2017

rdtr commented Aug 24, 2017 • edited Loading

hporter commented Sep 28, 2017

LanDeQuHuXi commented Oct 11, 2017

rafaelnovello commented Oct 16, 2017 • edited Loading

h5chauhan commented Feb 23, 2018

eschwartz commented May 16, 2018

Rixamos commented Jul 18, 2018

ayush-chauhan commented Jul 20, 2018 • edited Loading

mukeshnayak1 commented Dec 15, 2018

OleksiiDuzhyi commented Mar 1, 2019

bhrd commented Mar 11, 2019

ee07dazn commented Mar 14, 2019

breath103 commented Mar 29, 2019

kirkportas commented Apr 7, 2019 • edited Loading

BigJerBD commented May 2, 2019

nasmithan commented May 13, 2019

tmulc18 commented Jun 5, 2019

Amertz08 commented Jun 23, 2019 • edited Loading

anshulrohira commented Aug 3, 2019 • edited Loading

leejiancai commented Aug 13, 2019

jamiedoornbos commented Sep 5, 2019

jaygcoder commented Sep 23, 2019 • edited Loading

kadensungbincho commented Oct 4, 2019

chris-aeviator commented Oct 9, 2019 • edited Loading

jamiedoornbos commented Oct 9, 2019

mikolaje commented Oct 11, 2019

tooptoop4 commented Oct 22, 2019

chris-aeviator commented Oct 22, 2019 via email

vasinata commented Mar 11, 2020

chris-aeviator commented Mar 11, 2020 via email

vasinata commented Mar 11, 2020

shamoji-tech commented Jun 11, 2020

K-7 commented Apr 15, 2021

rdtr commented Aug 24, 2017 •

edited

Loading

rafaelnovello commented Oct 16, 2017 •

edited

Loading

ayush-chauhan commented Jul 20, 2018 •

edited

Loading

kirkportas commented Apr 7, 2019 •

edited

Loading

Amertz08 commented Jun 23, 2019 •

edited

Loading

anshulrohira commented Aug 3, 2019 •

edited

Loading

jaygcoder commented Sep 23, 2019 •

edited

Loading

chris-aeviator commented Oct 9, 2019 •

edited

Loading