Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks not scheduling with livy operator #3

Closed
kraj007 opened this issue Aug 19, 2020 · 3 comments
Closed

tasks not scheduling with livy operator #3

kraj007 opened this issue Aug 19, 2020 · 3 comments

Comments

@kraj007
Copy link

kraj007 commented Aug 19, 2020

Hello Vadim,

We use docker based airflow running in our aws infra. We use python Operator in all of our DAGS.
However , i was trying to use your livy operator in our Dags. I correctly mentioned dependency in requirement files and imported operator in DAGs. Tasks also get created but its not scheduling or remains in no state or queued forever. This we observed only for livy operator (as we used only Python operator till now) . Can you please tell me if there any configuration or setting in airflow which is not allowing tasks with Livy operator /custom operator getting scheduled or executed.

Thanks in Advance

@panovvv
Copy link
Owner

panovvv commented Aug 19, 2020

Hi!
The way I'd debug it is by narrowing down the scope of the problem:

  • try following the quick setup guide from the readme.md. There's a docker-compose file that brings up a small version of spark cluster with livy on your machine, then you run the airflow and there are already example DAGs that you could run, having nothing but your laptop. That will tell you if the library code is working.
  • same setup as above, but bring down the docker-compose cluster and try pointing Airflow at the cluster in your infra. Look at the helper script code, function init_airflow() to see how you can point Airflow at a different cluster (I think it's sufficient to redefine livy connection for sessions mode, plus batch_files_path variable for batches). This tells you if your cluster plays well with the library.
    And go from there... There's nothing that helps debugging things like that more than taking out a part of the system and replacing it with something that you know works.

@kraj007
Copy link
Author

kraj007 commented Aug 24, 2020

Thanks Vadim for your reply.
Also i am getting lot of "Duplicate session name" errors in DAG. I am using Livy 0.6 version on EMR.
While Passing payload i kept static "name" paremeter like "name" ="test_job".
My tasks sometime runs fine and most of the times failes with Duplicate session name issue.
Can you suggest solution for this

@panovvv
Copy link
Owner

panovvv commented Aug 24, 2020

No idea about this one as I've never encountered it... Try appending something random to a task name?

@kraj007 kraj007 closed this as completed Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants