Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception thrown in prometheus exporter when trying to scrape #16

Closed
rmn36 opened this issue Oct 11, 2018 · 7 comments
Closed

Exception thrown in prometheus exporter when trying to scrape #16

rmn36 opened this issue Oct 11, 2018 · 7 comments

Comments

@rmn36
Copy link

rmn36 commented Oct 11, 2018

Stacktrace below.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1988, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1641, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1544, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1639, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1625, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 69, in inner
    return self._run_view(f, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 368, in _run_view
    return fn(self, *args, **kwargs)
  File "/usr/local/airflow/plugins/prometheus_exporter/prometheus_exporter.py", line 136, in index
    return Response(generate_latest(), mimetype='text')
  File "/usr/local/airflow/plugins/prometheus_exporter/prometheus_exporter.py", line 114, in generate_latest
    for name, labels, value in metric.samples:
ValueError: too many values to unpack (expected 3)
@elephantum
Copy link
Contributor

Can you elabprate on your setup? Which version of Airflow do you use?

@rmn36
Copy link
Author

rmn36 commented Oct 11, 2018

Using version 1.9.0-4 based on this popular docker image with Python 3: https://github.com/puckel/docker-airflow

Issue appears to be that the Sample object has more than just name, label and value now. this may be a change on the prometheus client side.

[2018-10-11 21:30:34 +0000] [35] [INFO] Worker exiting (pid: 35)
Sample(name='process_virtual_memory_bytes', labels={}, value=275132416.0, timestamp=None, exemplar=None)

Hotfix is straightforward. Something like this:

for name, labels, value, timestamp, exemplar in metric.samples:

but that does not make sure of the additional information in any way.

Edit: this appears to fix the issue in Docker but when running in Kubernetes it still fails but with a different exception. Still investigating

@rmn36
Copy link
Author

rmn36 commented Oct 12, 2018

It now fails on one of the SQL queries.

This function:

def get_task_state_info():
    '''get task info
    :return task_info
    '''
    task_status_query = Session.query(
        TaskInstance.dag_id, TaskInstance.task_id,
        TaskInstance.state, func.count(TaskInstance.dag_id).label('value')
    ).group_by(TaskInstance.dag_id, TaskInstance.task_id, TaskInstance.state).subquery()
    return Session.query(
        task_status_query.c.dag_id, task_status_query.c.task_id,
        task_status_query.c.state, task_status_query.c.value, DagModel.owners
    ).join(DagModel, DagModel.dag_id == task_status_query.c.dag_id).all()

creates this query:

SELECT anon_1.dag_id AS anon_1_dag_id, 
       anon_1.task_id AS anon_1_task_id, 
       anon_1.state AS anon_1_state, 
       anon_1.value AS anon_1_value, 
       dag.owners AS dag_owners 
FROM (SELECT task_instance.dag_id AS dag_id, 
             task_instance.task_id AS task_id, 
             task_instance.state AS state, 
             count(task_instance.dag_id) AS value 
      FROM task_instance 
      GROUP BY task_instance.dag_id, task_instance.task_id, task_instance.state ) AS anon_1 
JOIN dag ON dag.dag_id = anon_1.dag_id

causes this error:
ERROR: relation "task_instance" does not exist at character 312

This seems to refers to the FROM task_instance in the subquery.

This is the Dockerfile I'm using the create the image with the plugin and client

FROM puckel/docker-airflow:1.9.0-4

USER root

ADD ./plugins /usr/local/airflow/plugins
RUN pip3 install prometheus_client

USER airflow

@elephantum
Copy link
Contributor

Thanks, we’ll try to reproduce your conditions and fix the problem.

@rmn36
Copy link
Author

rmn36 commented Oct 12, 2018

@elephantum Thanks! It appears that when the plugin and client are installed the postgresql database isn't initialized properly causing the task_instance table to not be present. The only difference in the postgresql logs I can find is that when the plugin and client are installed the postgresql process complains about an "incomplete startup packet"

LOG: incomplete startup packet

EDIT It might actually be a timing issue that the code is trying to access postgres before the database is actually created and it's causing this error and crashing the airflow process.

EDIT 2: everything I said was wrong. The database gets created fine. I think part of it might be a timing issue. However, the failure seems to be happening when loading dags

@rmn36
Copy link
Author

rmn36 commented Oct 12, 2018

@elephantum fixed by upgrading psycopg2

pip3 install psycopg2 -U

elephantum added a commit that referenced this issue Oct 15, 2018
@elephantum
Copy link
Contributor

@rmn36 can you please check with the latest release if everything works for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants