Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pulp_smash.api.poll_task #565

Open
elyezer opened this issue Feb 26, 2017 · 1 comment
Open

Improve pulp_smash.api.poll_task #565

elyezer opened this issue Feb 26, 2017 · 1 comment

Comments

@elyezer
Copy link
Contributor

elyezer commented Feb 26, 2017

We've seen some cases where for some unknown reason the poll_task won't never timeout. In order to avoid that we could improve the poll_task to use a threading.Timer like:

def poll_task(server_config, href):
    """Wait for a task and its children to complete. Yield response bodies.

    Poll the task at ``href``, waiting for the task to complete. When a
    response is received indicating that the task is complete, yield that
    response body and recursively poll each child task.

    :param server_config: A :class:`pulp_smash.config.ServerConfig` object.
    :param href: The path to a task you'd like to monitor recursively.
    :returns: An generator yielding response bodies.
    :raises pulp_smash.exceptions.TaskTimedOutError: If a task takes too
        long to complete.
    """
    timeout = 1800  # 1800s == 30m
    timer = threading.Timer(timeout, thread.interrupt_main)
    try:
        timer.start()
        while True:
            response = requests.get(
                urljoin(server_config.base_url, href),
                **server_config.get_requests_kwargs()
            )
            response.raise_for_status()
            attrs = response.json()
            if attrs['state'] in _TASK_END_STATES:
                # This task has completed. Yield its final state, then iterate
                # through each of its children and yield their final states.
                yield attrs
                for href in (task['_href'] for task in attrs['spawned_tasks']):
                    for final_task_state in poll_task(server_config, href):
                        yield final_task_state
                break
            sleep(5)
    except KeyboardInterrupt:
        raise exceptions.TaskTimedOutError(
            'Task {} is ongoing after {} seconds.'.format(href, timeout)
        )
    finally:
        timer.cancel()

The above approach is fine but now on Python 3 the thread module was renamed to _thread to encourage the usage of the threading instead. The only usage of the thread here is to interrupt the main thread after the timeout.

Another approach is to use a for loop:

def poll_task(server_config, href):
    """Wait for a task and its children to complete. Yield response bodies.

    Poll the task at ``href``, waiting for the task to complete. When a
    response is received indicating that the task is complete, yield that
    response body and recursively poll each child task.

    :param server_config: A :class:`pulp_smash.config.ServerConfig` object.
    :param href: The path to a task you'd like to monitor recursively.
    :returns: An generator yielding response bodies.
    :raises pulp_smash.exceptions.TaskTimedOutError: If a task takes too
        long to complete.
    """
    # 360 * 5s == 1800s == 30m
    # NOTE: The timeout counter is synchronous. We query Pulp, then count down,
    # then query pulp, then count down, etc. This is… dumb.
    poll_limit = 360
    for _ in range(poll_limit):
        response = requests.get(
            urljoin(server_config.base_url, href),
            **server_config.get_requests_kwargs()
        )
        response.raise_for_status()
        attrs = response.json()
        if attrs['state'] in _TASK_END_STATES:
            # This task has completed. Yield its final state, then iterate
            # through each of its children and yield their final states.
            yield attrs
            for href in (task['_href'] for task in attrs['spawned_tasks']):
                for final_task_state in poll_task(server_config, href):
                    yield final_task_state
            return
        sleep(5)
    raise exceptions.TaskTimedOutError(
        'Task {} is ongoing after {} polls.'.format(href, poll_limit)
    )

This have the same effect as the approach with the while loop but avoid managing the end of the iteration.

@Ichimonji10
Copy link
Contributor

I think the first approach is great. The problem with the for loop approach is that we synchronously wait for Pulp to respond to our "get status" request before letting the timer run, and we repeatedly do that, meaning that the actual amount of time spent in the for loop can be much longer than 1800s. We don't know the exact reason that poll_task() is sometimes causing issues, but getting rid of unreliable synchronous code seems like a good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants