Skip to content

Conversation

@rohanpm
Copy link
Contributor

@rohanpm rohanpm commented Sep 9, 2019

This commit makes every future returned by this client into
a proxy future. A proxy future can be used as a future, but
it will also implicitly proxy attribute and method lookups
through to the underlying result of a future. In many cases,
this allows blocking code to be written as if futures are
not used at all.

This is similar in spirit to commit 15eae36, embracing
implicit over explicit for cleaner code.

Why?

APIs in this library are written to return Futures, to enable
aggressively concurrent code. However, there remain many cases
where concurrency isn't possible or desired, so the APIs are
used in a blocking manner.

Currently, the authors of such code are penalized by having to
add many calls to .result(), making the code noisy. This commit
seeks to reduce that noise and make the blocking code more
readable, while still fully supporting non-blocking code styles.

As an example of why both non-blocking and blocking coding styles
should be supported, consider the search_repository API.

Two cases where a task might need to search for repositories are:

  • when uploading a productid cert into a repo, we need to find
    related repos for the same product and update repo notes there.
  • when adding or removing RPMs in a repo, we need to search for
    corresponding UBI repo(s) in need of repopulation

Both of these cases can occur during a task which needs to
process many other unrelated steps too, which ideally should
be handled concurrently; hence, non-blocking makes sense here.

But another common case is:

  • the user has passed a set of repo IDs into a task and we want
    to perform a certain operation on each repo (e.g. publish;
    clear-repo)

In this case, the desired behavior is generally that we want
to fail the task early and not proceed to next steps if any of
the passed repo IDs are incorrect. So, we do explicitly want
to block on repo search completion, by writing code such as:

repos = client.search_repository(...)
found_ids = [repo.id for repo in repos]
check_ids(requested_ids, found_ids)

Hence both blocking and non-blocking usage of search_repository
is valid in different contexts. It'd be best if we can cleanly
support both.

Risks

It's not a certainty that the benefits of this change outweigh
the disadvantages. Problems which might come up due to this include:

  • Generally enables devs to be sloppy, to misunderstand what's the
    return type of various methods, to not consider when we should
    or shouldn't block.

  • Might be confusing due to the proxy being incomplete.
    For example, if a Future would return an empty list,
    len(f) == 0 is True, but bool(f) is also True since the future
    is non-None; a behavior difference from a plain empty list.

  • Since it's now less common that .result() must be called, it might
    increase the chance that necessary calls are forgotten.

This commit makes every future returned by this client into
a proxy future. A proxy future can be used as a future, but
it will also implicitly proxy attribute and method lookups
through to the underlying result of a future. In many cases,
this allows blocking code to be written as if futures are
not used at all.

This is similar in spirit to commit 15eae36, embracing
implicit over explicit for cleaner code.

Why?
====

APIs in this library are written to return Futures, to enable
aggressively concurrent code. However, there remain many cases
where concurrency isn't possible or desired, so the APIs are
used in a blocking manner.

Currently, the authors of such code are penalized by having to
add many calls to .result(), making the code noisy. This commit
seeks to reduce that noise and make the blocking code more
readable, while still fully supporting non-blocking code styles.

As an example of why both non-blocking and blocking coding styles
should be supported, consider the search_repository API.

Two cases where a task might need to search for repositories are:

- when uploading a productid cert into a repo, we need to find
  related repos for the same product and update repo notes there.
- when adding or removing RPMs in a repo, we need to search for
  corresponding UBI repo(s) in need of repopulation

Both of these cases can occur during a task which needs to
process many other unrelated steps too, which ideally should
be handled concurrently; hence, non-blocking makes sense here.

But another common case is:

- the user has passed a set of repo IDs into a task and we want
  to perform a certain operation on each repo (e.g. publish;
  clear-repo)

In this case, the desired behavior is generally that we want
to fail the task early and not proceed to next steps if any of
the passed repo IDs are incorrect. So, we do explicitly want
to block on repo search completion, by writing code such as:

    repos = client.search_repository(...)
    found_ids = [repo.id for repo in repos]
    check_ids(requested_ids, found_ids)

Hence both blocking and non-blocking usage of search_repository
is valid in different contexts. It'd be best if we can cleanly
support both.

Risks
=====

It's not a certainty that the benefits of this change outweigh
the disadvantages. Problems which might come up due to this include:

- Generally enables devs to be sloppy, to misunderstand what's the
  return type of various methods, to not consider when we should
  or shouldn't block.

- Might be confusing due to the proxy being incomplete.
  For example, if a Future would return an empty list,
  len(f) == 0 is True, but bool(f) is also True since the future
  is non-None; a behavior difference from a plain empty list.

- Since it's now less common that .result() must be called, it might
  increase the chance that necessary calls are forgotten.
@rohanpm
Copy link
Contributor Author

rohanpm commented Sep 10, 2019

Feedback wanted, as I'm having a lot of trouble deciding whether this is a good idea.

How annoying is it really, having to write .result() everywhere when you want to block?

@JayZ12138
Copy link
Contributor

I feel this is a very decent feature, makes the code much cleaner.

Just need to remember we need to use f_proxy for all futures returned.

It would be better if f_proxy could also wrap the call back method similar as f_map and f_flat_map if that doesn't break the concept of 'proxy'

JayZ12138
JayZ12138 previously approved these changes Sep 10, 2019
@rajulkumar
Copy link
Contributor

What I understand from this is we still get the futures as before. Only thing that changes is it won't require calling .results() in every case like while iterating etc. Though that option is still there and there might be cases where you need that explicitly.
So I think it's a nice to have feature that will surely make things easier and more readable.

rajulkumar
rajulkumar previously approved these changes Sep 10, 2019
@rohanpm
Copy link
Contributor Author

rohanpm commented Sep 11, 2019

I'm going to hang onto this until next week at least.

@rohanpm rohanpm merged commit a0ba2c6 into release-engineering:master Sep 16, 2019
@rohanpm rohanpm deleted the proxy branch September 16, 2019 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants