Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object retrier #514

Merged
merged 4 commits into from
Mar 25, 2021
Merged

Object retrier #514

merged 4 commits into from
Mar 25, 2021

Conversation

ajkavanagh
Copy link
Contributor

This adds a wrapper class that detects if a callable object in any of
the descendent objects raises an Exception. If so, then it retries that
exception.

This is to attempt to make the zaza tests a little more robust in the
face of small network failures or strange restarts. This is a test, and
robust logging a reporting should be used to determine whether it is
covering up actual bugs rather than CI system issues.

Related Bug: (zot repo)#348

@codecov-io
Copy link

codecov-io commented Mar 3, 2021

Codecov Report

Merging #514 (000c977) into master (802e0e0) will increase coverage by 0.33%.
The diff coverage is 90.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #514      +/-   ##
==========================================
+ Coverage   18.49%   18.82%   +0.33%     
==========================================
  Files         160      160              
  Lines        9525     9565      +40     
==========================================
+ Hits         1762     1801      +39     
- Misses       7763     7764       +1     
Impacted Files Coverage Δ
zaza/openstack/charm_tests/octavia/tests.py 0.00% <0.00%> (ø)
zaza/openstack/utilities/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 802e0e0...000c977. Read the comment docs.

@ajkavanagh ajkavanagh force-pushed the object-retrier branch 2 times, most recently from bff1a7b to cbce35c Compare March 8, 2021 14:22
openstack-mirroring pushed a commit to openstack/openstack that referenced this pull request Mar 8, 2021
* Update charm-octavia from branch 'master'
  to 0c07a1f7959dfca82e2e2f12070dd4c8f5168988
  - Merge "Ensure octavia-driver-agents gets installed"
  - Ensure octavia-driver-agents gets installed
    
    Due to a 'quirk' being fixed, the octavia-driver-agents don't get
    installed if the ovsdb-subordinate  relation is the last to be made.
    This is due to the update-status hook gating the reactive handler from
    running which 'fixed' the issue.  Looking deeper, if the handler isn't
    the first to run, then the charm's packages won't get initialised
    properly.
    
    This patch fixes that by using the dynamic properties 'all_packages' and
    'full_service_list' available in charms.openstack core for the Charm
    class.  This means that it doesn't matter when the flag is set, as long
    as it is before the property is accessed.  Looking at the handler, this
    will be in the right place for the install.
    
    func-test-pr: openstack-charmers/zaza-openstack-tests#514
    
    Change-Id: I5fd75c9d371390bca402d6a3a264421a44fd092a
    Closes-Bug: #1916764
openstack-mirroring pushed a commit to openstack/charm-octavia that referenced this pull request Mar 8, 2021
Due to a 'quirk' being fixed, the octavia-driver-agents don't get
installed if the ovsdb-subordinate  relation is the last to be made.
This is due to the update-status hook gating the reactive handler from
running which 'fixed' the issue.  Looking deeper, if the handler isn't
the first to run, then the charm's packages won't get initialised
properly.

This patch fixes that by using the dynamic properties 'all_packages' and
'full_service_list' available in charms.openstack core for the Charm
class.  This means that it doesn't matter when the flag is set, as long
as it is before the property is accessed.  Looking at the handler, this
will be in the right place for the install.

func-test-pr: openstack-charmers/zaza-openstack-tests#514

Change-Id: I5fd75c9d371390bca402d6a3a264421a44fd092a
Closes-Bug: #1916764
@ajkavanagh ajkavanagh marked this pull request as ready for review March 9, 2021 15:04
@ajkavanagh
Copy link
Contributor Author

Tested with https://review.opendev.org/c/openstack/charm-octavia/+/778003 (which has now merged)

This adds a wrapper class that detects if a callable object in any of
the descendent objects raises an Exception.  If so, then it retries that
exception.

This is to attempt to make the zaza tests a little more robust in the
face of small network failures or strange restarts.  This is a test, and
robust logging a reporting should be used to determine whether it is
covering up actual bugs rather than CI system issues.

Related Bug: (zot repo)openstack-charmers#348
The main failure that seems to occur with clients is the
ConnectFailure, which according to the docs, is retry-able.  Thus
provide a function that adds that exception condition automatically.
Also fix the import problem in octavia test for the object retrier that
is being used to validate the ObjectRetrier feature.
Copy link
Contributor

@thedac thedac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking but a few thoughts. The ObjectRetrier call really feels like a re-implementation of tenacity. Is there a reason you are not using tenacity here?

I was a bit horrified to see the first usage was at the setUpClass level. Are we truly seeing failures at this stage? That would seem to indicate the overcloud is not really ready yet. Does it not?

It feels like we will want this kind of thing everywhere. Should we consider wrapping the openstack clients and importing the wrapped versions rather than the direct clients?

try:
return obj(*args, **kwargs)
except Exception as e:
# if retry_exceptions is None, or the type of the exception is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not None correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, typo; thanks for picking this up.

Comment on lines 68 to 73
cls.keystone_client = ObjectRetrier(
openstack_utils.get_keystone_session_client(cls.keystone_session))
cls.neutron_client = ObjectRetrier(
openstack_utils.get_neutron_session_client(cls.keystone_session))
cls.octavia_client = ObjectRetrier(
openstack_utils.get_octavia_session_client(cls.keystone_session))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised to see these at the setUpClass level. Are we seeing failures here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I may not have explained how it works properly. The __init__(...) method does the wrapping of the object, but it's when then client is used that the retry pattern is activated. e.g.

# wrap a client that is used to make calls.
client = ObjectRetrier(some_object_instance)
...
# later
...
# This is where the retries are done if needed
client.do_call()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes complete sense. I am not sure why I didn't see it at review time.

@ajkavanagh
Copy link
Contributor Author

Not blocking but a few thoughts. The ObjectRetrier call really feels like a re-implementation of tenacity. Is there a reason you are not using tenacity here?

Why didn't I use tenacity? It's a fair question. It was mostly about having more obvious (in the code) control over how the exceptions from the wrapped code are handled and determine whether a retry will happen and then either eating or propagating that exception; I felt that tenacity was hiding it too much and also it was quite awkward to set it up.

I was a bit horrified to see the first usage was at the setUpClass level. Are we truly seeing failures at this stage? That would seem to indicate the overcloud is not really ready yet. Does it not?

No, that's just the setup code to wrap the object; no retries happening at that point - please see the inline comment; I can see how the naming of the class might make people think that that is when the retries happen. Maybe the class should be renamed to SetupObjectRetrier or something like that that telegraphs that it's a setup phase, rather than when the retries happen. Alternatively, I can void the __init__() to a noop and use a class method (say _setup_) to do the wrap. However, I was trying to be minimally invasive of the namespace of the wrapped object as all 'calls' go to it via the retry mechanism. Hence renaming the class may be the better idea.

It feels like we will want this kind of thing everywhere. Should we consider wrapping the openstack clients and importing the wrapped versions rather than the direct clients?

Valid point. My purpose here (in this PR) was to introduce the code for the ObjectRetrier, show how it can be used, and test as fix for the Octavia tests. I didn't want to introduce it on everything (yet) as I wanted to see how it might impact our 'worst' offender set of tests first! :) What do you think?

@thedac
Copy link
Contributor

thedac commented Mar 24, 2021

Not blocking but a few thoughts. The ObjectRetrier call really feels like a re-implementation of tenacity. Is there a reason you are not using tenacity here?

Why didn't I use tenacity? It's a fair question. It was mostly about having more obvious (in the code) control over how the exceptions from the wrapped code are handled and determine whether a retry will happen and then either eating or propagating that exception; I felt that tenacity was hiding it too much and also it was quite awkward to set it up.

I am torn on this since we have the pattern of using tenacity everywhere else. But as I as, I am not blocking on this.

I was a bit horrified to see the first usage was at the setUpClass level. Are we truly seeing failures at this stage? That would seem to indicate the overcloud is not really ready yet. Does it not?

No, that's just the setup code to wrap the object; no retries happening at that point - please see the inline comment; I can see how the naming of the class might make people think that that is when the retries happen. Maybe the class should be renamed to SetupObjectRetrier or something like that that telegraphs that it's a setup phase, rather than when the retries happen. Alternatively, I can void the __init__() to a noop and use a class method (say _setup_) to do the wrap. However, I was trying to be minimally invasive of the namespace of the wrapped object as all 'calls' go to it via the retry mechanism. Hence renaming the class may be the better idea.

Makes complete sense now.

It feels like we will want this kind of thing everywhere. Should we consider wrapping the openstack clients and importing the wrapped versions rather than the direct clients?

Valid point. My purpose here (in this PR) was to introduce the code for the ObjectRetrier, show how it can be used, and test as fix for the Octavia tests. I didn't want to introduce it on everything (yet) as I wanted to see how it might impact our 'worst' offender set of tests first! :) What do you think?

I appreciate this is the first round and is targeted. It makes sense to land a lighter weight change first.

Copy link
Contributor

@thedac thedac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear we are just waiting on the typo fix to land this.

Fixes a typo (None -> not None) and renames the class from
ObjectRetrier -> ObjectRetrierWraps to make it clearer that the class
instantiation is to wrap and object with the retrier code rather than do
retries at that moment.
Copy link
Contributor

@thedac thedac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Landing. Thanks for the work.

@thedac thedac merged commit 0ace737 into openstack-charmers:master Mar 25, 2021
@ajkavanagh ajkavanagh deleted the object-retrier branch March 25, 2021 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants