Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surpress KeyError when applications are missing #367

Merged
merged 4 commits into from Nov 25, 2019

Conversation

SimonRichardson
Copy link
Member

The following surpresses a KeyError when a application or a remote
application is missing. Instead we should return None and correctly
handle that a application or a remote-application could be missing.

This should be safe to do so and the fact that an update from the
all watcher should fill in the gaps eventually. If this is not the
case, we should in the future ensure that a remote-application can't
raise index/key errors when data is missing.

The missing data when raised/thrown criples the library from being
useful, instead we should really handle all the cases these exist and
correctly insert None or a fallback.

The following surpresses a KeyError when a application or a remote
application is missing. Instead we should return None and correctly
handle that a application or a remote-application could be missing.

This should be safe to do so and the fact that an update from the
all watcher should fill in the gaps eventually. If this is not the
case, we should in the future ensure that a remote-application can't
raise index/key errors when data is missing.

The missing data when raised/thrown criples the library from being
useful, instead we should really handle all the cases these exist and
correctly insert None or a fallback.
@@ -35,7 +35,7 @@ deps =
# default tox env excludes integration and serial tests
commands =
# These need to be installed in a specific order
pip install urllib3==1.22
pip install urllib3==1.25.7
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to surpress the 3.8 warnings:

DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class AuthContext(collections.Mapping):

@gnuoy
Copy link
Contributor

gnuoy commented Nov 21, 2019

I tested with this PR and it fixed issue #366 for me. May only reservation (and this probably down my lack of understanding of how juju is modelling the relations) is that I don't understand why there would ever legitimately be no application associated with a relation.

@SimonRichardson
Copy link
Member Author

So this is a remoteApplication that it can't find, it should be show in pylibjuju. From my own testing (limited), the pylibjuju calls the AllWatcher Facade and consumes the delta changes from juju. These delta changes should be filling in the internal state of pylibjuju and we should have a local representation of juju internal model.
It seems like the AllWatcher call isn't consuming quick enough to get the changes from juju and asking for a relation match fails to pick up the remoteApplication.
Throwing a KeyError here is the wrong thing to do, it's expected that the AllWatcher should eventually catch up or if that's not the case, disconnecting and reconnecting should populate this internal state correctly.

@gnuoy
Copy link
Contributor

gnuoy commented Nov 21, 2019

It makes sense not to throw a KeyError if an absent application is a normal part of the deployment settling. But it sounds like there is another bug because the remote application never gets filled in, even rerunning the test 10 mins after the deployment.

@SimonRichardson
Copy link
Member Author

Indeed, I couldn't recreate that locally, but we have reproducing steps to potentially locate the issue.

After having a long internal discussion around this, it's evident that
this will cause errors down the line at some point. Instead we've
decided that it's better to throw a custom error, one that can be caught
individually to force the user of the library to disconnect and
reconnect from the model, to pick up any missing deltas.

This can be done in a way to the examples/model.py where retries can be
forced to happen when an exception is raised.

This code to change this is very rudimental and just exposes a new juju
error - JujuEntityNotFoundError. I've also updated the test to fully
expose the exception at a unit level.
@SimonRichardson
Copy link
Member Author

@gnuoy having discussed this further with @mitechie - I believe it's better to expose a custom error - JujuEntityNotFoundError, that can be caught which causes the code to attempt a new connection flow (disconnect and connect_current), to attempt to gain all the deltas on connection. This should hopefully ensure that you do have the remote model. If this isn't the case I'll look into the Juju side to see why the remote delta isn't coming through correctly.

@gnuoy
Copy link
Contributor

gnuoy commented Nov 21, 2019

I don't think this patch will fix the issue for me. I am seeing the issue on a deployed totally stable system, no matter how many times I connect->query->discconnect there is still no data associated with the 'remote-XXX' relation

@SimonRichardson
Copy link
Member Author

@gnuoy I'll try to find out why juju/juju isn't sending the delta or why libjuju isn't correctly consuming the delta then.

@gnuoy
Copy link
Contributor

gnuoy commented Nov 21, 2019

I added some debug to libjuju and I think the issue is that the remote applications (the apps in a different model) are unknown to this model, which makes sense. So using the bundles I mentioned earlier if the keystone model is in focus then the libjuju representation of the relations is:

    0 <Relation id=0 glance:cluster>
    1 <Relation id=1 keystone:cluster>
    2 <Relation id=2 percona-cluster:cluster>
    3 <Relation id=3 swift-proxy-region1:cluster>
    4 <Relation id=4 swift-proxy-region1:swift-storage swift-storage-region1-zone1:swift-storage>
    5 <Relation id=5 swift-proxy-region1:swift-storage swift-storage-region1-zone2:swift-storage>
    6 <Relation id=6 swift-proxy-region1:swift-storage swift-storage-region1-zone3:swift-storage>
    7 <Relation id=7 keystone:shared-db percona-cluster:shared-db>
    8 <Relation id=8 glance:shared-db percona-cluster:shared-db>
    9 <Relation id=9 glance:identity-service keystone:identity-service>
    10 <Relation id=10 swift-proxy-region1:identity-service keystone:identity-service>
    11 <Relation id=11 glance:object-store swift-proxy-region1:object-store>
    12 <Relation id=12 remote-9b8419fd79db401c87830070c46d0417:identity-service keystone:identity-service>
    13 <Relation id=13 remote-9b8419fd79db401c87830070c46d0417:swift-storage swift-storage-region1-zone1:swift-storage>
    14 <Relation id=14 remote-9b8419fd79db401c87830070c46d0417:swift-storage swift-storage-region1-zone2:swift-storage>
    15 <Relation id=15 remote-9b8419fd79db401c87830070c46d0417:swift-storage swift-storage-region1-zone3:swift-storage>
    16 <Relation id=16 swift-proxy-region1:swift-storage remote-6968c6dfdc7f4e3185950305bf97dabc:swift-storage>
    17 <Relation id=17 swift-proxy-region1:swift-storage remote-20b122c471b04c2681cbcc2532fd33ad:swift-storage>
    18 <Relation id=18 swift-proxy-region1:swift-storage remote-6474322a41de4d87879ba1c118cb9a34:swift-storage>
    19 <Relation id=19 remote-9b8419fd79db401c87830070c46d0417:rings-consumer swift-proxy-region1:rings-distributor>

Those four remotes relate to apps in the other model.

The following changes consume application offers, so that they're
correctly picked up.
@SimonRichardson
Copy link
Member Author

!!build!!

@SimonRichardson
Copy link
Member Author

SimonRichardson commented Nov 21, 2019

@gnuoy with my latest changes, I can get the examples/model.py to report back all the relations correctly.

➜ tox -e example examples/model.py
example develop-inst-noop: /home/simon/Documents/python-libjuju
example installed: apipkg==1.5,asynctest==0.13.0,attrs==19.3.0,backcall==0.1.0,bcrypt==3.1.7,bleach==3.1.0,certifi==2019.9.11,cffi==1.13.2,chardet==3.0.4,cryptography==2.8,decorator==4.4.1,docutils==0.15.2,entrypoints==0.3,execnet==1.7.1,idna==2.8,importlib-metadata==0.23,ipdb==0.12.2,ipython==7.9.0,ipython-genutils==0.2.0,jedi==0.15.1,jeepney==0.4.1,-e git+git@github.com:SimonRichardson/python-libjuju.git@8f840848b24e9394e5314f448a093b323339ca82#egg=juju,jujubundlelib==0.5.6,keyring==19.2.0,macaroonbakery==1.2.3,mock==3.0.5,more-itertools==7.2.0,packaging==19.2,paramiko==2.6.0,parso==0.5.1,pexpect==4.7.0,pickleshare==0.7.5,pkginfo==1.5.0.1,pluggy==0.13.0,prompt-toolkit==2.0.10,protobuf==3.10.0,ptyprocess==0.6.0,py==1.8.0,pyasn1==0.4.8,pycparser==2.19,Pygments==2.4.2,pymacaroons==0.13.0,PyNaCl==1.3.0,pyparsing==2.4.5,pyRFC3339==1.1,pytest==5.3.0,pytest-asyncio==0.10.0,pytest-forked==1.1.3,pytest-xdist==1.30.0,pytz==2019.3,PyYAML==5.1.2,readme-renderer==24.0,requests==2.22.0,requests-toolbelt==0.9.1,SecretStorage==3.1.1,six==1.13.0,theblues==0.5.2,toposort==1.5,tqdm==4.38.0,traitlets==4.3.3,twine==3.0.0,urllib3==1.25.7,wcwidth==0.1.7,webencodings==0.5.1,websockets==7.0,zipp==0.6.0
example run-test-pre: PYTHONHASHSEED='1135101242'
example run-test: commands[0] | python examples/model.py
remote-e6933eeb944c4ee880ace6fd8980ff5a
remote-e6933eeb944c4ee880ace6fd8980ff5a
remote-e6933eeb944c4ee880ace6fd8980ff5a
__________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________
  example: commands succeeded
  congratulations :)

This should help identify issues in the future.
Copy link
Contributor

@mitechie mitechie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shippit

"""Exception indicating that an entity was not found in the state. It was
expected that the entity was found in state and this is a terminal
condition.
To fix this condition, you should disconnect and reconnect to ensure that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline from here

juju/model.py Outdated Show resolved Hide resolved
@gnuoy
Copy link
Contributor

gnuoy commented Nov 22, 2019

The latest patch fixes the issue for me, thanks @SimonRichardson for all your work on this.

@gnuoy
Copy link
Contributor

gnuoy commented Nov 22, 2019

Although I thought this had fixed my cut down test case (I cannot reproduce my earlier success), going back to the original case suggests its not quite fixed. I am now getting:

2019-11-22 13:52:21 [INFO] Traceback (most recent call last):
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/openstack/charm_tests/swift/tests.py", line 33, in setUpClass
2019-11-22 13:52:21 [INFO]     super(SwiftImageCreateTest, cls).setUpClass()
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/openstack/charm_tests/test_utils.py", line 101, in setUpClass
2019-11-22 13:52:21 [INFO]     cls.keystone_session = openstack_utils.get_overcloud_keystone_session()
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/openstack/utilities/openstack.py", line 373, in get_overcloud_keystone_session
2019-11-22 13:52:21 [INFO]     return get_keystone_session(get_overcloud_auth(),
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/openstack/utilities/openstack.py", line 1596, in get_overcloud_auth
2019-11-22 13:52:21 [INFO]     remote_interface_name='certificates')
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/__init__.py", line 48, in _wrapper
2019-11-22 13:52:21 [INFO]     return run(_run_it())
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/__init__.py", line 36, in run
2019-11-22 13:52:21 [INFO]     return task.result()
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/__init__.py", line 47, in _run_it
2019-11-22 13:52:21 [INFO]     return await f(*args, **kwargs)
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/zaza/model.py", line 1436, in async_get_relation_id
2019-11-22 13:52:21 [INFO]     for rel in model.applications[application_name].relations:
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/juju/application.py", line 58, in relations
2019-11-22 13:52:21 [INFO]     return [rel for rel in self.model.relations if rel.matches(self.name)]
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/juju/application.py", line 58, in <listcomp>
2019-11-22 13:52:21 [INFO]     return [rel for rel in self.model.relations if rel.matches(self.name)]
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/juju/relation.py", line 115, in matches
2019-11-22 13:52:21 [INFO]     if endpoint.application is None:
2019-11-22 13:52:21 [INFO]   File "/home/ubuntu/work/bugs/1815879/.tox/func-noop/lib/python3.6/site-packages/juju/relation.py", line 26, in application
2019-11-22 13:52:21 [INFO]     raise JujuEntityNotFoundError(app_name, ["application", "remoteApplication"])
2019-11-22 13:52:21 [INFO] juju.errors.JujuEntityNotFoundError: Entity not found: remote-9b8419fd79db401c87830070c46d0417

@SimonRichardson
Copy link
Member Author

@gnuoy any chance you could un-comment this line to see what we're missing https://github.com/juju/python-libjuju/pull/367/files#diff-934c79f928f581e5f1d8903f275de68fR904

@gnuoy
Copy link
Contributor

gnuoy commented Nov 22, 2019

@gnuoy any chance you could un-comment this line to see what we're missing https://github.com/juju/python-libjuju/pull/367/files#diff-934c79f928f581e5f1d8903f275de68fR904

Do you mean the "raise..." line ? I did uncomment that and it didn't raise anything.

@SimonRichardson
Copy link
Member Author

$$merge$$

I'm going to merge this fix whilst we're still investigating #366

@jujubot jujubot merged commit 5fd1b7f into juju:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants