can't deploy hadoop-processing with python-libjuju #67

Closed
kwmonroe opened this Issue Mar 2, 2017 · 9 comments

Comments

Projects
None yet
3 participants
Member

kwmonroe commented Mar 2, 2017

I recently ran cwr-ci on https://jujucharms.com/hadoop-processing/. CWR calls bundletester which calls matrix, which does libjuju stuff.

I think it adds apps, machines, and then adds units -- it feels like the add-unit piece isn't working. A matrix model was created, apps were added, machines were added, but no units ever showed up:

$ juju status -m matrix-close-dingo                                                                                                                                                                                   Thu Mar  2 15:26:05 2017

Model               Controller  Cloud/Region         Version
matrix-close-dingo  lxd         localhost/localhost  2.0.3

App                   Version  Status       Scale  Charm                   Store       Rev  OS      Notes
client                         blocked          0  hadoop-client           jujucharms    3  ubuntu
ganglia                        waiting          0  ganglia                 jujucharms    5  ubuntu
ganglia-node                   waiting        0  ganglia-node            jujucharms    6  ubuntu
namenode                       maintenance      0  hadoop-namenode         jujucharms    8  ubuntu
plugin                         maintenance      0  hadoop-plugin           jujucharms    8  ubuntu
resourcemanager                maintenance      0  hadoop-resourcemanager  jujucharms    8  ubuntu
rsyslog                        waiting          0  rsyslog                 jujucharms    7  ubuntu
rsyslog-forwarder-ha           waiting        0  rsyslog-forwarder-ha    jujucharms    7  ubuntu
slave                          maintenance      0  hadoop-slave            jujucharms    8  ubuntu

Unit                       Workload     Agent       Machine  Public address  Ports  Message
### HEY!  THERE'S NOTHING HERE

Machine  State    DNS           Inst id        Series  AZ
0        started  10.38.19.117  juju-d91d66-0  xenial
1        started  10.38.19.56   juju-d91d66-1  xenial
2        started  10.38.19.225  juju-d91d66-2  xenial
3        started  10.38.19.79   juju-d91d66-3  xenial
4        started  10.38.19.119  juju-d91d66-4  xenial
5        started  10.38.19.166  juju-d91d66-5  xenial

Relation         Provides         Consumes              Type
juju-info        client           ganglia-node          subordinate
hadoop-plugin    client           plugin                subordinate
juju-info        client           rsyslog-forwarder-ha  subordinate
node             ganglia          ganglia-node          regular
...

I was able to manually go in and add units like this:

juju add-unit -m matrix-close-dingo client --to 0

But I'd like to not have to do that. @petevg is on the case!

Collaborator

petevg commented Mar 2, 2017

As @kwmonroe mentioned, I am going to pick this one up, as it breaks matrix.

Member

johnsca commented Mar 2, 2017

This basic test case worked fine for me:

async def main():
    model = Model()
    await model.connect_current()

    try:
        await model.deploy('cs:hadoop-processing')
    finally:
        await model.disconnect()

loop.run(main())
Collaborator

petevg commented Mar 2, 2017

@johnsca Intriguing. I bet that there's something broken in the local bundle parsing code then.

Member

kwmonroe commented Mar 2, 2017

I cut the original juju status short.. Check this out:

...
Relation         Provides         Consumes              Type
juju-info        client           ganglia-node          subordinate
hadoop-plugin    client           plugin                subordinate
juju-info        client           rsyslog-forwarder-ha  subordinate
node             ganglia          ganglia-node          regular
juju-info        ganglia-node     namenode              regular
juju-info        ganglia-node     resourcemanager       regular
juju-info        ganglia-node     slave                 regular
juju-info        namenode         ganglia-node          subordinate
namenode         namenode         plugin                regular
namenode         namenode         resourcemanager       regular
namenode         namenode         slave                 regular
resourcemanager  plugin           resourcemanager       regular
juju-info        resourcemanager  ganglia-node          subordinate
resourcemanager  resourcemanager  slave                 regular
juju-info        slave            ganglia-node          subordinate

It's missing some of the "rsyslog-forwarder-ha" relations. Here's the bundle it used:

http://paste.ubuntu.com/24096487/

It's like it stopped on line 80 and skipped 81-84. So maybe matrix is waiting on all the relations to be added before proceeding to add-unit.

Edit, you can see from the matrix.log that it did indeed stop adding relations after rsyslog-fwrd::client:

$ cat matrix.log
matrix:124:load_suites: Parsing /usr/local/lib/python3.5/dist-packages/matrix/matrix.yaml
matrix:453:add_model: Creating model matrix-close-dingo
deploy:4:deploy: Deploying /tmp/cwr-tmp-rCyRBc/bundletester-2XQdBG/hadoop-processing
juju.model:1628:deploy: Deploying cs:xenial/hadoop-client-3
juju.model:1628:deploy: Deploying cs:~bigdata-dev/xenial/ganglia-5
juju.model:1628:deploy: Deploying cs:~bigdata-dev/xenial/ganglia-node-6
juju.model:1628:deploy: Deploying cs:xenial/hadoop-namenode-8
juju.model:1628:deploy: Deploying cs:xenial/hadoop-plugin-8
juju.model:1628:deploy: Deploying cs:xenial/hadoop-resourcemanager-8
juju.model:1628:deploy: Deploying cs:~bigdata-dev/xenial/rsyslog-7
juju.model:1628:deploy: Deploying cs:~bigdata-dev/xenial/rsyslog-forwarder-ha-7
juju.model:1628:deploy: Deploying cs:xenial/hadoop-slave-8
juju.model:1579:addRelation: Relating resourcemanager <-> namenode
juju.model:1579:addRelation: Relating namenode <-> slave
juju.model:1579:addRelation: Relating resourcemanager <-> slave
juju.model:1579:addRelation: Relating plugin <-> namenode
juju.model:1579:addRelation: Relating plugin <-> resourcemanager
juju.model:1579:addRelation: Relating client <-> plugin
juju.model:1579:addRelation: Relating ganglia-node:juju-info <-> client:juju-info
juju.model:1579:addRelation: Relating ganglia-node:juju-info <-> namenode:juju-info
juju.model:1579:addRelation: Relating ganglia-node:juju-info <-> resourcemanager:juju-info
juju.model:1579:addRelation: Relating ganglia-node:juju-info <-> slave:juju-info
juju.model:1579:addRelation: Relating ganglia:node <-> ganglia-node:node
juju.model:1579:addRelation: Relating rsyslog-forwarder-ha:juju-info <-> client:juju-info
$
Member

johnsca commented Mar 2, 2017

@petevg I tried again with a local copy of the bundle.yaml which I modified to point to a local build of one of the charms and it still worked as expected.

@kwmonroe The BundleHandler in libjuju would in fact wait for all of the relations to show up in the model before proceeding to the addUnits, so that makes sense. I'm just not sure why I can't replicate it. Can you try this on the machine where it's failing, with your example bundle in /tmp/hp/bundle.yaml:

from juju import loop
from juju.model import Model


async def main():
    model = Model()
    await model.connect_current()

    try:
        await model.deploy('/tmp/hp')
    finally:
        await model.disconnect()

loop.run(main())
Member

kwmonroe commented Mar 2, 2017

@johnsca I tried your test from within my cwr/0 unit, lxd exec'd into the cwr container that previously failed. It worked :/

Not sure why the original stopped processing addRelations, but I'll tear this down and try the original job again to see what happens.

Member

johnsca commented Mar 4, 2017

I'm seeing intermittent issues similar to this with conjure-up. I think there's some sort of race condition for this.

Collaborator

petevg commented Mar 7, 2017

I'm currently working on this. I have a fix locally, but I'm going to check in a more complete fix that also addresses #65

Collaborator

petevg commented Mar 7, 2017

PR here: #74

@petevg petevg closed this Mar 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment