Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected failure during module execution: "_is_text_encoding=False" #407

Closed
infernix opened this issue Oct 26, 2018 · 10 comments
Closed

Unexpected failure during module execution: "_is_text_encoding=False" #407

infernix opened this issue Oct 26, 2018 · 10 comments

Comments

@infernix
Copy link

@infernix infernix commented Oct 26, 2018

This is a tricky one. Let me try to summarize.

  • Playbook manages entire OS for Debian desktop devices
  • Playbook in question upgrades target hosts from Debian 8 to 9. Several asyncs are used throughout.
  • Runs about a dozen different roles after the dist-upgrade without problems
  • At a specific point, all modules fail with _is_text_encoding=False error.
  • Rerunning the exact playbook afterwards does not trigger the issue.

This play still works:

- name: Install OS package dependancies
  apt: pkg={{ item }} state=present
  retries: 4
  delay: 15
  with_items:
  - libudev-dev
  - libjson-c-dev
  - libcurl4-gnutls-dev
  - node-colors
  - node-request
  - nodejs
  tags: domotica

And then this right after fails with above error:

- name: Install npm packages block
  block:
    - name: Install npm packages with npm module
      npm: path="/home/{{ domotica_user }}/domotica" ignore_scripts=yes
      become: yes
      become_user: "{{ domotica_user }}"
  rescue:
    - name: Install npm packages with command line
      command: npm install --ignore-scripts
      args:
        chdir: "/home/{{ domotica_user }}/domotica"
      become: yes
      become_user: "{{ domotica_user }}"
  tags: domotica

Both npm and command fail with the same error. I first thought it was an npm module issue, but evidently it's not.

The host executing the play runs Debian sid, and as mentioned the target host runs Debian 8 at the start of the play and then is upgraded during the play.

Unfortunately the only way I can reproduce this, is by instantiating one of those Debian 8 desktops and then executing this entire upgrade play (takes 30mins-2hrs depending on download speed and target host cpu). I haven't been able to reproduce in any other way.

@dw
Copy link
Member

@dw dw commented Oct 26, 2018

Hi :) What version of Ansible / Mitogen are these? And what is the host Python version? That is indeed a strange error, but it sounds like you've provided enough to go on to begin with.

"_is_text_encoding" is a keyword parameter to the codecs.Codec() class, although I'm not sure how your error ends up formatted the way it does -- is there no other diagnostic information provided?

Rerunning the playbook is possibly succeeding because a prior step triggering the problem becomes green in the second run. Is the 'apt' module run yellow in the first run, and green in the second run? That may significantly narrow things down

@dw
Copy link
Member

@dw dw commented Oct 26, 2018

Oh! An upgrade, you say! That might mean the Python version used to start the remote interpreter no longer matches the Python version on disk -- meaning future module imports could possibly end up with a mix-and-match standard library.

Actually 70% confident this hunch might be the case .. is_text_encoding was added fairly recently in the 2.7 series (May 2015), Debian 8 was released April 2015. Presumably the Python in Debian 8 lacks is_text_encoding, the upgrade supplies a Python with is_text_encoding, and the later module attempts to load a new codec which tries to supply the kwarg to the old interpreter.

If it is indeed the case, this should be a relatively simple fix -- fstat() on a few essential pieces of Python, and if they change, restart the interpreter

@infernix
Copy link
Author

@infernix infernix commented Oct 26, 2018

That's exactly what is the issue, yes. Python 2.7 on Debian 8 does not contain _is_text_encoding references in the relevant encodings/*.py codec files, whereas on Debian 9 they do. And the playbook in question executes a full dist-upgrade from Debian 8 to 9. Not sure why a bunch of other plays execute fine after the dist-upgrade though.

It seems the host python version is irrelevant - I have had the same issue happen just now where the host Python is 2.7.3 (Debian 7, e.g. does not have _is_text_encoding code) as well as with 2.7.15 (Debian sid, does have it). Ansible is 2.6.1 in both cases; mitogen is 72da291.

@dw
Copy link
Member

@dw dw commented Oct 26, 2018

The reason some don't break is because the remote interpreter doesn't recompile everything for every task, it reuses old imports, if they exist already. Double edged sword - much faster, but more brittle, as you have discovered :)

This is the second or third bug of its class -- on-disk config/code changes around the remote interpreter, but this is the first bug where hacking around it is either expensive and an incomplete fix (preload everything we could possibly need at startup), or requires a new mechanism.

So I think this ticket will be about teaching the remote interpreter to restart automatically, and that requires some work, as multiple controller workers may be connected to the same interpreter when it's time for restart (due to delegate_to etc)

Thanks for reporting this! Exciting showstopper bug to round up a Friday :P

@infernix
Copy link
Author

@infernix infernix commented Oct 26, 2018

Is there any way to trigger a restart manually from within the playbook (e.g. not by simply re-executing ansible-playbook)? Apparently an async play is not sufficient?

Edit: meta: reset_connection perhaps?

@dw
Copy link
Member

@dw dw commented Oct 26, 2018

If 'meta: reset_connection' was implemented, yes :) That is literally what I'm working on in my branch just now.

There is an action plugin in the test suite, mitogen_shutdown_all.py. It is quite brutal, but so long as you're running with 'linear' strategy (definitely not 'free' strategy), it may be enough to keep you moving along.

It's a noop for non-Mitogen connections so should be safe to try. Just drop it onto your ANSIBLE_ACTION_PLUGINS= path

@dw
Copy link
Member

@dw dw commented Oct 26, 2018

You'd use it by inserting it as the very next task after the dist-upgrade, I guess, and restrict it to a single host using 'run_once: true'. Subsequent tasks should cause reconnection to occur normally

@infernix
Copy link
Author

@infernix infernix commented Oct 29, 2018

Works 👍 Tried on about a dozen hosts.

I did see one AttributeError: 'Connection' object has no attribute '_reset' but that was after the apt: dist-upgrade and before the mitogen_shutdown_all so probably not related (host may have gone away, e.g. someone unplugged the thing - these are desktops).

@dw
Copy link
Member

@dw dw commented Oct 31, 2018

Hi Gerben,

The latest Git master includes support for "meta: reset_connection" on Ansible 2.5.6 and up. It's much less brutal than the previous method, although basically the same code underneath.

Upstream renamed "reset" and "_reset" at some stage -- not sure why you're seeing it during your run, but in any case, the extension implements both names now.

Still keeping this ticket open for a "smarter restart" option

dw added a commit that referenced this issue Jan 20, 2019
Closes #407.
@dw
Copy link
Member

@dw dw commented Jan 20, 2019

Hi Gerben,

I think for now given how constrained I am for time, the "smarter restart" option may have to wait :) Meanwhile "meta: reset_connection" is on master now, and I believe ready for a 0.2.4 release Real Soon Now(tm).

Thanks again for reporting this!

@dw dw closed this Jan 20, 2019
dw added a commit that referenced this issue Jan 20, 2019
* origin/dmw:
  Motivational shame badges back in README
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused variable (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused variable (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: unused variable (reported by LGTM)
  issue #61: unused import (reported by LGTM)
  issue #61: fix bare except (reported by LGTM)
  issue #61: unused variable (reported by LGTM)
  issue #61: remove duplicate method (reported by LGTM)
  issue #61: add missing close() implementation (reported by LGTM)
  issue #61: add inverse comparison (reported by LGTM)
  issue #61: remove duplicated method (reported by LGTM)
  issue #424: ansible: make put_file() raise AnsibleFileNotFound
  issue #407: update Changelog.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants