Pillar does not handle Unicode data #3436

Closed
madduck opened this Issue Jan 25, 2013 · 10 comments

Comments

Projects
None yet
@madduck
Contributor

madduck commented Jan 25, 2013

My pillar data is supplied by cmd_yaml, and some of it is unicoded (UTF-8). I have used pdb and printf-style debugging to verify that the ext_pillar function in salt/pillar/cmd_yaml.py returns those data properly as unicode() type.

I want to use these data in a Jinja2 template. Unfortunately (again using pdb and printf-style debugging), the data passed as "Context" to the Jinja2 processor are no longer unicoded.

This means that somewhere within Salt, between ext_pillar and instantiating a Jinja2 template as part of file.managed, the data are converted from Unicode to ASCII, without a conversion error, meaning that the data are squashed to an ASCII-representation. And indeed, the character '…' is stored as "\xe2\x80\xa6", not as "\u2026" as it should.

@morganfainberg

This comment has been minimized.

Show comment
Hide comment
@morganfainberg

morganfainberg Jan 25, 2013

Contributor

This appears that this is related to msgpack and how msgpack handles the unicode objects. There is an active thread over here on github: msgpack/msgpack#121

Since the data is msgpacked on the master (dumps) before being sent to the minions, and then unpacked (loads) the unicode data type is lost and we get a byte_str back. I am not sure what the best approach to fixing this problem.

I think the wrong approach would be decoding everything to utf-8 (sub-optimal and require running through the data structure and looking for any possible strings to decode).

@thatch45 do you have any insight on the best approach? Maybe build a map that tracks any unicode items in pillar (and possibly elsewhere) and then just decodes them directly on the minion's side. Downside is this requires (again) running through all the data to find unicode to build the mapping.

Confirmed via python cli here:

before_msgpack = {'INT': 1, 'str': 'STRING', 'UNICODE': u'\u2026'}

before_msgpack

{'INT': 1, 'str': 'STRING', 'UNICODE': u'\u2026'}

packed = msgpack.dumps(before_msgpack)
after_msgpack = msgpack.loads(packed)

after_msgpack
{'INT': 1, 'STR': 'STRING', 'UNICODE': '\xe2\x80\xa6'}
Contributor

morganfainberg commented Jan 25, 2013

This appears that this is related to msgpack and how msgpack handles the unicode objects. There is an active thread over here on github: msgpack/msgpack#121

Since the data is msgpacked on the master (dumps) before being sent to the minions, and then unpacked (loads) the unicode data type is lost and we get a byte_str back. I am not sure what the best approach to fixing this problem.

I think the wrong approach would be decoding everything to utf-8 (sub-optimal and require running through the data structure and looking for any possible strings to decode).

@thatch45 do you have any insight on the best approach? Maybe build a map that tracks any unicode items in pillar (and possibly elsewhere) and then just decodes them directly on the minion's side. Downside is this requires (again) running through all the data to find unicode to build the mapping.

Confirmed via python cli here:

before_msgpack = {'INT': 1, 'str': 'STRING', 'UNICODE': u'\u2026'}

before_msgpack

{'INT': 1, 'str': 'STRING', 'UNICODE': u'\u2026'}

packed = msgpack.dumps(before_msgpack)
after_msgpack = msgpack.loads(packed)

after_msgpack
{'INT': 1, 'STR': 'STRING', 'UNICODE': '\xe2\x80\xa6'}
@thatch45

This comment has been minimized.

Show comment
Hide comment
@thatch45

thatch45 Jan 26, 2013

Member

I am less then excited about this one, I will contact the message pack guys....

Member

thatch45 commented Jan 26, 2013

I am less then excited about this one, I will contact the message pack guys....

@torhve

This comment has been minimized.

Show comment
Hide comment
@torhve

torhve Feb 28, 2013

Contributor

Would not one solution to this problem be to force everything in salt to to be UTF-8?

This is how it works now:

>>> msgpack.loads(msgpack.dumps([1, u'Ø', 'ascii']))
(1, '\xc3\x98', 'ascii')

With forcing the data would look like this:

 >>> msgpack.loads(msgpack.dumps([1, u'Ø', 'ascii']), encoding='UTF-8')
(1, u'\xd8', u'ascii')
Contributor

torhve commented Feb 28, 2013

Would not one solution to this problem be to force everything in salt to to be UTF-8?

This is how it works now:

>>> msgpack.loads(msgpack.dumps([1, u'Ø', 'ascii']))
(1, '\xc3\x98', 'ascii')

With forcing the data would look like this:

 >>> msgpack.loads(msgpack.dumps([1, u'Ø', 'ascii']), encoding='UTF-8')
(1, u'\xd8', u'ascii')
@torhve

This comment has been minimized.

Show comment
Hide comment
@torhve

torhve Mar 7, 2013

Contributor

Ran into a problem again today with this bug.
@thatch45 is this on your radar?

Contributor

torhve commented Mar 7, 2013

Ran into a problem again today with this bug.
@thatch45 is this on your radar?

@sebw

This comment has been minimized.

Show comment
Hide comment
@sebw

sebw Jul 1, 2013

Contributor

I'm managing my DNS authoritative servers through Salt.

Since July 11 .be domain names can contain accents such as é à è, etc.

The pillar :

dns-public:
  master:
    ns01
  master-ip:
    "x.x.x.x;"
  slave-ip:
    "x.x.x.x;"
  allow-transfer:
    "x.x.x.x;"
  domain:
    - "\u00e9xample.org"

The state :

{% if pillar['dns-public']['master'] == grains['nodename'] %}
{% for domain in pillar['dns-public']['domain'] %}
/var/named/data/{{ domain }}.hosts:
  file:
    - managed
    - source: salt://dns-public/template.hosts
    - user: named
    - group: named
    - mode: 0664
    - template: jinja
    - backup: minion
    - replace: False
    - context:
      domain: {{ domain }}
      serial: {{ 2010123101 }}
    - require:
      - file: /var/named/data
{% endfor %}
{% endif %}

The source :

$ORIGIN .
$TTL 600
{{ domain }}      IN      SOA     ns01.x. dnsadmin.x. (
                        {{ serial }}
                        10800
                        3600
                        604800
                        86400 )
                        NS ns01.x.
                        NS ns02.x.
                        A x.x.x.x
www.{{ domain }}. IN CNAME {{ domain }}.

salt 'ns*' pillar.data

dns-public:
        ----------
        allow-transfer:
            x.x.x.x;
        domain:
            - éxample.org

salt 'ns*' state.highstate :

   State: - file
    Name:      /etc/named/domain.conf
    Function:  managed
        Result:    False
        Comment:   Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/utils/templates.py", line 55, in render_tmpl
    output = render_str(tmplstr, context, tmplpath)
  File "/usr/lib/python2.6/site-packages/salt/utils/templates.py", line 98, in render_jinja_tmpl
    output = jinja_env.from_string(tmplstr).render(**context)
  File "/usr/lib64/python2.6/site-packages/jinja2/environment.py", line 669, in render
    return self.environment.handle_exception(exc_info, True)
  File "<template>", line 11, in top-level template code
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128
Contributor

sebw commented Jul 1, 2013

I'm managing my DNS authoritative servers through Salt.

Since July 11 .be domain names can contain accents such as é à è, etc.

The pillar :

dns-public:
  master:
    ns01
  master-ip:
    "x.x.x.x;"
  slave-ip:
    "x.x.x.x;"
  allow-transfer:
    "x.x.x.x;"
  domain:
    - "\u00e9xample.org"

The state :

{% if pillar['dns-public']['master'] == grains['nodename'] %}
{% for domain in pillar['dns-public']['domain'] %}
/var/named/data/{{ domain }}.hosts:
  file:
    - managed
    - source: salt://dns-public/template.hosts
    - user: named
    - group: named
    - mode: 0664
    - template: jinja
    - backup: minion
    - replace: False
    - context:
      domain: {{ domain }}
      serial: {{ 2010123101 }}
    - require:
      - file: /var/named/data
{% endfor %}
{% endif %}

The source :

$ORIGIN .
$TTL 600
{{ domain }}      IN      SOA     ns01.x. dnsadmin.x. (
                        {{ serial }}
                        10800
                        3600
                        604800
                        86400 )
                        NS ns01.x.
                        NS ns02.x.
                        A x.x.x.x
www.{{ domain }}. IN CNAME {{ domain }}.

salt 'ns*' pillar.data

dns-public:
        ----------
        allow-transfer:
            x.x.x.x;
        domain:
            - éxample.org

salt 'ns*' state.highstate :

   State: - file
    Name:      /etc/named/domain.conf
    Function:  managed
        Result:    False
        Comment:   Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/utils/templates.py", line 55, in render_tmpl
    output = render_str(tmplstr, context, tmplpath)
  File "/usr/lib/python2.6/site-packages/salt/utils/templates.py", line 98, in render_jinja_tmpl
    output = jinja_env.from_string(tmplstr).render(**context)
  File "/usr/lib64/python2.6/site-packages/jinja2/environment.py", line 669, in render
    return self.environment.handle_exception(exc_info, True)
  File "<template>", line 11, in top-level template code
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128
@alexmorozov

This comment has been minimized.

Show comment
Hide comment
@alexmorozov

alexmorozov Oct 31, 2014

Maybe I post to the wrong thread, but this issue comes up whenever I google for pillar and unicode problems. Anyway.
The current workaround (as of salt 2014.01.13) is to use yaml_utf8 setting in master config, and to force unicode conversion of variables, like {{ pillar.unicode.variable.decode('utf-8') }}. At least it works for us.
Hope this helps someone.

Maybe I post to the wrong thread, but this issue comes up whenever I google for pillar and unicode problems. Anyway.
The current workaround (as of salt 2014.01.13) is to use yaml_utf8 setting in master config, and to force unicode conversion of variables, like {{ pillar.unicode.variable.decode('utf-8') }}. At least it works for us.
Hope this helps someone.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 31, 2014

Collaborator

Awesome, thanks for the workaround, @alexmorozov!

Collaborator

basepi commented Oct 31, 2014

Awesome, thanks for the workaround, @alexmorozov!

MrMarvin added a commit to sinnerschrader/salt-formula that referenced this issue Nov 9, 2014

@MrMarvin MrMarvin referenced this issue in saltstack-formulas/salt-formula Nov 9, 2014

Merged

adds `yaml_utf8` option to master config #63

@godymoon

This comment has been minimized.

Show comment
Hide comment
@godymoon

godymoon Oct 10, 2015

so,does the {{ pillar.unicode.variable.decode('utf-8') }} configure works @basepi ? how to configure {{ pillar.unicode.variable.decode('utf-8') }} in sls files? can u show me an example of configuration ? thx a lot @alexmorozov

so,does the {{ pillar.unicode.variable.decode('utf-8') }} configure works @basepi ? how to configure {{ pillar.unicode.variable.decode('utf-8') }} in sls files? can u show me an example of configuration ? thx a lot @alexmorozov

bernieke added a commit to Awingu/salt that referenced this issue Oct 20, 2015

@bernieke

This comment has been minimized.

Show comment
Hide comment
@bernieke

bernieke Oct 20, 2015

Contributor

I've created a pull request fixing this without the need of yaml_utf8 or decode (on top of 2015.8.1 and whatever fixes that already carries.)

Contributor

bernieke commented Oct 20, 2015

I've created a pull request fixing this without the need of yaml_utf8 or decode (on top of 2015.8.1 and whatever fixes that already carries.)

cachedout added a commit that referenced this issue Oct 20, 2015

Merge pull request #28134 from Awingu/2015.8
fix unicode pillar values #3436
@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 20, 2015

Collaborator

Awesome @bernieke! That fix has been merged, so I'm going to close this. It will be in 2015.8.2.

Collaborator

basepi commented Oct 20, 2015

Awesome @bernieke! That fix has been merged, so I'm going to close this. It will be in 2015.8.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment