New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2016.11] Salt Cloud: fix loading UTF-8 cached data #45459
[2016.11] Salt Cloud: fix loading UTF-8 cached data #45459
Conversation
@rallytime Please could somebody review this stuff? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you say its the default, however for eases of reading please add encoding='utf-8' to the pack/dump side. The it looks consistent.
However what a bad idea for msgpack not to have the same default for in and out.
salt/utils/cloud.py
Outdated
@@ -88,6 +88,8 @@ | |||
except ImportError: | |||
HAS_GETPASS = False | |||
|
|||
MSGPACK_ENCODING = 'utf-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment required to support international characters in tags. for example.
@damon-atkins Thanks for your review! Yeah, that's very strange design in msgpack as for me too. |
Posted to cloud channel for someone to test this out (saltstackcommunity.slack.com) |
@@ -88,6 +88,10 @@ | |||
except ImportError: | |||
HAS_GETPASS = False | |||
|
|||
# This is required to support international characters in AWS EC2 tags or any | |||
# other kind of metadata provided by particular Cloud vendor. | |||
MSGPACK_ENCODING = 'utf-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use __salt_system_encoding__
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtmanfred I'm not sure, really.
In my understanding, the locale of the system that Salt is running on has nothing to do with any weird characters, which could potentially appear in a Cloud instance metadata. They could be filled in manually, or by another automation system. Anyhow, Salt Cloud should be able to write everything on disk and restore without loosing bits.
I think utf-8
is the best choice here and it seems that msgpack
supports it well by default.
But if you think that it should be defined with __salt_system_encoding__
, we probably need to update the docs that host (or user) running salt-cloud
should have proper locale set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast, if any unicode symbol somehow appears in some instance metadata, and you would not set the correct locale on the Salt host, the salt-cloud
would entirely fail with UnicodeDecodeError
exception.
That's the case when it's really hard to figure out what you should do with that.
@gtmanfred Please just let me know what do think would be the better way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the systems locale is not setup to handle utf-8, we shouldn't handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. That makes sense. I will follow up with documentation update for Salt Cloud to cover necessary locale setup.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf-8 is correct as this is what is coming from AWS. __salt_system_encoding__
is wrong as it could be anything and does not represent the results of an AWS REST API. The data must be stored in a Unicode compatible format so it can be store in variables within Salt as Unicode.
The salt cli is responsible to convert the output of any salt-cloud command to __salt_system_encoding__
from internal Unicode variables. (I suspect PY2 & PY3 already do this in the log library)
@terminalmage will be adding code to convert cli arguments to Unicode (PY2 only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damon-atkins Thank you for your support! I am not very familiar with Salt internals regarding of encodings stuff, so wasn't sure how the proper fix should be done.
Indeed, the Cloud API sends node data in unicode, and Salt just works with it pretty good. The only thing being broken is reading data from Salt's own cache if enabled.
@gtmanfred So please consider to accept this PR as is.
Rebased against latest changes in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not feel good about not using __salt_system_encoding__
, because you could end up with data coming back into salt that cannot be used because the system encoding does not support it.
If people want to be able to use this data, they need to set their encoding to utf-8 anyway, otherwise it has a chance to break somewhere else, that won't be obvious what is at fault.
If you look at fileservers we use this for s3fs
when pulling down possible unicode data.
It is fine if someone else wants to overrule me, but i think using the system encoding is correct.
hi @terminalmage could you drop by this PR. |
I am in agreement with @gtmanfred on this one. |
So if a Chines web pages is loaded, you wish to store it as US ASCII and expect to get the same contents back. mmm. So if tags are mix of Chines, English, Japanese. And the salt-master (or cli) is LANG=C ( |
A recent update from |
Can you provide me a link to where it actually says that? Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone that comes by later to see this, here is the actual line in the msgpack page.
https://github.com/msgpack/msgpack-python/blob/master/README.rst#deprecating-encoding-option
Since it is recommended by msgpack to use unicode.
Daniel
Thank you guys! |
What does this PR do?
It makes sure that msgpack loads previously cached cloud event data in UTF-8. According to the API it always dumps the data in the
utf-8
encoding, but reads as raw (read ASCII) by default.https://msgpack-python.readthedocs.io/en/latest/api.html#msgpack.Packer
https://msgpack-python.readthedocs.io/en/latest/api.html#msgpack.Unpacker
What issues does this PR fix or reference?
Should finally fix #20823.
Previous Behavior
Salt Cloud was unable to compare cached data which contains unicode symbols with actual data read from an event.
New Behavior
Salt Cloud successfully loads and works with cached data stored in UTF-8.
Tests written?
No
Commits signed with GPG?
Yes