Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2016.11] Salt Cloud: fix loading UTF-8 cached data #45459

Merged
merged 2 commits into from Jan 29, 2018
Merged

[2016.11] Salt Cloud: fix loading UTF-8 cached data #45459

merged 2 commits into from Jan 29, 2018

Conversation

vutny
Copy link
Contributor

@vutny vutny commented Jan 16, 2018

What does this PR do?

It makes sure that msgpack loads previously cached cloud event data in UTF-8. According to the API it always dumps the data in the utf-8 encoding, but reads as raw (read ASCII) by default.
https://msgpack-python.readthedocs.io/en/latest/api.html#msgpack.Packer
https://msgpack-python.readthedocs.io/en/latest/api.html#msgpack.Unpacker

What issues does this PR fix or reference?

Should finally fix #20823.

Previous Behavior

Salt Cloud was unable to compare cached data which contains unicode symbols with actual data read from an event.

New Behavior

Salt Cloud successfully loads and works with cached data stored in UTF-8.

Tests written?

No

Commits signed with GPG?

Yes

@vutny
Copy link
Contributor Author

vutny commented Jan 18, 2018

@rallytime Please could somebody review this stuff?

Copy link
Contributor

@damon-atkins damon-atkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you say its the default, however for eases of reading please add encoding='utf-8' to the pack/dump side. The it looks consistent.

However what a bad idea for msgpack not to have the same default for in and out.

@@ -88,6 +88,8 @@
except ImportError:
HAS_GETPASS = False

MSGPACK_ENCODING = 'utf-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment required to support international characters in tags. for example.

@vutny
Copy link
Contributor Author

vutny commented Jan 18, 2018

@damon-atkins Thanks for your review! Yeah, that's very strange design in msgpack as for me too.

@damon-atkins
Copy link
Contributor

Posted to cloud channel for someone to test this out (saltstackcommunity.slack.com)

@rallytime rallytime requested a review from a team January 18, 2018 15:23
@@ -88,6 +88,10 @@
except ImportError:
HAS_GETPASS = False

# This is required to support international characters in AWS EC2 tags or any
# other kind of metadata provided by particular Cloud vendor.
MSGPACK_ENCODING = 'utf-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use __salt_system_encoding__ instead?

Copy link
Contributor Author

@vutny vutny Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtmanfred I'm not sure, really.

In my understanding, the locale of the system that Salt is running on has nothing to do with any weird characters, which could potentially appear in a Cloud instance metadata. They could be filled in manually, or by another automation system. Anyhow, Salt Cloud should be able to write everything on disk and restore without loosing bits.
I think utf-8 is the best choice here and it seems that msgpack supports it well by default.

But if you think that it should be defined with __salt_system_encoding__, we probably need to update the docs that host (or user) running salt-cloud should have proper locale set up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast, if any unicode symbol somehow appears in some instance metadata, and you would not set the correct locale on the Salt host, the salt-cloud would entirely fail with UnicodeDecodeError exception.
That's the case when it's really hard to figure out what you should do with that.

@gtmanfred Please just let me know what do think would be the better way to handle this?

Copy link
Contributor

@gtmanfred gtmanfred Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the systems locale is not setup to handle utf-8, we shouldn't handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. That makes sense. I will follow up with documentation update for Salt Cloud to cover necessary locale setup.

Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf-8 is correct as this is what is coming from AWS. __salt_system_encoding__ is wrong as it could be anything and does not represent the results of an AWS REST API. The data must be stored in a Unicode compatible format so it can be store in variables within Salt as Unicode.

The salt cli is responsible to convert the output of any salt-cloud command to __salt_system_encoding__ from internal Unicode variables. (I suspect PY2 & PY3 already do this in the log library)

@terminalmage will be adding code to convert cli arguments to Unicode (PY2 only)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damon-atkins Thank you for your support! I am not very familiar with Salt internals regarding of encodings stuff, so wasn't sure how the proper fix should be done.
Indeed, the Cloud API sends node data in unicode, and Salt just works with it pretty good. The only thing being broken is reading data from Salt's own cache if enabled.

@gtmanfred So please consider to accept this PR as is.

@vutny
Copy link
Contributor Author

vutny commented Jan 22, 2018

Rebased against latest changes in the 2016.11 branch.

Copy link
Contributor

@gtmanfred gtmanfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not feel good about not using __salt_system_encoding__, because you could end up with data coming back into salt that cannot be used because the system encoding does not support it.

If people want to be able to use this data, they need to set their encoding to utf-8 anyway, otherwise it has a chance to break somewhere else, that won't be obvious what is at fault.

If you look at fileservers we use this for s3fs when pulling down possible unicode data.

It is fine if someone else wants to overrule me, but i think using the system encoding is correct.

@damon-atkins
Copy link
Contributor

hi @terminalmage could you drop by this PR.

@rallytime
Copy link
Contributor

I am in agreement with @gtmanfred on this one.

@damon-atkins
Copy link
Contributor

damon-atkins commented Jan 26, 2018

So if a Chines web pages is loaded, you wish to store it as US ASCII and expect to get the same contents back. mmm.

So if tags are mix of Chines, English, Japanese. And the salt-master (or cli) is LANG=C (__salt_system_encoding__) and you expect it to stored ok and be retrieved ok.

@damon-atkins
Copy link
Contributor

A recent update from
https://github.com/msgpack/msgpack-python
Deprecating encoding option encoding and unicode_errors options are deprecated.
In case of packer, use UTF-8 always. Storing other than UTF-8 is not recommended.

@gtmanfred
Copy link
Contributor

Can you provide me a link to where it actually says that?

Thanks,
Daniel

@damon-atkins
Copy link
Contributor

Copy link
Contributor

@gtmanfred gtmanfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone that comes by later to see this, here is the actual line in the msgpack page.

https://github.com/msgpack/msgpack-python/blob/master/README.rst#deprecating-encoding-option

Since it is recommended by msgpack to use unicode.

Daniel

@vutny
Copy link
Contributor Author

vutny commented Jan 29, 2018

Thank you guys!

@rallytime rallytime merged commit cb6ce37 into saltstack:2016.11 Jan 29, 2018
@vutny vutny mentioned this pull request Jan 29, 2018
@vutny vutny deleted the salt-cloud-fix-loading-utf-cache branch January 29, 2018 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants