-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(MODULES-7143) add UTF-8 encoding to ERB templates. #51
Conversation
| installed_path = get_fake_reboot_resource_install_path(usage = :manifest) | ||
|
|
||
| # Manifest | ||
| fake_name = "\u4388\u542B\u3D3C\u7F4D\uF961\u4381\u53F4\u79C0\u3AB2\u8EDE" # 䎈含㴼罍率䎁叴秀㪲軞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all of these characters are the same byte width... can we instead use characters whose width vary?
The standard string we use in most places is
# different UTF-8 widths
# 1-byte A
# 2-byte ۿ - http://www.fileformat.info/info/unicode/char/06ff/index.htm - 0xDB 0xBF / 219 191
# 3-byte ᚠ - http://www.fileformat.info/info/unicode/char/16A0/index.htm - 0xE1 0x9A 0xA0 / 225 154 160
# 4-byte 𠜎 - http://www.fileformat.info/info/unicode/char/2070E/index.htm - 0xF0 0xA0 0x9C 0x8E / 240 160 156 142
MIXED_UTF8 = "A\u06FF\u16A0\u{2070E}" # Aۿᚠ𠜎See https://github.com/puppetlabs/puppet/blob/7e210c334f0231e961b699c663c3efd4b32373bf/spec/integration/faces/config_spec.rb#L13 for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, we can use variable bit width here.
|
|
||
| step 'Verify Results' | ||
| # PuppetFakeResource always overwrites file at this path | ||
| on(agent, "cat /cygdrive/c/#{fake_name}", :acceptable_exit_codes => [0]) do |result| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a Powershell call with EncodedCommand here to avoid sending UTF-8 over SSH?
|
|
||
| step 'Verify Results' | ||
| # if this file exists, 'absent' didn't work | ||
| on(agent, "test -f /cygdrive/c/#{fake_name}", :acceptable_exit_codes => [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a Powershell call with EncodedCommand here to avoid sending UTF-8 over SSH?
f2b9f78
to
e3be3d4
Compare
|
|
||
| step 'Verify Results' | ||
| # if this file exists, 'absent' didn't work | ||
| on(agent, powershell("cat C://#{fake_name}", 'EncodedCommand' => true), :acceptable_exit_codes => [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure cat in PowerShell doesn't set an exit code since its an alias to a cmdlet ... you'll want to make sure the contents are empty and exit the command yourself I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do a file count in the other example above, and verify there are 0 files by the given name.
| step 'Verify Results' | ||
| powershell_cmd = 'dir C:\\ | select Basename' | ||
| on(agent, powershell(powershell_cmd, 'EncodedCommand' => true)) do |result| | ||
| assert_match(/#{fake_name}/, result.stdout, 'File with correct UTF-8 characters was not present.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be giving you a (false?) positive result, but I'm not sure it's quite right. You want to pass the UTF8 over the wire to the host using EncodedCommand => true. Something like this (going from memory, verify that I didn't screw anything up):
powershell_cmd = "dir c:\\ | ? { $_.BaseName -eq '#{fake_name}' } | Measure-Object | Select Count"This guarantees that PowerShell is taking Unicode input and using that for a comparison. Because we can't trust what's coming back from the remote host, then the assertion becomes that you got 1 matching result. Something like:
assert_equal('1', result.stdout, 'File with correct UTF-8 characters was not present.')| agents.each do |agent| | ||
| uninstall_fake_reboot_resource(agent) | ||
| end | ||
| on(agents, "rm -rf /cygdrive/c/#{fake_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to do this a little differently to avoid the UTF-8 problem.
One solutions is to write these files to a randomized directory (ASCII characters only) and recursively delete it.
e3be3d4
to
758238f
Compare
|
|
||
| # Manifest | ||
| file_name = MIXED_UTF8 | ||
| file_path = (0...8).map { (65 + rand(26)).chr }.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ this is fine, but SecureRandom.uuid does roughly the same thing (uuids are limited to [0-9A-F]) ;0
758238f
to
006efe1
Compare
| end | ||
|
|
||
| step 'Verify Results' | ||
| powershell_cmd = "Write-Host (Get-ChildItem C:\\#{file_path} | Measure-Object ).Count;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little pedantic, but if we want to make sure we're matching the file by name we intended to create, we'd want:
(Get-ChildItem C:\\#{file_path} -Filter '#{file_name}' | Measure-Object).CountYou may have run into errors doing something like Get-ChildItem "C:\\#{file_path}\\#{file_name}" but this will avoid that. Examples of executing code:
C:\source\> (Get-ChildItem .\locales\ -Filter 'config.foo' | Measure-Object).Count
0
C:\source\> (Get-ChildItem .\locales\ -Filter 'config.yaml' | Measure-Object).Count
1
With a randomized parent directory, there shouldn't be other files in there (unless there was a subtle test bug or a future test maintainer made a subtly incorrect change). I'm pointing this out for the specificity reason, but also because we don't need to use EncodedCommand with how you have the assert setup anymore. I think the -Filter is preferable, but if you feel strongly about the other, we should yank the EncodedCommand bit since file_path is only alpha-numeric.
| agents.each do |agent| | ||
| uninstall_fake_reboot_resource(agent) | ||
| end | ||
| require "pry"; binding.pry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call the zookeeper... we've got a wild pry on the loose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lulz
006efe1
to
584f3aa
Compare
| step 'Verify Results' | ||
| powershell_cmd = "'Write-Host (Get-ChildItem C:\\#{file_path} -Filter '#{file_name}' | Measure-Object ).Count;'" | ||
| on(agent, powershell(powershell_cmd)) do |result| | ||
| assert_match(/1/, result.stdout, 'File with correct UTF-8 characters was not present.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with multi-line output (in a failing test case)? Do we need to use anchors on the string (or maybe just assert_equal and skip the regex)?
| # if the file count is greater than 0, 'absent' didn't work | ||
| powershell_cmd = "'Write-Host (Get-ChildItem C:\\#{file_path} -Filter '#{file_name}' | Measure-Object ).Count;'" | ||
| on(agent, powershell(powershell_cmd)) do |result| | ||
| assert_match(/0/, result.stdout, 'File with UTF-8 character name was not removed') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regex vs assert_equal
Adds an additional acceptance test creating a file with UTF-8 name.
584f3aa
to
8fc7a41
Compare
This PR adds UTF-8 support for fields inside of custom dsc templates.
It does not support the template custom name itself being in UTF-8, but parameters passed to fields in the template can now be UTF-8.
Also added an integration test for this functionality.