Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(MODULES-7143) add UTF-8 encoding to ERB templates. #51

Merged
merged 1 commit into from
Jun 7, 2018

Conversation

ThoughtCrhyme
Copy link
Contributor

This PR adds UTF-8 support for fields inside of custom dsc templates.

It does not support the template custom name itself being in UTF-8, but parameters passed to fields in the template can now be UTF-8.

Also added an integration test for this functionality.

installed_path = get_fake_reboot_resource_install_path(usage = :manifest)

# Manifest
fake_name = "\u4388\u542B\u3D3C\u7F4D\uF961\u4381\u53F4\u79C0\u3AB2\u8EDE" # 䎈含㴼罍率䎁叴秀㪲軞
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all of these characters are the same byte width... can we instead use characters whose width vary?

The standard string we use in most places is

# different UTF-8 widths
  # 1-byte A
  # 2-byte ۿ - http://www.fileformat.info/info/unicode/char/06ff/index.htm - 0xDB 0xBF / 219 191
  # 3-byte ᚠ - http://www.fileformat.info/info/unicode/char/16A0/index.htm - 0xE1 0x9A 0xA0 / 225 154 160
  # 4-byte 𠜎 - http://www.fileformat.info/info/unicode/char/2070E/index.htm - 0xF0 0xA0 0x9C 0x8E / 240 160 156 142
  MIXED_UTF8 = "A\u06FF\u16A0\u{2070E}" # Aۿᚠ𠜎

See https://github.com/puppetlabs/puppet/blob/7e210c334f0231e961b699c663c3efd4b32373bf/spec/integration/faces/config_spec.rb#L13 for an example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we can use variable bit width here.


step 'Verify Results'
# PuppetFakeResource always overwrites file at this path
on(agent, "cat /cygdrive/c/#{fake_name}", :acceptable_exit_codes => [0]) do |result|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a Powershell call with EncodedCommand here to avoid sending UTF-8 over SSH?


step 'Verify Results'
# if this file exists, 'absent' didn't work
on(agent, "test -f /cygdrive/c/#{fake_name}", :acceptable_exit_codes => [1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a Powershell call with EncodedCommand here to avoid sending UTF-8 over SSH?

@ThoughtCrhyme ThoughtCrhyme force-pushed the MODULES-7143 branch 2 times, most recently from f2b9f78 to e3be3d4 Compare June 4, 2018 22:09

step 'Verify Results'
# if this file exists, 'absent' didn't work
on(agent, powershell("cat C://#{fake_name}", 'EncodedCommand' => true), :acceptable_exit_codes => [1])
Copy link
Contributor

@Iristyle Iristyle Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure cat in PowerShell doesn't set an exit code since its an alias to a cmdlet ... you'll want to make sure the contents are empty and exit the command yourself I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also do a file count in the other example above, and verify there are 0 files by the given name.

step 'Verify Results'
powershell_cmd = 'dir C:\\ | select Basename'
on(agent, powershell(powershell_cmd, 'EncodedCommand' => true)) do |result|
assert_match(/#{fake_name}/, result.stdout, 'File with correct UTF-8 characters was not present.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be giving you a (false?) positive result, but I'm not sure it's quite right. You want to pass the UTF8 over the wire to the host using EncodedCommand => true. Something like this (going from memory, verify that I didn't screw anything up):

powershell_cmd = "dir c:\\ | ? { $_.BaseName -eq '#{fake_name}' } | Measure-Object | Select Count"

This guarantees that PowerShell is taking Unicode input and using that for a comparison. Because we can't trust what's coming back from the remote host, then the assertion becomes that you got 1 matching result. Something like:

assert_equal('1', result.stdout, 'File with correct UTF-8 characters was not present.')

agents.each do |agent|
uninstall_fake_reboot_resource(agent)
end
on(agents, "rm -rf /cygdrive/c/#{fake_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to do this a little differently to avoid the UTF-8 problem.

One solutions is to write these files to a randomized directory (ASCII characters only) and recursively delete it.


# Manifest
file_name = MIXED_UTF8
file_path = (0...8).map { (65 + rand(26)).chr }.join
Copy link
Contributor

@Iristyle Iristyle Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this is fine, but SecureRandom.uuid does roughly the same thing (uuids are limited to [0-9A-F]) ;0

end

step 'Verify Results'
powershell_cmd = "Write-Host (Get-ChildItem C:\\#{file_path} | Measure-Object ).Count;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little pedantic, but if we want to make sure we're matching the file by name we intended to create, we'd want:

(Get-ChildItem C:\\#{file_path} -Filter '#{file_name}' | Measure-Object).Count

You may have run into errors doing something like Get-ChildItem "C:\\#{file_path}\\#{file_name}" but this will avoid that. Examples of executing code:

C:\source\> (Get-ChildItem .\locales\ -Filter 'config.foo' | Measure-Object).Count
0

C:\source\> (Get-ChildItem .\locales\ -Filter 'config.yaml' | Measure-Object).Count
1

With a randomized parent directory, there shouldn't be other files in there (unless there was a subtle test bug or a future test maintainer made a subtly incorrect change). I'm pointing this out for the specificity reason, but also because we don't need to use EncodedCommand with how you have the assert setup anymore. I think the -Filter is preferable, but if you feel strongly about the other, we should yank the EncodedCommand bit since file_path is only alpha-numeric.

agents.each do |agent|
uninstall_fake_reboot_resource(agent)
end
require "pry"; binding.pry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call the zookeeper... we've got a wild pry on the loose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lulz

step 'Verify Results'
powershell_cmd = "'Write-Host (Get-ChildItem C:\\#{file_path} -Filter '#{file_name}' | Measure-Object ).Count;'"
on(agent, powershell(powershell_cmd)) do |result|
assert_match(/1/, result.stdout, 'File with correct UTF-8 characters was not present.')
Copy link
Contributor

@Iristyle Iristyle Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with multi-line output (in a failing test case)? Do we need to use anchors on the string (or maybe just assert_equal and skip the regex)?

# if the file count is greater than 0, 'absent' didn't work
powershell_cmd = "'Write-Host (Get-ChildItem C:\\#{file_path} -Filter '#{file_name}' | Measure-Object ).Count;'"
on(agent, powershell(powershell_cmd)) do |result|
assert_match(/0/, result.stdout, 'File with UTF-8 character name was not removed')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here regex vs assert_equal

Adds an additional acceptance test creating a file with UTF-8 name.
@ThoughtCrhyme ThoughtCrhyme merged commit efbbe8b into puppetlabs:master Jun 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants