New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_encoding salt util #47581

Merged
merged 9 commits into from May 25, 2018

Conversation

Projects
None yet
4 participants
@twangboy
Contributor

twangboy commented May 9, 2018

What does this PR do?

Created the get_encoding function salt.util.files to try to detect the encoding of the file.
Uses io.open to open the file with the proper encoding.
Writes the lines to the new file using the proper encoding.

What issues does this PR fix or reference?

#47274

Previous Behavior

Some files were being incorrectly detected as binary files by the is_binary function (utf-16 encoded files for example). Additionally, once they were properly detected as not binary, the read line function would incorreclty parse the lines ignoring a trailing null byte (\x00) causing the file to become corrupted.

New Behavior

Detects the encoding of the file and opens the file with the detected encoding so that the read line function works correctly.

Tests written?

No

Commits signed with GPG?

Yes

@cachedout cachedout requested review from terminalmage and damon-atkins May 10, 2018

@damon-atkins

This comment has been minimized.

Member

damon-atkins commented May 10, 2018

I suggest a function which returns the file information.
e.g.

{
open_type: <can be used by open>
type: <text string>
eol: <end of line type>
size: <int>
}

Or salt/utils/files contains a "open" class which manages files open, backups, read, write, eol char, etc. It just takes care of it all.

@cachedout cachedout changed the title from Add is_encoding salt util to Add is_encoding salt util [WIP] May 10, 2018

@rallytime

This comment has been minimized.

Contributor

rallytime commented May 15, 2018

@twangboy This is causing the following tests to fail:

  • unit.modules.test_file.FileBlockReplaceTestCase.test_backup
  • unit.modules.test_file.FileBlockReplaceTestCase.test_dry_run
  • unit.modules.test_file.FileBlockReplaceTestCase.test_no_modifications
  • unit.modules.test_file.FileBlockReplaceTestCase.test_replace_append
  • unit.modules.test_file.FileBlockReplaceTestCase.test_replace_append_newline_at_eof
  • unit.modules.test_file.FileBlockReplaceTestCase.test_replace_multiline
  • unit.modules.test_file.FileBlockReplaceTestCase.test_replace_partial_marked_lines
  • unit.modules.test_file.FileBlockReplaceTestCase.test_replace_prepend
  • unit.modules.test_file.FileBlockReplaceTestCase.test_show_changes
  • unit.modules.test_file.FileBlockReplaceTestCase.test_unfinished_block_exception
Args:
fp_ (pointer): A pointer to the file to check

This comment has been minimized.

@terminalmage

terminalmage May 16, 2018

Member

Shouldn't the arg name be path here? Also, there is no such type in Python as pointer, this should be str instead.

bool: True if successful, otherwise False
'''
if not os.path.isfile(path):
return False

This comment has been minimized.

@terminalmage

terminalmage May 16, 2018

Member

I feel like we should probably be attempting to read from the file and simply raising the exception here, but it looks like is_binary, et al have already established the pattern that we just return False in these cases, so I guess this is OK.

data = fp_.read(2048)
data.decode(encoding)
except UnicodeDecodeError:
return True

This comment has been minimized.

@terminalmage

terminalmage May 16, 2018

Member

Shouldn't False be returned here? The decode is failing, so that would suggest that the file is not in the specified encoding.

@twangboy twangboy changed the title from Add is_encoding salt util [WIP] to Add get_encoding salt util [WIP] May 16, 2018

@twangboy twangboy changed the title from Add get_encoding salt util [WIP] to Add get_encoding salt util May 16, 2018

twangboy added some commits May 9, 2018

Add is_encoding salt util
Use it to detect encoding in file.blockreplace
Remove to_encoding, create get_encoding
Use io.open so you can pass an encoding
@twangboy

This comment has been minimized.

Contributor

twangboy commented May 16, 2018

Rebased

@twangboy twangboy force-pushed the twangboy:fix_47274 branch from 51ba145 to c0f735d May 16, 2018

@damon-atkins

This comment has been minimized.

Member

damon-atkins commented May 16, 2018

Can not see any issue with the new function.

However the reason I suggest a function to return several bits of information. Is so the file examined once. For example to determine EOL the last part of the file is best place to read (e.g. last two characters of the file). If the file is under 2k then last part of the file has already been read.

{
open_type: <can be used by open>
type: <text string>
eol: <end of line type>
size: <int>
etc
}
@twangboy

This comment has been minimized.

Contributor

twangboy commented May 16, 2018

@damon-atkins Could you explain what you mean by open_type and type?

@twangboy

This comment has been minimized.

Contributor

twangboy commented May 16, 2018

The following tests are failing:
integration.states.test_file.BlockreplaceTest.test_append_auto_line_separator
integration.states.test_file.BlockreplaceTest.test_prepend_auto_line_separator
unit.utils.test_schedule.ScheduleTestCase.test_eval_schedule_cron_splay
unit.utils.test_schedule.ScheduleTestCase.test_eval_schedule_cron
unit.modules.test_file.FileBlockReplaceTestCase.test_backup
unit.modules.test_file.FileBlockReplaceTestCase.test_dry_run
unit.modules.test_file.FileBlockReplaceTestCase.test_no_modifications
unit.modules.test_file.FileBlockReplaceTestCase.test_replace_append
unit.modules.test_file.FileBlockReplaceTestCase.test_replace_append_newline_at_eof
unit.modules.test_file.FileBlockReplaceTestCase.test_replace_multiline
unit.modules.test_file.FileBlockReplaceTestCase.test_replace_partial_marked_lines
unit.modules.test_file.FileBlockReplaceTestCase.test_replace_prepend
unit.modules.test_file.FileBlockReplaceTestCase.test_show_changes
unit.modules.test_file.FileBlockReplaceTestCase.test_unfinished_block_exception

@damon-atkins

This comment has been minimized.

Member

damon-atkins commented May 17, 2018

Could you explain what you mean by open_type and type?

Basically open_type information on how python open should open the file, or params for python open to use. e.g. open in text or binary mode or some thing else. type would be binary or text or text-bom or something along these lines.

The orginal orginal idea I had was to have a utils.filemgt (manage files) class.

  • opens the file and determines determines its characteristics, and cache this info. e.g. EOL etc. So it ensures the file structure is not changes i.e. if it has a BOM head the BOM head is kept even on Unix/Linux
  • manages disk full issues
  • manages not replacing the file if it has not changed
  • keeps a record of all files open for change but not changed
  • keeps a record of all files changed.
  • backup files if they are changed into backup location.
  • checksum function which if called checksum is cached so no need to read the file again if some other part of salt wants the information.
  • determines the best buffer size for the file, based on its size.
  • determines if to use memmap if its small file.
  • when the class close a temp file for update, it replaces target file if its different.
  • support "change" over many calls. Example would be sed like calls, or many append like calls.
  • etc.
    i.e. try to pull most of the logic out of all the modules and states around dealing with different files so users of the class do not care about the structure of the text file as its taken care of it, for them.
@twangboy

This comment has been minimized.

Contributor

twangboy commented May 17, 2018

I think that's a cool idea, but it's a major change and doesn't relate to this issue directly.

@twangboy

This comment has been minimized.

Contributor

twangboy commented May 18, 2018

Now the tests are just failing on Py3... I'll start digging

@rallytime rallytime requested a review from terminalmage May 21, 2018

]
for _encoding, bom in boms:
if _data.startswith(bom):
log.debug('Found BOM for {0}'.format(_encoding))

This comment has been minimized.

@terminalmage

terminalmage May 21, 2018

Member

We should be using the %s string substitution in log messages:

log.debug('Found BOM for %s', _encoding)
'__utils__': {'files.is_text': MagicMock(return_value=True)},
'__utils__': {
'files.is_binary': MagicMock(return_value=False),
'files.get_encoding': MagicMock(return_value=None)

This comment has been minimized.

@terminalmage

terminalmage May 21, 2018

Member

You're mocking thiis to return None, but that function never actually returns None, might that cause problems in terms of the accuracy of our testing?

In addition, I'm not sure it's a great idea to mock these functions at the module level, rather than just patching in individual tests, unless a given function is used in a lot of tests and needs to be consistently mocked with the same result.

@rallytime rallytime merged commit da9eaa1 into saltstack:2018.3 May 25, 2018

5 of 9 checks passed

jenkins/PR/salt-pr-rs-cent7-n Pull Requests » Salt PR - RS CentOS 7 #19301 — ABORTED
Details
default Build finished.
Details
jenkins/PR/salt-pr-linode-cent7-py3 Pull Requests » Salt PR - Linode CentOS 7 - PY3 #5245 — FAILURE
Details
jenkins/PR/salt-pr-linode-ubuntu16-py3 Pull Requests » Salt PR - Linode Ubuntu16.04 - PY3 #10215 — FAILURE
Details
WIP ready for review
Details
jenkins/PR/salt-pr-clone Pull Requests » Salt PR - Clone #25435 — SUCCESS
Details
jenkins/PR/salt-pr-docs-n Pull Requests » Salt PR - Docs #17523 — SUCCESS
Details
jenkins/PR/salt-pr-linode-ubuntu14-n Pull Requests » Salt PR - Linode Ubuntu14.04 #23179 — SUCCESS
Details
jenkins/PR/salt-pr-lint-n Pull Requests » Salt PR - Code Lint #22153 — SUCCESS
Details

@twangboy twangboy deleted the twangboy:fix_47274 branch May 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment