Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'clean' filter fails on small files #116

Closed
ljm42 opened this issue Feb 27, 2021 · 9 comments
Closed

'clean' filter fails on small files #116

ljm42 opened this issue Feb 27, 2021 · 9 comments

Comments

@ljm42
Copy link
Contributor

ljm42 commented Feb 27, 2021

If you try to encrypt a small file (less than 8 chars) it fails:

# git add <filename>
fatal: <filename>: clean filter 'crypt' failed

The problem is with the read command in git_clean():

read -rn 8 firstbytes <"$tempfile"

One solution is to add an or statement like this:

  read -rn 8 firstbytes <"$tempfile" || firstbytes=X
jmurty added a commit that referenced this issue Mar 1, 2021
The `read -rn 8` command doesn't seem to cope with small files, and
does all kinds of strange things when I try running it manually on
latest macOS to test, including:
- not reading anything at all from files with short first lines
- reading too much from files with long first lines (more than 8 chars)

Overall the `head -c8` approach used everywhere else to do this work
is probably safer, and should work with small files as well.
@jmurty
Copy link
Collaborator

jmurty commented Mar 1, 2021

Hi @ljm42 can you please test the change in PR #117 to see if it fixes the problem with small files?

@ljm42
Copy link
Contributor Author

ljm42 commented Mar 1, 2021

This is great! I really appreciate you looking into this, and for your work on #114 as well!

My dataset includes txz files, which are essentially zips that have been renamed. And apparently they contain null bytes in the first 8 chars, because they cause this line:
firstbytes=$(head -c8 "$tempfile")
to throw this error:
transcrypt: line 134: warning: command substitution: ignored null byte in input
(ignore the line number, I added some debug code to figure out which files were causing the error)

I solved it by converting null bytes to spaces:
firstbytes=$(head -c8 "$tempfile" | tr '\0' ' ')
and now git_clean() works on both small files and zip files.

It looks like all of the encryption unit tests are based on the string My secret content. Would it make sense to include basic tests for other variations such such as a string that is less than 8 characters, and a super small zip file? Just to ensure that these types of files will work in future versions?

@jmurty
Copy link
Collaborator

jmurty commented Apr 26, 2021

Hi @ljm42 I have updated the PR #117 with a couple of extra tests for small files and files with a null byte near the start.

Can you check that the fix-clean-filter-for-small-files branch works for your problem files? If so, I should be able to merge what is a fairly minor change.

@ljm42
Copy link
Contributor Author

ljm42 commented Apr 26, 2021

Sorry, #117 fix-clean-filter-for-small-files branch still throws warnings with txz files (although I think it does actually encrypt them). Here is a test with a small txz file (rather than the string from the new test sh\0x):

# echo "sample text" > secret.txt
# tar cfJ archive.txz secret.txt
# rm secret.txt
# echo "archive.txz filter=crypt diff=crypt merge=crypt" >> .gitattributes
# git add .gitattributes archive.txz
.git/crypt/transcrypt: line 129: warning: command substitution: ignored null byte in input

Running line 129 directly results in the same:

# firstbytes=$(head -c8 archive.txz)
-bash: warning: command substitution: ignored null byte in input

With this modification there are no more warnings:
# firstbytes=$(head -c8 archive.txz | tr '\0' ' ')

I am not sure why the string sh\0x does not throw the same warning

@jmurty
Copy link
Collaborator

jmurty commented Apr 28, 2021

Thanks for the continued test feedback. This one is tricky.

I haven't been able to reproduce the warning message on my Mac, so I Googled and it looks like that warning is new in Bash version 4.4: https://askubuntu.com/questions/926626/how-do-i-fix-warning-command-substitution-ignored-null-byte-in-input

Unfortunately, while adding the simple tr command might work for Bash 4.4 and later it doesn't work for the version on my up-to-date Mac 3.2.57(1):

$ firstbytes=$(head -c8 archive.txz | tr '\0' ' ')
tr: Illegal byte sequence

So it looks like new bash complains about null read by head but not tr, while older bash is fine reading other (non-null) bytes in head but can fail if you pass them to tr depending on your shell's character encoding settings. Specifically for this test case, the byte with octal character code 375 breaks tr for my older bash in my shell with LANG=en_GB.UTF-8

The only workaround I have found so far is this hack with an explicit LC_ALL=C to override any default character encoding settings. It's ugly, but might be the best we can do:

firstbytes=$(head -c8 archive.txz | LC_ALL=C tr -d '\0')

Notice also that tr deletes instead of substituting nulls.

@jmurty
Copy link
Collaborator

jmurty commented Apr 28, 2021

I have applied the LC_ALL=C tr -d '\0' fix on the fix-clean-filter-for-small-files branch, can you please try it again @ljm42?

@ljm42
Copy link
Contributor Author

ljm42 commented May 5, 2021

So sorry for the delayed response! This fix works great for me. I'm impressed with how many variations of bash/etc that you are able to handle!

There are two other instances of "head -c8" in transcrypt, do they need need this as well?

jmurty added a commit that referenced this issue Jan 15, 2022
The `read -rn 8` command doesn't seem to cope with small files, and
does all kinds of strange things (on macOS at least) including:
- not reading anything at all from files with short first lines
- reading too much from files with long first lines (more than 8 chars)

Switch to us `head -c8` instead, which is safer, plus some fiddly
use of `tr` to ignore null bytes early in a way that works for older
and newer versions of bash.
@jmurty
Copy link
Collaborator

jmurty commented Jan 15, 2022

I'm very late getting to this, sorry.

It's possible the other uses head -c8 are problematic, but since you haven't seen any definite problems for the unusual files in this ticket I'm reluctant to change them just in case.

For now I'll leave the minimal change as it is, and finally get it merged to main via #117

@jmurty jmurty closed this as completed Jan 15, 2022
jmurty added a commit that referenced this issue Jan 15, 2022
# Via GitHub
* main:
  Fix handling of small files and files with null in first 8 bytes (#116)
  Improve command hint to fix secret files not encrypted in index (#120) (#130)

# Conflicts:
#	transcrypt
@ljm42
Copy link
Contributor Author

ljm42 commented Jan 27, 2022

Awesome, thank you very much!

jmurty added a commit that referenced this issue Oct 15, 2022
# By James Murty (18) and others
# Via GitHub (1) and James Murty (1)
* main: (26 commits)
  Centralise load and save of password into functions #141
  Fix date of 2.2.0 release
  Ensure tests use "main" as default branch name #143
  Use OpenSSL for B64 encoding not `base64` which differs between Linux and Mac #140
  Use core attributesFile from worktree (#137)
  Document `xxd` requirement, and make optional with OpenSSL < 3 (#138)
  Prepare for 2.2.0 release
  Fix when using OpenSSL 3 which no longer embeds salt in output (#135)
  Consolidate all git operation scripts into a single transcrypt script
  Fix handling of small files and files with null in first 8 bytes (#116)
  Improve command hint to fix secret files not encrypted in index (#120) (#130)
  Remove Ubuntu 16.04 LTS from test matrix (#123)
  Configure default Git branch name for macOS tests in GitHub
  Handle rename of primary branch from "master" to "main"
  Ensure Git index is up-to-date before dirty repo  check #37 (#109)
  Fix incorrect salt when partially staged files are commited (#119)
  Use shorthand for grep options for broader compatibility (#121)
  Let user set a custom path to openssl #108
  Install entire transcrypt script into repository
  Change version to indicate development "pre-release" status
  ...

# Conflicts:
#	README.md
#	tests/_test_helper.bash
#	tests/test_cleanup.bats
#	tests/test_crypt.bats
#	tests/test_init.bats
#	tests/test_not_inited.bats
#	transcrypt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants