Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ new ] detect changes using sha256 rather than timestamps #1535

Merged
merged 2 commits into from
Jun 16, 2021

Conversation

fabianhjr
Copy link
Contributor

@fabianhjr fabianhjr commented Jun 11, 2021

Closes #1519
Fixes #23 (modification time is not stored but read from the filesystem)

@fabianhjr fabianhjr force-pushed the hash-source-files branch from b1ee446 to 3c76ca1 Compare June 11, 2021 04:29
@fabianhjr
Copy link
Contributor Author

I am blocked by Windows, if someone is a Windows dev and could help with the sha256 function it would be appreciated.

@edwinb
Copy link
Collaborator

edwinb commented Jun 11, 2021

Something I often do while working on Idris is touch a file to force it to rebuild. Or, just resave it. This probably isn't a common thing to want to do, but even so, what should I do instead now? I guess remove the relevant ttc?

@fabianhjr
Copy link
Contributor Author

fabianhjr commented Jun 11, 2021

Hi @edwinb, an easier option than hunting down the idris ttc file to might be to add a blank line or space in the file you are working on. I did some testing on my fork and got this time differences:

image
image
Racket Self-hosting:
image
image

Comparing this runs:

Since its intention is to reduce friction to contributing by speeding up the CI (Particularly time to first error and completion time) if it causes more inconvenience on the coding stage it would be counterproductive to the intention.

@andorp
Copy link
Contributor

andorp commented Jun 11, 2021

What if we hide one behind an arg?

@fabianhjr
Copy link
Contributor Author

fabianhjr commented Jun 11, 2021

Yes, a flag is also a possibility.

I have no preference as a default so to keep backwards compatibility the sha256 based one could be the one that needed a toggle. Any preference for a flag? (Eg, -Xcheck-hash)

@benhormann
Copy link
Contributor

Quoting and escaping is different on windows, thanks to cmd.exe. But if the paths are relative you don't need to.

Isn't SHA2 overkill? Would've thought MD5 or even CRC is good for this job.
Only cksum is standard. Even macOS and BSD have different names for SHA. Could upgrade to a FFI call later.

@fabianhjr
Copy link
Contributor Author

Isn't SHA2 overkill? Would've thought MD5 or even CRC is good for this job.

MD5, SHA-1, and SHA-2 (256 to 512) are in the same magnitude of performance: https://automationrhapsody.com/md5-sha-1-sha-256-sha-512-speed-performance/ (In the comments there is even some pointers to get SHA-256 to be faster than MD5 and SHA-1)

I don't want to risk collisions (either accidental or as an intentional attack) so CRC is out.

Only cksum is standard.

sha256sum is on the same standard (corutils) as cksum ( https://www.gnu.org/software/coreutils/manual/coreutils.html#sha2-utilities ) and comes preinstalled on most UNIX based systems. (BSDs, macOS, Linux, etc)

Quoting and escaping is different on windows, thanks to cmd.exe. But if the paths are relative you don't need to.

Oh ok, then it should be easy to fix and even windows has sha256sum preinstalled so it seems like this could get to a mergeable state :3

Could upgrade to a FFI call later.

Yes that would probably help with the slight performance penalty of launching trough a shell and piping console IO. (And the path escaping or not)

@benhormann
Copy link
Contributor

... comes preinstalled on most UNIX based systems. (BSDs, macOS, Linux, etc)

But it doesn't, only Linux and MSYS. Neither is gmp.
Turns out shasum is implemented in Perl on macOS.
I'll pipe down now.

@fabianhjr fabianhjr force-pushed the hash-source-files branch 5 times, most recently from f883375 to 325ac32 Compare June 13, 2021 21:43
@fabianhjr
Copy link
Contributor Author

Ok, this new behavior is now behind an experimental flag (-Xcheck-hashes)

There is still a need for a windows Idris-dev since I don't have a windows machine and removing the escape didn't solve the issue. (I could leave the windows version unimplemented and leave a constant string in place but that would be less than ideal. UnU)

@fabianhjr fabianhjr force-pushed the hash-source-files branch 2 times, most recently from 37fb169 to b7c63b9 Compare June 13, 2021 22:26
@fabianhjr fabianhjr changed the title Use sha256 instead of modification time to determine if module needs rebuilding Implement flag to use sha256 instead of modification time to determine if a module needs rebuilding; use this to speed up CI Jun 14, 2021
@fabianhjr fabianhjr changed the title Implement flag to use sha256 instead of modification time to determine if a module needs rebuilding; use this to speed up CI Implement flag to use sha256 instead of modification time to determine if a module needs rebuilding; use this to speed up CI with a cache Jun 14, 2021
@fabianhjr fabianhjr force-pushed the hash-source-files branch 2 times, most recently from b714332 to a114d09 Compare June 14, 2021 01:23
@benhormann
Copy link
Contributor

TL;DR popen "certutil -hashfile \"" ++ fileName ++ "\""

Example output:

SHA1 hash of file C:\Windows\System32\cmd.exe:
0f 3c 4f f2 8f 35 4a ed e2 02 d5 4e 9d 1c 55 29 a3 bf 87 d8
CertUtil: -hashfile command completed successfully.

Assumes no % in fileName. Can append " SHA256" to upgrade. certutil -hashfile


Or if not quoting, here are my best guesses (untested):

/bin/sh (ref: https://unix.stackexchange.com/a/270979)

escapeUnquotedSh s = pack $ foldr escape [] $ unpack s
  where
    special : List Char
    special = unpack ";<=>?@[\]^`{|}~"
    escape : Char -> List Char -> List Char
    escape c cs = if c < '+' || isInfixOf [c] special
     then '\\' :: c :: cs 
     else c :: cs

/bin/sh

escapeQuotedSh s = pack $ foldr escape [] $ unpack s
  where
    escape : Char -> List Char -> List Char
    escape '\"' cs = '\\' :: '\"' :: cs
    escape '\\' cs = '\\' :: '\\' :: cs
    escape '\$' cs = '\\' :: '\$' :: cs
    escape '\`' cs = '\\' :: '\`' :: cs
    escape c    cs = c :: cs

C:\Windows\System32\cmd.exe (ref: https://ss64.com/nt/syntax-esc.html)

escapeUnquotedCmd s = pack $ foldr escape [] $ unpack s
  where
    escape : Char -> List Char -> List Char
    escape '^' cs = '^' :: '^' :: cs
    escape '%' cs = '^' :: '%' :: cs
    escape '!' cs = '^' :: '!' :: cs
    escape '*' cs = '^' :: '*' :: cs
    escape '?' cs = '^' :: '?' :: cs
    escape ';' cs = '^' :: ';' :: cs
    escape '&' cs = '^' :: '&' :: cs -- illegal in paths
    escape '|' cs = '^' :: '|' :: cs -- illegal in paths
    escape '<' cs = '^' :: '<' :: cs -- illegal in paths
    escape '>' cs = '^' :: '>' :: cs -- illegal in paths
    escape ' ' cs = '^' :: ' ' :: cs
    escape '\n' cs = '^' :: '\n' :: cs
    escape '\t' cs = '^' :: '\t' :: cs
    escape c cs = c :: cs

For paths, could simply drop the illegal chars.

/usr/local/bin/idris2

escapeString s = pack $ foldr escape [] $ unpack s
  where
    escape : Char -> List Char -> List Char
    escape '\"' cs = '\\' :: '\"' :: cs
    escape '\\' cs = '\\' :: '\\' :: cs
    escape c  cs = c :: cs

Or, for Multi-line / Single-line literals:

escapeString ml s = pack $ foldr escape ml [] $ unpack s
  where
    escape : Bool -> Char -> List Char -> List Char
    escape False '\n' cs = '\\' :: 'n'  :: cs
    escape False '\r' cs = '\\' :: 'r'  :: cs
    escape _     '\"' cs = '\\' :: '\"' :: cs
    escape _     '\\' cs = '\\' :: '\\' :: cs
    escape _     c    cs = c :: cs

Do we actually need escapeString?

@fabianhjr
Copy link
Contributor Author

Thanks @benhormann will use what you documented to rewrite the escapes and select the more complete ones for Windows Unquoted and POSIX/*NIX Unquoted.

@benhormann
Copy link
Contributor

@fabianhjr
C:\> vim ab^ cd ef results in three files to edit...
C:\> vim "ab cd" ef for two files.

Escaping doesn't work for args, but does for > redirection. Quoting make more sense for file paths anyway.
Even for sh, not many people are going to use a path with $\"`. It'd break the makefiles too.

@fabianhjr fabianhjr force-pushed the hash-source-files branch 5 times, most recently from e04afaf to 6bed1bf Compare June 14, 2021 21:39
@fabianhjr fabianhjr force-pushed the hash-source-files branch from 6bed1bf to 607117f Compare June 14, 2021 21:52
@fabianhjr fabianhjr marked this pull request as ready for review June 14, 2021 21:58
@fabianhjr fabianhjr changed the title Implement flag to use sha256 instead of modification time to determine if a module needs rebuilding; use this to speed up CI with a cache Implement flag to use sha256 instead of modification time to determine if a module needs rebuilding Jun 14, 2021
@benhormann
Copy link
Contributor

@fabianhjr Escaping doesn't work inside quotes. Remove escapeStringWindows.

fabianhjr and others added 2 commits June 15, 2021 09:41
Allow checking for content hashes instead of modification time to
determine if a module needs rebuilding. This can be toggled with the
`-Xcheck-hashes` flag.

Windows and escaping help by Ben Hormann

Co-authored-by: Ben Hormann <benhormann@users.noreply.github.com>
@fabianhjr fabianhjr force-pushed the hash-source-files branch 2 times, most recently from da9e50a to 7ad045c Compare June 15, 2021 15:23
@fabianhjr
Copy link
Contributor Author

@fabianhjr Escaping doesn't work inside quotes. Remove escapeStringWindows.

Removed, thanks for the help. osEscape now sends Windows to id and other to escapeUnix (\ and " with \ as escape char)

@fabianhjr
Copy link
Contributor Author

fabianhjr commented Jun 15, 2021

Hi @edwinb, @gallais, I would appreciate comment on the high level design ( usage of -X flag or any change that might be desired)

@benhormann
Copy link
Contributor

Any chance of -Xcheck-hashes optionally taking a command name? E.g.

  • idris2 -Xcheck-hashes --build ...
  • idris2 -Xcheck-hashes gsha256sum --build ...

@fabianhjr
Copy link
Contributor Author

fabianhjr commented Jun 16, 2021

Don't think so but on Linux I would use a symbolic link.

I have no way to validate the windows behaviour other than the CI. :C

On the other hand that would be a great PR/contribution to improve the windows experience.

EDIT: If this gets merged I can help you implement it but I don't feel confident increasing the complexity of this PR and keeping it mergable.

@gallais gallais changed the title Implement flag to use sha256 instead of modification time to determine if a module needs rebuilding [ new ] detect changes using sha256 rather than timestamps Jun 16, 2021
@gallais gallais merged commit e0e8403 into idris-lang:master Jun 16, 2021
@fabianhjr
Copy link
Contributor Author

Oh, :o great to have this merged. @benhormann I can open a new PR for the windows certutil option or if you prefer I can help you with that PR. (Both as an explicit option or maybe as a fallback for windows)

@benhormann
Copy link
Contributor

Don't think so but on Linux I would use a symbolic link.

Same for macOS or BSD. Outside of MSYS, you'd find an exe to put on your PATH.

but I don't feel confident increasing the complexity of this PR and keeping it mergable.

Fair call.


R.I.P. ⚰️ 67 /196: Building IdrisPaths (src/IdrisPaths.idr) as a progress indicator. Good job 👍

@fabianhjr fabianhjr deleted the hash-source-files branch June 16, 2021 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants