Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp (and copyFile) doesn't work on M1 macs when it overwrites a binary (upstream mac issue) #17913

Open
jyapayne opened this issue May 1, 2021 · 28 comments

Comments

@jyapayne
Copy link
Contributor

jyapayne commented May 1, 2021

cc: @timotheecour

I pulled the latest from Nim devel this morning and tried to run build_all.sh like usual. This resulted in the failure below, as well as a broken nim binary, resulting in the inability to run nim at all.

Example

> ./build_all.sh
./build_all.sh: line 2: rem: command not found
bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff exists.

cp bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff bin/nim

bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff -v
Nim Compiler Version 1.0.11 [MacOSX: arm64]
Compiled at 2020-12-21
Copyright (c) 2006-2019 by Andreas Rumpf

git hash: 19440baa807bbda58290ac9d491c9aa8a2bea2fa
active boot switches: -d:release -d:danger

bin/nim c --skipUserCfg --skipParentCfg --hints:off koch
ci/funs.sh: line 6: 10835 Killed: 9               "$@"

Now, when I run nim --version I get:

> nim --version
Killed: 9

And I cannot run nim at all. I have to build nim from csources, then manually run koch to build.

Additional information

$ nim -v
Killed: 9
@timotheecour
Copy link
Member

timotheecour commented May 1, 2021

that's odd, I had tested it and it seemed to work IIRC, but there is a typo which I'm fixing in #17915 but which doesn't seem related to ci/funs.sh: line 6: 10835 Killed: 9 "$@" failing
(because set -e appears after the typo)

can you please test #17915 (or simply manually change build_all.sh in your clone and re-run build_all.sh) and report whether it now works, and if not what is the output?

please also report relevant context information so I get a better understanding, eg:

from the error reported it seems like it's failing at "$@" right after echo "$@" when evaluating echo_run bin/nim c --skipUserCfg --skipParentCfg --hints:off koch, but I'm not (yet) seeing how it relates to #17899

echo_run () {
  # echo's a command before running it, which helps understanding logs
  echo ""
  echo "$@"
  "$@"
}

if none of the above, work, can you try replacing
echo_run bin/nim c --skipUserCfg --skipParentCfg --hints:off koch
by
bin/nim c --skipUserCfg --skipParentCfg --hints:off koch
(ditto in rest of build_all.sh)
and see whether it makes it work?

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

  • MacOS Big Sur 11.3 on M1 arch
  • which nim -> /Users/joey/Nim/bin/nim
  • not using choosenim, building from source
  • sh build_all.sh vs ./build_all.sh makes no difference
  • I'm using bash
  • git checkout 1f1d85b^ -> It also does not work. Guess it's not that commit, my bad.
> bash --version
GNU bash, version 3.2.57(1)-release (arm64-apple-darwin20)
Copyright (C) 2007 Free Software Foundation, Inc.

Hmm, it looks like if I do the following on latest devel, the error can be reproduced, at least on my machine:

> git clone https://github.com/nim-lang/Nim
> cd Nim
> sh build_all.sh
> # works first time, wait to be done
> sh build_all.sh
bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff exists.

cp bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff bin/nim

bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff -v
Nim Compiler Version 1.0.11 [MacOSX: arm64]
Compiled at 2020-12-21
Copyright (c) 2006-2019 by Andreas Rumpf

git hash: 19440baa807bbda58290ac9d491c9aa8a2bea2fa
active boot switches: -d:release -d:danger

bin/nim c --skipUserCfg --skipParentCfg --hints:off koch
ci/funs.sh: line 6: 27277 Killed: 9

Using this information, I will try to do a bisect to confirm which commit introduced the error.

One interesting thing to note:

> ./bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff --version
Nim Compiler Version 1.0.11 [MacOSX: arm64]
Compiled at 2020-12-21
Copyright (c) 2006-2019 by Andreas Rumpf

git hash: 19440baa807bbda58290ac9d491c9aa8a2bea2fa
active boot switches: -d:release -d:danger

> cp ./bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff ./bin/nim
> ./bin/nim --version
Killed: 9

Somehow copying the binary makes it not work? But then if I do this:

> rm ./bin/nim
> cp ./bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff ./bin/nim
> ./bin/nim --version
Nim Compiler Version 1.0.11 [MacOSX: arm64]
Compiled at 2020-12-21
Copyright (c) 2006-2019 by Andreas Rumpf

git hash: 19440baa807bbda58290ac9d491c9aa8a2bea2fa
active boot switches: -d:release -d:danger

It works. Inserting this line in funs.sh makes it work every time for me:

    echo_run rm -f bin/nim # This line inserted
    echo_run cp $nim_csources bin/nim
    echo_run $nim_csources -v

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

This is the PR that breaks it for me:

#17815

@timotheecour
Copy link
Member

timotheecour commented May 1, 2021

@jyapayne thanks, can you please:

  • try refs #17913 workaround for build_all.sh on M1 mac with broken(?) cp #17917 that i just sent out, (and make sure to run build_all.sh twice as you did above to make sure re-running keeps working)
  • independently of this hotfix, I want to understand why rm -f bin/nim would be needed in the 1st place to avoid masking a potentially different underlying issue, what's the you'd get if you replace then end of nimBuildCsourcesIfNeeded in ci/funs.sh with:
    # echo_run rm -f bin/nim # make sure this is commented
    echo_run \ls -al bin/nim
    echo_run cp $nim_csources bin/nim
    echo_run \ls -al $nim_csources
    echo_run \ls -al bin/nim
    echo_run $nim_csources -v
    echo_run bin/nim -v

for the 1st and 2nd run of build_all.sh?

(to see if somehow cp behaves differently on M1 mac especially in terms of permissions, etc)

other things to try: echo_run diff $nim_csources bin/nim

  • instead of echo_run rm -f bin/nim, would this work? (again, with re-running build_all.sh twice for sanity check):
    echo_run cp -f $nim_csources bin/nim
    echo_run $nim_csources -v

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

I can confirm #17917 works for me.

Replacing the lines in ci/funs.sh gets me this output on the second run:

> sh build_all.sh
bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff exists.

ls -al bin/nim
-rwxr-xr-x  1 joey  staff  5047520 May  1 14:01 bin/nim

cp bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff bin/nim

ls -al bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff
-rwxr-xr-x  1 joey  staff  2899472 May  1 11:54 bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff

ls -al bin/nim
-rwxr-xr-x  1 joey  staff  2899472 May  1 14:01 bin/nim

bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff -v
Nim Compiler Version 1.0.11 [MacOSX: arm64]
Compiled at 2020-12-21
Copyright (c) 2006-2019 by Andreas Rumpf

git hash: 19440baa807bbda58290ac9d491c9aa8a2bea2fa
active boot switches: -d:release -d:danger

bin/nim -v
ci/funs.sh: line 6: 43416 Killed: 9               "$@"

Running diff on the two files shows no difference. They are equivalent but somehow not.

instead of echo_run rm -f bin/nim, would this work? (again, with re-running build_all.sh twice for sanity check):

It does not work with cp -f either. For a sanity check, I tried copying the bin/nimble binary to bin/nimble2 and then copying over it again. I can confirm that the same behavior is exhibited. This is the weirdest thing ever.

> cp ./bin/nimble ./bin/nimble2
> ./bin/nimble2 --version
nimble v0.13.1 compiled at 2021-05-01 20:05:12
git hash: d13f3b8ce288b4dc8c34c219a4e050aaeaf43fc9
> cp ./bin/nimble ./bin/nimble2
> ./bin/nimble2
Killed: 9

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

Hmm, tried with a different binary on my system (git), but I don't get the same behavior.

@timotheecour
Copy link
Member

timotheecour commented May 1, 2021

this sounds like a bug with either M1 mac's (unlikely) or maybe your particular configuration, but let's keep digging; what is:

  • which cp ?
  • are any of the weird files involved symbolic links (looks like not based on ls output)
  • can you try replacing echo_run cmd by cmd to remove this out of the equation and make sure echo_run itself isn't related
  • try adding -v to cp for verbose output (unlikely to help much but...)
  • try also cp -p, see docs: -p Cause cp to preserve the following attributes...
  • can you run lldb -- ./bin/nimble2 or use gdb?

this would probably help diagnose:

right before the cmd that fails (eg bin/nim -v)

echo_run file bin/nim
echo_run file $nim_csources

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

Just tried it on my work M1 macbook, with the same results, so I don't think it's local config as my work macbook has never had Nim on it before and is mostly vanilla in terms of command line stuff.

> which cp
/bin/cp

No weird symlinks. Everything in ./bin is a binary.

Replacing echo_run cmd with cmd has no effect. Same result.

Adding -v just specifies what the file was renamed to. Does nothing.

cp -p also does nothing.

lldb on failed binary:

> lldb -- ./bin/nim
(lldb) target create "./bin/nim"
Killed: 9

Running file on both files before the failed command:

file bin/nim
bin/nim: Mach-O 64-bit executable arm64

file bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff
bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff: Mach-O 64-bit executable arm64

@timotheecour
Copy link
Member

timotheecour commented May 1, 2021

very strange indeed.
how about testing the hypothesis that cp is broken and trying this (please adapt as needed):

nim c fakecp.nim # but use a working version of nim of course for this

# fakecp.nim:
import std/os
copyFile("bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff", "bin/nim")
# EDIT: maybe it should use `copyFileWithPermissions` instead
# instead of `cp $nim_csources bin/nim`:
./fakecp

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

But it can't be cp because the same behavior exists on my work M1 vs my personal M1.

I tried it anyways and it results in the same behavior.

@timotheecour
Copy link
Member

timotheecour commented May 1, 2021

i mean, since both macs you tried were M1, it could be due to some odd/broken behavior for cp, which i tried to circumvent via copyFile (but maybe copyFileWithPermissions would make more sense)

we can merge #17917 to unblock you, but we should keep this issue opened until we understand what's causing this issue, because the problem may crop up again in different circumstances and at least now we have a state that can be reproduced on (at least) M1 macs.

it's really hard to debug on my end since i can't reproduce this and don't have an M1 mac, but IIRC @Araq does, maybe he has some ideas?

I'm running out of ideas here and can't find similar issues online, but maybe worth trying:

  • stat bin/nim (and comparing to bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff)
  • ditto with mdls bin/nim

@timotheecour timotheecour changed the title Cannot build with build_all.sh, breaks nim binary Cannot build with build_all.sh, breaks nim binary (un-explainable behavior with cp on M1 macs) May 1, 2021
@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

i mean, since both macs you tried were M1, it could be due to some odd/broken behavior for cp, which i tried to circumvent via copyFile (but maybe copyFileWithPermissions would make more sense)

Ah, alright, makes sense :)

we can merge #17917 to unblock you, but we should keep this issue opened until we understand what's causing this issue, because the problem may crop up again in different circumstances and at least now we have a state that can be reproduced on (at least) M1 macs.

Cool, sounds good.

it's really hard to debug on my end since i can't reproduce this and don't have an M1 mac, but IIRC @Araq does, maybe he has some ideas?

Yeah, maybe he can reproduce. It's definitely a weird issue, thanks for debugging with me!

Here is finally some different output. From stat:

> stat -s ./bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff
st_dev=16777233 st_ino=2472748 st_mode=0100755 st_nlink=1 st_uid=501 st_gid=20 st_rdev=0 st_size=2899472 st_atime=1619907034 st_mtime=1619907034 st_ctime=1619907034 st_birthtime=1619907034 st_blksize=4096 st_blocks=5664 st_flags=0

> stat -s ./bin/nim
st_dev=16777233 st_ino=2473065 st_mode=0100755 st_nlink=1 st_uid=501 st_gid=20 st_rdev=0 st_size=2899472 st_atime=1619907087 st_mtime=1619907087 st_ctime=1619907087 st_birthtime=1619907071 st_blksize=4096 st_blocks=7168 st_flags=0

You can see the st_blocks is different for some reason. I have no idea why that would differ. From the docs:

struct stat { /* when _DARWIN_FEATURE_64_BIT_INODE is NOT defined */
     dev_t    st_dev;    /* device inode resides on */
     ino_t    st_ino;    /* inode's number */
     mode_t   st_mode;   /* inode protection mode */
     nlink_t  st_nlink;  /* number or hard links to the file */
     uid_t    st_uid;    /* user-id of owner */
     gid_t    st_gid;    /* group-id of owner */
     dev_t    st_rdev;   /* device type, for special file inode */
     struct timespec st_atimespec;  /* time of last access */
     struct timespec st_mtimespec;  /* time of last data modification */
     struct timespec st_ctimespec;  /* time of last file status change */
     off_t    st_size;   /* file size, in bytes */
     quad_t   st_blocks; /* blocks allocated for file */
     u_long   st_blksize;/* optimal file sys I/O ops blocksize */
     u_long   st_flags;  /* user defined flags for file */
     u_long   st_gen;    /* file generation number */
 };

Here's the output from mdls, but it offers no help.

> mdls ./bin/nim
_kMDItemDisplayNameWithExtensions      = "nim"
kMDItemContentCreationDate             = 2021-05-01 22:11:11 +0000
kMDItemContentCreationDate_Ranking     = 2021-05-01 00:00:00 +0000
kMDItemContentModificationDate         = 2021-05-01 22:11:27 +0000
kMDItemContentModificationDate_Ranking = 2021-05-01 00:00:00 +0000
kMDItemContentType                     = "public.unix-executable"
kMDItemContentTypeTree                 = (
    "public.unix-executable",
    "public.data",
    "public.item",
    "public.executable"
)
kMDItemDateAdded                       = 2021-05-01 22:11:11 +0000
kMDItemDateAdded_Ranking               = 2021-05-01 00:00:00 +0000
kMDItemDisplayName                     = "nim"
kMDItemDocumentIdentifier              = 0
kMDItemFSContentChangeDate             = 2021-05-01 22:11:27 +0000
kMDItemFSCreationDate                  = 2021-05-01 22:11:11 +0000
kMDItemFSCreatorCode                   = ""
kMDItemFSFinderFlags                   = 0
kMDItemFSHasCustomIcon                 = (null)
kMDItemFSInvisible                     = 0
kMDItemFSIsExtensionHidden             = 0
kMDItemFSIsStationery                  = (null)
kMDItemFSLabel                         = 0
kMDItemFSName                          = "nim"
kMDItemFSNodeCount                     = (null)
kMDItemFSOwnerGroupID                  = 20
kMDItemFSOwnerUserID                   = 501
kMDItemFSSize                          = 2899472
kMDItemFSTypeCode                      = ""
kMDItemInterestingDate_Ranking         = 2021-05-01 00:00:00 +0000
kMDItemKind                            = "Unix Executable File"
kMDItemLogicalSize                     = 2899472
kMDItemPhysicalSize                    = 3670016

> mdls ./bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff
_kMDItemDisplayNameWithExtensions      = "nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff"
kMDItemContentCreationDate             = 2021-05-01 22:10:34 +0000
kMDItemContentCreationDate_Ranking     = 2021-05-01 00:00:00 +0000
kMDItemContentModificationDate         = 2021-05-01 22:10:34 +0000
kMDItemContentModificationDate_Ranking = 2021-05-01 00:00:00 +0000
kMDItemContentType                     = "public.unix-executable"
kMDItemContentTypeTree                 = (
    "public.unix-executable",
    "public.data",
    "public.item",
    "public.executable"
)
kMDItemDateAdded                       = 2021-05-01 22:10:34 +0000
kMDItemDateAdded_Ranking               = 2021-05-01 00:00:00 +0000
kMDItemDisplayName                     = "nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff"
kMDItemDocumentIdentifier              = 0
kMDItemFSContentChangeDate             = 2021-05-01 22:10:34 +0000
kMDItemFSCreationDate                  = 2021-05-01 22:10:34 +0000
kMDItemFSCreatorCode                   = ""
kMDItemFSFinderFlags                   = 0
kMDItemFSHasCustomIcon                 = (null)
kMDItemFSInvisible                     = 0
kMDItemFSIsExtensionHidden             = 0
kMDItemFSIsStationery                  = (null)
kMDItemFSLabel                         = 0
kMDItemFSName                          = "nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff"
kMDItemFSNodeCount                     = (null)
kMDItemFSOwnerGroupID                  = 20
kMDItemFSOwnerUserID                   = 501
kMDItemFSSize                          = 2899472
kMDItemFSTypeCode                      = ""
kMDItemInterestingDate_Ranking         = 2021-05-01 00:00:00 +0000
kMDItemKind                            = "Unix Executable File"
kMDItemLogicalSize                     = 2899472
kMDItemPhysicalSize                    = 2899968

@timotheecour
Copy link
Member

Here's the output from mdls, but it offers no help.

actually it does, kMDItemPhysicalSize differs!

kMDItemPhysicalSize                    = 3670016
kMDItemPhysicalSize                    = 2899968

this is most likely related to the difference observed with stat
st_blocks=5664
st_blocks=7168

we're onto something :)
definitely starts to sound like an apple M1 mac bug

@timotheecour
Copy link
Member

timotheecour commented May 1, 2021

@jyapayne can you also run report stat -s ./bin/nim and mdls bin/nim right before cp bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff bin/nim (at whichever 1st or 2nd call to build_all.sh is relevant)? that should help narrow down

@jyapayne
Copy link
Contributor Author

jyapayne commented May 1, 2021

actually it does, kMDItemPhysicalSize differs!

Good eye! Missed that one :)

I reset my Nim repo and I modified the build_all.sh script after the first (successful) run. I output stat -s ./bin/nim and mdls bin/nim before and after cp bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff bin/nim, then ran build_all.sh again. What's interesting now is that everything from mdls is null.

BEFORE

stat -s bin/nim
st_dev=16777233 st_ino=2484715 st_mode=0100755 st_nlink=1 st_uid=501 st_gid=20 st_rdev=0 st_size=5047520 st_atime=1619910247 st_mtime=1619910247 st_ctime=1619910247 st_birthtime=1619910247 st_blksize=4096 st_blocks=9864 st_flags=0

mdls bin/nim
kMDItemFSContentChangeDate = (null)
kMDItemFSCreationDate      = (null)
kMDItemFSCreatorCode       = ""
kMDItemFSFinderFlags       = (null)
kMDItemFSHasCustomIcon     = (null)
kMDItemFSInvisible         = 0
kMDItemFSIsExtensionHidden = (null)
kMDItemFSIsStationery      = (null)
kMDItemFSLabel             = (null)
kMDItemFSName              = (null)
kMDItemFSNodeCount         = (null)
kMDItemFSOwnerGroupID      = (null)
kMDItemFSOwnerUserID       = (null)
kMDItemFSSize              = (null)
kMDItemFSTypeCode          = ""

cp bin/nim_csources_a8a5241f9475099c823cfe1a5e0ca4022ac201ff bin/nim

AFTER

stat -s bin/nim
st_dev=16777233 st_ino=2484715 st_mode=0100755 st_nlink=1 st_uid=501 st_gid=20 st_rdev=0 st_size=2899472 st_atime=1619910247 st_mtime=1619910319 st_ctime=1619910319 st_birthtime=1619910247 st_blksize=4096 st_blocks=7168 st_flags=0

mdls bin/nim
kMDItemFSContentChangeDate = (null)
kMDItemFSCreationDate      = (null)
kMDItemFSCreatorCode       = ""
kMDItemFSFinderFlags       = (null)
kMDItemFSHasCustomIcon     = (null)
kMDItemFSInvisible         = 0
kMDItemFSIsExtensionHidden = (null)
kMDItemFSIsStationery      = (null)
kMDItemFSLabel             = (null)
kMDItemFSName              = (null)
kMDItemFSNodeCount         = (null)
kMDItemFSOwnerGroupID      = (null)
kMDItemFSOwnerUserID       = (null)
kMDItemFSSize              = (null)
kMDItemFSTypeCode          = ""

Relevent code changed in build_all.sh:

    echo ""
    echo "BEFORE"
    echo_run stat -s bin/nim
    echo_run mdls bin/nim
    echo_run cp $nim_csources bin/nim
    echo ""
    echo "AFTER"
    echo_run stat -s bin/nim
    echo_run mdls bin/nim

@Araq
Copy link
Member

Araq commented May 3, 2021

I can reproduce the problem. I needed to run sh build_all.sh twice, as you said.

@timotheecour
Copy link
Member

timotheecour commented May 3, 2021

I'm preparing a stackoverflow question to get more eyes on this issue, is there a simpler repro than the preceding? (asking people to execute sh build_all.sh which they don't trust isn't great)
Ideally a git repo people can clone containing just 2 binary files nim and nim2 such that:

./nim2 -v # works
cp ./nim2 ./nim
./nim -v # crashes with Killed: 9

(i can't do it, I don't have M1 mac)

In the meantime, #17917 can unblock M1 users

@Araq Araq closed this as completed in fff5001 May 3, 2021
@jyapayne
Copy link
Contributor Author

jyapayne commented May 3, 2021

Ideally a git repo people can clone containing just 2 binary files nim and nim2 such that

Yeah, if those files already exist in the repo and are executable as an arm64 M1 binary. I think people will trust running a binary even less than a shell script though

@timotheecour
Copy link
Member

I think people will trust running a binary even less than a shell script though

running a binary or shell script seems to have the same security caveats (unless you look into what the shell script does, which itself calls other binaries...), but the 2nd way is simpler (less steps involved), and in any case a sandbox can be used in both cases, and I can always suggest the 2 ways to reproduce the problem. Your help is welcome to create such a repo with both binaries, and then I/we can create a stackoverflow question or report it to apple

@timotheecour timotheecour reopened this May 3, 2021
@jyapayne
Copy link
Contributor Author

jyapayne commented May 4, 2021

I've come up with a reproducible way to hit this issue that doesn't even envolve Nim, so we can file the stackoverflow question with just a description of two source files and how to execute the commands. I'll detail it here for documentation's sake, but also post an SO question and post the link here afterwards. Below is a draft of what I might post to the SO question, so let me know your thoughts @timotheecour. Just a note that I couldn't reliable reproduce the differences in mdls or stat -s, so I've left them out.

Mac M1 Binary cp issue

Description

Recently, I've been observing an issue that happens after copying a binary file over another binary file without first deleting it on my M1. After some experimentation, I've come up with a reproducible method of hitting this issue on Apple's new hardware on the latest 11.3 release of Big Sur.

The issue happens when copying a differing binary over another binary after they have been run at least once. Not sure what is causing this issue, but it's very perplexing and could potentially lead to some security issues.

For example, this produces the error:

> ./binaryA
# output A
> ./binaryB
# output B
> cp binaryA binaryB
> ./binaryB
Killed: 9 

Setup

In order to reproduce the above behavior, we can create two simple C files with the following contents:

// binaryA.c
#include<stdio.h>

int main() {
    printf("Hello world!");
}
// binaryB.c
#include<stdio.h>
const char s[] = "Hello world 123!"; // to make sizes differ for clarity

int main() {
    printf("%s", s);
}

Now, you can run the following commands and get the error described (the programs must be run before the issue can be reproduced, so running the programs below is necessary):

> gcc -o binaryA binaryA.c
> gcc -o binaryB binaryB.c
> ./binaryA
Hello world!
> ./binaryB
Hello world 123!
> cp binaryA binaryB
> ./binaryB
Killed: 9

As you can see, the binaryB binary no longer works. For all intents and purposes, the two binaries are equal but one runs and one doesn't. Does anyone have a theory behind this behavior or is it a bug?

@timotheecour
Copy link
Member

excellent; just 1 note, can you add return 0; in both programs and still maintain the bug? (just to rule out the case that it'd give a non-zero output and crash, even if that case would still seem buggy)

please also link to this issue and maybe mention at least diff and stat -s for completeness, other than that, I think it's ready to post; definitely sounds like a bug; after SO we should also report it to apple somehow, maybe asking on SO where's a good channel for that

@jyapayne
Copy link
Contributor Author

jyapayne commented May 4, 2021

excellent; just 1 note, can you add return 0; in both programs and still maintain the bug?

Yep. Just tried it and it still works.

please also link to this issue and maybe mention at least diff and stat -s for completeness, other than that, I think it's ready to post; definitely sounds like a bug; after SO we should also report it to apple somehow, maybe asking on SO where's a good channel for that

Cool, will do!

@jyapayne
Copy link
Contributor Author

jyapayne commented May 4, 2021

The SO post

@jyapayne
Copy link
Contributor Author

jyapayne commented May 5, 2021

According to the answer received, it's expected behavior.

This is because Big Sur on ARM M1 processor requires all code to be validly signed (if only ad hoc) or the operating system will not execute it, instead killing it on launch.

@timotheecour
Copy link
Member

timotheecour commented May 5, 2021

that clarifies things, but it seems like this will cause subtle, hard to diagnose, breakages in lots of existing scripts (as was the case with this issue)

what if there's a hard link? can you try creating a hardlink before rm and see if the problem happens there? smthg like:

> gcc -o binaryA binaryA.c
> gcc -o binaryB binaryB.c
> ./binaryA
Hello world!
> ./binaryB
Hello world 123!
> ln binaryB binaryC # create hard link
> rm binaryB # underlying file not really deleted because of hard link?
> cp binaryA binaryB
> ./binaryB
# does it still give: Killed: 9 ?

if so then looks like there's no reliable way to update a binary, since it's not practical to list where potential hardlinks are on the system?

@jyapayne
Copy link
Contributor Author

jyapayne commented May 5, 2021

It does not give a Killed: 9, but that's because you completely remove binaryB and are thus creating a "new" file by copying binaryA to binaryB with a new vnode. binaryC will still reference the old binaryB vnode, but the new binaryB will be a new vnode, as far as I understand.

@timotheecour
Copy link
Member

timotheecour commented May 5, 2021

ok.

another question is:

  • should os.copyFile handle this case (detecting this behavior via a macro provided by osx, or detecting at least osx version/arch), so that copyFile for binaries behaves like rm old && cp new old ? otherwise it looks like we'll end up with code that could work in all platforms except for M1 if it involves this situation (with a terrible diagnostic)

  • indepndently of the above, does the OS provide a corresponding error code (with corresponding strerror) for this case, when attempting to execute binaryB via execCmdEx + friends

@jyapayne
Copy link
Contributor Author

jyapayne commented May 5, 2021

should os.copyFile handle this case (detecting this behavior via a macro provided by osx, or detecting at least osx version/arch), so that copyFile for binaries behaves like rm old && cp new old ? otherwise it looks like we'll end up with code that could work in all platforms except for M1 if it involves this situation (with a terrible diagnostic)

You'd probably then have to stat the file to see if it's binary, then check if the resulting file exists. It sounds somewhat expensive, but it might not be a big deal because file ops are slow anyway and it would only be for M1 arch. I think this might require an RFC to hash out. It might be worth it to implement if it could avoid huge amounts of debugging time.

indepndently of the above, does the OS provide a corresponding error code (with corresponding strerror) for this case, when attempting to execute binaryB via execCmdEx + friends

The return code after running the failing binary is 137, if that means anything to you. I looked in the Console app, and the exception info it gives is:

Exception Type:        EXC_BAD_ACCESS (Code Signature Invalid)
Exception Codes:       0x0000000000000032, 0x0000000100e94000
Exception Note:        EXC_CORPSE_NOTIFY

@timotheecour timotheecour changed the title Cannot build with build_all.sh, breaks nim binary (un-explainable behavior with cp on M1 macs) cp (and copyFile) doesn't work on M1 macs when it overwrites a binary (upstream mac issue) May 7, 2021
PMunch pushed a commit to PMunch/Nim that referenced this issue Mar 28, 2022
PMunch pushed a commit to PMunch/Nim that referenced this issue Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants