Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posix.getgroups() failure on Mac OS X #52148

Closed
voidspace opened this issue Feb 10, 2010 · 83 comments
Closed

posix.getgroups() failure on Mac OS X #52148

voidspace opened this issue Feb 10, 2010 · 83 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@voidspace
Copy link
Contributor

voidspace commented Feb 10, 2010

BPO 7900
Nosy @loewis, @ronaldoussoren, @abalkin, @orsenthil, @pitrou, @ned-deily, @bitdancer, @florentx
Files
  • issue7900.diff
  • tg.py
  • apple-2.5-fix-posixmodule.c.diff
  • apple-2.6-fix-posixmodule.c.diff
  • no-darwin-ext.diff: Undefine _DARWIN_C_SOURCE in posixmodule
  • issue7900-tests.diff: additional tests
  • os-getgroups.patch
  • issue7900-1.diff
  • getsetgroups-bug.tar
  • os-getgroups-v2.patch
  • os-getgroups-v3.patch
  • issue7900-trunk.diff
  • smime.p7s
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ronaldoussoren'
    closed_at = <Date 2011-03-14.20:06:58.558>
    created_at = <Date 2010-02-10.15:04:56.845>
    labels = ['type-bug', 'library']
    title = 'posix.getgroups() failure on Mac OS X'
    updated_at = <Date 2011-03-14.20:06:58.556>
    user = 'https://github.com/voidspace'

    bugs.python.org fields:

    activity = <Date 2011-03-14.20:06:58.556>
    actor = 'ronaldoussoren'
    assignee = 'ronaldoussoren'
    closed = True
    closed_date = <Date 2011-03-14.20:06:58.558>
    closer = 'ronaldoussoren'
    components = ['Library (Lib)']
    creation = <Date 2010-02-10.15:04:56.845>
    creator = 'michael.foord'
    dependencies = []
    files = ['16270', '16306', '16309', '16310', '16326', '16332', '16333', '16338', '16347', '17221', '17234', '17751', '18336']
    hgrepos = []
    issue_num = 7900
    keywords = ['patch']
    message_count = 83.0
    messages = ['99165', '99274', '99277', '99279', '99304', '99306', '99307', '99310', '99390', '99391', '99556', '99625', '99627', '99628', '99630', '99631', '99675', '99693', '99759', '99766', '99772', '99775', '99862', '99865', '99901', '99903', '99908', '99909', '99910', '99913', '99919', '99926', '99928', '99933', '99935', '99939', '99941', '99944', '99962', '99964', '100055', '100058', '101850', '105042', '105065', '105067', '105074', '105086', '108393', '108425', '108430', '108442', '108456', '108482', '108483', '109593', '109597', '109599', '109607', '109608', '111310', '111353', '111441', '112360', '112365', '112370', '112563', '120970', '120974', '120975', '120977', '121266', '121267', '121269', '121270', '121289', '121291', '121292', '121295', '121312', '121333', '121334', '130887']
    nosy_count = 10.0
    nosy_names = ['loewis', 'ixokai', 'ronaldoussoren', 'belopolsky', 'orsenthil', 'pitrou', 'ned.deily', 'r.david.murray', 'flox', 'l0nwlf']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7900'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @voidspace
    Copy link
    Contributor Author

    voidspace commented Feb 10, 2010

    test_posix fails on trunk on Mac OS X (Snow Leopard)

    test.test_support.TestFailed: Traceback (most recent call last):
      File "Lib/test/test_posix.py", line 42, in testNoArgFunctions
        posix_func()
    OSError: [Errno 22] Invalid argument
    Python 2.7a3+ (trunk:78129M, Feb 10 2010, 10:40:28) 
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    >>> import posix
    >>> posix.getgroups()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 22] Invalid argument

    @voidspace voidspace added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Feb 10, 2010
    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Feb 12, 2010

    I don't see any issue here, runs perfectly fine on Mac OS X (Snow Leopard)

    Shashwat-Anands-MacBook-Pro:test l0nwlf$ pwd
    /Users/l0nwlf/python-svn/Lib/test
    Shashwat-Anands-MacBook-Pro:test l0nwlf$ python2.7 test_posix.py
    testNoArgFunctions (main.PosixTester) ... ok
    test_access (main.PosixTester) ... ok
    test_chdir (main.PosixTester) ... ok
    test_chflags (main.PosixTester) ... ok
    test_chown (main.PosixTester) ... ok
    test_confstr (main.PosixTester) ... ok
    test_dup (main.PosixTester) ... ok
    test_dup2 (main.PosixTester) ... ok
    test_fchown (main.PosixTester) ... ok
    test_fdopen (main.PosixTester) ... ok
    test_fstat (main.PosixTester) ... ok
    test_fstatvfs (main.PosixTester) ... ok
    test_ftruncate (main.PosixTester) ... ok
    test_getcwd_long_pathnames (main.PosixTester) ... ok
    test_initgroups (main.PosixTester) ... ok
    test_lchflags (main.PosixTester) ... ok
    test_lchown (main.PosixTester) ... ok
    test_lsdir (main.PosixTester) ... ok
    test_osexlock (main.PosixTester) ... ok
    test_osshlock (main.PosixTester) ... ok
    test_pipe (main.PosixTester) ... ok
    test_stat (main.PosixTester) ... ok
    test_statvfs (main.PosixTester) ... ok
    test_strerror (main.PosixTester) ... ok
    test_tempnam (main.PosixTester) ... ok
    test_tmpfile (main.PosixTester) ... ok
    test_umask (main.PosixTester) ... ok
    test_utime (main.PosixTester) ... ok

    ----------------------------------------------------------------------
    Ran 28 tests in 0.025s

    OK

    Shashwat-Anands-MacBook-Pro:test l0nwlf$ python2.7 --versionPython 2.7a3+
    Shashwat-Anands-MacBook-Pro:test l0nwlf$ python2.7 
    Python 2.7a3+ (trunk:78165, Feb 12 2010, 22:36:03) 
    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import posix
    >>> posix.getgroups()
    [20, 204, 100, 98, 81, 80, 79, 61, 12, 402, 401]
    >>>

    @orsenthil
    Copy link
    Member

    orsenthil commented Feb 12, 2010

    What is the (Apple Inc. build 5646) (dot 1) vs normal (Apple Inc. build 5646). ?

    While, ronald.oussoren did make a lot some changes recently (r78149 to r78152).This fix could have been a side-effect of one of it, thought I could not find the direct correlation.

    @voidspace
    Copy link
    Contributor Author

    voidspace commented Feb 12, 2010

    I still see it on trunk (revision 78165). No idea what the (dot 1) means.

    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Feb 13, 2010

    It seems they are basically the same thing, the version of GCC and the build of OS X(latest in the case here). Was not able to figure out the (dot 1) stuff though.

    @orsenthil
    Copy link
    Member

    orsenthil commented Feb 13, 2010

    please not remove the nosy list. ( I guess, you did it by accident).
    let's wait for ronald's response.

    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Feb 13, 2010

    Thanks for correcting it back. I did not even realized it.

    @l0nwlf
    Copy link
    Mannequin

    l0nwlf mannequin commented Feb 13, 2010

    5646 and 5646.1 are the builds of GCC by Apple. The various builds of gcc are present on http://www.opensource.apple.com/source/gcc/

    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin -> http://www.opensource.apple.com/source/gcc/gcc-5646.1/

    [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
    -> http://www.opensource.apple.com/source/gcc/gcc-5646/

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Feb 16, 2010

    Michael:

    • which configure options do you use?
    • which xcode version do you use?
      (this shouldn't be relevant, I'm interested in what causes the dot 1
      suffix)
    • If you use --enable-universalsdk: do you have the 10.4 SDK installed
      (should be installed in "$(xcode-select -print-path)/SDKs/")

    I cannot reproduce this with r78205, OSX 10.6.2/10C540, gcc version 4.2.1 (Apple Inc. build 5659), Xcode 3.2.2/10M2135.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Feb 16, 2010

    A related question: is this issue present in the 3.x trunk?

    (BTW: feel free to assign all OSX related issues to me)

    @ronaldoussoren ronaldoussoren self-assigned this Feb 16, 2010
    @voidspace
    Copy link
    Contributor Author

    voidspace commented Feb 19, 2010

    I'm not seeing the same issue on my Macbook Pro. I can get all this info from my desktop machine (Mac Pro) when I return from PyCon.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 20, 2010

    Michael,

    Can you post the output of "groups" and "id" command from your Mac? It looks like posix_getgroups cannot handle more than NGROUPS_MAX groups and NGROUPS_MAX is 16 on Mac OS.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 20, 2010

    I was able to reproduce the error. First, add your user name to multiple test groups as follows:

    $ sudo dscl . -create /Groups/testN GroupMembership username
    (repeat 16 times with different Ns)
    $ ./python.exe 
    Python 2.7a3+ (trunk:78265M, Feb 20 2010, 13:18:22) 
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import posix
    >>> posix.getgroups()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 22] Invalid argument

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 20, 2010

    I am submitting a fix. I am using the following feature documented in getgroups(2):
    """
    If _DARWIN_C_SOURCE is defined, getgroups() can return more than {NGROUPS_MAX} groups.
    """

    It appears that _DARWIN_C_SOURCE is defined in the standard python configuration on Mac OS X. Tested on 10.6 only.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 20, 2010

    It looks like the current implementation is not POSIX compliant because it assumes that NGROUPS_MAX is compile time constant. However, according to <http://www.opengroup.org/onlinepubs/000095399/functions/getgroups.html\>, "Application writers should note that {NGROUPS_MAX} is not necessarily a constant on all implementations."

    I would suggest using my _DARWIN_C_SOURCE implementation unconditionally and make similar changes to posix_setgroups, but this is probably a subject for a separate issue.

    1 similar comment
    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 20, 2010

    It looks like the current implementation is not POSIX compliant because it assumes that NGROUPS_MAX is compile time constant. However, according to <http://www.opengroup.org/onlinepubs/000095399/functions/getgroups.html\>, "Application writers should note that {NGROUPS_MAX} is not necessarily a constant on all implementations."

    I would suggest using my _DARWIN_C_SOURCE implementation unconditionally and make similar changes to posix_setgroups, but this is probably a subject for a separate issue.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Feb 21, 2010

    I would suggest using my _DARWIN_C_SOURCE implementation
    unconditionally and make similar changes to posix_setgroups, but this
    is probably a subject for a separate issue.

    I would propose a different strategy: if _SC_NGROUPS_MAX is defined, use
    that to find out how much memory to allocate, otherwise, fall back to
    the current max array size. Can you find out whether doing so would also
    fix the issue at hand?

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 21, 2010

    On Sun, Feb 21, 2010 at 1:58 PM, Martin v. Löwis <report@bugs.python.org> wrote:
    ..

    I would propose a different strategy: if _SC_NGROUPS_MAX is defined, use
    that to find out how much memory to allocate, otherwise, fall back to
    the current max array size. Can you find out whether doing so would also
    fix the issue at hand?

    I am afraid that the following is the evidence that it won't:

    Python 2.7a3+ (trunk:78265M, Feb 20 2010, 15:20:36)
    [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> os.sysconf('SC_NGROUPS_MAX')
    16
    >>> len(os.getgroups())  # with the patch
    22

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 22, 2010

    Here is another interesting fact: Mac OS 10.6 comes with python 2.5 and 2.6 preinstalled:

    $ python2.5 -V
    Python 2.5.3c1
    $ python2.6 -V
    Python 2.6.1

    Neither of these exhibit the same bug, but both are broken in some way. Given

    $ cat tg.py
    import os
    g = os.getgroups()
    print g
    os.setgroups(g[:5])
    print os.getgroups()
    
    $ sudo python2.5 tg.py
    [0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2]
    [0, 101, 204, 100, 98]
    $ sudo python2.6 tg.py
    [0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2, 1, 401]
    [0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2, 1, 401]

    Note that python2.5 truncates the group list which is
    $ sudo id -G
    0 101 204 100 98 80 61 29 20 12 9 8 5 4 3 2 1 401

    but setgroups works as expected. In contrast, python2.6 reports all groups correctly, but setgroups has no effect.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 22, 2010

    Apparently, Apple patches posix_[gs]etgroups functions as follows:

    for 2.5: http://www.opensource.apple.com/source/python/python-44/2.5/fix/posixmodule.c.ed
    
    for 2.6: http://www.opensource.apple.com/source/python/python-44/2.6/fix/posixmodule.c.ed

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Feb 22, 2010

    And as usual they can't be bothered to describe what the patch does, or even use regular universal diffs.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 22, 2010

    I've converted apple patches to unified diffs, but I cannot reproduce 2.5 behavior.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 22, 2010

    After some head-scratching, I figured out how to reproduce stock python2.5 behavior. It turns out that defining _DARWIN_C_SOURCE not only allows getgroups() output to exceed NGROUPS_MAX (as documented), but also effectively disables setgroups() which is not documented.

    With no-darwin-ext.diff patch and previously attached tg.py, I see

    $ cat tg.py
    import os
    g = os.getgroups()
    print(g)
    os.setgroups(g[:5])
    print(os.getgroups())
    
    $ sudo ./python.exe tg.py
    [0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2]
    [0, 101, 204, 100, 98]

    which is the same as with stock python2.5:

    $ sudo python2.5 tg.py
    [0, 101, 204, 100, 98, 80, 61, 29, 20, 12, 9, 8, 5, 4, 3, 2]
    [0, 101, 204, 100, 98]

    Note that root is a member of 18 groups on my system, but the last two are truncated by os.getgroups().

    It is tempting to adopt no-darwin-ext.diff as a solution to this issue because allowing more than NGROUPS_MAX (or sysconf(_SC_NGROUPS_MAX) which should be the same) groups is really a Mac OS bug.

    In order to have both working os.setgroups() and os.getgroups() supporting more than NGROUPS_MAX results, it appears that the two functions should be compiled in separate compilation units which is probably too big of a price to pay for the functionality.

    Also, my bpo-7900.diff, while likely to work in most practical situation is vulnerable to a race condition if group membership is expanded between two calls to getgroups.

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Feb 22, 2010

    I am reclassifying this as a crash because os.getgroups() crashes the interpreter when python is running as root on an unmodified system:

    $ sudo ./python.exe  -c "import os; os.getgroups()"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    OSError: [Errno 22] Invalid argument

    This is also a regression apparently introduced in r63955.

    @AlexanderBelopolsky AlexanderBelopolsky mannequin added type-crash A hard crash of the interpreter, possibly with a core dump and removed type-bug An unexpected behavior, bug, or error labels Feb 22, 2010
    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Feb 23, 2010

    Alexander: What makes you think r63955 introduced the problem?

    Btw. This does not crash the interpreter: the example you give causes an exception and cleanly shuts down the interpreter. The exception is unwanted, but I wouldn't call it a crash.

    The Apple fix for getgroups in python2.6 is odd, it uses an undocumented API (getgrouplist_2).

    If I read the manpage correctly there is a posixly correct way to implement os.getgroups:

    • call getgroups(MAX_GROUPS,...)
    • if that fails: call getgroups(0,...), the result is groupcount
    • allocate an array of groupcount gid_t's and call getgroups(groupcount)

    I'll work on a patch that implements this.

    @abalkin
    Copy link
    Member

    abalkin commented Jul 8, 2010

    s/2.7/2.7.1/

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Jul 23, 2010

    I've added bpo-9344 for adding os.getgroupslist. I'd prefer to keep adding that function separate from this issue. Btw. I'm +1 on adding such a function.

    I will shortly commit a port of os-getgroups-v3.patch to 3.2, but without the tests in "PosixGroupsTester" because those explictly exclude OSX.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Jul 23, 2010

    Committed a port to python3 for os-getgroups-v3.patch in r83088,
    including test cases (contrary to what I wrote before).

    Backports:
    3.1: r83093

    I'll backport to 2.7 and 2.6 tomorrow.

    To complete the documentation for picking this patch: I've spoken with an Apple engineer about this issue. He says the the _DARWIN_C_SOURCE behavior is intentional and will not be reverted. Apple's build of python, and other system tools (including perl) also use the _DARWIN_C_SOURCE behavior.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Jul 24, 2010

    2.7: r83124
    2.6: r83126

    The fix is now in all active branches, and I therefore close the issue.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2010

    Reopening. This seems to have broken a couple of buildbots (two different issues):
    http://www.python.org/dev/buildbot/builders/x86%20FreeBSD%202.7/builds/44/steps/test/logs/stdio
    http://www.python.org/dev/buildbot/builders/sparc%20solaris10%20gcc%202.6/builds/737/steps/test/logs/stdio

    If you want to have a global look at buildbot status, you can use bbreport:
    http://code.google.com/p/bbreport/

    Please don't commit platform-dependent code without at least watching the buildbots afterwards...

    @pitrou pitrou reopened this Aug 1, 2010
    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Aug 1, 2010

    The 2.6 problem (the solaris buildbot you link to) should be fixed in r83420.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Aug 1, 2010

    The other problem is fixed in r83431 for the py3k trunk.

    I'll check the buildbot status tomorow morning, if that shows that the issue is truly gone I'll backport to the other branches and close this issue.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Aug 3, 2010

    Some else backported to 3.1 (that is, 3.1 already contained the fix when I tried the svnmerge)

    Backported to 2.7 in r83643

    Backported to 2.6 in r83650

    @ixokai
    Copy link
    Mannequin

    ixokai mannequin commented Nov 11, 2010

    This test is failing again, and IIUC, largely due to the same sort of issues: http://www.python.org/dev/buildbot/all/builders/AMD64%20Leopard%203.1/builds/65

    I was able to track down what exactly caused it to fail in this case on my box, though. Whatever "posix.getgroups()" ends up calling, appears to be tied to the current users login -- or at least, doesn't get updated when new groups are added to the user.

    This failure happened because at some point after the buildbot was up and running, I added a new user to the machine (totally unconnected to the existing buildbot runner): this caused a new group to be added to the buildbot runner's user.

    "id -G" starts returning that group immediately, but "posix.getgroups()" returns the same list as it had before. I was able to further reproduce it in Terminal, by having a console open, and compiling 3.1 there then adding a user, and running the test. It fails. Opening up a new terminal window, running the test-- and it succeeds. The original console continues to fail.

    @ixokai ixokai mannequin reopened this Nov 11, 2010
    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Nov 11, 2010

    This is the expected behavior on OSX. Apple has a pretty odd interpretation of the standards wrt getgroups and setgroups behavior.

    This behavior is not a bug in python

    Sent from my iPhone

    On 11 nov. 2010, at 22:17, Stephen Hansen <report@bugs.python.org> wrote:

    Stephen Hansen <me+python@ixokai.io> added the comment:

    This test is failing again, and IIUC, largely due to the same sort of issues: http://www.python.org/dev/buildbot/all/builders/AMD64%20Leopard%203.1/builds/65

    I was able to track down what exactly caused it to fail in this case on my box, though. Whatever "posix.getgroups()" ends up calling, appears to be tied to the current users login -- or at least, doesn't get updated when new groups are added to the user.

    This failure happened because at some point after the buildbot was up and running, I added a new user to the machine (totally unconnected to the existing buildbot runner): this caused a new group to be added to the buildbot runner's user.

    "id -G" starts returning that group immediately, but "posix.getgroups()" returns the same list as it had before. I was able to further reproduce it in Terminal, by having a console open, and compiling 3.1 there then adding a user, and running the test. It fails. Opening up a new terminal window, running the test-- and it succeeds. The original console continues to fail.

    ----------
    nosy: +ixokai
    versions: +Python 3.1, Python 3.2


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue7900\>


    @ixokai
    Copy link
    Mannequin

    ixokai mannequin commented Nov 11, 2010

    Well, yes: the result of posix.getgroups is not a bug in Python, but is it a bug in the test? Should it be skipped on OSX, or some other solution?

    Having buildbots fail because of something that's expected behavior is bad, isn't it?

    @bitdancer
    Copy link
    Member

    bitdancer commented Nov 11, 2010

    Right, regardless of whether or not it is a bug in python, IMO it *is* a bug in the python test suite, since we *expect* buildbots to be long running processes and therefore they are going to get hit by this failure on OSX periodically with a pretty high likelyhood. Yes it is easily fixable (restart the builder), but it seems to me the test should be fixed somehow instead of putting that burden on the buildbot owner.

    A skip on OSX would certainly be the simplest solution, and we could thereby indicate that we consider this behavior to be a bug in OSX.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Nov 16, 2010

    If anything should be done the test that checks the output of id -G should be removed if we want the buildbot to keep running without problems when you change the buildbots account.

    After reading the message about the new failures again I don't think this is the OSX issue I mentioned (an which is explain in painfull detail earlier in the message list): it's just that the buildbot account got changed (unintentionally) while buildbot was running.

    BTW. I don't understand why adding a new account to an OSX machine adds existing accounts to a new group, I have never seen that behaviour before (on OSX).

    I'm -1 on changing anything for now and do not consider this to be a bug in Python or its testset.

    @bitdancer
    Copy link
    Member

    bitdancer commented Nov 16, 2010

    Ronald, on a normal unix system if you add a user to a group, any existing process/terminal session that runs 'id -G' will return the *old* group list. Only a new process/terminal session will see the new group.

    On OSX, 'id -G' returns the new group when run in an existing process/terminal session, according to what you wrote.

    You can't just remove the 'id -G' from that test, because the test is using 'id -G' to get an independent verification of the list of group numbers as a check against what getgroups returns. On a normal unix system, these two would match. On OSX, they don't.

    At the moment I don't see any alternative to skipping the test on OSX with a message that 'id -G' and 'getgroups' do not return the same group list on OSX.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Nov 16, 2010

    I'm still -1 on changing the test. The test only fails when run from the buildbot and the buildbot account is changed without restarting buildbot. Changing the buildbot account should happen almost never, and IMO you should restart the buildbot daemon when you do so (and that's just good practice)

    Disabling the test on OSX means that os.getgroups will not get tested at all on OSX, even when I run the testsuite from the command-line.

    @ixokai
    Copy link
    Mannequin

    ixokai mannequin commented Nov 16, 2010

    The test is clearly verifying a *wrong* assumption: that id -G will match posix.getgroups() which simply does not hold on OSX.

    I can reproduce this reliably on a completely clean, brand new installation of 10.5: from there the only things that have been done to the box is updating to 10.5.8, and then downloading the latest XCode tools that run on Leopard.

    From here, launch Terminal: leave the console open. Run id -G; then run python and look at posix.getgroups().

    Now, go into System Preferences and add a new user. Don't do anything else. Don't change anything with existing user.

    In the console that was already open, do id -G again. Now run python again, and do posix.getgroups() -- those no longer match.

    Clearly IMHO the assumption that the test is declaring to be an expected result simply is not true in a OSX-Unix environment.

    Yes, if I go and *edit the actual slave user* then surely I can expect failures until I restarted the buildslave. But, if by merely adding a user causes a change to the buildslaves user by no action of my own, and that causes this test to be invalid... the test itself seems to be founded on assumptions which simply are not reliably true.

    I understand disabling the test means os.getgroups() will no longer be tested on OSX: and yet, the current situation is a specific behavior of os.getgroups() is tested which is *not* actually the guaranteed behavior of that operation.

    There is at least one very easy to reproduce situation in which id -G and posix.getgroups() do not match: I don't know if there are more. But for the test to assert the truth that its only correct when they match seems to be a mistake.

    @bitdancer
    Copy link
    Member

    bitdancer commented Nov 16, 2010

    I agree with Stephen. The test in question is *not a valid test* on OSX. Therefore on OSX it should be skipped.

    If you can think of a way to test the actual behavior of getgroups on OSX, that's even better.

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Nov 16, 2010

    Please explain how the failure can be reproduced.

    I've done some testing on my machine using Apple's copy of python 2.6.1 (on OSX 10.6), which has the same getgroups implementation as the current heads of the active branches.

    >>> os.getgroups()
    [20, 402, 204, 61, 12, 401]
    >>> os.system("id -G")
    20 402 204 61 12 401
    0

    (Now open the Accounts preference pane and add a new user)

    >>> os.getgroups()
    [20, 403, 402, 204, 61, 12, 401]
    >>> os.system("id -G")
    20 403 402 204 61 12 401
    0

    Note how the result of both os.getgroups and id -G changes, which should mean that tests shouldn't fail unless you happened to add a new account in the split-second between the "calls" to os.getgroups and "id -G" in a testrun.

    Was the buildbot started using launchd (the recipe at <http://buildbot.net/trac/wiki/UsingLaunchd\> seems correct)? If not, how is it started?

    @bitdancer
    Copy link
    Member

    bitdancer commented Nov 16, 2010

    Having just reread this issue more carefully, my understanding is that Ronald had elected to make the results returned from os.getgroups match that returned by "system tools" (by which I understood him to mean the 'id' command). Since Ronald reports he sees the intended behavior, Stephen's results seem to show that there is a problem with the fix in some circumstances which need to be understood.

    Alexander noted that this should all be documented, and I agree, so I'm opening a new issue for the doc update.

    @bitdancer
    Copy link
    Member

    bitdancer commented Nov 16, 2010

    And it's entirely possible (even likely) that what Stephen is seeing here is a platform bug in OSX's quirky implementation of group management.

    @ixokai
    Copy link
    Mannequin

    ixokai mannequin commented Nov 16, 2010

    On 11/16/10 5:44 AM, Ronald Oussoren wrote:

    Ronald Oussoren <ronaldoussoren@mac.com> added the comment:
    Please explain how the failure can be reproduced.

    I have. But to do so more directly:

    1. Launch Terminal.app; leave the window console open.
    2. Run: id -G
    3. Run: python
    4. Type: import posix; posix.getgroups()
    5. Go into System Preferences, add a user.
    6. Type again, posix.getgroups(): notice, the values have not changed.
    7. Either os.system("id -G") or ^D and type id -G: in either case, these
      values *have* changed. Tested both.

    I've done some testing on my machine using Apple's copy of python 2.6.1 (on OSX 10.6), which has the same getgroups implementation as the current heads of the active branches.

    As I said, the slave is running the latest on 10.5. Perhaps its a
    platform bug which is fixed in 10.6: either way, the test is declaring
    behavior is true that it shouldn't, I think.

    Perhaps the test should only be skipped on 10.5? I am happy to provide a
    patch which tests sys.platform == "darwin" and then runs sw_vars to make
    only skip < 10.6.

    I verified posix.getgroups() on 10.6 does not appear to exhibit this
    behavior on my SL slave. However, that box does a LOT, so I can't vouch
    for its 'purity' like the 10.5 box.

    Was the buildbot started using launchd (the recipe at <http://buildbot.net/trac/wiki/UsingLaunchd\> seems correct)? If not, how is it started?

    It was started with launchd, yes: with a variation of that recipe.
    However as I stated, the behavior can be readily reproduced directly in
    Terminal.

    @ned-deily
    Copy link
    Member

    ned-deily commented Nov 17, 2010

    The problem Stephen is seeing with the buildbot machine is ABI-dependent; the behavior of getgroups(2) changed in 10.6. You can demonstrate this all on a 10.6 system. Open a terminal session and verify the process's groups:

    $ id -G
    20 40200 401 204 100 98 80 61 12 403 40100 103
    $ /usr/local/bin/python3.2 -c 'import posix; print(posix.getgroups())'
    [20, 40200, 401, 204, 100, 98, 80, 61, 12, 403, 40100, 103]

    Now create a new user with System Preferences. One of the quirks here is that OS X 10.5 and 10.6 create a new group for that user and assign other existing users to that group. (The new group is one of the somewhat mysteriously named com.apple.sharepoint.group.n groups.)

    Still in the same terminal session after the new user/group was created and the existing user name we are running under was automatically added to the new group:
    $ id -G
    20 40200 401 204 100 98 80 61 12 403 40100 402 103
    $ # note: new group membership 402 = com.apple.sharepoint.group.1
    $ now test with 3 Pythons built from the same source, py3k tip:
    $ cd ../../sdk10-4/py3k/
    $ ./python -c 'import posix; print(posix.getgroups())'
    [20, 40200, 401, 204, 100, 98, 80, 61, 12, 403, 40100, 103]
    $ cd ../../sdk10-5/py3k/
    $ ./python -c 'import posix; print(posix.getgroups())'
    [20, 40200, 401, 204, 100, 98, 80, 61, 12, 403, 40100, 103]
    $ cd ../../sdk10-6/py3k/
    $ ./python -c 'import posix; print(posix.getgroups())'
    [20, 40200, 401, 204, 100, 98, 80, 61, 12, 403, 40100, 402, 103]

    Only the version built with a deployment target of 10.6 - that is, using the 10.6 SDK and the 10.6 ABI - reflects the updated grouplist. And that difference can be seen, as Alexander noted earlier, in the symbols referenced. An nm ./python | grep getgroups for each shows:
    U _getgroups$DARWIN_EXTSN
    for the 10.6 deployment target version but
    U _getgroups
    for the 10.5 and 10.4 targeted versions.

    So unless building for a deployment target of 10.6 (or higher), it is to be expected that the output of /usr/bin/id will not match the results of getgroups(2) if the user's group membership changes during the run (as can happen when another user is created or deleted).

    This particular problem should only be an issue when running on 10.5 and higher and using a 10.5 or earlier ABI. On 10.4, neither getgroups(2) (as expected) nor /usr/bin/id see updates to group memberships made during the lifetime of the parent terminal session; starting a new login terminal session does see the updates.

    Also note that this issue would be observable with all existing current python.org OS X installers running on 10.5 or 10.6 as most have been built with a 10.3 deployment target while 2.7 also provides an additional 32-/64-bit one with a 10.5 deployment target. (I believe Ronald intends to build future 32-/64-bit installers with a 10.6 deployment target so they would be the first to not be subject to this issue.)

    FTR, here are the configure options I used for each build:

    ./configure --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk --with-universal-archs=32-bit MACOSX_DEPLOYMENT_TARGET=10.4

    ./configure --enable-universalsdk=/Developer/SDKs/MacOSX10.5.sdk --with-universal-archs=intel MACOSX_DEPLOYMENT_TARGET=10.5

    ./configure --enable-universalsdk=/Developer/SDKs/MacOSX10.6.sdk --with-universal-archs=intel MACOSX_DEPLOYMENT_TARGET=10.6

    @ned-deily
    Copy link
    Member

    ned-deily commented Nov 17, 2010

    (Argh! Just to be very clear, those ./configure commands are all one line, including the MACOSX_DEPLOYMENT_TARGET as an argument to the configure script.)

    @ronaldoussoren
    Copy link
    Contributor

    ronaldoussoren commented Mar 14, 2011

    I'm closing this issue again, the current behavior is intended (as it mirrors platform behavior).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants