Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_os.test_chown() failure on koobs-freebsd-{current,9} #72025

Closed
vstinner opened this issue Aug 23, 2016 · 21 comments
Closed

test_os.test_chown() failure on koobs-freebsd-{current,9} #72025

vstinner opened this issue Aug 23, 2016 · 21 comments
Labels
3.7 (EOL) end of life tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

BPO 27838
Nosy @vstinner, @PCManticore, @berkerpeksag, @serhiy-storchaka, @koobs, @vajrasky
Files
  • groups-test.log
  • test_os_chown.patch
  • koobs-freebsd-current-python-3.5-debug-build-773.txt
  • python-initial.txt
  • python-sudo.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-07-11.21:47:39.751>
    created_at = <Date 2016-08-23.13:40:58.216>
    labels = ['3.7', 'type-bug', 'tests']
    title = 'test_os.test_chown() failure on koobs-freebsd-{current,9}'
    updated_at = <Date 2018-07-11.21:47:39.750>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-07-11.21:47:39.750>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-07-11.21:47:39.751>
    closer = 'vstinner'
    components = ['Tests']
    creation = <Date 2016-08-23.13:40:58.216>
    creator = 'vstinner'
    dependencies = []
    files = ['44859', '45227', '45439', '46936', '46937']
    hgrepos = []
    issue_num = 27838
    keywords = ['patch', 'buildbot']
    message_count = 21.0
    messages = ['273444', '273640', '273641', '276272', '276273', '276278', '277498', '277548', '277599', '278857', '278870', '279490', '279594', '279596', '280567', '280570', '280837', '291062', '295266', '295486', '321502']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'Claudiu.Popa', 'berker.peksag', 'serhiy.storchaka', 'koobs', 'vajrasky']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27838'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @vstinner
    Copy link
    Member Author

    http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Debug%203.x/builds/940/steps/test/logs/stdio

    ======================================================================
    ERROR: test_chown (test.test_os.ChownFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_os.py", line 1138, in test_chown
        os.chown(support.TESTFN, uid, gid_1)
    PermissionError: [Errno 1] Operation not permitted: '@test_12983_tmp'

    @vstinner vstinner added the tests Tests in the Lib/test dir label Aug 23, 2016
    @koobs
    Copy link

    koobs commented Aug 25, 2016

    This appears to have spontaneously resolved itself after build #957 after many failures. For reference, other builders on the same host were failing as well:

    AMD64 FreeBSD CURRENT Debug 3.5 (#567, #568)
    AMD64 FreeBSD CURRENT Non-Debug 3.5 (#8, #9)

    And possibly others (Didn't listed any that hadn't failed in the last 5 builds)

    According to my (non-expert) reading of the code, the test skips (or is supposed to skip) unless group count of uid is > 1.

    The group membership of the buildbot user this worker runs as is only 'buildbot' and on that basis wouldn't a skip expected?

    Open questions are:

    1. Why/how did it suddenly *start* failing. (I cant see any relevant commits at or around the time)
    2. Why/how did it suddenly stop failing (I made no worker/buildbot changes

    Nosy vajrasky (original unit test creator) and Claudiu (who reviewed), who might be able to shine a light on what might be going on

    @vstinner
    Copy link
    Member Author

    Hum, the test has a fail ratio somewhere near 1/5. It fails for 6 months,
    maybe longer.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Sep 13, 2016

    "According to my (non-expert) reading of the code, the test skips (or is supposed to skip) unless group count of uid is > 1.

    The group membership of the buildbot user this worker runs as is only 'buildbot' and on that basis wouldn't a skip expected?"

    The group here is not the group of buildbot user. The group here refers to all groups in the system.

    I am the creator of this test. I will investigate this issue but since I am not BSD user, it may take a while.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Sep 13, 2016

    "The group here is not the group of buildbot user. The group here refers to all groups in the system." -> I retract back this statement.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Sep 13, 2016

    The only way I can reproduce this in Linux (still downloading FreeBSD Current), is to remove user from the group before (I did it in different terminal) executing os.chown method to that specific group id.

    I am thinking to add more information in the exception message but let's wait until I finish downloading FreeBSD Current. Maybe I am luckier in BSD.

    @koobs
    Copy link

    koobs commented Sep 27, 2016

    This started failing once again on koobs-freebsd-current:

    ======================================================================
    ERROR: test_chown (test.test_os.ChownFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/python/3.x.koobs-freebsd-current/build/Lib/test/test_os.py", line 1168, in test_chown
        os.chown(support.TESTFN, uid, gid_1)
    PermissionError: [Errno 1] Operation not permitted: '@test_15221_tmp'

    Looking into this a bit further:

    buildbot defaults (via twisted) to a umask of 077 for the build worker build/ directory, resulting in build directories with permissions as follows:

    drwx------ 3 buildbot buildbot 4 May 17 12:36 2.7.koobs-freebsd-current
    drwx------ 3 buildbot buildbot 4 Aug 21 11:28 2.7.koobs-freebsd-current.nondebug

    Looking inside these directories to where TESTFN's (I think) are written, we find:

    [root@CURRENT-amd64:/usr/home/buildbot/python/2.7.koobs-freebsd-current/build/build] ls -la
    total 18
    drwx------ 8 buildbot buildbot 8 Sep 27 17:03 .
    drwx------ 19 buildbot buildbot 45 Sep 27 16:59 ..
    drwx------ 2 buildbot buildbot 68 Sep 27 16:59 lib.freebsd-12.0-CURRENT-amd64-2.7-pydebug
    drwx------ 2 buildbot buildbot 6 Sep 27 16:59 scripts-2.7
    drwx------ 4 buildbot buildbot 4 Sep 27 16:59 temp.freebsd-12.0-CURRENT-amd64-2.7-pydebug
    drwx------ 2 buildbot buildbot 2 Sep 27 16:59 test_python_45339
    drwx------ 2 buildbot buildbot 2 Sep 27 17:03 test_python_46278
    drwx------ 2 buildbot buildbot 2 Sep 27 17:03 test_python_47336

    Does this explain the "Operation not permitted" error above?

    If so, the question is, why/how is this test being run if it requires an environment within which two groups can read and/or write files?

    Again looking at group membership for the buildbot user on the worker host in question (koobs-freebsd-current), the user only has a single group:

    # groups buildbot
    buildbot

    @berkerpeksag
    Copy link
    Member

    Perhaps running a script like below on the host would help to identify the problem?

    import getpass
    import grp
    import pprint
    
    pprint.pprint(getpass.getuser())
    pprint.pprint([(g.gr_gid, g.gr_name) for g in grp.getgrall()])
    pprint.pprint([(g.gr_gid, g.gr_name, g.gr_mem) for g in grp.getgrall() if getpass.getuser() in g.gr_mem])

    @berkerpeksag berkerpeksag added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Sep 27, 2016
    @koobs
    Copy link

    koobs commented Sep 28, 2016

    Attach file with test results.

    It's worth mentioning that these results may (or may not) be different than output when running under the process id started by twistd, which is executed by root (startup script), and results in the following command:

    buildbot 9627 0.0 0.3 116200 11112 - I Thu17 4:51.87 /usr/local/bin/python2.7 /usr/local/bin/twistd --uid=buildbot --gid=buildbot --pidfile=/usr/home/buildbot/python/twistd.pid --python=/usr/home/buildbot/python/buildbot.tac

    @koobs
    Copy link

    koobs commented Oct 18, 2016

    Ping. All branches on the koobs-freebsd-current buildbot are still failing due to this issue

    I could recreate the entire worker environment from scratch, but:

    a) I'm not sure it will resolve the issue
    b) I'd rather fix the root cause

    @koobs
    Copy link

    koobs commented Oct 18, 2016

    All builders (branches) are failing, not just 3.x DEBUG. Failures also appear no longer random or intermittent

    @koobs koobs changed the title test_os.test_chown() random failure on "AMD64 FreeBSD CURRENT Debug 3.x" buildbot test_os.test_chown() failure on koobs-freebsd-current Oct 18, 2016
    @serhiy-storchaka
    Copy link
    Member

    Proposed patch adds more verbose output in tests. Hope this will help to diagnose a problem.

    It also fixes test_chown_without_permission which can fail if run as a user next after root (uid=1 or like).

    @serhiy-storchaka
    Copy link
    Member

    Kubilay, could you please run tests with my patch?

    @koobs
    Copy link

    koobs commented Oct 28, 2016

    Serhiy, ah I thought the patch would be applied to say the 'custom' builder for this buildbot or the branch in general :)

    If not, I can test this in ~<= 2 days locally, though its worth noting that the issue quite likely not be reproducible outside of the Python buildbot environment.

    @koobs
    Copy link

    koobs commented Nov 11, 2016

    It appears something has changed in the past few weeks (with no changes made to the buildbot worker).

    Now only 3.5 branch (both debug and non-debug) builders are failing, except there are now many failing tests, many different errors, and test clean is also failing.

    Among others errors:

    FileExistsError: [Errno 17] File exists: '@test_42517_tmp/TEST1/SUB1/SUB11'
    PermissionError: [Errno 1] Operation not permitted: '@test_42517_tmp'
    IsADirectoryError: [Errno 21] Is a directory: '@test_42517_tmp'
    PermissionError: [Errno 13] Permission denied: 'SUB21'

    Attached is the full log.

    Looking into some of the test_xxxxx_tmp directories referenced, I see a SUB21 directory created with no permissions

    [root@CURRENT-amd64:/usr/home/buildbot/python/3.5.koobs-freebsd-current/build/build/test_python_85793/@test_85793_tmp/TEST1/SUB2.new] ls -la
    total 5
    drwx------ 3 buildbot buildbot 5 Nov 11 16:44 .
    drwx------ 3 buildbot buildbot 4 Nov 11 16:44 ..
    lrwx------ 1 buildbot buildbot 11 Nov 11 16:44 broken_link2 -> tmp3/broken
    lrwx------ 1 buildbot buildbot 103 Nov 11 16:44 link -> /usr/home/buildbot/python/3.5.koobs-freebsd-current/build/build/test_python_85793/@test_85793_tmp/TEST2
    d--------- 2 buildbot buildbot 3 Nov 11 16:44 SUB21

    This is observed across all test_python_xxxx directories:

    [root@CURRENT-amd64:/usr/home/buildbot/python/3.5.koobs-freebsd-current/build/build] find . -perm 000
    ./test_python_85793/@test_85793_tmp/TEST1/SUB2.new/SUB21
    ./test_python_96334/@test_96334_tmp/TEST1/SUB2.new/SUB21
    ./test_python_53788/@test_53788_tmp/TEST1/SUB2.new/SUB21
    ./test_python_56622/@test_56622_tmp/TEST1/SUB2.new/SUB21
    ./test_python_38482/@test_38482_tmp/TEST1/SUB2.new/SUB21
    ./test_python_58380/@test_58380_tmp/TEST1/SUB2.new/SUB21
    ./test_python_42517/@test_42517_tmp/TEST1/SUB2.new/SUB21

    @koobs
    Copy link

    koobs commented Nov 11, 2016

    I also note *one* failure on koobs-freebsd-9 on 3.x and 3.6 branches, identical errors:

    Nov 09 14:53 c27269c0d619... failure AMD64 FreeBSD 9.x 3.x bpo-5304 Failed test
    Nov 09 14:53 b671ac7ae620... failure AMD64 FreeBSD 9.x 3.6 #282 Failed test

    ======================================================================
    ERROR: test_chown (test.test_os.ChownFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/python/3.6.koobs-freebsd9/build/Lib/test/test_os.py", line 1200, in test_chown
        os.chown(support.TESTFN, uid, gid_1)
    PermissionError: [Errno 1] Operation not permitted: '@test_83654_tmp'
    
    ----------------------------------------------------------------------
    
    
    

    ======================================================================
    ERROR: test_chown (test.test_os.ChownFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/test_os.py", line 1200, in test_chown
        os.chown(support.TESTFN, uid, gid_1)
    PermissionError: [Errno 1] Operation not permitted: '@test_93884_tmp'

    I cannot explain what would cause persistent failure on one host and intermittent failure on another, except perhaps different host resources (cpu/mem) creating favourable conditions on one, but not the other.

    @koobs koobs changed the title test_os.test_chown() failure on koobs-freebsd-current test_os.test_chown() failure on koobs-freebsd-{current,9} Nov 11, 2016
    @koobs
    Copy link

    koobs commented Nov 15, 2016

    @serhiy

    I have noticed that the failure is reproducible in the buildbot workers only when startup (of buildbot) is invoked via sudo, and not when started on first-boot (rc runs as root).

    In both situations, twistd then drops privs to --uid=buildbot --gid=buildbot).

    However, I *cannot* reproduce the failure (python -m test.regrtest test_os) on a clean checkout and build of 3.x either under a normal user, *or* under sudo.

    I think we need some instrumentation added to test to see whats happening.

    We can add that instrumentation and also test your patch committed to a private hg branch using the 'custom' builder.

    I can also provide SSH access to the buildbot hosts.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 3, 2017

    Similar (or same?) failure on "x86 Gentoo Non-Debug with X 3.x":

    ======================================================================
    ERROR: test_chown (test.test_os.ChownFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/buildbot/buildarea/3.x.ware-gentoo-x86.nondebug/build/Lib/test/test_os.py", line 1218, in test_chown
        os.chown(support.TESTFN, uid, gid_1)
    PermissionError: [Errno 1] Operation not permitted: '@test_19914_tmp'

    http://buildbot.python.org/all/builders/x86%20Gentoo%20Non-Debug%20with%20X%203.x/builds/542/steps/test/logs/stdio

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 6, 2017

    New failure on AMD64 FreeBSD CURRENT Debug 3.5.

    http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Debug%203.5/builds/112/steps/test/logs/stdio

    ======================================================================
    ERROR: test_chown (test.test_os.ChownFileTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/python/3.5.koobs-freebsd-current/build/Lib/test/test_os.py", line 1209, in test_chown
        os.chown(support.TESTFN, uid, gid_1)
    PermissionError: [Errno 1] Operation not permitted: '@test_10547_tmp'

    @koobs
    Copy link

    koobs commented Jun 9, 2017

    as per msg280837, this beings to happen when I restart the buildbot worker via sudo (does not fail on initial startup, executed/invoked using the same script, which does not use sudo) and the environment the worker starts with appears to be relevant.

    Attached are the environments for the worker at initial startup (python-initial) and subsequent service restart using sudo (python-sudo, which fails tests).

    The delta between the two is:

    --- python-initial      2017-06-09 14:35:49.557098000 +1000
    +++ python-sudo 2017-06-09 14:35:13.665893000 +1000
    @@ -6,17 +6,29 @@
       BLOCKSIZE=K
       EDITOR=vi
       GROUP=buildbot
    -  HOME=/
    +  HOME=/root
       HOST=CURRENT-amd64
       HOSTTYPE=FreeBSD
    -  LOGNAME=buildbot
    +  LANG=en_US.UTF-8
    +  LC_ALL=en_US.UTF-8
    +  LC_CTYPE=en_US.UTF-8
    +  LOGNAME=root
       MACHTYPE=x86_64
    +  MAIL=/var/mail/root
       OSTYPE=FreeBSD
       PAGER=more
    -  PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin://bin
    +  PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin
    +  PS1=%B[%{%}%n%{%}%b@%B%{%}%m%b%{%}:%~%B]%b
       PWD=/usr/home/buildbot/python/3.6.koobs-freebsd-current.nondebug/build
    -  RC_PID=23
    +  RC_PID=22356
    +  SHELL=/bin/csh
       SHLVL=1
    -  USER=buildbot
    +  SUDO_COMMAND=/usr/local/etc/rc.d/buildslave restart
    +  SUDO_GID=1001
    +  SUDO_UID=1001
    +  SUDO_USER=koobs
    +  TERM=screen-256color
    +  USER=root
    +  USERNAME=root
       VENDOR=amd
      using PTY: False

    @vstinner
    Copy link
    Member Author

    I didn't see this failrue since one year, so I close the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants