Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TestCreateServiceSecretFileMode, TestCreateServiceConfigFileMode #42960

Merged

Commits on Oct 27, 2021

  1. Fix race in TestCreateServiceSecretFileMode, TestCreateServiceConfigF…

    …ileMode
    
    Looks like this test was broken from the start, and fully relied on a race
    condition. (Test was added in 65ee7ff)
    
    The problem is in the service's command: `ls -l /etc/config || /bin/top`, which
    will either:
    
    - exit immediately if the secret is mounted correctly at `/etc/config` (which it should)
    - keep running with `/bin/top` if the above failed
    
    After the service is created, the test enters a race-condition, checking for 1
    task to be running (which it ocassionally is), after which it proceeds, and looks
    up the list of tasks of the service, to get the log output of `ls -l /etc/config`.
    
    This is another race: first of all, the original filter for that task lookup did
    not filter by `running`, so it would pick "any" task of the service (either failed,
    running, or "completed" (successfully exited) tasks).
    
    In the meantime though, SwarmKit kept reconciling the service, and creating new
    tasks, so even if the test was able to get the ID of the correct task, that task
    may already have been exited, and removed (task-limit is 5 by default), so only
    if the test was "lucky", it would be able to get the logs, but of course, chances
    were likely that it would be "too late", and the task already gone.
    
    The problem can be easily reproduced when running the steps manually:
    
        echo 'CONFIG' | docker config create myconfig -
    
        docker service create --config source=myconfig,target=/etc/config,mode=0777 --name myservice busybox sh -c 'ls -l /etc/config || /bin/top'
    
    The above creates the service, but it keeps retrying, because each task exits
    immediately (followed by SwarmKit reconciling and starting a new task);
    
        mjntpfkkyuuc1dpay4h00c4oo
        overall progress: 0 out of 1 tasks
        1/1: ready     [======================================>            ]
        verify: Detected task failure
        ^COperation continuing in background.
        Use `docker service ps mjntpfkkyuuc1dpay4h00c4oo` to check progress.
    
    And checking the tasks for the service reveals that tasks exit cleanly (no error),
    but _do exit_, so swarm just keeps up reconciling, and spinning up new tasks;
    
        docker service ps myservice --no-trunc
        ID                          NAME              IMAGE                                                                                    NODE             DESIRED STATE   CURRENT STATE                     ERROR     PORTS
        2wmcuv4vffnet8nybg3he4v9n   myservice.1       busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Ready           Ready less than a second ago
        5p8b006uec125iq2892lxay64    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete less than a second ago
        k8lpsvlak4b3nil0zfkexw61p    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete 6 seconds ago
        vsunl5pi7e2n9ol3p89kvj6pn    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete 11 seconds ago
        orxl8b6kt2l6dfznzzd4lij4s    \_ myservice.1   busybox:latest@sha256:f7ca5a32c10d51aeda3b4d01c61c6061f497893d7f6628b92f822f7117182a57   docker-desktop   Shutdown        Complete 17 seconds ago
    
    This patch changes the service's command to `sleep`, so that a successful task
    (after successfully performing `ls -l /etc/config`) continues to be running until
    the service is deleted. With that change, the service should (usually) reconcile
    immediately, which removes the race condition, and should also make it faster :)
    
    This patch changes the tests to use client.ServiceLogs() instead of using the
    service's tasklist to directly access container logs. This should also fix some
    failures that happened if some tasks failed to start before reconciling, in which
    case client.TaskList() (with the current filters), could return more tasks than
    anticipated (as it also contained the exited tasks);
    
        === RUN   TestCreateServiceSecretFileMode
            create_test.go:291: assertion failed: 2 (int) != 1 (int)
        --- FAIL: TestCreateServiceSecretFileMode (7.88s)
        === RUN   TestCreateServiceConfigFileMode
            create_test.go:355: assertion failed: 2 (int) != 1 (int)
        --- FAIL: TestCreateServiceConfigFileMode (7.87s)
    
    Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
    thaJeztah committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    13cff6d View commit details
    Browse the repository at this point in the history