Skip to content

core: robust Docker diagnostics using direct system checks (Fixes #34)#343

Merged
tfoote merged 2 commits intoosrf:mainfrom
T-Rajeev30:docker-permission-detection-v2
Jan 27, 2026
Merged

core: robust Docker diagnostics using direct system checks (Fixes #34)#343
tfoote merged 2 commits intoosrf:mainfrom
T-Rajeev30:docker-permission-detection-v2

Conversation

@T-Rajeev30
Copy link
Contributor

Summary

This PR improves Docker error handling by using direct, measurable
system checks instead of parsing Docker error strings.

It addresses the long-standing issue in #34 and incorporates the
feedback and intent from the discussion in #334.

Detection logic

The following failure modes are detected explicitly:

  • Missing Docker binary (PATH check)
  • Permission denied accessing /var/run/docker.sock
  • Docker daemon not responding (API ping failure)
  • Fallback for unexpected Docker API errors

All errors are reported via argparse.parser.error() for clean CLI exits.

Why this approach

  • Avoids fragile error-string parsing
  • Uses filesystem and API-level checks
  • Centralizes Docker diagnostics in core.py
  • Keeps CLI behavior consistent and predictable

Tested scenarios

  • Normal Docker operation
  • Docker socket permission denied
  • Docker daemon stopped (socket + service)
  • Recovery after restoring permissions and daemon

- Detect missing Docker binary via PATH check
- Detect daemon unavailability via direct client connection failure
- Detect permission issues via socket access checks
- Provide clear, actionable error messages
- Avoid fragile error-string parsing entirely
- Use argparse parser.error() for clean CLI exits

Tested scenarios:
- Normal operation
- Docker socket permission denied
- Docker daemon stopped
- Recovery after restoring permissions
@T-Rajeev30 T-Rajeev30 requested a review from tfoote as a code owner January 24, 2026 08:32
@T-Rajeev30
Copy link
Contributor Author

T-Rajeev30 commented Jan 24, 2026

Baseline: normal operation

rocker ubuntu:22.04 echo NORMAL_OK

image builds and container runs successfully

Permission denied: socket inaccessible

sudo chmod 600 /var/run/docker.sock
rocker ubuntu:22.04 echo SHOULD_FAIL
sudo chmod 666 /var/run/docker.sock

clean CLI error, permission issue detected, actionable fix suggested

Docker daemon stopped

sudo systemctl stop docker.socket docker.service
rocker ubuntu:22.04 echo SHOULD_FAIL
sudo systemctl start docker.socket docker.service

daemon unavailability detected, clear diagnostic shown, no traceback

Recovery after restoring system state

sudo chmod 666 /var/run/docker.sock
sudo systemctl start docker.socket docker.service
rocker ubuntu:22.04 echo FINAL_OK

normal operation restored

Copy link
Collaborator

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is a nicely reduced change set which should resolve #34 well. I have two small tweaks to request but otherwise this looks great.

Comment on lines +254 to +260
client = docker.from_env().api
except AttributeError:
# docker-py pre 2.0
docker_client = docker.Client()
# Validate that the server is available
docker_client.ping()
return docker_client
except (docker.errors.DockerException, docker.errors.APIError, ConnectionError) as ex:
raise DependencyMissing('Docker Client failed to connect to docker daemon.'
' Please verify that docker is installed and running.'
' As well as that you have permission to access the docker daemon.'
' This is usually by being a member of the docker group.'
' The underlying error was:\n"""\n%s\n"""\n' % ex)
client = docker.Client()

client.ping()
return client

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines appear to be unchanged except for a variable rename and some lost comments. Can you revert them to the original for better visibility.

"Fix:\n"
f" sudo usermod -aG docker {user}\n"
" log out and log back in"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move these into the error processing instead of running the checks every time.

The effort of things like the deferred import won't be useful if these debug paths are hit on startup every time. These three detectors could be run in the except block below before the raise on line 264 https://github.com/osrf/rocker/pull/343/changes#diff-2fe1df65d4c2adbc967d2d5fd30aff95ab3b70d956f1bc4b6f5ff3f6bf881b4aR264

@T-Rajeev30
Copy link
Contributor Author

Hi @tfoote this PR is ready from my side it replaces Docker error string parsing with direct system checks and fully resolves #34 as discussed in #334.
Happy to adjust approach if you’d prefer a different detection strategy or scope.

@tfoote
Copy link
Collaborator

tfoote commented Jan 26, 2026

Great, please see my inline requests above about small cleanups.

- Move Docker checks into exception block (only run on failure)
- Restore original variable names (docker_client not client)
- Restore original comments (docker-py pre 2.0, etc)
- Remove separate helper functions to reduce diff noise

This addresses both review comments on PR osrf#343:
1. Minimal diff by preserving unchanged code structure
2. Better performance by only running checks when needed

All error messages and user experience remain identical.
@T-Rajeev30
Copy link
Contributor Author

T-Rajeev30 commented Jan 27, 2026

@tfoote Thank you for the clear feedback! I've made both requested changes:

1. Restored original code structure:

  • Reverted variable rename (docker_client instead of client)
  • Restored all original comments (# docker-py pre 2.0, etc.)
  • Minimized diff to show only actual changes

2. Moved diagnostics to exception handler:

  • Checks now only run when Docker connection fails
  • No filesystem checks during normal startup
  • Removed separate helper functions for cleaner code

The new flow:

  1. Try Docker connection first
  2. Only if it fails run diagnostic checks
  3. Provide specific actionable error

All error messages and user experience remain identical. Ready for re-review!

Copy link
Collaborator

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reducing this down. As it's only in the exception processing I'm not going to insist on test coverage.

@tfoote tfoote merged commit 12ff255 into osrf:main Jan 27, 2026
4 checks passed
@T-Rajeev30 T-Rajeev30 requested a review from tfoote January 27, 2026 17:27
@T-Rajeev30
Copy link
Contributor Author

T-Rajeev30 commented Jan 27, 2026

@tfoote Thanks again for the review and for merging this — it was a great experience working through the feedback.

If there are any adjacent areas you’d recommend improving (around rocker or elsewhere in the repo) for someone interested in contributing longer-term, I’d be happy to help where it’s most useful.

@tfoote
Copy link
Collaborator

tfoote commented Jan 27, 2026

I'd suggest looking at the good first issues tag on issues: https://github.com/osrf/rocker/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22

A lower level one might be #35 which will need to find a way to capture the build output and present it. The underlying capture mechanism isn't clearly defined.

Or something a ilttle bit more about creating a helpful tool for users is integrating Dive #310

It looks like it could be a new default extension that exposes the capabilities of dive to rocker users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants