Replies: 11 comments 8 replies
-
|
Thanks for writing this up. My brief comments:
Re your questions (I have various PRs pending): Q1: How do you manage configurations across many users? Q2: How do you handle SSH access and permissions at scale? Q3: Do you centralize configurations, or keep them client-side? Q4: What are your main pain points and best practices? My lab is here: |
Beta Was this translation helpful? Give feedback.
-
|
Hi Simon, Thank you very much for your detailed feedback!
As you mentioned, one possible approach would be a relay/agent model where the client communicates only with the coordinator, and the coordinator forwards the requested operation to the relevant exporter-side agent. The exporter would then perform the local actions on the hardware. Labgrid already follows a partially similar approach, as both exporters and clients connect to the coordinator. In practice, however, the coordinator is mainly responsible for resource discovery and reservation, while the actual execution still often involves direct client-to-exporter communication. This model could be extended further by routing all actions through the coordinator, effectively turning exporters into backend agents. This would make labgrid easier to scale and easier to explain to non-expert users, as they would interact with "boards" or "targets" rather than exporter-specific infrastructure details. Overall, I agree with your observation, many setups right now seem to rely on custom patches to achieve a stable workflow. Without them, it can be quite difficult to run a robust shared environment. That’s why I think it’s very valuable to discuss these use cases and potential improvements together. |
Beta Was this translation helpful? Give feedback.
-
Thanks for writing this up, Ozan!
You're right that for single-exporter labs there's redundancy, and I assume most labs are like that. The split makes more sense when a place aggregates resources from multiple exporters (e.g. serial from one host, power switch from another), or when resources are moved between places without restarting the exporter. But for a typical lab where one exporter owns all the resources for a board, defining the same thing twice does feel awkward. One option might be to have optional, auto-creation of places from exporter groups, with explicit place definitions still supported for the multi-exporter case. That would simplify the common path without losing flexibility. In my lab I have ~40 boards (although a few are just QEMU), and roles let me use a command like PR: #1411 lab config:
Currently a single shared user ('gitlab-runner') on the 'runner' machine which can ssh to the exporter machine. It isn't great, but OK for my simple setup so far.
Yes I strongly agree :-) The current model where the client SSHes directly to the exporter is not ideal and one of the main reasons I haven't made my lab available for interactive access. A coordinator-mediated execution model would be a significant improvement from this POV. The hard part is probably the streaming use cases (console, video) where direct connection is currently used for latency. Those would need a forwarding mechanism through the coordinator. But for the request/response style operations (power, IO, file transfer) it should be easy enough to route through the coordinator. This would also make it possible to expose labgrid through a higher-level service without users needing direct SSH credentials at all. One other thing to mention is that I found it painful to have to fully specify each USB device. When something is broken, it it hard to figure out which USB hub/port to replug, etc. So I created this: It may be that my USB-heavy lab is unusual, not sure.
|
Beta Was this translation helpful? Give feedback.
-
Isn't this what the |
Beta Was this translation helpful? Give feedback.
-
I think you are right that The part I am still trying to separate is whether this also solves the execution model. As far as I understand it, the coordinator still mainly handles discovery, state, release, reservations, etc. Actual resource access or command execution still seems to happen through an SSH or proxy path from the client side. For example, operations such as bootstrap still appear to run through the client-side driver. With This also leaves some user and permission questions. A shared exporter user such as There is also a similar issue with the So I currently see two related but separate layers: network isolation, which Of course, this would be a larger change, and some use cases may still need a more direct path for latency and throughput. But request-response operations such as power control, file transfer, bootstrap, or some hardware actions might be interesting candidates. I may also be missing part of the intended model here, so I would be interested in your view. Do you think |
Beta Was this translation helpful? Give feedback.
-
I don't know. I'm a few steps behind you and Simon. I've just been using local setups so far, so I haven't had to deal with that issue yet. At BayLibre, we've been looking at using headscale/tailscale to make a virtual network for all exporters/coordinators/clients. This adds another layer of admin and potential problems though. In general, I've run most of the same issues/pain points listed in the OP. Points 1 and 2 I don't see as such a big deal since just one knowledgeable person has to set those up in the first place, then anyone can use them without having to set them up again. For points 3 and 5, the direction I have been going is putting as much as possible in the This also includes a Python file for strategy definitions. So for distributing, I've been looking at making a Python project and distributing it with git. This also includes a
Point 4 I haven't got to yet, but I am not looking forward to having to set up a user for everyone I want to share my exporters with. My thought here is to just create a Another pain point for me has been setting up the exporters. I would ideally like to have one exporter per system under test (reasoning being to minimize disruptions to other users when tweaking a config), which means lots of exporters. I like the idea of using docker containers for this. One thing I haven't solved there is how best to deal with hot-plugable devices (maybe something like this) and you still have to deal with udev rules on the host system, so it isn't quite a simple as spinning up a container. |
Beta Was this translation helpful? Give feedback.
-
|
Quick update: I've made the lab available for interactive remote access since my earlier comment, and a few things have shaken out that are relevant to this discussion.
Agreeing with @ozan956 - What I've set up since the previous comment I now have a working remote-access setup that addresses some (not all) of the pain points without needing protocol changes:
None of this changes the fundamental architecture (it's still SSH/TCP under the hood) but it gives a usable remote workflow today and per-user accountability. It also demonstrates the value of "clients only need one well-known endpoint" - which is essentially what coordinator-mediated execution would give us natively, without needing a VPN. Lab access for anyone who wants to try Minimal client setup is now small - no U-Boot tree required for read-only access:
Total client footprint is a few MB; no U-Boot tree, no buildman, no per-machine setup. Access is currently manual rather than self-service; could be opened up with OIDC if there's interest. Documented in -- Now I'm wondering what it would take to keep the environment in the exporter... |
Beta Was this translation helpful? Give feedback.
-
|
Hi David,
I believe the challenge begins when you need to manage or modify the setup over time. For example, whenever a new board is introduced, all clients need to update their environments accordingly if they want to use it. Centralizing this responsibility by having more on exporter side would be valuable. It allows a smaller group of “expert” users to handle configurations, while others can focus on using the system. This separation can scale well within a team.
This approach certainly improves usability, but it still has some limitations. For instance, if your environment file defines multiple targets and you attempt to acquire a single one using: labgrid-client -c env.yaml -p PLACE1 acquireLabgrid initially uses This behavior suggests that the environment file is fundamentally designed to be client-oriented and tailored to specific use cases, rather than serving as a shared, scalable abstraction. I further think that environment files are designed to exist as separate instances for each use case for each user (multiple envs in every local).
I had similar thoughts initially. Using a shared user account (e.g., a labgrid-client -c env.yaml -p PLACE1 scp images/u-boot-spl :u-boot-splthe file is copied to a default location such as To mitigate this, users would need to follow strict conventions, such as using separate directories (e.g.,
That's an interesting idea. Thanks for bringing this up. I will give it more thought. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Simon,
So we would be moving from:
to:
We would need a significant design change. On the client side, the environment defines the "targets." These targets are bound to "places," which are created by clients and managed by the coordinator. 1 - We would need to move "targets" to the exporter. The client would then need to fetch the target, so the 2 - The tricky part would be the strategies. There will likely be multiple design approaches to discuss. 3 - This change would also remove the ability to use multiple devices from multiple exporters within a single place. However, I am not sure whether there is a real use case where someone needs multiple resources from different exporters in one place, so this may not be a significant issue. There are likely many other points to consider. In the end, it is doable, but it will be challenging. |
Beta Was this translation helpful? Give feedback.
-
Yes this is not easy. I have been prototyping a way for exporters to send their environment file to the coordinator, which then merges them into a single file, which it then provides to clients. Once I've tested it a bit more I will see if I can create a clean PR. |
Beta Was this translation helpful? Give feedback.
-
|
Disclaimer: I converted this into a discussion, unfortunately the conversion does not take the answer order into account… Answering the questions: 1. Resource Discovery ConfusionThe seperation between resources and places was an intentional choice since Pengutronix operates a lab with removable hardware but fixed resources for places. This means we have to separate the physical connections (resources) from the actual requirement for the hardware (place) and need to be able to assemble them as needed. If you have a 1:1 dut to exporter configuration, you can take advantage of the matches as well by allocating all resources of a group to a place via a wildcard match, i.e. But mostly your users should not care about resources, they should care about places. 2. Redundancy Between Groups and PlacesWe envisioned the resources to be aggregated by physical physical, i.e. if you have a rackspace with serial connection, ethernet and power, these would all be under the same group. Aggregating all possible USB resources for a USB Hub is also possible, even if it just helps to make the hierarchy clearer. As explained above, with wildcard matches the distinction between groups and places is required. 3. Environment on Client SideAn environment should only be required if you want to run a set of scripts or a test suite on the device, in that case it should be kept with the respective code. Most interactive usage via the labgrid client should not require the setup of an environment configuration file. 4. SSH and Access Management ComplexityWe need an authenticated connection to the exporter to execute commands and SSH was a natural choice for that. Rolling out the SSH configuration for users depends on your exporter DNS setup, but can ideally be a three line configuration i.e.:
5. Configuration Management ProblemWhile clients require a working SSH configuration, everything else should not be required for interactive usage of labgrid via labgrid-client. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I would like to share my current experience and some challenges I encountered while designing and scaling a labgrid setup. My intention is to start a discussion and explore how we can improve usability, especially for larger deployments.
Target Setup
The setup I am working on is designed for approximately 40 clients and 5 exporters, distributed across multiple locations.
These users are not only embedded Linux engineers. They include board designers, electrical engineers, and application-level developers. Therefore, it is important that they can use the system without needing to understand labgrid internals in detail.
Current Client Experience
Let’s walk through the typical flow from a client perspective.
1. Resource Discovery Confusion
After installation, the user tries to list available resources and sees something like:
At this point, the concept of “groups” becomes confusing.
This creates an unnecessary learning step and introduces ambiguity.
2. Redundancy Between Groups and Places
There are two common ways to structure groups:
a) By hardware type:
This structure provides no information about which resources belong to which device.
b) By device:
This is much clearer and easier to understand.
However, even when groups are already well-structured, users still need to redefine the same mapping again as “places” on the client side.
This creates redundancy, additional configuration effort and confusion for new users.
3. Environment on Client Side
Another major issue is the need for an environment on the client side.
Each client is expected to:
In practice:
This leads to duplication, inconsistency and maintenance overhead. For a setup with many users, this is not scalable.
4. SSH and Access Management Complexity
Each client also needs:
NetworkServicewith a usernameThis introduces two challenges:
5. Configuration Management Problem
Effectively, each client needs a local configuration database, including the labgrid environment configuration and SSH configuration. Managing this becomes especially difficult for users who are not familiar with these concepts.
Summary of Current Workflow
Today, a typical client must:
This is a complex process, especially for non-expert users.
Proposed Improvements
To simplify the experience and improve scalability, I suggest the following:
1. Simplify Resource Abstraction
2. Centralize Configuration
3. Improve SSH Handling
4. Reduce Client Responsibilities
The ideal client workflow should be:
No deep knowledge of labgrid internals should be required.
Conclusion
The goal is to make labgrid accessible and easy to use for all clients, not only those familiar with its internal architecture.
By reducing configuration overhead and centralizing responsibilities, we can improve usability, reduce maintenance effort and scale more effectively across teams and locations
I would especially like to hear from teams working on similar large-scale setups.
It would be very valuable to discuss these use cases and experiences.
Looking forward to your feedback.
Beta Was this translation helpful? Give feedback.
All reactions