Challenges and Improvements for Large-Scale Labgrid Deployments and Client Usability #1861

ozan956 · 2026-04-27T10:40:49Z

ozan956
Apr 27, 2026

Hi everyone,

I would like to share my current experience and some challenges I encountered while designing and scaling a labgrid setup. My intention is to start a discussion and explore how we can improve usability, especially for larger deployments.

Target Setup

The setup I am working on is designed for approximately 40 clients and 5 exporters, distributed across multiple locations.

These users are not only embedded Linux engineers. They include board designers, electrical engineers, and application-level developers. Therefore, it is important that they can use the system without needing to understand labgrid internals in detail.

Current Client Experience

Let’s walk through the typical flow from a client perspective.

1. Resource Discovery Confusion

After installation, the user tries to list available resources and sees something like:

adi-test/group-1/NetworkSerialPort
adi-test/group-1/NetworkUSBDebugger
adi-test/group-1/NetworkPowerPort
adi-test/group-2/NetworkSerialPort
adi-test/group-2/NetworkUSBDebugger
adi-test/group-2/NetworkPowerPort

At this point, the concept of “groups” becomes confusing.

Users may assume they can directly use these group names.
However, they later learn that resources must be mapped into "places" before use.

This creates an unnecessary learning step and introduces ambiguity.

2. Redundancy Between Groups and Places

There are two common ways to structure groups:

a) By hardware type:

adi-test/consoles/NetworkSerialPort
adi-test/debuggers/NetworkUSBDebugger
adi-test/power-ports/NetworkPowerPort

This structure provides no information about which resources belong to which device.

b) By device:

adi-test/SC598_EZKIT-01/NetworkSerialPort
adi-test/SC598_EZKIT-01/NetworkPowerPort
adi-test/SC598_EZKIT-01/NetworkUSBDebugger

This is much clearer and easier to understand.

However, even when groups are already well-structured, users still need to redefine the same mapping again as “places” on the client side.

This creates redundancy, additional configuration effort and confusion for new users.

3. Environment on Client Side

Another major issue is the need for an environment on the client side.

Each client is expected to:

Create and maintain their own environment
Define tool configurations (e.g., OpenOCD)
Specify coordinator settings
Configure access details

In practice:

These environments are mostly identical for all users
Boards and setups are rarely changing
Yet every user maintains their own copy

This leads to duplication, inconsistency and maintenance overhead. For a setup with many users, this is not scalable.

4. SSH and Access Management Complexity

Each client also needs:

SSH access to exporters
Proper SSH configuration
A correctly defined NetworkService with a username

This introduces two challenges:

SSH setup must be repeated for each user and exporter
Environment files become user-specific and this reduces reusability

5. Configuration Management Problem

Effectively, each client needs a local configuration database, including the labgrid environment configuration and SSH configuration. Managing this becomes especially difficult for users who are not familiar with these concepts.

Summary of Current Workflow

Today, a typical client must:

Install labgrid
Understand concepts like "groups" and "places"
Identify and acquire a place
Create or obtain an environment configuration
Configure SSH access
Customize settings (e.g., username for NetworkService)
Finally connect and use the board

This is a complex process, especially for non-expert users.

Proposed Improvements

To simplify the experience and improve scalability, I suggest the following:

1. Simplify Resource Abstraction

Make the distinction between groups and places clearer, or unify them
Allow exporters to define ready-to-use “places"
Clients should not need to remap resources manually

2. Centralize Configuration

Move environment configuration to the exporter side
Maintain a shared configuration for all users
Avoid duplication across clients

3. Improve SSH Handling

Simplify or centralize SSH configuration
Reduce the need for per-user customization
Align CLI and pytest workflows

4. Reduce Client Responsibilities

The ideal client workflow should be:

Install labgrid
Acquire a place
Connect to debugger or console
Use the device

No deep knowledge of labgrid internals should be required.

Conclusion

The goal is to make labgrid accessible and easy to use for all clients, not only those familiar with its internal architecture.

By reducing configuration overhead and centralizing responsibilities, we can improve usability, reduce maintenance effort and scale more effectively across teams and locations

I would especially like to hear from teams working on similar large-scale setups.

How do you manage configurations across many users?
How do you handle SSH access and permissions at scale?
Do you centralize configurations, or keep them client-side?
What are your main pain points and best practices?

It would be very valuable to discuss these use cases and experiences.

Looking forward to your feedback.

sjg20 · 2026-04-28T18:46:48Z

sjg20
Apr 28, 2026

Thanks for writing this up. My brief comments:

I tend to use roles, which are kind-of a superset of places. I haven't tried to use groups.
Yes the separate configuration doesn't seem well motivated. In my case I have it on an NFS share, but it isn't ideal. I agree it should be unified, e.g. by allowing the 'environment' to go in the export file, with perhaps some overrides in the client. In my case I use a single environment file for the whole lab, which has required patching Labgrid.
That would be nice! What are you suggesting? So sort of relay on the exporter?
Agreed

Re your questions (I have various PRs pending):

Q1: How do you manage configurations across many users?
I use an NFS share which all machines can see

Q2: How do you handle SSH access and permissions at scale?
Just copying credentials around...not great

Q3: Do you centralize configurations, or keep them client-side?
The config is centralised - with the export and environment yaml (and a list of places) in the same NFS directory.

Q4: What are your main pain points and best practices?
Getting traction on PRs. For now I am just doing my own thing.

My lab is here:
https://lab.u-boot.org/

0 replies

ozan956 · 2026-05-01T14:18:57Z

ozan956
May 1, 2026
Author

Hi Simon,

Thank you very much for your detailed feedback!

On roles vs. groups/places
What I was trying to highlight is that we currently have groups on the exporter side and places on the client side, and in practice these are usually initialized to match each other. Because of that, having a single naming or classification mechanism (ideally defined on the exporter side) could simplify things significantly. It should be possible to configure everything on the exporter's side. I would be interested to hear more about how you use roles in your setup.
On configuration and access management
Yes, I assume you had to patch certain aspects (e.g., around network service username requirements). Are you effectively configuring all machines in the lab for a single user, or how are you structuring access? Did you propagate your own user across all systems, or do you treat each machine as an independent "client"? As you also mentioned, this becomes quite challenging to manage, especially across multiple offices and labs.
On client–exporter interaction
The specific problem I would like to solve is the direct communication path between the client and the exporter. In my view, the client should not need to reach the exporter directly, nor know exporter hostnames, SSH users, ports, or local permission details.

As you mentioned, one possible approach would be a relay/agent model where the client communicates only with the coordinator, and the coordinator forwards the requested operation to the relevant exporter-side agent. The exporter would then perform the local actions on the hardware.

Labgrid already follows a partially similar approach, as both exporters and clients connect to the coordinator. In practice, however, the coordinator is mainly responsible for resource discovery and reservation, while the actual execution still often involves direct client-to-exporter communication.

This model could be extended further by routing all actions through the coordinator, effectively turning exporters into backend agents. This would make labgrid easier to scale and easier to explain to non-expert users, as they would interact with "boards" or "targets" rather than exporter-specific infrastructure details.

Overall, I agree with your observation, many setups right now seem to rely on custom patches to achieve a stable workflow. Without them, it can be quite difficult to run a robust shared environment. That’s why I think it’s very valuable to discuss these use cases and potential improvements together.

0 replies

sjg20 · 2026-05-02T12:01:01Z

sjg20
May 2, 2026

Hi Simon,

Thank you very much for your detailed feedback!

Thanks for writing this up, Ozan!

On roles vs. groups/places
What I was trying to highlight is that we currently have groups on the exporter side and places on the client side, and in practice these are usually initialized to match each other. Because of that, having a single naming or classification mechanism (ideally defined on the exporter side) could simplify things significantly. It should be possible to configure everything on the exporter's side. I would be interested to hear more about how you use roles in your setup.

You're right that for single-exporter labs there's redundancy, and I assume most labs are like that. The split makes more sense when a place aggregates resources from multiple exporters (e.g. serial from one host, power switch from another), or when resources are moved between places without restarting the exporter. But for a typical lab where one exporter owns all the resources for a board, defining the same thing twice does feel awkward.

One option might be to have optional, auto-creation of places from exporter groups, with explicit place definitions still supported for the multi-exporter case. That would simplify the common path without losing flexibility.

In my lab I have ~40 boards (although a few are just QEMU), and roles let me use a command like ub-int bbb that doesn't need to know the underlying place name. Roles let me have multiple roles share a place (e.g. different U-Boot builds for the same board, such as ff3399 and vbe).

PR: #1411

lab config:

e3b7760

On configuration and access management
Yes, I assume you had to patch certain aspects (e.g., around network service username requirements). Are you effectively configuring all machines in the lab for a single user, or how are you structuring access? Did you propagate your own user across all systems, or do you treat each machine as an independent "client"? As you also mentioned, this becomes quite challenging to manage, especially across multiple offices and labs.

Currently a single shared user ('gitlab-runner') on the 'runner' machine which can ssh to the exporter machine. It isn't great, but OK for my simple setup so far.

On client–exporter interaction
The specific problem I would like to solve is the direct communication path between the client and the exporter. In my view, the client should not need to reach the exporter directly, nor know exporter hostnames, SSH users, ports, or local permission details.

As you mentioned, one possible approach would be a relay/agent model where the client communicates only with the coordinator, and the coordinator forwards the requested operation to the relevant exporter-side agent. The exporter would then perform the local actions on the hardware.

Labgrid already follows a partially similar approach, as both exporters and clients connect to the coordinator. In practice, however, the coordinator is mainly responsible for resource discovery and reservation, while the actual execution still often involves direct client-to-exporter communication.

This model could be extended further by routing all actions through the coordinator, effectively turning exporters into backend agents. This would make labgrid easier to scale and easier to explain to non-expert users, as they would interact with "boards" or "targets" rather than exporter-specific infrastructure details.

Overall, I agree with your observation, many setups right now seem to rely on custom patches to achieve a stable workflow. Without them, it can be quite difficult to run a robust shared environment. That’s why I think it’s very valuable to discuss these use cases and potential improvements together.

Yes I strongly agree :-) The current model where the client SSHes directly to the exporter is not ideal and one of the main reasons I haven't made my lab available for interactive access. A coordinator-mediated execution model would be a significant improvement from this POV.

The hard part is probably the streaming use cases (console, video) where direct connection is currently used for latency. Those would need a forwarding mechanism through the coordinator. But for the request/response style operations (power, IO, file transfer) it should be easy enough to route through the coordinator.

This would also make it possible to expose labgrid through a higher-level service without users needing direct SSH credentials at all.

One other thing to mention is that I found it painful to have to fully specify each USB device. When something is broken, it it hard to figure out which USB hub/port to replug, etc. So I created this:

#1854

It may be that my USB-heavy lab is unusual, not sure.

Simon

0 replies

dlech · 2026-05-02T17:39:37Z

dlech
May 2, 2026

This model could be extended further by routing all actions through the coordinator,

Isn't this what the labgrid-exporter --isolated option does?

1 reply

Emantor May 6, 2026
Maintainer

No, labgrid-exporter --isolated instructs labgrid to open a SSH tunnel to the exporter since it is not directly reachable on the local network (therefore isolated).

ozan956 · 2026-05-02T19:34:55Z

ozan956
May 2, 2026
Author

This model could be extended further by routing all actions through the coordinator,

Isn't this what the labgrid-exporter --isolated option does?

I think you are right that labgrid-exporter --isolated is an important part of the solution, and it may already cover many practical cases. My understanding is that --isolated mainly improves the network access model. That is very useful for isolation.

The part I am still trying to separate is whether this also solves the execution model. As far as I understand it, the coordinator still mainly handles discovery, state, release, reservations, etc. Actual resource access or command execution still seems to happen through an SSH or proxy path from the client side.

For example, operations such as bootstrap still appear to run through the client-side driver. With --isolated, the SSH path may go through the proxy or exporter instead of a directly reachable resource host, but it is still not quite the same as the coordinator dispatching the action to an exporter.

This also leaves some user and permission questions. A shared exporter user such as labgrid can make setup simpler, but it can make auditing and per-user isolation weaker. For example, multiple users writing as the same Unix user could also create overwrite, or cleanup problems.

There is also a similar issue with the --isolated when user identity is passed through end to end. In that setup, the client connects to the proxy as their own user and then reaches the exporter as the same user. This is good for traceability, but it still means every exporter needs the relevant users and keys. As the number of users and exporters grows, the key distribution and permission management problem remains. So both approaches have trade-offs.

So I currently see two related but separate layers: network isolation, which --isolated addresses, and coordinator-mediated action execution, which would be a different model. In that second model, clients would send action requests to the coordinator, and the coordinator would dispatch them to the relevant exporter. That could reduce the need for client SSH access to exporters and make the user-facing workflow simpler.

Of course, this would be a larger change, and some use cases may still need a more direct path for latency and throughput. But request-response operations such as power control, file transfer, bootstrap, or some hardware actions might be interesting candidates.

I may also be missing part of the intended model here, so I would be interested in your view. Do you think --isolated is already enough for this kind of multi-user setup, or do you see value in adding some form of coordinator-mediated execution for (selected or every) actions?

0 replies

dlech · 2026-05-02T22:13:13Z

dlech
May 2, 2026

I may also be missing part of the intended model here, so I would be interested in your view. Do you think --isolated is already enough for this kind of multi-user setup, or do you see value in adding some form of coordinator-mediated execution for (selected or every) actions?

I don't know. I'm a few steps behind you and Simon. I've just been using local setups so far, so I haven't had to deal with that issue yet.

At BayLibre, we've been looking at using headscale/tailscale to make a virtual network for all exporters/coordinators/clients. This adds another layer of admin and potential problems though.

In general, I've run most of the same issues/pain points listed in the OP.

Points 1 and 2 I don't see as such a big deal since just one knowledgeable person has to set those up in the first place, then anyone can use them without having to set them up again.

For points 3 and 5, the direction I have been going is putting as much as possible in the .yaml config file. For example, you can put the place name and the coordinator hostname in there, so one only has to export LG_ENV=my-config.yaml and then they can use labgrid-client without extra args or other environment variables.

This also includes a Python file for strategy definitions. So for distributing, I've been looking at making a Python project and distributing it with git. This also includes a pyproject.toml file for setting up a venv with labgrid. So steps for letting someone else use it would be:

git clone ...
cd my-project
uv sync (or whatever your Python tool of choice is)
. .venv/bin/activate
export LG_ENV=my-config.yaml

Point 4 I haven't got to yet, but I am not looking forward to having to set up a user for everyone I want to share my exporters with. My thought here is to just create a labgrid user instead and have everyone use that. And it still requires getting a public key from every user and installing it on the exporter.

Another pain point for me has been setting up the exporters. I would ideally like to have one exporter per system under test (reasoning being to minimize disruptions to other users when tweaking a config), which means lots of exporters. I like the idea of using docker containers for this. One thing I haven't solved there is how best to deal with hot-plugable devices (maybe something like this) and you still have to deal with udev rules on the host system, so it isn't quite a simple as spinning up a container.

1 reply

Emantor May 6, 2026
Maintainer

Point 4 I haven't got to yet, but I am not looking forward to having to set up a user for everyone I want to share my exporters with. My thought here is to just create a labgrid user instead and have everyone use that. And it still requires getting a public key from every user and installing it on the exporter.

SSH access for each user is required since labgrid currently does not provide any authentication for users.

Another pain point for me has been setting up the exporters. I would ideally like to have one exporter per system under test (reasoning being to minimize disruptions to other users when tweaking a config), which means lots of exporters.

While users will be disturbed by an exporter restart since ser2net often needs to be restarted, the coordinator will reaquire the resources as they appear to minimize user disruption.

I like the idea of using docker containers for this. One thing I haven't solved there is how best to deal with hot-plugable devices (maybe something like this) and you still have to deal with udev rules on the host system, so it isn't quite a simple as spinning up a container.

The docker containers were an external contribution that is especially useful for the coordinator, but for exporters udev access is almost always required which makes containers not really feasible for them.

sjg20 · 2026-05-03T22:31:05Z

sjg20
May 3, 2026

Quick update: I've made the lab available for interactive remote access since my earlier comment, and a few things have shaken out that are relevant to this discussion.

Isn't this what the labgrid-exporter --isolated option does?

Agreeing with @ozan956 - --isolated helps with network isolation (proxying through the exporter) but doesn't change the execution model. The client still drives drivers locally and SSHes to the exporter to run individual commands. Per-user identity propagation, audit trails, and exposing labgrid as a service all still need true coordinator-mediated execution.

What I've set up since the previous comment

I now have a working remote-access setup that addresses some (not all) of the pain points without needing protocol changes:

Network: self-hosted headscale (open-source Tailscale control server) on a public host. Lab + clients all join the tailnet, which gives a stable IP that works through NAT. No port forwarding, no per-client config.
Auth: a single shared labgrid UNIX account on the lab host, with each user authenticating via their own SSH key. Each authorized_keys entry forces a small wrapper script (via command="...") that refuses interactive shells and logs every command to an audit file, attributed to the user. Revocation = remove one line.
Serial audit: small labgrid changes I want to discuss separately upstream:
- New optional user field in ExporterSetAcquiredRequest so the exporter knows who acquired a place
- LG_SERIAL_TRACE_DIR env var that, when set, makes the exporter pass trace-both to the per-acquire ser2net. Result: per-board, per-user serial trace files capturing every byte both ways

None of this changes the fundamental architecture (it's still SSH/TCP under the hood) but it gives a usable remote workflow today and per-user accountability. It also demonstrates the value of "clients only need one well-known endpoint" - which is essentially what coordinator-mediated execution would give us natively, without needing a VPN.

Lab access for anyone who wants to try

Minimal client setup is now small - no U-Boot tree required for read-only access:

Install tailscale and email me your desired username + SSH public key
I send back a tailscale preauth key
sudo tailscale up --login-server=https://headscale.u-boot.org --auth-key=<key>
Install labgrid from U-Boot integration [WIP] #1411 (the U-Boot integration branch - the lab's drivers come from this PR, and it ships an example_env.cfg already configured for the kea coordinator with all ~50 boards as RemotePlace entries):
```
git clone https://github.com/labgrid-project/labgrid.git
cd labgrid && gh pr checkout 1411
pip install --user .
```

Three environment variables:

export PATH=$HOME/.local/bin:$HOME/labgrid/contrib/u-boot:$PATH
export LG_COORDINATOR=100.64.0.1:20408
export LG_ENV=$HOME/labgrid/contrib/u-boot/example_env.cfg

labgrid-client console -p bbb (or any of ~50 boards), or ub-int -T bbb for the full strategy without rebuilding U-Boot.

Total client footprint is a few MB; no U-Boot tree, no buildman, no per-machine setup. Access is currently manual rather than self-service; could be opened up with OIDC if there's interest. Documented in contrib/u-boot/index.rst of #1411.

--

Now I'm wondering what it would take to keep the environment in the exporter...

0 replies

ozan956 · 2026-05-05T12:23:05Z

ozan956
May 5, 2026
Author

Hi David,

Points 1 and 2 I don't see as such a big deal since just one knowledgeable person has to set those up in the first place, then anyone can use them without having to set them up again.

I believe the challenge begins when you need to manage or modify the setup over time. For example, whenever a new board is introduced, all clients need to update their environments accordingly if they want to use it.

Centralizing this responsibility by having more on exporter side would be valuable. It allows a smaller group of “expert” users to handle configurations, while others can focus on using the system. This separation can scale well within a team.

For points 3 and 5, the direction I have been going is putting as much as possible in the .yaml config file...

This approach certainly improves usability, but it still has some limitations. For instance, if your environment file defines multiple targets and you attempt to acquire a single one using:

labgrid-client -c env.yaml -p PLACE1 acquire

Labgrid initially uses -p PLACE1 to select the matching role from the environment. However, during the acquire phase, it switches to operating on all RemotePlace entries defined in the environment, rather than only PLACE1. As a result, it becomes impossible to acquire a single place in isolation without bypassing the environment file and interacting directly with the coordinator.

This behavior suggests that the environment file is fundamentally designed to be client-oriented and tailored to specific use cases, rather than serving as a shared, scalable abstraction. I further think that environment files are designed to exist as separate instances for each use case for each user (multiple envs in every local).

Point 4 I haven't got to yet...

I had similar thoughts initially. Using a shared user account (e.g., a labgrid user) for all clients. However, this introduces another issue. When using commands like:

labgrid-client -c env.yaml -p PLACE1 scp images/u-boot-spl :u-boot-spl

the file is copied to a default location such as ~/u-boot-spl on the remote system. If multiple users share the same account and perform similar operations, they may unintentionally overwrite each other’s files.

To mitigate this, users would need to follow strict conventions, such as using separate directories (e.g., /home/labgrid/user1/). However, this shifts the burden onto users and introduces potential for human error, as there is no built-in isolation mechanism.

Another pain point for me has been setting up the exporters...

That's an interesting idea. Thanks for bringing this up. I will give it more thought.

0 replies

ozan956 · 2026-05-05T12:51:23Z

ozan956
May 5, 2026
Author

Hi Simon,

Now I'm wondering what it would take to keep the environment in the exporter...

So we would be moving from:

exporter: knows local hardware and exports resources
coordinator: knows places and matching
client: builds targets, drivers, and strategies

to:

exporter: knows hardware, target configuration, drivers, and strategies
coordinator: still manages place ownership and matching
client: becomes much thinner and mostly consumes what the exporter/coordinator provides

We would need a significant design change.

On the client side, the environment defines the "targets." These targets are bound to "places," which are created by clients and managed by the coordinator.

1 - We would need to move "targets" to the exporter. The client would then need to fetch the target, so the _get_target() logic would require a major redesign.

2 - The tricky part would be the strategies. There will likely be multiple design approaches to discuss.

3 - This change would also remove the ability to use multiple devices from multiple exporters within a single place. However, I am not sure whether there is a real use case where someone needs multiple resources from different exporters in one place, so this may not be a significant issue.

There are likely many other points to consider. In the end, it is doable, but it will be challenging.

0 replies

sjg20 · 2026-05-05T14:08:00Z

sjg20
May 5, 2026

Hi Simon,

Now I'm wondering what it would take to keep the environment in the exporter...

So we would be moving from:

exporter: knows local hardware and exports resources

coordinator: knows places and matching

client: builds targets, drivers, and strategies

to:

exporter: knows hardware, target configuration, drivers, and strategies

coordinator: still manages place ownership and matching

client: becomes much thinner and mostly consumes what the exporter/coordinator provides

We would need a significant design change.

On the client side, the environment defines the "targets." These targets are bound to "places," which are created by clients and managed by the coordinator.

1 - We would need to move "targets" to the exporter. The client would then need to fetch the target, so the _get_target() logic would require a major redesign.

2 - The tricky part would be the strategies. There will likely be multiple design approaches to discuss.

3 - This change would also remove the ability to use multiple devices from multiple exporters within a single place. However, I am not sure whether there is a real use case where someone needs multiple resources from different exporters in one place, so this may not be a significant issue.

There are likely many other points to consider. In the end, it is doable, but it will be challenging.

Yes this is not easy. I have been prototyping a way for exporters to send their environment file to the coordinator, which then merges them into a single file, which it then provides to clients. Once I've tested it a bit more I will see if I can create a clean PR.

0 replies

Emantor · 2026-05-06T13:41:24Z

Emantor
May 6, 2026
Maintainer

Disclaimer: I converted this into a discussion, unfortunately the conversion does not take the answer order into account…

Answering the questions:

1. Resource Discovery Confusion

The seperation between resources and places was an intentional choice since Pengutronix operates a lab with removable hardware but fixed resources for places. This means we have to separate the physical connections (resources) from the actual requirement for the hardware (place) and need to be able to assemble them as needed.

If you have a 1:1 dut to exporter configuration, you can take advantage of the matches as well by allocating all resources of a group to a place via a wildcard match, i.e.

labgrid-client -p testplace add-match adi-test/group-1/*

But mostly your users should not care about resources, they should care about places.

2. Redundancy Between Groups and Places

We envisioned the resources to be aggregated by physical physical, i.e. if you have a rackspace with serial connection, ethernet and power, these would all be under the same group. Aggregating all possible USB resources for a USB Hub is also possible, even if it just helps to make the hierarchy clearer.

As explained above, with wildcard matches the distinction between groups and places is required.

3. Environment on Client Side

An environment should only be required if you want to run a set of scripts or a test suite on the device, in that case it should be kept with the respective code.

Most interactive usage via the labgrid client should not require the setup of an environment configuration file.
The environment configuration files also allow a purely local setup by defining all local resources within them, the idea was to allow a seamless move from a remote setup on the local developer desk into the remote infrastructure by employing the RemotePlace resource where all resource information is fetched from the coordinator.

4. SSH and Access Management Complexity

We need an authenticated connection to the exporter to execute commands and SSH was a natural choice for that.
Our expectation is that multi-user and multi-exporter deployments will be managed by a configuration management solution as well, which can handle the roll-out of the SSH keys for users too.

Rolling out the SSH configuration for users depends on your exporter DNS setup, but can ideally be a three line configuration i.e.:

Host *.exporters.mycompanyinternaldomain.com
  User <myusername>
  IdentityFile ~/.ssh/my-company-identity

NetworkService should point to the DUT and not the exporter.

5. Configuration Management Problem

While clients require a working SSH configuration, everything else should not be required for interactive usage of labgrid via labgrid-client.

6 replies

Emantor May 6, 2026
Maintainer

You are also able to bootstrap, fastboot, io, video, write-files, write-image, …

Yes, strategy usage currently requires a local configuration, this is unfortunately necessary since it needs to be loaded from a local file.

dlech May 6, 2026

I guess the board I am using is an odd one then.

It requires a custom strategy for bootstrap because it requires a vendor tool for that.

power doesn't work for me without a config because I don't have a resource with PowerProtocol, but rather a gpio that needs a driver matched to it to turn it into PowerProtocol.

And using io is quite tedious, e.g. it would take 4 commands to get ready for bootstrap. Of course one could write a script for this, but then it is not much different between sharing the script and sharing the strategy between clients.

Anyway, point is that there is a need for an easy way to share config/strategy between clients because in some cases you can't do much without it.

An environment should only be required if you want to run a set of scripts or a test suite on the device, in that case it should be kept with the respective code.

This is the direction I have been going. And I tend to agree that this makes more sense to do it this way. We already have git that is much better suited to managing and distributing this sort of thing as opposed to the suggested change to put the environment on the exporter.

sjg20 May 6, 2026

This is the direction I have been going. And I tend to agree that this makes more sense to do it this way. We already have git that is much better suited to managing and distributing this sort of thing as opposed to the suggested change to put the environment on the exporter.

I have all the lab files in git, but I think an option to consider the environment to be 'part of the lab' would simplify things for some use cases.

ozan956 May 6, 2026
Author

Thank you for the explanations @Emantor.

I agree that the separation between resources and places makes sense for dynamic setups. My point is mostly about another common deployment model like, one exporter (or one exporter group) owns all resources for a specific board. In that case, the group and the place often end up representing almost the same thing. For this common case, it would be helpful if labgrid could provide another path. For example by optionally creating places from exporter groups or by allowing the exporter/coordinator side to provide a ready-to-use target description. The more flexible resource/place model could still remain available for dynamic setups.

Regarding the environment file, I understand the intended separation. For simple usage, clients may be able to work without an environment. However, in practice, it is a requirement for many cases that I encountered. As David said, console may work without an environment, but in most other cases the client needs configuration anyway. For example, if a setup uses OpenOCDDriver, the driver needs information such as the OpenOCD configs, script search path, images, etc. So even if the bootstrap command is available from the CLI, a real OpenOCD-based setup still needs to get this board-specific configuration from somewhere.

Right now, it is possible to pass extra arguments to labgrid-client bootstrap, but this only helps in a very narrow case. The arguments are parsed as key=value pairs only in the fallback path for NetworkAlteraUSBBlaster, where the client auto-creates an OpenOCDDriver. For a NetworkUSBDebugger, even though OpenOCDDriver can bind to it, the bootstrap fallback code does not create an OpenOCDDriver. So with no local environment and only a NetworkUSBDebugger, labgrid-client bootstrap ... ends with target has no compatible resource available. That means the CLI argument path does not remove the need for an environment/configuration in this kind of setup.

That's why I am questioning whether a fully client-side environment may be the best abstraction for shared labs. If the environment file is distributed through git and is not expected to be modified by individual users, then every client is effectively carrying a local copy of the same lab description. In that case, keeping this description closer to the lab infrastructure itself (on the exporter/coordinator side) seems more natural.

I also agree that SSH is a natural choice for authenticated execution, and configuration management can help with rolling out users, keys, and SSH configuration. My concern is that this still becomes an operational burden at scale. In a multi-user and multi-exporter setup, we still need to manage key distribution, rotation, revocation, user permissions, and auditing across all exporters. A shared Unix user simplifies this, but weakens per-user isolation and can lead to practical issues such as users overwriting each other’s files during scp or bootstrap. So I think configuration management can make the current model workable, but it does not fully solve the user-facing complexity. Reducing the need for direct client-to-exporter access would still make the system easier to scale and easier to use.

I am not suggesting that the current model should be removed. It is clearly useful for complex setups where places aggregate resources dynamically or where test code owns its own environment. My suggestion is that labgrid could also support another way to handle configuration and access for larger shared labs.

sjg20 May 12, 2026

I tried creating a way to send the environment from the coordinator to clients:

#1866

Uh oh!

Challenges and Improvements for Large-Scale Labgrid Deployments and Client Usability #1861

Uh oh!

Target Setup

Current Client Experience

1. Resource Discovery Confusion

2. Redundancy Between Groups and Places

a) By hardware type:

b) By device:

3. Environment on Client Side

4. SSH and Access Management Complexity

5. Configuration Management Problem

Summary of Current Workflow

Proposed Improvements

1. Simplify Resource Abstraction

2. Centralize Configuration

3. Improve SSH Handling

4. Reduce Client Responsibilities

Conclusion

Replies: 11 comments · 8 replies

Uh oh!

Uh oh!

ozan956 May 1, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Emantor May 6, 2026 Maintainer

Uh oh!

ozan956 May 2, 2026 Author

Uh oh!

Uh oh!

Emantor May 6, 2026 Maintainer

Uh oh!

Uh oh!

ozan956 May 5, 2026 Author

Uh oh!

ozan956 May 5, 2026 Author

Uh oh!

Uh oh!

Emantor May 6, 2026 Maintainer

1. Resource Discovery Confusion

2. Redundancy Between Groups and Places

3. Environment on Client Side

4. SSH and Access Management Complexity

5. Configuration Management Problem

Uh oh!

Emantor May 6, 2026 Maintainer

Uh oh!

Uh oh!

Uh oh!

ozan956 May 6, 2026 Author

Uh oh!

Replies: 11 comments 8 replies

ozan956
May 1, 2026
Author

Emantor May 6, 2026
Maintainer

ozan956
May 2, 2026
Author

Emantor May 6, 2026
Maintainer

ozan956
May 5, 2026
Author

ozan956
May 5, 2026
Author

Emantor
May 6, 2026
Maintainer

Emantor May 6, 2026
Maintainer

ozan956 May 6, 2026
Author