CNTR-1: fix stale gNOI connection post reboot + Implement PushConfig for static bind. by kjahed · Pull Request #5367 · openconfig/featureprofiles

kjahed · 2026-04-20T17:01:26Z

Currently, first containerZ call post reboot fails with 'connection reset by peer error' due to the stale connection handle cached by Ondatra. This PR waits for the device to come back up post reboot by polling the device state (instead of the current arbitrary sleep), and polls the device using gNOI (instead of the usual gNMI) to ensure the connection handle is proper post reboot.

The PR also implements PushConfig for static binding as it required to create a new containerZ client for certain vendors.

gemini-code-assist · 2026-04-20T17:01:37Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the reliability of container lifecycle tests by replacing arbitrary wait times with active health checks post-reboot. Additionally, it extends the static binding functionality to include configuration pushing, which is required for proper client setup in certain vendor environments.

Highlights

Reboot Handling: Replaced the static sleep timer with a robust polling mechanism using gNOI to detect when the device is fully back online, preventing stale connection errors.
Configuration Management: Implemented the PushConfig method for static binding to support necessary client initialization for specific vendors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

OpenConfigBot · 2026-04-20T17:01:52Z

Pull Request Functional Test Report for #5367 / `603ca74`

Virtual Devices

Device	Test	Test Documentation	Job	Raw Log
Arista cEOS		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Cisco 8000E		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Cisco XRd		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Juniper ncPTX		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Nokia SR Linux		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Openconfig Lemming		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.

Hardware Devices

Device	Test	Test Documentation	Raw Log
Arista 7808		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Cisco 8808		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Juniper PTX10008		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.
Nokia 7250 IXR-10e		CNTR-1: Basic container lifecycle via `gnoi.Containerz`.

Help

gemini-code-assist

Code Review

This pull request replaces a static sleep with a polling loop to detect device reboots in container lifecycle tests and adds a PushConfig method to the static DUT binding. Feedback includes reducing the polling interval for better reliability, improving error messaging and failure handling in tests, implementing shorter timeouts for reachability checks to prevent blocking, and fixing a resource leak in the new PushConfig method by ensuring gRPC connections are properly closed.

gemini-code-assist · 2026-04-20T17:04:06Z

 		// Wait for reboot.
-		time.Sleep(8 * time.Minute)
+		maxRebootTime := 8 * time.Minute
+		ticker := time.NewTicker(30 * time.Second)


A 30-second ticker interval might be too long for detecting a reboot, especially on virtual devices or fast-rebooting hardware. If the device reboots and comes back up within the 30-second window, the loop might miss the 'down' state and eventually timeout. Consider reducing the interval to 5 or 10 seconds for better reliability.

Suggested change

ticker := time.NewTicker(30 * time.Second)

ticker := time.NewTicker(10 * time.Second)

gemini-code-assist · 2026-04-20T17:04:06Z

+		for {
+			select {
+			case <-timeout:
+				t.Fatalf("Timeout exceeded: DUT did not reboot within %v seconds.", maxRebootTime)


The error message 'DUT did not reboot' is slightly misleading if the device actually went down but failed to become reachable again. Additionally, using %v seconds with a time.Duration results in a redundant unit (e.g., '8m0s seconds'). Using t.Fatalf is preferred here as the failure makes subsequent test steps meaningless.

Suggested change

t.Fatalf("Timeout exceeded: DUT did not reboot within %v seconds.", maxRebootTime)

t.Fatalf("Timeout exceeded: DUT did not become reachable after reboot within %v.", maxRebootTime)

References

In tests, t.Fatalf is preferred over t.Errorf when a failure makes subsequent test steps meaningless, as this fails fast and reduces overall test execution time.

gemini-code-assist · 2026-04-20T17:04:06Z

+			case <-ticker.C:
+				// use GNOI to refresh the stale cached connection post reboot.
+				sysClient := dut.RawAPIs().GNOI(t).System()
+				_, err := sysClient.Time(ctx, &gspb.TimeRequest{})


The Time RPC call uses the test context ctx, which has a long timeout (8 minutes). If the device is in a state where it accepts connections but hangs on RPCs, this call could block the polling loop for a long time. It's safer to use a shorter timeout for each individual reachability check.

tctx, cancel := context.WithTimeout(ctx, 10*time.Second) _, err := sysClient.Time(tctx, &gspb.TimeRequest{}) cancel()

gemini-code-assist · 2026-04-20T17:04:06Z

+	gnmiClient, err := d.DialGNMI(ctx)
+	if err != nil {
+		return err
+	}
+	if _, err := gnmiClient.Set(ctx, setRequest); err != nil {
+		return err
+	}


The call to d.DialGNMI(ctx) creates a new gRPC connection that is never closed, leading to a resource leak. Since PushConfig is an internal method of staticDUT, you should use dialConn directly to obtain the connection and ensure it is closed after the operation.

Suggested change

gnmiClient, err := d.DialGNMI(ctx)

if err != nil {

return err

}

if _, err := gnmiClient.Set(ctx, setRequest); err != nil {

return err

}

conn, err := dialConn(ctx, d, introspect.GNMI, nil)

if err != nil {

return err

}

defer conn.Close()

if _, err := gpb.NewGNMIClient(conn).Set(ctx, setRequest); err != nil {

return err

}

cntr1 stale connection

603ca74

kjahed requested a review from a team as a code owner April 20, 2026 17:01

kjahed added the cisco-pr label Apr 20, 2026

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

alshabib approved these changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNTR-1: fix stale gNOI connection post reboot + Implement PushConfig for static bind.#5367

CNTR-1: fix stale gNOI connection post reboot + Implement PushConfig for static bind.#5367
kjahed wants to merge 1 commit intoopenconfig:mainfrom
b4firex:kjahed/cntr1

kjahed commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Uh oh!

OpenConfigBot commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	ticker := time.NewTicker(30 * time.Second)
	ticker := time.NewTicker(10 * time.Second)

	t.Fatalf("Timeout exceeded: DUT did not reboot within %v seconds.", maxRebootTime)
	t.Fatalf("Timeout exceeded: DUT did not become reachable after reboot within %v.", maxRebootTime)

Conversation

kjahed commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

OpenConfigBot commented Apr 20, 2026

Pull Request Functional Test Report for #5367 / 603ca74

Virtual Devices

Hardware Devices

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pull Request Functional Test Report for #5367 / `603ca74`