Skip to content

Fix/exporter port#613

Merged
guvenc merged 3 commits intomainfrom
fix/exporter_port
Oct 17, 2024
Merged

Fix/exporter port#613
guvenc merged 3 commits intomainfrom
fix/exporter_port

Conversation

@vlorinc
Copy link
Copy Markdown
Contributor

@vlorinc vlorinc commented Oct 1, 2024

Proposed Changes

  • add flag to change grpc port
  • grpc tcp connection is now tested as first step
  • removed retries of socket connection test, first connection is tried in loop in retryInterval until success
  • retryInterval lowered to 5 seconds
  • update shutdown process:
    -- when connection to dpdk fails, process ends and doesn't retry to connect as before
    -- this way it can be handled by kubernetes and shown in restart counter
    -- added graceful shutdown of http server
  • udpate error handling:
    -- queryTelemetry function now returns error, this error indicates problems with dpdk telemetry socket and shuts down the program
    -- connection testing functions merged together to clean up code
  • check for uid to use correct dpdk telemetry socket

Fixes #610 Dpservice crashes due to early telemetry in multiport-eswitch mode:
Metrics are updated only after successful connection to grpc port.

@vlorinc vlorinc requested a review from a team as a code owner October 1, 2024 14:39
@vlorinc vlorinc linked an issue Oct 1, 2024 that may be closed by this pull request
@github-actions github-actions bot added bug Something isn't working size/L labels Oct 1, 2024
Copy link
Copy Markdown
Contributor

@PlagueCZ PlagueCZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covers everything needed and more.

Tested in OSC lab and based on Grafana output it is working right.

When dpservice pod gets restarted, exporter is now waiting for gRPC which is absolutely critical for multiport-eswitch due to a bug.

@guvenc
Copy link
Copy Markdown
Contributor

guvenc commented Oct 16, 2024

@vlorinc
This one has a conflict with main branch now. Would you please rebase to main ?
I think the conflict comes from the commit "Fix exporter for different NUMA nodes"
which is also part of this PR and that commit is already merged to main.

@vlorinc vlorinc closed this Oct 17, 2024
FlorinPeter and others added 3 commits October 17, 2024 11:03
Implemented a check to verify if TCP port 1337 on localhost is open before attempting to write to the DPDK connection. This ensures more robust error handling and potentially resolves connectivity issues earlier in the process.
- grpc tcp connection is now tested as first step
- removed retries of socket connection test, retry is run until success in retryInterval
- retryInterval lowered to 5 seconds
- update shutdown process:
-- when connection to dpdk fails, process ends and doesn't retry the connection
-- this way it can be handled by kubernetes and shown in restart counter
-- added graceful shutdown of http server
- udpate error handling:
-- queryTelemetry function now returns error, this error indicates problems with dpdk telemetry socket and shuts down the program
-- connection testing functions merged together to clean up code
- check for uid to use correct dpdk telemetry socket
@vlorinc vlorinc reopened this Oct 17, 2024
Copy link
Copy Markdown
Contributor

@guvenc guvenc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guvenc guvenc merged commit 3ae9d11 into main Oct 17, 2024
@guvenc guvenc deleted the fix/exporter_port branch October 17, 2024 10:02
@hardikdr hardikdr added this to Roadmap Jun 26, 2025
@hardikdr hardikdr moved this to Done in Roadmap Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking bug Something isn't working size/L

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Dpservice crashes due to early telemetry in multiport-eswitch mode

5 participants