Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(agw): AGW container restart check their health status #14008

Merged
merged 10 commits into from
Oct 11, 2022

Conversation

mpfirrmann
Copy link
Contributor

@mpfirrmann mpfirrmann commented Sep 23, 2022

Summary

The LTE integration tests use sleep timer to wait for a magma services to become functional again after restarting them as part of several tests. These sleep timers are based on empirical data for the systemd-based AGW and are not necessarily well suited to account for the containerized AGW's docker container to become healty again after a restart.
Instead, we want to check the container health as an indicator to continue with the tests after a restart (and at the same time leave the existing workflow for the systemd-based AGW as it is for now).

Some special thoughts to be considered here:

  • The container corresponding to the mme service is called oai_mme.
  • The sctpd service does not have a magma@ prefix. Nevertheless, it is started via the magmad service.
  • We want to avoid a double waiting time resolving these special cases

The PR is separated into ten commits:

  1. get_service_name_from_init_system() is added to handle the special naming cases.
  2. disable_service uses the correct service name and works with docker and systemd.
  3. enable_service uses the correct service name and works with docker and systemd.
  4. is_service_active() uses the correct name and checks for systemctl is-active {service_name} / docker container health status is "healthy".
  5. A function to check the health of ALL of magma's AGW docker containers is added (excl. magma's health container which is disabled for the tests).
  6. The wait for restart function checks if all containers are healthy (docker) or waits the given wait_time for a systemd-based AGW.
  7. All service restarts are handled by restart_services(["list", "of", "services"], time_wait: int).
  8. After restarting all services we wait for all docker containers to become healthy.
  9. We adapt the pipelined/datapath check for the containerized AGW.
  10. The restarts of AGW containers follows THIS scheme.

Closes #12956.

Test Plan

In its current state, all pre-commit and extended S1AP integration tests pass locally using a containerized and a systemd-based AGW.
Tested with GitHub actions (ongoing) to show the tests are still working:
https://github.com/mpfirrmann/magma/actions/runs/3113678195

Additional Information

  • This change is backwards-breaking

@mpfirrmann mpfirrmann requested review from a team and rsarwad September 23, 2022 15:07
@pull-request-size pull-request-size bot added the size/L Denotes a Pull Request that changes 100-499 lines. label Sep 23, 2022
@github-actions
Copy link
Contributor

Thanks for opening a PR! 💯

A couple initial guidelines

Howto

  • Reviews. The "Reviewers" listed for this PR are the Magma maintainers who will shepherd it.
  • Checks. All required CI checks must pass before merge.
  • Merge. Once approved and passing CI checks, use the ready2merge label to indicate the maintainers can merge your PR.

More info

Please take a moment to read through the Magma project's

If this is your first Magma PR, also consider reading

@github-actions github-actions bot added the component: agw Access gateway-related issue label Sep 23, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 23, 2022

feg-workflow

    2 files  203 suites   40s ⏱️
374 tests 374 ✔️ 0 💤 0
388 runs  388 ✔️ 0 💤 0

Results for commit 2a1653c.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 23, 2022

dp-workflow

14 tests   14 ✔️  2m 20s ⏱️
  1 suites    0 💤
  1 files      0

Results for commit 2a1653c.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 23, 2022

agw-workflow

615 tests   611 ✔️  3m 58s ⏱️
    2 suites      4 💤
    2 files        0

Results for commit 2a1653c.

♻️ This comment has been updated with latest results.

@mpfirrmann mpfirrmann self-assigned this Sep 23, 2022
@mpfirrmann mpfirrmann linked an issue Sep 23, 2022 that may be closed by this pull request
4 tasks
@mpfirrmann mpfirrmann force-pushed the pr/docker_systemd_restart branch 4 times, most recently from af08b95 to 202384d Compare September 26, 2022 16:05
Copy link
Member

@VinashakAnkitAman VinashakAnkitAman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py Outdated Show resolved Hide resolved
lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py Outdated Show resolved Hide resolved
lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py Outdated Show resolved Hide resolved
lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py Outdated Show resolved Hide resolved
@mpfirrmann mpfirrmann force-pushed the pr/docker_systemd_restart branch 2 times, most recently from 41a610e to eba742f Compare September 28, 2022 09:44
@@ -1002,17 +1002,18 @@ def restart_all_services(self):
"sudo systemctl stop 'magma@*' 'sctpd' ;"
"sudo systemctl start magma@magmad",
)
self._wait_for_pipelined_to_initialize()

EXTRA_WAIT_TIME_FOR_OTHER_SERVICES_SECONDS = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can collect the constants defined in this class and make the class attributes similar to how this is handled in S1ApUtil or SubscriberUtil

Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
…ter `restart_all_services()`

Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
Signed-off-by: Marco Pfirrmann <marco.pfirrmann@tngtech.com>
@magma magma deleted a comment from github-actions bot Oct 10, 2022
Copy link
Member

@VinashakAnkitAman VinashakAnkitAman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: agw Access gateway-related issue size/L Denotes a Pull Request that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make AGW service restarts aware of container status
5 participants