Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTE integ tests fail while execute(_run_integ_tests, gateway_ip) #13631

Closed
LKreutzer opened this issue Aug 17, 2022 · 7 comments
Closed

LTE integ tests fail while execute(_run_integ_tests, gateway_ip) #13631

LKreutzer opened this issue Aug 17, 2022 · 7 comments
Labels
component: ci All updates on CI (Jenkins/CircleCi/Github Action) LTE-Integration-Test Issues relating to LTE Integration Tests

Comments

@LKreutzer
Copy link
Contributor

LKreutzer commented Aug 17, 2022

The LTE integ tests and the LTE integ tests bazel are failing on master with:

Warning: Unable to load SSH config file '/Users/runner/.ssh/config'


Fatal error: local() encountered an error (return code 2) while executing 'ssh -i /Users/runner/.vagrant.d/insecure_private_key -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -tt vagrant@127.0.0.1 -p 2201 'cd $MAGMA_ROOT/lte/gateway/python/integ_tests;  sudo ethtool --offload eth1 rx off tx off; sudo ethtool --offload eth2 rx off tx off; source ~/build/python/bin/activate; export GATEWAY_IP=192.168.60.142; make integ_test enable-flaky-retry=true ''

Aborting.

Warning: Unable to load SSH config file '/Users/runner/.ssh/config'

Example runs:

The LTE integ tests fail sometimes with this error, while the LTE integ tests bazel fail with this error in almost all runs for the past 5-6 days.

@LKreutzer LKreutzer added component: ci All updates on CI (Jenkins/CircleCi/Github Action) LTE-Integration-Test Issues relating to LTE Integration Tests labels Aug 17, 2022
@LKreutzer
Copy link
Contributor Author

Findings:
With the changes from the PR #13423 the Make command integ_test executes an exit 1 if there has been a test failure.

.PHONY: integ_test
integ_test: $(PYTHON_BUILD)/setupinteg_env $(BIN)/pytest
	. $(PYTHON_BUILD)/bin/activate
ifdef TESTS
	$(call execute_test,$(TESTS))
else
	echo "pass" > $(MAGMA_ROOT)/test_status.txt
ifndef enable-flaky-retry
	echo "Flaky test retries are disabled"
	$(foreach test,$(EXTENDED_TESTS) $(PRECOMMIT_TESTS),$(call execute_test,$(test));)
else
	echo "Flaky test retries are enabled"
	-$(foreach test,$(EXTENDED_TESTS) $(PRECOMMIT_TESTS),$(call execute_test,$(test));)
	if [ ! -z `grep -s pass $(MAGMA_ROOT)/test_status.txt` ]; then echo "Final integ_test status: Passed"; else echo "Final integ_test status: Failed"; exit 1; fi
endif
endif

When this Make call happens in fabric _run_integ_tests the exit leads to a fatal error logging via fabric.

@LKreutzer
Copy link
Contributor Author

In the run https://github.com/magma/magma/runs/7868848266?check_suite_focus=true it seems that the test_paging_after_mme_restart fails three times (one assertion error and two timeouts):

===Flaky Test Report===

Traceback (most recent call last):
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/test_paging_after_mme_restart.py", line 127, in test_paging_after_mme_restart
    test.verify()
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/util/traffic_util.py", line 696, in verify
    raise RuntimeError(
RuntimeError: Cached results object is not a tuple : None

test_paging_after_mme_restart (test_paging_after_mme_restart.TestPagingAfterMmeRestart) failed (Execution Count: 2).Traceback (most recent call last):
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/test_paging_after_mme_restart.py", line 103, in test_paging_after_mme_restart
    response = self._s1ap_wrapper.s1_util.get_response()
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py", line 222, in get_response
    raise AssertionError(
AssertionError: Timeout (60 sec) occurred while waiting for response message

test_paging_after_mme_restart (test_paging_after_mme_restart.TestPagingAfterMmeRestart) failed (Execution Count: 3).Traceback (most recent call last):
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/test_paging_after_mme_restart.py", line 103, in test_paging_after_mme_restart
    response = self._s1ap_wrapper.s1_util.get_response()
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py", line 222, in get_response
    raise AssertionError(
AssertionError: Timeout (60 sec) occurred while waiting for response message

===End Flaky Test Report===

But the GitHub overview only reports one failure:
Screenshot from 2022-08-17 11-09-26

Is this working as intended, see also comment above @VinashakAnkitAman ?

@VinashakAnkitAman
Copy link
Member

VinashakAnkitAman commented Aug 24, 2022

The LTE integ tests and the LTE integ tests bazel are failing on master with:

Warning: Unable to load SSH config file '/Users/runner/.ssh/config'


Fatal error: local() encountered an error (return code 2) while executing 'ssh -i /Users/runner/.vagrant.d/insecure_private_key -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -tt vagrant@127.0.0.1 -p 2201 'cd $MAGMA_ROOT/lte/gateway/python/integ_tests;  sudo ethtool --offload eth1 rx off tx off; sudo ethtool --offload eth2 rx off tx off; source ~/build/python/bin/activate; export GATEWAY_IP=192.168.60.142; make integ_test enable-flaky-retry=true ''

Aborting.

Warning: Unable to load SSH config file '/Users/runner/.ssh/config'

Example runs:

The LTE integ tests fail sometimes with this error, while the LTE integ tests bazel fail with this error in almost all runs for the past 5-6 days.

Hi @LKreutzer I think the logs seen here is ok. Here the integ test case execution has failed for some testcases like s1aptests/test_attach_detach_multiple_ip_blocks_mobilityd_restart.py. The error log Warning: Unable to load SSH config file '/Users/runner/.ssh/config' is present irrespective of the sucess or failure of integ test (See success log here with same warning https://github.com/magma/magma/runs/7993460940?check_suite_focus=true). Now for the 2nd fatal error log, if lte integ test fails, this is expected to result in the command execution failure. This is why the command is exiting with exit code 1. Moreover, this is last command of the fabric target to run integ test, so the final status generated from here is perfectly ok to show the status of this command execution. This should not create any problem in next set of command execution from the Github workflow file, therefore can be considered perfectly ok.

@VinashakAnkitAman
Copy link
Member

VinashakAnkitAman commented Aug 24, 2022

In the run https://github.com/magma/magma/runs/7868848266?check_suite_focus=true it seems that the test_paging_after_mme_restart fails three times (one assertion error and two timeouts):

===Flaky Test Report===

Traceback (most recent call last):
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/test_paging_after_mme_restart.py", line 127, in test_paging_after_mme_restart
    test.verify()
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/util/traffic_util.py", line 696, in verify
    raise RuntimeError(
RuntimeError: Cached results object is not a tuple : None

test_paging_after_mme_restart (test_paging_after_mme_restart.TestPagingAfterMmeRestart) failed (Execution Count: 2).Traceback (most recent call last):
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/test_paging_after_mme_restart.py", line 103, in test_paging_after_mme_restart
    response = self._s1ap_wrapper.s1_util.get_response()
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py", line 222, in get_response
    raise AssertionError(
AssertionError: Timeout (60 sec) occurred while waiting for response message

test_paging_after_mme_restart (test_paging_after_mme_restart.TestPagingAfterMmeRestart) failed (Execution Count: 3).Traceback (most recent call last):
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/test_paging_after_mme_restart.py", line 103, in test_paging_after_mme_restart
    response = self._s1ap_wrapper.s1_util.get_response()
  File "/home/vagrant/magma/lte/gateway/python/integ_tests/s1aptests/s1ap_utils.py", line 222, in get_response
    raise AssertionError(
AssertionError: Timeout (60 sec) occurred while waiting for response message

===End Flaky Test Report===

But the GitHub overview only reports one failure: Screenshot from 2022-08-17 11-09-26

Is this working as intended, see also comment above @VinashakAnkitAman ?

Hi @LKreutzer This GitHub overview log display is completely wrong. But at the same time I don't see from where this log is taken because the test result logs does not contain this specific message at all. I am assuming this is Github overview's own representation where it is throwing log only after first failure and misses reporting of rest of 2 logs. Not sure if we have any role in this Github log display here which we can correct at all.

@mpfirrmann
Copy link
Contributor

@MoritzThomasHuebner, may this be related to #14292?

@MoritzThomasHuebner
Copy link
Contributor

Probably not, I don't believe the missing config has anything to do with this failure.

@nstng
Copy link
Contributor

nstng commented Dec 27, 2022

I assume this is the (now) normal behavior if at least one test fails - will close for now. A new issue should be created with updated data if this is still a problem.

@nstng nstng closed this as completed Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: ci All updates on CI (Jenkins/CircleCi/Github Action) LTE-Integration-Test Issues relating to LTE Integration Tests
Projects
None yet
Development

No branches or pull requests

5 participants