Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize application-operator and application-connector tests #8019

Closed
franpog859 opened this issue Apr 15, 2020 · 9 comments
Closed

Stabilize application-operator and application-connector tests #8019

franpog859 opened this issue Apr 15, 2020 · 9 comments
Assignees
Labels
area/application-connector Issues or PRs related to application connectivity kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.

Comments

@franpog859
Copy link
Contributor

franpog859 commented Apr 15, 2020

Description

Stabilize application-operator and application-connector tests on nightly and weekly clusters. The tests fail on about 10% of runs. It's not something critical because it does not happen on the PRs so it does not block the implementation process. It's important where it comes to the CI-force side, where the test owners are being pinged a lot

The issue probably concerns the clusters configuration. I assume so because it does not happen very often and does happen hardly ever on PRs

Expected result

application-operator and application-connector tests do not fail on nightly and weekly clusters

Logs

@franpog859 franpog859 added area/application-connector Issues or PRs related to application connectivity kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Apr 15, 2020
@franpog859 franpog859 added this to the Backlog_Framefrog milestone Apr 15, 2020
@franpog859
Copy link
Contributor Author

@adamwalach said:

Just wanted to mention that VMs used for weekly/nightly clusters are huge - we changed the configuration some time ago because it was even more unstable - right now on weekly we have 3 nodes with 12cpu/16GBram each.

I think it could help with the investigation process, thanks!

@janmedrek janmedrek self-assigned this May 15, 2020
@janmedrek janmedrek removed this from the Backlog_Framefrog milestone May 15, 2020
@franpog859
Copy link
Contributor Author

franpog859 commented May 26, 2020

Logs from CI after this commit:

Error Trace:    proxy_test.go:421
                   proxy_test.go:286
Error:          Not equal: 
                   expected: string("2.0")
                   actual  : <nil>(<nil>)
Test:           TestProxyService/oauth_spec_url_test
Error Trace:    proxy_test.go:421
                   proxy_test.go:379
Error:          Not equal: 
                   expected: string("2.0")
                   actual  : <nil>(<nil>)
Test:           TestProxyService/additional_query_params_in_spec_test
Error Trace:    gateway_events_test.go:100
Error:          Should be true
Test:           TestGatewayEvents/should_get_all_subscribed_events
TestProxyService/additional_query_parameters_test
util.go:28: 
--------------------------------
POST /operator-app-test-p8rx/v1/metadata/services HTTP/1.1
Host: application-registry-external-api:8081
--------------------------------
HTTP/1.1 500 Internal Server Error
Content-Length: 291
Content-Type: application/json;charset=UTF-8
Date: Tue, 26 May 2020 09:39:24 GMT
Server: envoy
X-Envoy-Upstream-Service-Time: 418

{"code":500,"error":"Creating service in Application failed, Creating service failed, Operation cannot be fulfilled on applications.applicationconnector.kyma-project.io \"operator-app-test-p8rx\": the object has been modified; please apply your changes to the latest version and try again"}
--------------------------------
util.go:12: Invalid response code

These may be helpful:


        POST /app-conn-tests-6xdc/v1/events HTTP/1.1
        Host: gateway.gke-upgrade-commit-332d309c9-2cuzjjlxsy.a.build.kyma-project.io
        --------------------------------
        HTTP/1.1 500 Internal Server Error
        Content-Length: 131
        Content-Type: application/json
        Date: Tue, 02 Jun 2020 09:44:25 GMT
        Server: istio-envoy
        X-Envoy-Upstream-Service-Time: 1031
        
        {"status":500,"type":"internal_server","message":"internal_server","moreInfo":"error sending cloudevent: 503 Service Unavailable"}
        --------------------------------
        require.go:794: 
        	Error Trace:	suite.go:242
                                application_access_test.go:31
        	Error:      	Received unexpected error:
        	            	failed to send Event. Status: 0, Error: 
        	Test:       	TestApplicationAccess

@franpog859
Copy link
Contributor Author

After this PR the tests should be more stable. I suggest to close this issue and reopen it whenever the problem occurs again. If so, please, add the link and the important log in the comment! 👋

@Disper Disper closed this as completed Jun 4, 2020
@franpog859
Copy link
Contributor Author

  • operator failed with timeout - logs

@jakkab jakkab reopened this Jul 8, 2020
@franpog859
Copy link
Contributor Author

Hey, @jakkab ! Feel free to add the logs with the errors in the comment. I'll help with fixing them

@koala7659
Copy link
Contributor

The problem persists - logs

@adamwalach
Copy link
Contributor

@adamwalach
Copy link
Contributor

I checked logs in gcp (stackdriver) and it looks like this problem is quite common:
Screenshot 2020-07-27 at 16 33 43

@adamwalach
Copy link
Contributor

adamwalach commented Jul 29, 2020

Looks quite promising, there are no application-connector failures since the patch was merged:
Screenshot 2020-07-29 at 10 36 56
filter: text:'application-connector-1" in status "Failed' in all containers in workload-kyma-prow cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/application-connector Issues or PRs related to application connectivity kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test.
Projects
None yet
Development

No branches or pull requests

6 participants