Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1990140: add connection with timeout in TBR accessibility check to expedite 'disconnected' mode #384

Merged
merged 1 commit into from Aug 13, 2021

Conversation

gabemontero
Copy link
Contributor

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 9, 2021

@gabemontero: An error was encountered searching for bug 1990140 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. could not unmarshal response body: invalid character '<' looking for beginning of value

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

Bug 1990140: add connection with timeout in TBR accessibility check to expedite 'disconnected' mode

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from coreydaley and dmage August 9, 2021 16:26
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2021
}
defer connWithTimeout.Close()
// still do the tls form of connect (which does not have the handy timeout form of dial) to confirm
// ssl handshake is OK
tlsConf := &tls.Config{}
conn, err := tls.Dial("tcp", "registry.redhat.io:443", tlsConf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use tls.Client(connWithTimeout, tlsConf) (maybe with minor tweaks: https://cs.opensource.google/go/go/+/refs/tags/go1.16.7:src/crypto/tls/tls.go;l=154-164;drc=refs%2Ftags%2Fgo1.16.7)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I'll add that - thanks @dmage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after an iteration to addres either ServerName or InsecureSkipVerify must be specified in the tls.Config I have it working

pushing update momentarily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update pushed @dmage thanks

@gabemontero
Copy link
Contributor Author

unrelated flake in e2e-aws that is already noted in sippy (fails ~20% of the time)

/test e2e-aws

Copy link
Contributor

@dperaza4dustbit dperaza4dustbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one question for me

// we have seen cases in the field with disconnected cluster where the default connection timeout can be
// very long (15 minutes in one case); so we do an initial non-tls connection were we can specify a quicker
// timeout to filter out that scenario and default to tbr inaccessible / Removed in an expedient fashion
connWithTimeout, err := net.DialTimeout("tcp", "registry.redhat.io:443", 15*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 still a best guest number that might change in the future. is this something we can make configurable so we can have the customer change if it does not work for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question @dperaza4dustbit

config changes where we have to change the config object and add a new field are expensive / costly

admittedly a "gut feeling", but for what we are dealing with here, it would be better to just have a hard coded value that is sufficient

at most, maybe add an environment variable on the deployment that could be read

minimally, I would be agreeable to running say the e2e-aws-operator test suite repeated times (maybe a dozen) to get a warmer fuzzy

/test e2e-aws-operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK understand @gabemontero , maybe take me through the process of making a config change to get a feeling on the price there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep let's add that to Thursday's call, along with the cross referencing CI flakes with sippy that I mentioned in your openshift/origin PR

@gabemontero
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Aug 9, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 9, 2021

@gabemontero: This pull request references Bugzilla bug 1990140, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jitendar-singh

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gabemontero
Copy link
Contributor Author

/skip

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

several operators below samples were degraded in e2e-aws-upgrade

/retest

@gabemontero
Copy link
Contributor Author

/test e2e-aws-operator

@gabemontero
Copy link
Contributor Author

/retest

/test e2e-aws-operator

@gabemontero
Copy link
Contributor Author

/retest

2 similar comments
@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

/retest

@dperaza4dustbit
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 12, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 12, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dperaza4dustbit, gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor Author

/retest

1 similar comment
@gabemontero
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 13, 2021

@gabemontero: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/okd-e2e-aws-upgrade 36d24c2 link /test okd-e2e-aws-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci openshift-ci bot merged commit ff30d8c into openshift:master Aug 13, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 13, 2021

@gabemontero: All pull requests linked via external trackers have merged:

Bugzilla bug 1990140 has been moved to the MODIFIED state.

In response to this:

Bug 1990140: add connection with timeout in TBR accessibility check to expedite 'disconnected' mode

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants