Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cluster peer HTTP SRV discovery when no HTTPS records exist #11776

Merged
merged 1 commit into from
Feb 3, 2021

Conversation

brandond
Copy link
Contributor

@brandond brandond commented Apr 10, 2020

embed: Fix cluster peer HTTP SRV discovery

Fixed issue where peer SRV discovery failed if no HTTPS endpoints were discovered. HTTP endpoints were never added to the address list due to a bad error check, and the _etcd-server-ssl._tcp.<domain> failure masked the subsequent success of lookups for _etcd-server._tcp.<domain>

Fixes #11321

@brandond brandond force-pushed the fix_srv_11321 branch 2 times, most recently from 7afcc2e to 908b97c Compare April 10, 2020 22:54
@codecov-io
Copy link

codecov-io commented Apr 10, 2020

Codecov Report

Merging #11776 into master will decrease coverage by 0.28%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11776      +/-   ##
==========================================
- Coverage   66.65%   66.36%   -0.29%     
==========================================
  Files         403      403              
  Lines       36881    36881              
==========================================
- Hits        24582    24477     -105     
- Misses      10811    10906      +95     
- Partials     1488     1498      +10     
Impacted Files Coverage Δ
embed/config.go 54.12% <0.00%> (ø)
auth/store.go 58.84% <0.00%> (-19.53%) ⬇️
pkg/netutil/netutil.go 61.47% <0.00%> (-7.38%) ⬇️
clientv3/namespace/watch.go 87.87% <0.00%> (-6.07%) ⬇️
etcdserver/api/v3rpc/lease.go 77.21% <0.00%> (-5.07%) ⬇️
proxy/grpcproxy/watcher.go 89.79% <0.00%> (-4.09%) ⬇️
clientv3/leasing/cache.go 87.77% <0.00%> (-3.89%) ⬇️
pkg/testutil/recorder.go 77.77% <0.00%> (-3.71%) ⬇️
etcdserver/util.go 95.23% <0.00%> (-3.58%) ⬇️
clientv3/leasing/txn.go 88.09% <0.00%> (-3.18%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59f5fb2...6f40c1b. Read the comment docs.

@brandond brandond changed the title Fix HTTP SRV discovery when no HTTPS records exist Fix cluster peer HTTP SRV discovery when no HTTPS records exist Apr 13, 2020
@xorjain
Copy link

xorjain commented May 24, 2020

Thanks a lot for this fix @brandond .

Can someone from the team please merge this change? Thanks in advance.

@der-eismann
Copy link

Yeah, we also really need this PR merged so we can finally upgrade our cluster.

@brandond
Copy link
Contributor Author

Is there anything I can do to help move this forward? I realize pulling in multierr might be controversial, I'm open to alternate approaches to wrapping or combining the errors.

@trivigy
Copy link

trivigy commented Jul 15, 2020

@brandond have you been able to find a workaround in the meantime? Or did you endup just using your own fork?

@brandond
Copy link
Contributor Author

@trivigy yeah I'm just running off a local fork.

@der-eismann
Copy link

der-eismann commented Aug 13, 2020

@gyuho could you (or another maintainer) maybe have a look? We finally want to upgrade.

yangxuanjia pushed a commit to yangxuanjia/etcd that referenced this pull request Oct 21, 2020
@stale
Copy link

stale bot commented Nov 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 11, 2020
@der-eismann
Copy link

Not stale

@stale stale bot removed the stale label Nov 11, 2020
Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for not looking at this earlier. Thank you for the fix.

Could you, please:

  • Rebase this
  • Add a regression test for this scenario.

embed/config.go Outdated Show resolved Hide resolved
embed/config.go Outdated Show resolved Hide resolved
@brandond
Copy link
Contributor Author

brandond commented Feb 2, 2021

Rebased and requested changes made. I'll take a look at how to best mock the SRV responses for a proper test.

@ptabor
Copy link
Contributor

ptabor commented Feb 2, 2021

Thank you for the iteration.

  • Please run ./script/fix.sh, as go.mod -> go.sum went out of sync, and that's the reason why the test fails.
  • Is it possible to add a regression test for this scenario ?

@brandond
Copy link
Contributor Author

brandond commented Feb 3, 2021

@ptabor go.sum should be fixed now. I added some tests for both sections of the codebase that I touched; hopefully that looks good? I'm not sure how to get codecov to update.

@brandond brandond requested a review from ptabor February 3, 2021 10:37
@ptabor
Copy link
Contributor

ptabor commented Feb 3, 2021

Thank you for fix.
Don't worry about code cov.
[edit: In ~30min there should be link at the end of this test output: https://travis-ci.com/github/etcd-io/etcd/jobs/479950649]

Hopefully last thing to update: There is inconsistency in go.mod's that break PASSES="fmt" ./test.sh

FAIL: inconsistent versions for depencency: go.uber.org/multierr
  - go.uber.org/multierr@v1.5.0 from: go.etcd.io/etcd/client/v3
  - go.uber.org/multierr@v1.6.0 from: go.etcd.io/etcd/server/v3
FAIL: inconsistent dependencies

@brandond
Copy link
Contributor Author

brandond commented Feb 3, 2021

I have no idea where any of that is coming from, the only go.mod that references multierr is in server. I'm going to see if squashing my changes and then doing a clean set of go.mod changes fixes this.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@brandond
Copy link
Contributor Author

brandond commented Feb 3, 2021

Well I fixed that but now CodeQL appears to be failing due to some transient github error?

@ptabor ptabor merged commit 4accc34 into etcd-io:master Feb 3, 2021
@ptabor
Copy link
Contributor

ptabor commented Feb 3, 2021

Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

etcd cluster fails to start when using DNS SRV discovery with non-TLS
6 participants