Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve local hostname resolution by trying different methods #22512

Merged

Conversation

ldziedziul
Copy link
Contributor

@ldziedziul ldziedziul commented Oct 17, 2022

Getting hostname from HOSTNAME env variables doesn't work on vanilla non-containerised Ubuntu systems.

Here's the failures on non-dockerized env: https://jenkins.hazelcast.com/job/Hazelcast-master-OracleJDK8-Esxi7/4/#showFailuresLink

Example on clean AWS ubuntu:

ubuntu version

root@ip-10-0-189-252:/home/ubuntu# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

java version

root@ip-10-0-189-252:/home/ubuntu# java -version
openjdk version "11.0.16" 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Ubuntu-0ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Getting hostname with different methods

root@ip-10-0-189-252:/home/ubuntu# jshell
|  Welcome to JShell -- Version 11.0.16
|  For an introduction type: /help intro

jshell> System.getenv("HOSTNAME")
$1 ==> null

jshell> java.net.InetAddress.getLocalHost().getHostName();
$2 ==> "ip-10-0-189-252"

root@ip-10-0-189-252:/home/ubuntu# hostname
ip-10-0-189-252

This PR tries to improve hostname resolution by trying different methods:

  • get HOSTNAME env variable
  • get output from hostname system command
  • get hostname InetAddress.getLocalHost().getHostName() (broken DNS setup can break it)

Reference:

Copy link
Collaborator

@vbekiaris vbekiaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   AuthenticationInformationLeakTest.testAuthenticationExceptionDoesNotLeakInfo:99->getKeyValue:170 ? EOF
[INFO] 
[ERROR] Tests run: 4346, Failures: 0, Errors: 1, Skipped: 8
[INFO] 

[ERROR] There are test failures.

@ldziedziul
Copy link
Contributor Author

run-lab-run

@ldziedziul ldziedziul enabled auto-merge (squash) October 17, 2022 12:31
vbekiaris added a commit that referenced this pull request Oct 17, 2022
…22501)

- Adds automated cluster state management for persistence on kubernetes
- Supports cluster-wide shutdown, rolling restart and partial member
recovery from failure on kubernetes [HZ-1190] [HZ-1191] [HZ-1193]
- Fixes behaviour of readiness probe with persistence enabled [HZ-1349]
- Allows tuning either for speedy crash recovery with FROZEN state or
availability of in-memory data structures with NO_MIGRATION state for
missing members [HZ-1311]
- Fixes backup sync after single member crash recovery [HZ-1349]

Design document in EE side:

https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md

(cherry picked from commit 1ddc16e)
1:1 clean backport of #21844 to 5.2.0 release branch

Also includes backport of #22512 

Co-authored-by: Łukasz Dziedziul <lukasz.dziedziul@hazelcast.com>
@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   ClientQueryCacheRecreationTest.listeners_still_works_after_query_cache_recreation:154->HazelcastTestSupport.assertTrueEventually:1338->HazelcastTestSupport.assertTrueEventually:1236->lambda$listeners_still_works_after_query_cache_recreation$0:152 expected:<90> but was:<12>
[INFO] 
[ERROR] Tests run: 50915, Failures: 1, Errors: 0, Skipped: 238
[INFO] 

[ERROR] There are test failures.

@ldziedziul
Copy link
Contributor Author

run-lab-run

@hz-devops-test
Copy link

The job Hazelcast-pr-builder of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
-------TEST FAILURE-------
--------------------------
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   AuthenticationInformationLeakTest.testAuthenticationExceptionDoesNotLeakInfo:99->getKeyValue:170 ? EOF
[INFO] 
[ERROR] Tests run: 4346, Failures: 0, Errors: 1, Skipped: 8
[INFO] 

[ERROR] There are test failures.

Process exec = Runtime.getRuntime().exec("hostname");
exec.waitFor(PROCESS_TIMEOUT_IN_SECONDS, TimeUnit.SECONDS);
InputStream stream = exec.getInputStream();
return new BufferedReader(new InputStreamReader(stream)).lines().collect(Collectors.joining("\n"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be in try-with-resources to avoid leaks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary since process streams are closed when the process is finished.

@ldziedziul ldziedziul requested a review from a team as a code owner October 17, 2022 20:58
Copy link
Member

@srknzl srknzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@ldziedziul ldziedziul merged commit 4110690 into hazelcast:master Oct 18, 2022
@ldziedziul ldziedziul deleted the improve-local-hostname-resolution branch October 18, 2022 06:46
vbekiaris added a commit that referenced this pull request Oct 19, 2022
…22502)

- Adds automated cluster state management for persistence on kubernetes
- Supports cluster-wide shutdown, rolling restart and partial member
recovery from failure on kubernetes [HZ-1190] [HZ-1191] [HZ-1193]
- Fixes behaviour of readiness probe with persistence enabled [HZ-1349]
- Allows tuning either for speedy crash recovery with FROZEN state or
availability of in-memory data structures with NO_MIGRATION state for
missing members [HZ-1311]
- Fixes backup sync after single member crash recovery [HZ-1349]

Design document in EE side:

https://github.com/vbekiaris/hazelcast-enterprise/blob/enhancements/5.2/k8s-persistence/docs/design/persistence/04-persistence-kubernetes-improvements.md

(cherry picked from commit 1ddc16e)
1:1 clean backport from #21844 

Also includes backport of #22512 
Co-authored-by: Łukasz Dziedziul <lukasz.dziedziul@hazelcast.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants