Skip to content

JDK-8334217 : [AIX] Misleading error messages after JDK-8320005 #19887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

suchismith1993
Copy link
Contributor

@suchismith1993 suchismith1993 commented Jun 25, 2024

dll_load_library() call fails is not analyzed (the ebuf content is ignored)
dll_load_library does not analyze the contents of ebuf which leads to misleading error message when library loading fails.

We now call the second dll_load_library() only if the first returns with a 'No such file or directory' error message.
If the first dll_load_library() found the library is not able to load it by any reason, we do not try again with a .a extension.
JDK-8334217


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8334217: [AIX] Misleading error messages after JDK-8320005 (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19887/head:pull/19887
$ git checkout pull/19887

Update a local copy of the PR:
$ git checkout pull/19887
$ git pull https://git.openjdk.org/jdk.git pull/19887/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19887

View PR using the GUI difftool:
$ git pr show -t 19887

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19887.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 25, 2024

👋 Welcome back sroy! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 25, 2024

@suchismith1993 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8334217: [AIX] Misleading error messages after JDK-8320005

Reviewed-by: jkern, mbaesken

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 230 new commits pushed to the master branch:

  • 419cc46: 8335533: OutOfMemoryError: Metaspace observed again on AIX in test RedefineLeakThrowable.java after JDK-8294960
  • 8feabc8: 8334057: JLinkReproducibleTest.java support receive test.tool.vm.opts
  • bc7cd42: 8314498: [macos] Transferring File objects to Finder fails
  • c8a95a7: 8072701: resume001 failed due to ERROR: timeout for waiting for a BreakpintEvent
  • 388fcf0: 8336349: Fix more simple -Wzero-as-null-pointer-constant warnings in C2 code
  • ab27aca: 8336297: C2: Fix -Wzero-as-null-pointer-constant warnings in derived Node ctors
  • 9dfcd75: 8334121: Anonymous class capturing two enclosing instances fails to compile
  • 000de30: 8335269: [Graal] occasional timeout in java/lang/StringBuffer/TestSynchronization.java with loom
  • 4635531: 8335159: Move method reference to lambda desugaring before Lower
  • a253e0f: 8335642: Hide Transform implementation for Class-File API
  • ... and 220 more: https://git.openjdk.org/jdk/compare/933eababf2b79586a911082af36fdcc41763c7b9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@JoKern65, @MBaesken) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Jun 25, 2024

@suchismith1993 The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 25, 2024
@suchismith1993 suchismith1993 changed the title Misleading error fix JDK-8334217 : [AIX] Misleading error messages after JDK-8320005 Jun 26, 2024
@suchismith1993 suchismith1993 marked this pull request as ready for review June 26, 2024 07:01
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 26, 2024
@mlbridge
Copy link

mlbridge bot commented Jun 26, 2024

Copy link
Contributor

@JoKern65 JoKern65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

@tstuefe
Copy link
Member

tstuefe commented Jun 28, 2024

Are these strings not locale dependent? How would this work with a different locale?

@JoKern65
Copy link
Contributor

Are these strings not locale dependent? How would this work with a different locale?

In principle this might be the case, but our special code does the existence check beforehand (you know the 'loading a dll twice without getting a new handle') and this check returns the string as is. See Aix_dlopen() in porting_aix.cpp.

@tstuefe
Copy link
Member

tstuefe commented Jun 28, 2024

Are these strings not locale dependent? How would this work with a different locale?

In principle this might be the case, but our special code does the existence check beforehand (you know the 'loading a dll twice without getting a new handle') and this check returns the string as is. See Aix_dlopen() in porting_aix.cpp.

Okay. But grepping for an english error string verbatim seems really iffy. @suchismith1993 Please factor out the string at least, e.g. as a common define.

@suchismith1993
Copy link
Contributor Author

suchismith1993 commented Jun 28, 2024

Are these strings not locale dependent? How would this work with a different locale?

In principle this might be the case, but our special code does the existence check beforehand (you know the 'loading a dll twice without getting a new handle') and this check returns the string as is. See Aix_dlopen() in porting_aix.cpp.

Okay. But grepping for an english error string verbatim seems really iffy. @suchismith1993 Please factor out the string at least, e.g. as a common define.

You mean using a #define ? or declaring a string --> const error_report[] = "No Such File or Directory";

@tstuefe
Copy link
Member

tstuefe commented Jun 29, 2024

Are these strings not locale dependent? How would this work with a different locale?

In principle this might be the case, but our special code does the existence check beforehand (you know the 'loading a dll twice without getting a new handle') and this check returns the string as is. See Aix_dlopen() in porting_aix.cpp.

Okay. But grepping for an english error string verbatim seems really iffy. @suchismith1993 Please factor out the string at least, e.g. as a common define.

You mean using a #define ? or declaring a string --> const error_report[] = "No Such File or Directory";

What I meant is to define the string somewhere - be it a constexpr or a define - and pull that definition into both places where its used.

Tying important implementation decisions to scanning error message literals is super breaky and violates the principle of least surprise. Someone may change the error message in the future, maybe to enrich it. And nobody would expect someone to scan that message literally and base decisions on that.

Better would be introducing clear error numbers. But by using a central common definition for the string you at least minimize the chance of this breaking if someone changes the error messages, and signal too the code reader that the message may be used somewhere else.

@suchismith1993
Copy link
Contributor Author

But by using a central common definition for the string you at least minimize the chance of this breaking if someone changes the error messages, and signal too the code reader that the message may be used somewhere else.

Great point. Thanks, thats good learning.
We had worked on checking error numbers before, but there were limitations to that.
Is the current #define code i pushed ,ok ?

@tstuefe
Copy link
Member

tstuefe commented Jun 29, 2024

But by using a central common definition for the string you at least minimize the chance of this breaking if someone changes the error messages, and signal too the code reader that the message may be used somewhere else.

Great point. Thanks, thats good learning. We had worked on checking error numbers before, but there were limitations to that. Is the current #define code i pushed ,ok ?

No, since you still only use it in one place, and in porting_aix.cpp we still use a literal string.

@JoKern65
Copy link
Contributor

But by using a central common definition for the string you at least minimize the chance of this breaking if someone changes the error messages, and signal too the code reader that the message may be used somewhere else.

Great point. Thanks, thats good learning. We had worked on checking error numbers before, but there were limitations to that. Is the current #define code i pushed ,ok ?

Hi Suchi, what was the problem with error number? I would enrich the dll_load_library() interface by an eno to return the errno
static void* dll_load_library(const char *filename, int* eno, char *ebuf, int ebuflen) {
and also the Aix_dlopen() interface
Aix_dlopen(const char* filename, int Flags, int* eno, const char** error_report) {

Then you can return the errno from Aix_dlopen in two places

  if (false == search_file_in_LIBPATH(filename, &libstat)) {
    // file with filename does not exist
  #ifdef ASSERT
    result = ::dlopen(filename, Flags);
    assert(result == nullptr, "dll_load: Could not stat() file %s, but dlopen() worked; Have to improve stat()", filename);
  #endif
    *error_report = "Could not load module .\nSystem error: No such file or directory";
+   *eno = ENOENT;
    return nullptr;
  }

and

      // Library not yet loaded; load it, then store its handle in handle table
+     errno = 0;
      result = ::dlopen(filename, Flags);
      if (result != nullptr) {
        g_handletable_used++;
        (p_handletable + i)->handle = result;
        (p_handletable + i)->inode = libstat.st_ino;
        (p_handletable + i)->devid = libstat.st_dev;
        (p_handletable + i)->hash = hash;
        (p_handletable + i)->refcount = 1;
      }
      else {
        // error analysis when dlopen fails
+       *eno = errno;                         
        *error_report = ::dlerror();
        if (*error_report == nullptr) {
          *error_report = "dlerror returned no error description";
        }
      }

In os_aix.cpp you have to add and modify the following

  // First try to load the existing file.
+ int eno = 0;
  result = dll_load_library(filename, &eno, ebuf, ebuflen);
  // If the load fails,we try to reload by changing the extension to .a for .so files only.
  // Shared object in .so format dont have braces, hence they get removed for archives with members.
  if (result == nullptr && eno == ENOENT && pointer_to_dot != nullptr && strcmp(pointer_to_dot, old_extension) == 0) {
    snprintf(pointer_to_dot, sizeof(old_extension), "%s", new_extension);
    result = dll_load_library(file_path, &eno, ebuf, ebuflen);

@suchismith1993
Copy link
Contributor Author

Hi Suchi, what was the problem with error number? I would enrich the dll_load_library() interface by an eno to return the errno

#16604 (comment) This was the discussion

@JoKern65
Copy link
Contributor

JoKern65 commented Jul 1, 2024

Hi Suchi, what was the problem with error number? I would enrich the dll_load_library() interface by an eno to return the errno

#16604 (comment) This was the discussion

Yes, that's right. But my new proposal follows the points mentioned in the discussion above. A pointer to an integer is passed down to exactly the two places, where the errno can be examined. And before the dlopen() call I set errno to 0 and directly after I save errno into eno and give it all the way back to os::dll_load(). Evaluating the errno directly in os::dll_load() is to late with the explanation of the mentioned discussion.

@suchismith1993
Copy link
Contributor Author

Hi Suchi, what was the problem with error number? I would enrich the dll_load_library() interface by an eno to return the errno

#16604 (comment) This was the discussion

Yes, that's right. But my new proposal follows the points mentioned in the discussion above. A pointer to an integer is passed down to exactly the two places, where the errno can be examined. And before the dlopen() call I set errno to 0 and directly after I save errno into eno and give it all the way back to os::dll_load(). Evaluating the errno directly in os::dll_load() is to late with the explanation of the mentioned discussion.

Yes. So should I wait till you push it out, need to mark it as dependent on you change then ?

@JoKern65
Copy link
Contributor

JoKern65 commented Jul 1, 2024

Hi Suchi, what was the problem with error number? I would enrich the dll_load_library() interface by an eno to return the errno

#16604 (comment) This was the discussion

Yes, that's right. But my new proposal follows the points mentioned in the discussion above. A pointer to an integer is passed down to exactly the two places, where the errno can be examined. And before the dlopen() call I set errno to 0 and directly after I save errno into eno and give it all the way back to os::dll_load(). Evaluating the errno directly in os::dll_load() is to late with the explanation of the mentioned discussion.

Yes. So should I wait till you push it out, need to mark it as dependent on you change then ?

No. This was meant as a sketch for you how YOU can solve the problem without the string hassle.

@suchismith1993
Copy link
Contributor Author

  • errno = 0;
    result = ::dlopen(filename, Flags);
    if (result != nullptr) {
      g_handletable_used++;
      (p_handletable + i)->handle = result;
      (p_handletable + i)->inode = libstat.st_ino;
      (p_handletable + i)->devid = libstat.st_dev;
      (p_handletable + i)->hash = hash;
      (p_handletable + i)->refcount = 1;
    }
    else {
      // error analysis when dlopen fails
    
  •   *eno = errno;                         
      *error_report = ::dlerror();
      if (*error_report == nullptr) {
    

Where is the errno getting set here ?

@JoKern65
Copy link
Contributor

JoKern65 commented Jul 3, 2024

Where is the errno getting set here ?

errno is an implicit global variable set/modified by almost all runtime APIs in case of an error. In our case errno is set/modified in the ::dlopen() call. So the logic is first reset errno to 0 (no error), then call the API (here ::dlopen) which will set errno in case of an error with one of the values described in the APIs man-page, and directly after the API call but at least before the next API call save the content of errno into a local variable for further evaluation (here we return the value of eno to your function for the eno == ENOENT check).

Copy link
Contributor

@JoKern65 JoKern65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was intentional, that I used the variable name eno or if you want err_no but not errno to transport the content of the global errno variable. Have you compiled this beforehand?
I will mark my changes were needed.

@suchismith1993
Copy link
Contributor Author

It was intentional, that I used the variable name eno or if you want err_no but not errno to transport the content of the global errno variable. Have you compiled this beforehand? I will mark my changes were needed.

Not it was not intentional. I corrected it after i understood your comment about errno.

@@ -113,6 +113,7 @@
#error Hotspot on AIX must be compiled with -D_LARGE_FILES
#endif

#define ERROR_NO_SUCH_FILE "No such file or directory"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore. Remove the #define line.

@@ -988,7 +989,7 @@ bool os::dll_address_to_library_name(address addr, char* buf,
return true;
}

static void* dll_load_library(const char *filename, char *ebuf, int ebuflen) {
static void* dll_load_library(const char *filename, int *errno, char *ebuf, int ebuflen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use the global errno variable name here, instead use *eno

@@ -1016,7 +1017,7 @@ static void* dll_load_library(const char *filename, char *ebuf, int ebuflen) {
void* result;
const char* error_report = nullptr;
JFR_ONLY(NativeLibraryLoadEvent load_event(filename, &result);)
result = Aix_dlopen(filename, dflags, &error_report);
result = Aix_dlopen(filename, errno, dflags, &error_report);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use the global errno variable name here, instead use *eno

snprintf(pointer_to_dot, sizeof(old_extension), "%s", new_extension);
result = dll_load_library(file_path, ebuf, ebuflen);
result = dll_load_library(file_path, &errno, ebuf, ebuflen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use the global errno variable name here, instead use &eno

@@ -1035,7 +1035,7 @@ static bool search_file_in_LIBPATH(const char* path, struct stat64x* stat) {
// specific AIX versions for ::dlopen() and ::dlclose(), which handles the struct g_handletable
// This way we mimic dl handle equality for a library
// opened a second time, as it is implemented on other platforms.
void* Aix_dlopen(const char* filename, int Flags, const char** error_report) {
void* Aix_dlopen(const char* filename, int *eno, int Flags, const char** error_report) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also have to adapt the Aix_dlopen() declaration in the header file porting_aix.hpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I understood. The reason i was not getting error was i was compiling on Linux machine. I thought the jdk would throw errors atleast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch Flags and eno
old: void* Aix_dlopen(const char* filename, int eno, int Flags, const char* error_report) {
new: void* Aix_dlopen(const char* filename, int Flags, int eno, const char* error_report) {

and I have to emphasize please also add this additional int *eno to the declaration of Aix_dlopen() in the porting_aix.hpp file

@@ -1016,7 +1016,7 @@ static void* dll_load_library(const char *filename, char *ebuf, int ebuflen) {
void* result;
const char* error_report = nullptr;
JFR_ONLY(NativeLibraryLoadEvent load_event(filename, &result);)
result = Aix_dlopen(filename, dflags, &error_report);
result = Aix_dlopen(filename, eno, dflags, &error_report);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch dflags and eno
old: result = Aix_dlopen(filename, eno, dflags, &error_report);
new: result = Aix_dlopen(filename, dflags, eno, &error_report);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. why are we switching the order of arguments / parameters ? is it some order of the arguments to be processed ?

@@ -1035,7 +1035,7 @@ static bool search_file_in_LIBPATH(const char* path, struct stat64x* stat) {
// specific AIX versions for ::dlopen() and ::dlclose(), which handles the struct g_handletable
// This way we mimic dl handle equality for a library
// opened a second time, as it is implemented on other platforms.
void* Aix_dlopen(const char* filename, int Flags, const char** error_report) {
void* Aix_dlopen(const char* filename, int *eno, int Flags, const char** error_report) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch Flags and eno
old: void* Aix_dlopen(const char* filename, int eno, int Flags, const char* error_report) {
new: void* Aix_dlopen(const char* filename, int Flags, int eno, const char* error_report) {

and I have to emphasize please also add this additional int *eno to the declaration of Aix_dlopen() in the porting_aix.hpp file

@@ -1035,7 +1035,7 @@ static bool search_file_in_LIBPATH(const char* path, struct stat64x* stat) {
// specific AIX versions for ::dlopen() and ::dlclose(), which handles the struct g_handletable
// This way we mimic dl handle equality for a library
// opened a second time, as it is implemented on other platforms.
void* Aix_dlopen(const char* filename, int *eno, int Flags, const char** error_report) {
void* Aix_dlopen(const char* filename,int Flags, int *eno, const char** error_report) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert a space between ',' and 'int Flags'.
Otherwise it looks good now and I will check with a testbuild.

@JoKern65
Copy link
Contributor

JoKern65 commented Jul 4, 2024

Thanks Suchi. For me it looks good now. Test build with your changes also succeeded. I will wait for the test runs next night and approve then.

Copy link
Contributor

@JoKern65 JoKern65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now and test runs succeeded.

@suchismith1993
Copy link
Contributor Author

Thanks. @tstuefe Could you kindly provide a review.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 16, 2024
@suchismith1993
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jul 16, 2024
@openjdk
Copy link

openjdk bot commented Jul 16, 2024

@suchismith1993
Your change (at version 48c3a02) is now ready to be sponsored by a Committer.

@suchismith1993
Copy link
Contributor Author

@JoKern65 Could you sponsor this ?

@MBaesken
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jul 17, 2024

Going to push as commit 8713628.
Since your change was applied there have been 253 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 17, 2024
@openjdk openjdk bot closed this Jul 17, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jul 17, 2024
@openjdk
Copy link

openjdk bot commented Jul 17, 2024

@MBaesken @suchismith1993 Pushed as commit 8713628.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants