Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3477: javalib Channels.newChannel#read now reports EOF #3478

Merged
merged 2 commits into from Sep 19, 2023

Conversation

LeeTibbert
Copy link
Contributor

Fix #3477

The read() method returned by javalib Channels.newChannel now reports end of file (a.k.a EOF)
as described in the corresponding JDK ReadByteChannel documentation (-1).

@LeeTibbert LeeTibbert added component:javalib backport candidate PR which might be backported into previous major release of SN labels Sep 13, 2023
@LeeTibbert
Copy link
Contributor Author

The four failures seem to have nothing to do with the changes of this PR.

At least two are concurrent pool failures caused by timeouts. The other two are just weird,
some spawn code saying that a particular C RTL call is not implemented. Probably a the call is
implemented but failing, and the error message is misleading/dead_wrong.

When he day cools off and I can get back to my work machine, I will roll this up the hill and drop it again.
Perhaps the cosmic ray flux in Seattle or wherever GitHub CI runs will have faded.

@LeeTibbert
Copy link
Contributor Author

The two failures appear to have nothing to do with the changes of this PR.

  1. The windows multithread failure appears to be an intermittent. It add weight to the
    idea that there is a defect in the, admittedly experimental multithread support, especially
    on Windows

  2. The multiarch pidfd_open failed: Function not implemented failure is also seen in three places in
    PR Javalib housekeeping: add two missing classes #3479. This appears to be the operating system C library telling the scala layer that the pidfd_open
    system call is not implemented. Could be a .h file changed or went missing. Or the constant could be
    getting mangled being passed from Scala to C.

    What is puzzling:

    1. Quite similar Tests are able to find the pidfd_open syscall.
    2) This was working in the last CI run I could find. September 6, 2023.    It also seemed to be working
         at some point on September 7, 2023 at the last CI run of PR #3473.  
    

My time is quite limited for the rest of the week. I will chase the pidfd_open failures when I can.
Not that my changes are proven innocent, but I suspect that the next person to submit a PR will
encounter the same defect.

@ekrich
Copy link
Member

ekrich commented Sep 14, 2023

This appears to be the operating system C library telling the scala layer that the pidfd_open
system call is not implemented.

There are some guards in the system for that call but they must not be sufficient. I am not sure why we are using these Linuxisms unless we have to as it decreases portability and maintainability.

@LeeTibbert
Copy link
Contributor Author

LeeTibbert commented Sep 15, 2023

The OS specific code is because process control tends to be OS specific. Linux has one way,
BSD (macOS, FreeBSD) use another, and Windows uses a third. This is very near the bottom
of the abstraction stack, where the rubber hits the road.

I think pidfd_open is actually the victim here. I think, but have not yet had time to prove,
that something went bump in the night around September 7.

Logically, the presence or absence of an operating system entry point should not depend on whether
or not SN Multithread is enabled.

Of course, this "feature" does not show itself on either my Linux or macOS system (or else
it would have been fixed by now).

My rough plan is to create a Draft/WIP PR which:

  1. Shows which to define the operating system constant is being taken: a .h file or the fallback constant.
    Is that consistent on all the multiarch runs, both passing & failing? What is the actual value being
    passed in C to the OS?

  2. After the above, roll back the changes of September 7, one by one, LIFO. Does the intermittent pidfd_open
    failure still exist at the Sept 6 level?

Unfortunately, as mentioned, my debug time is limited for the next week or so. This kind of work can
not be done, at least by me, on-the-fly. It takes concentration & attention to detail.

In my first few runs through this, I did not see any differences in how the succeeding & failing
multiarch operating system instances were constructed/build. I did not see pidfd_open being
tailored off, but it could be happening. Thus the proposed careful WIP rollback to the apparently
working Sept 6 configuration.

@LeeTibbert
Copy link
Contributor Author

From WIP PR #3480 and a CI run in my fork, it looks like this is becoming more consistently
a linux-arm-64 issue. Consistency if failure will help me track this down.

Now if someone could give me 72 hour energetic, productive hours in a day....

@WojciechMazur WojciechMazur merged commit f50d68d into scala-native:main Sep 19, 2023
77 of 79 checks passed
WojciechMazur pushed a commit to WojciechMazur/scala-native that referenced this pull request Oct 13, 2023
…OF (scala-native#3478)

* Fix scala-native#3477: javalib Channels.newChannel#read now reports EOF

* Restart CI; no semantic changes

(cherry picked from commit f50d68d)
WojciechMazur pushed a commit that referenced this pull request Oct 13, 2023
* Fix #3477: javalib Channels.newChannel#read now reports EOF

* Restart CI; no semantic changes

(cherry picked from commit f50d68d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport candidate PR which might be backported into previous major release of SN component:javalib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

javalib read() returned by Channels.newChannel should report EOF
3 participants