Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Implement experimental javalib IntStream & LongStream classes #3729

Conversation

LeeTibbert
Copy link
Contributor

@LeeTibbert LeeTibbert commented Feb 1, 2024

The javalib IntStream and LongStream classes now have an experimental implementation.

A number of classes, noticibly the Random and CharSequence classes, were modified
to provide previously missing methods which IntStream and LongStream.

unit-tests are provided for the change of this PR.

Individual morsels of this PR might be backported to SN 4.n, but the overall PR is
probably not a good candidate: too many interconnected pieces.

@WojciechMazur
Copy link
Contributor

Wonderful! Today I've made experiment with foundations for publishing Scala projects as dynamic libraries, and that was one of found blockers - some methods in Scala 2 stdlib were using IntStream.
That's a lot of code, but I'll try to review it as soon as posible

@LeeTibbert
Copy link
Contributor Author

LeeTibbert commented Feb 2, 2024

The prospect of this working getting actual use gladdens my heart!

  1. re: review
    You are on an amazing roll and I do not want to disturb that.
    Nothing in this PR blocks pending work. As always, I appreciate your time when the turn of this
    PR comes around.

  2. re: scala2 stdlib IntStream usage
    Do you happen to remember approximately where they were used? I can rat around and
    try to find the uses. I suspect that it may be worthwhile for me to specialize at least String.

  3. Both sequential and parallel streams have a maximum parallelism of one. That is, parallel
    streams exist but do not offer much difference.

    There is a long discussion about parallelism. Such "auto concurrent" support was supposed
    to be a major feature of Streams. There are probably several (like in the 100s) of Ph. D. dissertations
    worth of work in this area. When/if the dust settles, we could/should chat about what is economic
    in the SN context.

    I have done some private studies about true parallel streams for Scala Native. I hope to be able
    to get back to that work. I suspect that increased parallelism will only payoff for people
    who provide custom "intelligent" spliterators. That is, spliterators which know the amount work
    which should be done for each item and a better guestimate of the number of items to be processed
    (with number of hardware CPU available as background information).

    Current spliterators, as reverse engineered from the JVM, look to be economic only when the
    amount of work per item is large. The JVM (and JSR-166) also document that such parallelism
    only comes into play when considering 10,000 items.

  4. Concerns:

    • Correctness: I have tried to use the compiler to ensure that DoubleStream, IntStream, and LongStream
      all use Scala AnyValue (Java primitives) and not fall into Scala AnyRef (Java Object). I think I have it right
      in almost all cases, but this is a "be on the lookout" (BOLO) concern for me. I will probably be chasing
      individual cases for years.

    • Performance:

      1. Overall: Streams make a number of programming idioms easy but there are a lot of method
        calls embedded in the pipelines implementing them. I have no objective measurements
        of the speed or lack of such for Streams. A work-in-hope. I also have no objective
        measurements of any performance bottlenecks.

      2. AnyVal vs AnyRef: Someday, I would like to measure two runs. One using Stream[Double] and
        the other using DoubleStream. In theory, desire, and intent, the latter should be faster (as AnyVal).
        There may be some boxing/unboxing underneath going on that I have missed. I suspect Scala & Scala Native
        are actually doing pretty well with Stream[Double].

    • A lot of the code is boilerplate. I need to discover the quirks of each type of Stream before considering
      generating the specialized code from a template (.gyb). An at least preliminary understanding of
      parallelism concerns would also help. I certainly do not like making the same correction in 4 places
      and missing one of them.

Copy link
Contributor

@WojciechMazur WojciechMazur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned with hard dependencies on {Int/Long}StreamImpl. I don't fully understand the composition of different Impl classes in Streams, but I worry that a custom Stream implementation might slip into the pipeline leading to runtime class cast exception. If possible we should either ensure that different Stream implementations would not mixed in the pipeline, or only depend on public API.
Otherwise looks good to me

if (n > 0) {
nextInt(n) + origin
} else { // range not representable as int
var r: Integer = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why boxed Integer instead of primitive Int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That is bad translation from the Java, not something intentional.
Fixed now.

Thank you for reviewing all this code and doing so on a weekend.

}

val coercedPriorStages = pipeline
.asInstanceOf[ArrayDeque[IntStreamImpl]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that pipeline would contain IntStream element which is not an instance of IntStreamImpl? For example, if somebody implements a custom IntStream, can a custom implementation be added to pipeline? It seems to be possible during flatMap-like operations. In such case it would fail at runtime.

_parallel,
coercedPriorStages
)
.asInstanceOf[LongStream]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast should not be required. It should be automatically widen to LongStream by compiler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in several places in java.util.stream.

The changes do make the code more idiomatic & consistent.

@LeeTibbert
Copy link
Contributor Author

I'm a bit concerned with hard dependencies on {Int/Long}StreamImpl. I don't fully understand the composition of different Impl classes in Streams, but I worry that a custom Stream implementation might slip into the pipeline leading to runtime class cast exception. If possible we should either ensure that different Stream implementations would not mixed in the pipeline, or only depend on public API. Otherwise looks good to me

This is exactly the kind of insightful alternate view I was hoping to get from review. thank you.

I would have to check what Java allows. It does not use the "providers" concept for the Stream classes,
so one would need to sub-class. What happens when mixing custom subclasses of *Stream.
I would have to see what magic its internal ReferencePipeline does or does not do.

I propose that this PR, once reasonably passing CI, flow through. It is pretty large & complex already.
I can create an Issue about the lack_of/difficulty_of extensibility so that the concern is not forgotten.


The entire SN pipeline implementation, across all types of *Streams, is private[stream. To call it brittle would
be kind to it and its author would not disagree.

I believe the answer & intent is that no custom implementation of *Stream can slip into the SN pipeline.
That is because the *StreamImpl classes are all private [stream] and not accessible to the world.
To be compatible with the JVM, there is no public method to add a stage to the SN internal pipeline.

This means that any custom implementation of one *Stream class would need to implement its own private implementation of the pipeline concept and all other *Stream classes & objects intended to be used
in that pipeline.

So, this gives safety but not ease of extensibility.


From studying the quoted comment, I need to do some sandbox work. I have the hint of a way to remove
some of the hideous but effective type coercions involved with the pipeline ArrayDeque.

@LeeTibbert
Copy link
Contributor Author

Three failures are "usual suspects", all others are Green.

@WojciechMazur WojciechMazur merged commit 0225ce8 into scala-native:main Feb 6, 2024
59 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants