Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso #7270

Merged
merged 35 commits into from
Jul 21, 2023

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Jul 12, 2023

Pull Request Description

The added benchmark is a basis for a performance investigation.

We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd radeusgd force-pushed the wip/radeusgd/benchmark-vectorized-column-oops branch from 4120352 to d93ff75 Compare July 13, 2023 19:19
@radeusgd
Copy link
Member Author

benchmarks-vectorized-2023-07-13-21.txt

Results of final run of todays benchmarks. Tomorrow I will analyze them and write up a summary.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 14, 2023

Comparing performance Java to Enso roundtrip vs staying in Enso, on a primitive value (long)

We are comparing the cost of computing x + c accross various approaches. The goal of this particular measurement is to see how 'bad' or 'good' the roundtrip is. Java vectorized is just used as a reference - comparing vectorized ops is done later.

image

Date conversions

Firstly, we can observe that not performing date conversions is speeding up our processing by 35%. This suggests that we should remove them wherever possible.

The date conversion is needed when passing Enso objects to a Java Table to ensure that EnsoDate (et al.) get converted to the LocalDate. Without this conversion, the EnsoDate would be a Value or PolyglotMap that is not recognizable in Java and we could not perform date-time operations on it.

However, this only applies to columns of date-time types and Mixed type (since it can also get date-times and needs them to be properly ready for Java usage). Currently we are doing conversions everywhere but this proves that we should stop where possible.
The ticket #6111 should be able to fix that - once we are able to specify an expected Value_Type in Column.from_vector, we can choose to perform the conversions only when the type may accept date-time objects. In other cases they are unnecessary.

Mapping a vector VS using Storage+Builder through polyglot

In this example, the fastest approach was accessing the storage through polyglot getStorage.getItemBoxed. Converting Storage to a Vector, mapping it and then rebuilding from that vector, without the unnecessary here date conversions, was only 8% slower. Curiously, using specialized storage builder was actually 22% slower - probably because the benchmark code had some flaw - maybe calling a method accepting unboxed long is not at all faster than calling one accepting an Object? That remains to be seen.

Calling-back to Enso from Java

The primary focus of this particular benchmark was to measure how calling Enso callbacks from Java compares to running all the time in Enso and just doing polyglot calls.

Calling back to Java was 22% slower than the fastest Enso implementation. That is a bit slower but within range of fine-tuning as other Enso implementations were reaching similar targets. It was still faster than the approach that is fastest was slowed down by the unnecessary date conversions.

As a comparative baseline, we see that a vectorized operation (+) was 14x faster than the fastest Enso implementation and about 18x faster than calling back to Java.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 14, 2023

Comparing a Boxed value (String) Java-Enso-roundtrip vs staying in Enso

Now we do the same as before, but on a boxed value (String) and using a custom function from Enso (e.g. coming from a user, like in Column.map) - so we cannot rely on a Java vectorized counterpart. This is to measure how 'bad' is the Java-Enso roundtrip and if its better or worse than doing the thing directly in Enso.

The function we use is f x = "|" + x + "|".

image

We can see that even though there is overhead of calling back to Enso, the operation performed in Java and calling back to Enso is still the fastest here.

The second fastest, is doing mapping with Enso using Storage.getItemBoxed and building the result with our StringBuilder - it is only 17% slower.

Converting to_vector and then back, is actually 22% slower than Java roundtrip if we disable date conversions and much much worse with them turned on.

Conclusions

We are currently using the to_vector and back approach, as we thought it avoids the roundtrip overhead. We are currently performing the date conversions.

This "optimization" has been implemented without good enough measurements and based on these results is most likely actually making things worse. We need to revert it and most likely we just need to rely on the 'good old' roundtrip.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 14, 2023

Comparing Enso vs Java vectorized operation on a Boxed value (String)

This is the main benchmark that we wanted to do. We are comparing the performance of an operation performed in Enso VS one performed in Java (without calling back to Enso like above, the operation is 'vectorized' i.e. implemented in pure Java).

We want the Enso results to be as close to Java as possible.

Map operation

First we perform a map, in particular we map a String column with column.ends_with constant, returning a boolean column.

image

Here we can see that the performance is rather comparable.

The best Enso implementation is only 4.3% slower on average than the Java vectorized one. That is pretty good!

Interestingly, in this case, contrary to above, the approach of to_vector and rebuilding the column from it is actually faster than using a builder.

Looking at the plot, we can see that there is quite a lot of variability in all implementations - the Java one manages to achieve the lowest results, but all of them have +- similar variance - from 700ms to 1100ms (or more). With such variance, comparing averages is not the most robust way to compare the results - looking at the plot one could say that all of these operations have comparable performance, with Java being slightly better.

I'm worried about the very high variance here. I may have chosen a bad example to use _.ends_with. I thought it's nice because it's vectorized at both sides, but it calls back to a Java library (ICU), so the Enso operation does not have much advantage like as if it were a builtin. I may try re-creating this benchmark with some other example.

Bi-map operation

Now we compare an operation that combines two columns and creates a new one. We take two text columns and concatenate them. The texts are kept relatively short to measure mostly the overheads and not the cost of concatenation itself.

image

Here there is much less variance within each operation. We can see that Java is a clear winner here. I it is 3.3x faster than the fastest Enso implementation (using Storage.getItemBoxed and builder). The other approach - to_vector and back has very comparable performance to the builder one (if we disable unnecessary date conversions) but is slightly slower on average.

This sets a first optimization goal - here Enso is 3x slower - that is quite a lot. Ideally, we'd want to bring this much closer so that Enso can stay within 10-20% of the Java baseline.

Aggregate

The last operations computes an aggregate result over a column. It is interesting, because its result is not a new column but a single scalar value. So it can technically be implemented in O(1) memory - possibly reducing GC noise.

image

The results are actually quite surprising - the fastest implementation is using the to_vector approach, although one using Storage is only marginally slower. Java vectorized operation here is actually 20% slower than Enso here.

I think I may have done this benchmark wrong. I used table.aggregate which is supposed to compute the Longest string, but there may be some more logic happening in aggregate (since it is also prepared to handle stuff like grouping). But it could be just that computing string length is actually pretty costly (due to ICU grapheme cluster handling). TO BE REVISED - I should try a simpler for loop for Java case.

Revised Aggregate

Again, computing String length in Enso uses pretty heavy ICU. That may not be the best benchmark to use as a baseline. Thus I created another one - we do a custom operation where we iterate over each date (so its a Boxed value) and sum their month parts. It's a relatively simple operation and returns a scalar integer.

Since Table library does not have a dedicated vectorized operation for this weird scenario, I hacked a sumMonths helper on the DateStorage, to see what is the performance of the operation in 'pure Java'.

image

Here we can see that the optimized Java implementation is 33x faster.

@radeusgd
Copy link
Member Author

radeusgd commented Jul 14, 2023

Comparing Enso vs Java vectorized operation on a primitive value (long)

Now we will compare the performance on more primitive operations - numbers.

Ideally, we also want Enso to be as close to Java performance here, but that may be much harder to achieve.

Map operation

We map a column by adding a constant to it.

image

We can see that the fastest Enso implementation is the to_vector approach (if we avoid date conversions), but it is still 12.5x slower than Java.

Bi-map operation

We add two columns to each other.

image

Again, best Enso implementation is to_vector which is 13.4x slower than Java. The other two (to_vector with date conversions and using Storage.getItem+builder) are comparable to each other but are both 20x slower than Java.

Total Aggregate

image

On a primitive aggregation that computes just a result value, Java is only about 2-3x faster when doing aggregate - but that includes grouping logic.

Once we switch to java_optimized - a simple for loop more closely resembling the fold that we do in Enso, then Java is ~70x faster than the best Enso implementation!

@radeusgd
Copy link
Member Author

The Enso workflow I'm using to analyze benchmarks
VectorizedBenchmarks-2023-07-14.zip

@radeusgd
Copy link
Member Author

Revisiting Boxed Map operation

The previous Boxed map operation was relying on a pretty heavy ends_with check.

I created a new benchmark, using Date.year instead, which is much simpler and should serve as a better baseline.

image

Here we can see that the Java vectorized operation is fast. It is 12.5x faster than the fastest Enso implementation. So here we are getting results similar to the divide in primitive operations, even though here the Date is boxed. It may be additional overhead of polyglot conversions between EnsoDate and Java's LocalDate at the polyglot boundary that make Enso worse.

The comparisons also show us some guideline for library development - the approach with builder is faster than rebuilding from vector, and curiously a presumably not boxing appendLong is actually worse than appendNoGrow that takes an Object - it could be the no-grow aspect, but most likely the unboxing logic actually complicates the polyglot dispatch. The difference is however only 8%. Using vector is 20% slower which is signficant.

@radeusgd
Copy link
Member Author

I'm thinking it could be worth to get this PR merged to keep these benchmarks for future reference.

The only blocking issue is some modifications in Table Java helpers that should not be 'leaked' to production code just because some benchmarks wanted them. Next week I will try to refactor them into separate helpers placed in the Benchmarks project. This may slightly influence the results (as they will need to use indirection of calling methods on Storage instead of direct array access). I will post the updated results then, they will show a more realistic scenario anyway.

@radeusgd
Copy link
Member Author

Summary

This data gives us two main insights. I will summarize them shortly here and then link to tickets.

Improving current Enso libraries

It seems that for boxed types, Storage.getItemBoxed seems to be yielding faster effects, whereas for primitive types apparently to_vector seems to be better.

We should revisit places where we use non-vectorized operations and amend these to use the best tool for each.

Some further benchmarks may be needed to judge as we have combinations of Storage.getItemBoxed + Builder and to_vector + from_vector, but we should also try cross-combinations of these to see the best results.

Second thing, we modified Column.map to use to_vector + from_vector, but the benchmarks show that it is not actually better. We may want to run a few more tests on other kinds of functions to verify that is really the case, but most likely we want to revert that change and return to relying on Java calling back to Enso.

Optimizing Enso

We are seeing that Enso is ~3x slower than Java on concatenating two string columns. That is a decent result but possibly could be better.

Even more, we can see that other boxed operations like mapping Date.year or summing Date.month tend to be 12x faster in Java or more. That is a big difference. We should try to get to be not worse than 3x slower on these cases, ideally within 10-20% of Java runtime.

Then, there are operations on primitive values. There also many are roughly 12x slower than Java. Ideally we should get these results closer as well, recognizing it may be harder to achieve then for boxed values, as Java has more advantage in this case. An optimized primitive aggregation that is not allocating (but returning a scalar) is as much as 70x faster in Java!

@GregoryTravis
Copy link
Contributor

It seems that for boxed types, Storage.getItemBoxed seems to be yielding faster effects, whereas for primitive types apparently to_vector seems to be better.

This is interesting. Does getItemBoxed have overhead because it is doing the boxing? I had assumed this just returns a boxed value unchanged in most cases.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the tests is good, but until there is an objective way to execute the tests reliably and daily, I would avoid making conclusions of what needs to be fixed. Also related to #5067. I assume I need some assisted guidance/walk thru to understand the individual tests more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having these benchmarks is interesting and important. At the end most of Enso users are going to operate on Table and Column objects and we need to make sure the operations are fast.

On the other hand, I have to admit, it is all a bit too fuzzy for me. It is a bunch of code and I don't even know where to start, how to run a single benchmark, how to warm it up, how to compare the results, etc.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to run a single benchmark

To run a single benchmark, we just need to run the main function. I use the launcher - enso run benchmark-file.enso. The Run_All.enso is just a shortcut to run all of these.

How to warm it up

I run the benchmark on a 1 million row table 100 times. We can then look at the graphs to see after how many iterations the performance improves.

When comparing the average run times, I was dropping the first 40 iterations and working with averages over the last 60 iterations, assuming that 40 iterations was enough for these 10^6 loops to sufficiently warm up.

I'm not sure how better we can measure the warm up in a more precise manner, I'm open to receiving help in that regard.

On the other hand, I think this methodology, while rather simple, may actually be good for us (feel free to prove me wrong).

It may not measure the highest peak performance, if I do not warm up it enough. However, I assume that our users will most often use the Table library on data on the order of magnitude of 10k to 100M rows, and so it seems that we may be most interested in performance on such magnitudes after 1-2 runs. If peak performance is achieved too late, it will not be noticed by the user and what we care most about is that Enso runs smoothly when working live. So this seemed like a good enough measurement of the execution times that the users will 'feel'.

How to compare the results

I may need more clarification what you are asking for here. Practically, I was using the workflow I linked below - I can add it to the repository if you think it is worth it. In the workflow we load a file and can then choose which benchmark we display, we can view the scatterplot and compute average runtimes and relative differences between averages.

In terms of 'what' we are comparing:

I've created a separate Enso type for different types of 'tasks'. Each Enso type represents one 'thing we want to compute' - like adding two columns, executing some function over a column returning a new column or a scalar. Then, each method inside of this type represents one 'approach' - i.e. one particular implementation getting us that results. We can compare various methods performance between each other within a single 'task' to see which one is most efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to run a single benchmark?
The Run_All.enso is just a shortcut to run all of these.

I am not sure what you mean by Run_All. enso$ find . | grep Run_All yields nothing.

To run a single benchmark, we just need to run the main function.
I use the launcher - enso run benchmark-file.enso.

Do you mean I should run:

enso$ ./built-distribution/enso-engine-0.0.0-dev-linux-amd64/enso-0.0.0-dev/bin/enso --run test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso 
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:12:1: error: The `project` keyword was used in an import statement, but the module does not belong to a project.
   12 | import project.Table.Performance_Research.Common_Setup.Common_Setup
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:13:1: error: The `project` keyword was used in an import statement, but the module does not belong to a project.
   13 | import project.Table.Performance_Research.Helpers
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:32:9: error: The name `Helpers` could not be found.
   32 |         Helpers.column_from_vector "result" mapped convert_polyglot_dates=convert_polyglot_dates
      |         ^~~~~~~
/home/devel/NetBeansProjects/enso/enso/test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso:60:14: error: The name `Common_Setup` could not be found.
   60 | main = spec (Common_Setup.Config)
      |              ^~~~~~~~~~~~
Aborting due to 4 errors and 0 warnings.
Execution finished with an error: Compilation aborted due to errors.

that doesn't seem to work.

Copy link
Member

@JaroslavTulach JaroslavTulach Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover I guess we are facing a problem of different perspective...

To run a single benchmark, we just need to run the main function.

by running enso --run test/Exploratory_Benchmarks/src/Table/Column_Map_2.enso I will not run a single benchmark, but five different Bench.measure!

I need to run just a single Bench.measure and run it in isolation with arguments to dump IGV graphs. If I run five benchmarks (aka Bench.measure) at once, I will not even be able to find out which graph belongs to which Bench.measure.

We really need an ability to run just a single Bench.measure and as such I'd like to advocate immediate switching to #7101 Bench builder pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to warm it up?
... dropping the first 40 iterations and
working with averages over the last 60 iterations,

The JMH library has concept of warmup and measurement. You are basically saying that @Warmup(iterations=40) and @Measurement(iterations=60). That's a fair methodology.

The more JMH-like our harness is, the easier will be the integration with existing benchmarks. As such I'd like to advocate using the same warmup & measurement terminology as for example here.

assuming that 40 iterations was enough for these 10^6 loops to sufficiently warm up.

There is a way to detect a compilation kicked in unexpectedly and fail the benchmark. I proposed it when I wanted to speed the benchmarks run by lowering the number of warmup iterations as much as possible in #6271

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't seem to work.

That's why I use the launcher 😉

If you are using the runner directly, you need to add --in-project with path to project root for this to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need an ability to run just a single Bench.measure and as such I'd like to advocate immediate switching to #7101 Bench builder pattern.

Then I propose to merge this PR and get it converted to the builder pattern as part of #7101 which is still a draft. Does that sound ok?

@@ -142,6 +142,20 @@ public static Column fromItems(String name, List<Value> items) {
return new Column(name, storage);
}

/** Creates a column from an Enso array. No polyglot conversion happens. This is unsafe */
public static Column fromItemsRaw(String name, List<Object> items, StorageType expectedType) throws ClassCastException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember when fromItems introduced List<Value>... deciding whether List<Value> is needed or whether List<Object> is enough is a delicate dance balancing GraalVM Host Interop, GraalVM version, Enso implementation of interop and needs of the libraries.

In any case, I believe the #1 thing we need is to get stable, independent benchmarks being executed daily by the CI. Without it we will just shift the code back and forth without being sure the system is improved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I will keep it as is, but I will be adding tickets to revisit this.

@JaroslavTulach
Copy link
Member

I also want to create a ticket about the performance of the interpreter itself

I don't necessarily need another "I wish Enso interpreter would be lightening fast" ticket, but once there is a way to run individual benchmark in separation feel free to report where a single benchmark lacks behind another. There already is

but it was reported the "old way". Such a colloquial report is unhelpful as it requires extra time and work to turn the benchmark into JMH benchmark. To avoid such a useless operation, I'd like us to fix the benchmarking infrastructure first before we flood the issue tracker with more #5067-like reports which are too fuzzy from an interpreter prespective.

Comment on lines 3 to 5
This workflow is prepared mostly to analyse the output of benchmarks
from `test/Exploratory_Benchmarks`. See `test/Exploratory_Benchmarks/README.md`
for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works with the output of any of our current text based benchmarks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as their name the benchmarks using the pattern <group>.<single-bench-name>, then yes.

The idea of the workflow is that we display plots comparing <single-bench-name> inside of a single <group>. If the benchmarks are also grouped like that and we want to compare them within groups, then yes. If not - we'd need to adapt it but we can re-use at least the data loading part (which is just a few first nodes).

@radeusgd
Copy link
Member Author

radeusgd commented Jul 18, 2023

Summarizing findings

Column.map should be reverted back to Storage.map

image
image

We can see that in both cases, the java_roundtrip approach is fastest (excluding java_vectorized, but that is just for comparison - generic map is not possible to vectorize).

We can see that reverting will be 2-6x faster than the current implementation, depending on the use case.

The boxed operation has a lot of variance to it on all approaches, but still the averages are much better for java_roundtrip.

How shall we implement Column.round?

From the data we have, it seems that the Java vectorized approach will be fastest. However, this varies a lot from implementation to implementation. To really know this we would need to implement a PoC of the Vectorized operation in Java for the benchmark and do measurements. Without this it is hard to judge how big is the overhead of instead building it from the more primitive vectorized operations.

Using our Enso .round builtin as a Java-to-Enso callback will be best implemented using the java_roundtrip approach, not using builders. But whether it would be faster or slower than the current implementation is impossible to predict well without doing the measurements.

Based on results from Primitive_Enso_Callback_Test we can expect the callback to be about 15x slower than Java vectorized approach. We do not have measurements comparing the approach relying on primitive vectorized operations - how round is currently implemented.

If we really want to answer this question - unfortunately we need a bit more measurements. I can write a benchmark specifically for round. With just comparing Java-to-Enso callback vs the current implementation should take about 1h to implement and measure - it is very simple. Getting a Java vectorized implementation can take additional 1-2h I imagine. @jdunkerley how would you like me to proceed with this? I'm happy to leave it for now, do the simple 1h measurement or do the more complex one.

Optimizing Table.aggregate

The benchmarks have found a bit of a 'bottleneck' when computing simple not-grouped aggregates.

image

We can see that for tasks like computing the Maximum integer value in a Table, the 'ideal' implementation is as much as 48x faster than the current implementation.

Hypotheses:

  1. Because we re-use the same logic that is used with grouping, we have to create a huge List<Integer> of all indices to run the aggregate. We can do this much more efficiently, without generating the list of indices, or at least by using a specialized List implementation that would take O(1) time and memory (essentially a Range).
  2. We are using a generic comparison logic which is doing a lot of checks. We could implement 'specialized' aggregates for specialized storages (like LongStorage and DoubleStorage) which will avoid this complicated logic in favour of super low-level numeric comparisons.

It is hard to say whether (1) or (2) is the 'more major' bottleneck here - these are definitely venues for improvement and the measurements suggest we can get around 40x improvement from implementing both of these.

image

The Boxed_Total_Aggregate_Test where the cost is much more dominated by the cost of computing Text length, shows that the java_loop is about 30% faster than the current implementation.

It is hard to translate that onto the case of primitives, but suggests that the hypothesis (2) may give a bigger improvement than (1) - but as always with such things - the only way to know for sure is to do practical measurements.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why my review is needed? Maybe because of build.sbt changes? Anyway (as expressed in the comments), I don't like the (old) way benchmarking is done in Enso and I am not keen on seeing more benchmarks written in the old way. As such I've just started #7324 to give us more reusable "builder API". I would personally prefer if #7324 was integrated first, this PR got adjusted to the new "builder API" and only then integrated. However I am approving anyway, as I believe my approval shouldn't be needed for (almost) pure Enso code changes.

@JaroslavTulach JaroslavTulach mentioned this pull request Jul 18, 2023
2 tasks
mergify bot pushed a commit that referenced this pull request Jul 19, 2023
Designing new `Bench` API to _collect benchmarks_ first and only execute them then. This is a minimal change to allow  implementation of #7323  - e.g. ability to invoke a _single benchmark_ via JMH harness.

# Important Notes
This is just the basic API skeleton. It can be enhanced, if the basic properties (allowing integration with JMH) are kept. It is not intent of this PR to make the API 100% perfect and usable. Neither it is goal of this PR to update existing benchmarks to use it (74ac8d7 changes only one of them to demonstrate _it all works_ somehow). It is however expected that once this PR is integrated, the newly written benchmarks (like the ones from #7270) are going to use (or even enhance) the new API.
Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thorough exploration of the space.

I'd like us to avoid changing the core type yet but agree the next steps and what we should take for here as immediate changes.

We should then work to fold this into the new benchmark framework being built up by @Akirathan

@@ -142,6 +142,20 @@ public static Column fromItems(String name, List<Value> items) {
return new Column(name, storage);
}

/** Creates a column from an Enso array. No polyglot conversion happens. This is unsafe */
public static Column fromItemsRaw(String name, List<Object> items, StorageType expectedType) throws ClassCastException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I don't think we should add this to the core Column type.
It can be a static method anywhere from my read so I suggest it goes into the exploratory helper Java.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep it in std-lib, because I actually plan to use it in #6111 - there when we know the expected value type, if the value type is not Mixed nor Date (etc.), we can guarantee no dates will come in - and so we do not need the conversion to happen. This will make the map operation and construction of columns like integer column much faster.

I can make it a static, but since I plan to use it in the main codebase soon, I did not want to move it to the helpers - since I'd have to move it back again when implementing #6111

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair - felt for an experiment we shouldn't be changing the core but happy given next step will be to change it.

@radeusgd radeusgd added CI: Ready to merge This PR is eligible for automatic merge CI: Keep up to date Automatically update this PR to the latest develop. labels Jul 20, 2023
@radeusgd radeusgd removed the CI: Keep up to date Automatically update this PR to the latest develop. label Jul 21, 2023
@radeusgd radeusgd self-assigned this Jul 21, 2023
@mergify mergify bot merged commit 56635c9 into develop Jul 21, 2023
23 of 25 checks passed
@mergify mergify bot deleted the wip/radeusgd/benchmark-vectorized-column-oops branch July 21, 2023 17:25
mergify bot pushed a commit that referenced this pull request Jul 25, 2023
- Fixes #7231
- Cleans up vectorized operations to distinguish unary and binary operations.
- Introduces MixedStorage which may pretend to be a more specialized storage on demand.
- Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type.
- Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso
- Tests for comparison of Mixed storages with each other and other types
- Started using `Set` for `Filter_Condition.Is_In` for better performance.
- ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~
- This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371
- ~~Trying to avoid conversions when calling Enso functions from Java.~~
- Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error.
- Not sure how much it really helps. [Benchmarks](#7270 (comment)) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results.
- Created #7378 to track this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants