Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no Java API for adding arrays of primitives to repeated fields without boxing #3316

Closed
oridb opened this issue Jul 5, 2017 · 18 comments
Labels
enhancement inactive Denotes the issue/PR has not seen activity in the last 90 days. java P3 performance

Comments

@oridb
Copy link

oridb commented Jul 5, 2017

I would like to have an API for primitive types of repeated fields, in order to efficiently set them. The reason I need this is that I am transferring a large array of floats (representing the pixels of a thermal image), and would be happier if there was no need to box each float before adding it to the protobuf.

Ideally, the protobuf would just provide a function addAllBuffer(float[] data) in addition to the current iterator based APIs. Currently, all of the APIs for adding an array involve boxing the data at some point or another.

The system setup is a little bit odd, since the grpc server is actually running on an android device, but I think this request applies in general.

Should this be an issue in the gRPC issue tracker?

This is a new feature request. Also, I was asked to file an issue by @ejona86.

What version of gRPC and what language are you using?

I am using Java. The desired API is not present in all versions, as far as I am aware.

What operating system (Linux, Windows, …) and version?

Android (Linux), targeting Android 6.

Yes, I would also like to switch to a newer version.

What runtime / compiler are you using (e.g. python version or version of gcc)

Android Studio, with current versions of the toolchain.

What did you expect to see?

An efficient, boxing-free, method of adding a large array of primitives to a protobuf.

What did you see instead?

http://cdn3.meme.am/cache/instances/folder113/500x/63531113/x-x-everywhere-boxing-boxing-everywhere.jpg

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Jul 5, 2017

There is already an unboxed accessor addBuffer(float value), but I guess the problem is that we are storing repeated fields in List<Float> internally and that requires boxing regardless how the API looks like.

@oridb
Copy link
Author

oridb commented Jul 7, 2017

Yes, I'm already using that API.

As you mentioned, it still boxes values. And it's just a bit clunky compared to just adding everything in one go. It's not a deal breaker, but improving performance on large primitive buffers would be nice. And having a simple API for it would be a pretty good bonus.

@noguespi
Copy link

noguespi commented Oct 3, 2017

I have the same issue.

We are returning a message of "repeated int32". Protobuf isn't using primitive and the boxing of each element generate a lot of gc. It starts to have a noticable impact when we reach a big amount of request per second.

Do we have a hotfix/alternative ?

@bolerio
Copy link

bolerio commented Apr 23, 2018

+1

1 similar comment
@paolieri
Copy link

+1

@rbroggi
Copy link

rbroggi commented May 18, 2018

Hi all,
I have the same problem and end up turning down the adoption due to this issue. I've made all the developments to migrate from an RMI style communication to a GRPC and the memory usage of my service jumped from ~20Gb to ~60Gb. doubles[] are much more concise than List. I'm a big +1 here...
I am a big fan of protobuf hopefully we can achieve this requirement.

@smovva
Copy link

smovva commented Aug 17, 2018

+1

@trams
Copy link

trams commented May 26, 2019

I see you did a big work on integrating collection of unboxed primitives to generated protobuf messages. That is great! But Builder are still using boxed types. So I am +1 for this

@hitsmaxft
Copy link

hitsmaxft commented Jun 18, 2020

we meet performance issue while receiving large response from big data services, such as ranking service, output large amount of scores (storage as List ) with PB protocol.

Boxed primitives with collection cause high memory and cpu overhead.

@elharo elharo self-assigned this Sep 30, 2021
@elharo
Copy link
Contributor

elharo commented Sep 30, 2021

We really should be able to do this with primitive arrays internally whatever the API looks like. Making a note to myself to investigate further. Does this only happen with floating point values or also with integers?

@vitalyli
Copy link

vitalyli commented Nov 12, 2021

This becomes a problem for Java GC especially for Ranker NN that requires float and int vectors. At very minimum it's huge waste to take in primitive array of floats and box each one into Float list. Then serialize that. For ranking this translates into 50k or more Float objects for GC to collect. At high QPS this becomes visible enough for us to consider a custom protocol instead of proto message.
import org.tensorflow.example.Features;
import org.tensorflow.example.FloatList;
Int64List
etc.
Really need FloatList and Int64List wrappers, that take (primitive array, start index, length) and avoid Boxing into objects to be thrown away immediately after proto is built.

@elharo
Copy link
Contributor

elharo commented Mar 4, 2022

This would require a Tier 2, possibly Tier 1, review.

The big question here is, as with all optimizations, does it matter? Can any significant impact be shown that justifies the effort of improving this? Base don the above comments, I think the answer is yes. However this is going to take non-trivial effort and convincing multiple folks, so if anyone can provide a more complete write-up of the measurements they've done and how much this is costing their app, that would be very helpful.

@wnleao
Copy link

wnleao commented Mar 10, 2022

Hey, @elharo! I would like to help. What kind of project do you think would be helpful? Would jmh benchmarks and visualvm profiles help?

For instance:

message ValuesBytes {
    bytes values = 1;
}

message ValuesRepeatedFloats {
    repeated float values = 1;
}

getValuesAsBytes: works with ByteString, converts it to a primitive float array and then accesses each value.

getValuesAsFloats: works with repeated float and uses getValues to access each value.

Benchmark                                                                  Mode  Cnt        Score          Error   Units
FloatBenchmark.getValuesAsBytes                                            avgt    5      107.403 ±        3.705   ms/op
FloatBenchmark.getValuesAsBytes:·gc.alloc.rate                             avgt    5        5.108 ±        0.157  MB/sec
FloatBenchmark.getValuesAsBytes:·gc.alloc.rate.norm                        avgt    5   843776.480 ±     9736.395    B/op
FloatBenchmark.getValuesAsBytes:·gc.count                                  avgt    5          ≈ 0                 counts
FloatBenchmark.getValuesAsRepeatedFloats                                   avgt    5      113.880 ±       23.573   ms/op
FloatBenchmark.getValuesAsRepeatedFloats:·gc.alloc.rate                    avgt    5       10.048 ±        2.066  MB/sec
FloatBenchmark.getValuesAsRepeatedFloats:·gc.alloc.rate.norm               avgt    5  1759073.476 ±     9518.489    B/op
FloatBenchmark.getValuesAsRepeatedFloats:·gc.churn.G1_Eden_Space           avgt    5       11.134 ±       95.865  MB/sec
FloatBenchmark.getValuesAsRepeatedFloats:·gc.churn.G1_Eden_Space.norm      avgt    5  2050548.622 ± 17655842.045    B/op
FloatBenchmark.getValuesAsRepeatedFloats:·gc.churn.G1_Survivor_Space       avgt    5        1.012 ±        8.715  MB/sec
FloatBenchmark.getValuesAsRepeatedFloats:·gc.churn.G1_Survivor_Space.norm  avgt    5   186413.511 ±  1605076.550    B/op
FloatBenchmark.getValuesAsRepeatedFloats:·gc.count                         avgt    5        1.000                 counts
FloatBenchmark.getValuesAsRepeatedFloats:·gc.time                          avgt    5       11.000                     ms

@elharo elharo removed their assignment Apr 15, 2022
@elharo
Copy link
Contributor

elharo commented Apr 15, 2022

Where I'd like to start is with a compelling real world application that shows this as a bottleneck, not simply a microbenchmark. Large arrays of primitives aren't that common outside of scientific and engineering applications so it's possible this just doesn't come up that often. However if we could find an existing large app that is burning significant CPU on this that might be convincing. The ideal scenario would be a large GCP customer that could quantify how much of their monthly spend is going to this. Or an Android app that could show NNN miliiseconds of app startup time is spent here.

@vitalyli
Copy link

vitalyli commented Apr 16, 2022

Real world application is Maps ranking. High traffic, not a small application. I can't disclose actual numbers, however the limitation of boxing /unboxing of features from floating point native arrays and into Lists of objects is a clear overhead and it creates work for GC, the more often that runs the slower overall system.
For one TFRanking call, the input range is on the order of 200 features, by 30 to 60 documents. Give or take the input is 10000 floats on each call.

@noguespi
Copy link

Where I'd like to start is with a compelling real world application that shows this as a bottleneck, not simply a microbenchmark. Large arrays of primitives aren't that common outside of scientific and engineering applications so it's possible this just doesn't come up that often. However if we could find an existing large app that is burning significant CPU on this that might be convincing. The ideal scenario would be a large GCP customer that could quantify how much of their monthly spend is going to this. Or an Android app that could show NNN miliiseconds of app startup time is spent here.

see my previous comment

It has been years and I m no more working on this project, but if we chose protobuf and not REST at this time it was because of speed and performance. Definitely, the garbage collector had a big impact on the performance in our case and it was the bottleneck to greater throughput.

Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

@github-actions github-actions bot added the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Jun 16, 2024
Copy link

github-actions bot commented Jul 1, 2024

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement inactive Denotes the issue/PR has not seen activity in the last 90 days. java P3 performance
Projects
None yet
Development

No branches or pull requests