Rounding fix #392

gunnarmorling · 2024-01-14T08:27:12Z

No description provided.

…there are some without prepare script

AlexanderYastrebov · 2024-01-14T13:33:57Z

src/main/java/dev/morling/onebrc/CalculateAverage_baseline.java

@@ -79,7 +80,7 @@ public static void main(String[] args) throws IOException {
                    return res;
                },
                agg -> {
-                    return new ResultRow(agg.min, agg.sum / agg.count, agg.max);
+                    return new ResultRow(agg.min, (Math.round(agg.sum * 10.0) / 10.0) / agg.count, agg.max);


This is not correct and just adds another rounding on top of two roundings by toString and println and thus hiding the problem even more.

You see, as I mentioned in #49, the problem can not be fixed when double is used for calculation because not all numbers can be exactly represented as doubles (e.g. 0.1 or 99.9, see https://math.stackexchange.com/questions/2710986/exact-representation-of-floating-point-numbers) and therefore Douple.parseDouble or the summation are already imprecise. Adding any kind of rounding during calculation of average or printing won't fix that.

Consider:

package sum; import java.math.BigDecimal; class Sum { public static void main(String[] args) { var sum = 0.0; var sumD = BigDecimal.ZERO; var rowD = new BigDecimal("99.9"); var count = 1_000_000_000; for (int i = 0; i < count; i++) { sum += 99.9; sumD = sumD.add(rowD); } System.out.println(sum); System.out.println(sumD); } }

prints

$ java Sum.java 9.989999883589902E10 99900000000.0

As you can see the sum is not precise even before we do any division.

The proper way is either (slow) to use BigDecimal for the row values and to calculate sum and then apply rounding after average calculation or (fast) use integer summation of row*10 which is possible because input uses fixed format and then again apply rounding at the end.

I think it should be something like this:

/* * Copyright 2023 The original authors * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package dev.morling.onebrc; import static java.util.stream.Collectors.collectingAndThen; import static java.util.stream.Collectors.groupingBy; import static java.util.stream.Collectors.joining; import static java.util.stream.Collectors.reducing; import java.math.BigDecimal; import java.math.RoundingMode; import java.nio.file.Files; import java.nio.file.Paths; import java.util.Optional; import java.util.TreeMap; import java.util.stream.Stream; public class CalculateAverage_AlexanderYastrebov { private static class Measurement { final String name; private long count; // min, max and sum hold actual value scaled by 10 private long min; private long max; private long sum; static Measurement parse(String line) { var parts = line.split(";", 2); return new Measurement(parts[0], parseMetric(parts[1])); } private static long parseMetric(String s) { return Long.parseLong(s.replaceFirst("[.]", "")); } Measurement(String name, long value) { this.name = name; this.count = 1; this.min = this.max = this.sum = value; } Measurement add(Measurement m) { this.min = Math.min(min, m.min); this.max = Math.max(max, m.max); this.sum += m.sum; this.count += m.count; return this; } String getName() { return name; } String format() { var smin = BigDecimal.valueOf(min) .divide(BigDecimal.TEN, 1, RoundingMode.UNNECESSARY) .toPlainString(); var smax = BigDecimal.valueOf(max) .divide(BigDecimal.TEN, 1, RoundingMode.UNNECESSARY) .toPlainString(); var savg = BigDecimal.valueOf(sum) .divide(BigDecimal.valueOf(count * 10), 1, RoundingMode.CEILING) .toPlainString(); return String.format("%s=%s/%s/%s", name, smin, savg, smax); } } public static void main(String[] args) throws Exception { var input = "./measurements.txt"; if (args.length == 1) { input = args[0]; } try (Stream<String> lines = Files.lines(Paths.get(input))) { var result = lines.map(Measurement::parse) .collect(groupingBy(Measurement::getName, TreeMap::new, collectingAndThen(reducing(Measurement::add), Optional::get))); var output = result.values().stream() .map(Measurement::format) .collect(joining(", ", "{", "}")); System.out.println(output); } } }

You are right, the calculation isn't correct, and it's certainly not what I would recommend to do in any real-world application.

But does it matter in any practical sense for the challenge at hand? Specifically, can there be any 1B row dataset with values of one fractional digit where the accumulated error would be so significant, that the result with one fractional digit would differ from the result of a correct implementation?

This was referenced Jan 14, 2024

Compare outputs with tolerance #375

Closed

Compare results with tolerance #390

Closed

gunnarmorling force-pushed the rounding-fix branch 2 times, most recently from 6cb1c03 to f0204d9 Compare January 14, 2024 08:46

Add rounding error test case

d821805

gunnarmorling force-pushed the rounding-fix branch from 70692e7 to 359c0c2 Compare January 14, 2024 09:49

gunnarmorling added 2 commits January 14, 2024 10:55

#49 Fixing rounding behavior of baseline implementation

3f1fd2e

Making sure default SDK is used when evaluating multiple entries and …

6ff399d

…there are some without prepare script

gunnarmorling force-pushed the rounding-fix branch from 359c0c2 to 6ff399d Compare January 14, 2024 09:55

gunnarmorling merged commit a8fd067 into main Jan 14, 2024
1 check passed

gunnarmorling mentioned this pull request Jan 14, 2024

Clarify rounding semantics #49

Closed

gunnarmorling deleted the rounding-fix branch January 14, 2024 10:10

AlexanderYastrebov reviewed Jan 14, 2024

View reviewed changes

edorfaus mentioned this pull request Apr 23, 2024

Make the rounding rule match the baseline #746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rounding fix #392

Rounding fix #392

gunnarmorling commented Jan 14, 2024

AlexanderYastrebov Jan 14, 2024

AlexanderYastrebov Jan 14, 2024

gunnarmorling Jan 14, 2024

Rounding fix #392

Rounding fix #392

Conversation

gunnarmorling commented Jan 14, 2024

AlexanderYastrebov Jan 14, 2024

Choose a reason for hiding this comment

AlexanderYastrebov Jan 14, 2024

Choose a reason for hiding this comment

gunnarmorling Jan 14, 2024

Choose a reason for hiding this comment