-
Notifications
You must be signed in to change notification settings - Fork 10
[BUG] Rounding errors cause wkz.gis.geo distance calculation error. #239
Comments
Hey! Thanks for filing that issue and providing a proposed fix! 👍 As I'm on a somewhat lengthy family trip, I won't be able to investigate this issue any time soon. However, your proposed fix looks good to me at first glance. If you want and have time a PR would be highly appreciated. I believe adding a little reproducing test for the mentioned function would help in gaining confidence that your proposed change actually fixes the bug. Let me know if I should provide more support/info. |
Hello! Thank you for your reply. Please, enjoy the trip, as this "bug" is far from critical. As it currently stands, I have no experience writing unit tests; that's why I stated I am happy to discuss whether the "bodge" is appropriate (and whether it fits the PR anyhow). I have been, however, doing some research in the meanwhile. Mainly discovering, why does this precise GPX log file trigger this error. As a write-up; it's difficult to logically sum up past 6 days; but here we go - it's a long, and messy one. EDIT: a late-night post is never a good idea. This comment has been updated 2022-05-20 at 9:30AM CEST. Chapter 1: Reproducing the error (with fictional coordinates) First, I've been experimenting with fictional coordinates - As the distance was 15 meters (and we already know this is far enough not to trigger error in WKZ), I decided to close the gap further, starting from
This was the computational output:
There we go. In the theoretical case of two neighboring points, which "happen" to have the same offset in both the latitude as well as in longitude, the "degree delta" between the latitude and longitude, smaller than What we also see is that the smallest possibile distance we can calculate is 9.5 centimeters. Chapter 2: Reproducing the error (with the GPX file) I have later tried to reproduce the error with the original GPX file. I have extracted the functions, used by WKZ, into a single separated script and did some "print debugging" on it. It occured to me, that in my case, the pair of "point, next_point" (to be using get_total_distance_of_trace() nomenclature) that triggered the computational error was Quick search throughout the GPX file contains these records - to a decimal place precisely, indicating no raw data has been rounded while (for example) loading them into Pandas array, so that's OK-good:
Chapter 3: How close is far enough, really? (The distance between two points) I plugged the coordinates into the NHC calculator and got The NHC's calculator page has a link that has brought me to a chapter about distance calculation. I began to dig around and search for the different methods of calculating the distance between two objects on the sphere. Sure enough, I quickly found a Wikipedia Article about Great circle distance and therefore also learned about a spherical law of cosines (method, used by WKZ). The article also quickly confirmed the sentence by Mr. Williams, about the rounding errors for two nearby coordinates: Mr. Williams (about alternative formula for distance calculation compared to Spherical law of cosines):
Wikipedia:
Opening Google maps and using coordinates from my GPX file does confirm, that these are quite "a bit" closer, compared to "a few meters". I'd rather say, they are pretty much few centimeters apart (minus the HDOP and gps accuracy, but that's a story for another day). This lead me to Haversine formula. Further reading on Wikipedia there states:
Thinking about that for a second brought me to a conclusion, that one rarely changes his location while working out, in a manner so rapid, causing his two consecutive track points being separated by half of a Earth's sphere. Chapter 4: How does the alternative stack up to the existing solution? Comparison time, then. In order to compare Haversine result with currently used, I implemented Haversine function in a totally-not-this-repo-friendly manner, also implementing my bodge for existing calculation formula (in order to finish calculations at all)
The output:
Chapter 5: Results are in, but how and why?
But where does the difference come from? Closely examining the legacy code seems to suggest that there are lot of points in my recording, that are (accidentally?) very close together:
Tis many close-proximity points could accumulate for a difference of 10 meters. Heck, it could accumulate for even more. I am not going to assume or dig down this rabbit hole too much. :) Chapter 6: Testing the new function again - with our first, fictional scenario. So, how does Haversine stack up in the rounding errors on ultra-small coordinate differences? I have decided to run the test from the beginning of my comment on Haversines:
Last few lines of output:
This would seem to suggest that the accuracy of Haversine function is far greater and error-prone. Even lowering the "delta step" led me to the results in range of "1e-08". I know this comment is an exhausting giant mess, I won't lie. At least I hope, that it illustrates that the cause of the problem is the rounding error (in cosine function), and that for small proximity, cosines should apparently be avoided. :) As stated above, as our sport activities will never jump across half the globe between two recordings, my personal suggestion is that we replace the existing mathematical formula with Haversines. Take all the time you need and let me know what do you think. Kind regards! P.S. Also, are you sure that this is multiplied with "centimeters"? I believe it should stand for "meters" :) |
@kw4tts wow what an investigation! Thank you so much for this thorough comment. I totally agree on your suggestion of using haversine instead of the custom math formula. By now I actually already have some other tool published which in fact uses haversine. I have to admit though, that I simply adopted the haversine pypi package because of its convenience and not the improved handling of coordinates with small proximity. Thanks to your in depth investigation I now also have much more important argument at hand. I will try to come up with an implementation which replaces the current math formula with the haversine approach soon. |
(Just need to fix the failing ci pipeline, hasn't been touched for a while...) |
Describe the bug
As I myself am not a particularly fast uphill rider, the two coordinates of a few consecutive points of an activity of mine (GPX file attached) were in such a close proximity, causing the rounding errors in function
calculate_distance_between_points
(@wkz.gis.geo:30) to sum the mathematical formula beyond 1 and cause functionacos()
(@wkz.gis.geo:33) to fail (RaisingValueError: math domain error
).workoutizer/wkz/gis/geo.py
Line 33 in 768e676
To Reproduce
Steps to reproduce the behavior:
Environment:
Ubuntu 20.04.4 LTS
99.0
0.25.0
Additional context
Attached sample GPX file as TXT: BMC.txt
Suggested workaround/fix
The following code is my "bodge" / "fix". I highly appreciate somebody having a second look and share their thought.
TLDR: The math formula is wrapped in
max(0,min(FORMULA,1))
Edit
Hats down to the author for providing such a functional platform/portal. I was already making my own variant of this and found this one by accident, browsing for a gpx parsing pip package.
The text was updated successfully, but these errors were encountered: