Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub#140 ⁃ SphinxQL protocol rounds float attributes to 6 decimal places #140

Closed
barryhunter opened this Issue Dec 17, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@barryhunter
Copy link
Contributor

barryhunter commented Dec 17, 2018

Describe the environment

Manticore Search version: 2.6.2 0bbd194@180223 release

OS version: "Ubuntu 14.04.5 LTS

Build version: 0bbd194

Describe the problem

Description of the issue: SphinxQL protocol rounds float attributes to 6 decimal places

Steps to reproduce:

index geopointrt {
        type = rt
        path = /var/lib/manticore/data/geopointrt
        rt_field = title
        rt_attr_float = wgs84_lat
        rt_attr_float = wgs84_long
}
sphinxQL>insert into geopointrt values (134501,'Orchard Terrace, Omagh',0.9528931356546767,-0.12749577825941016);
Query OK, 1 row affected (0.01 sec)

sphinxQL>select * from geopointrt where id=134501\G
*************************** 1. row ***************************
        id: 134501
 wgs84_lat: 0.952893
wgs84_long: -0.127496
1 row in set (0.00 sec)

For comparison from the API

            [134501] => Array
                (
                    [weight] => 1
                    [attrs] => Array
                        (
                            [wgs84_lat] => 0.95289313793182
                            [wgs84_long] => -0.1274957805872
                        )
                )

The same of course also happens when using a plain index, and data is direct from mysql,

  RADIANS(wgs84_lat) AS wgs84_lat,\
  RADIANS(wgs84_long) AS wgs84_long,\

 sql_attr_float           wgs84_lat
sql_attr_float          = wgs84_long

which is how found this.

The attribute is being written ok (as evidenced by the SPhinxAPI query), just rounded in SphinxQL. Getting the 6decimal place value in both mysql CLI client, and PHP code, so dont think its the client doing the rounding. (of and of course can get full more accurate result from mysql directly

mysql> select title,wgs84_lat,RADIANS(wgs84_lat),degrees(0.9528931356546767),degrees(0.952893),degrees(0.95289313793182),wgs84_long,RADIANS(wgs84_long),degrees(-0.127496) from gridimage_search where gridimage_id = 134501\G
*************************** 1. row ***************************
                      title: Orchard Terrace, Omagh
                  wgs84_lat: 54.596755
         RADIANS(wgs84_lat): 0.9528931356546767
degrees(0.9528931356546767): 54.596755
          degrees(0.952893): 54.596747227559554
  degrees(0.95289313793182): 54.5967551304707
                 wgs84_long: -7.304970
        RADIANS(wgs84_long): -0.12749577825941016
         degrees(-0.127496): -7.304982704799944
1 row in set (0.01 sec)

This demonstrates that a decimal latitude recomputed from the truncated radians value is wrong by a small amount (about 50m in real world :)

The API provided value 0.95289313793182 is still not exactly the same as provided as input, but not expected as exact. But looks like if got 10 decimal places back from SPhinxQL, then would be able to reproduce the original decimal lat/long to 6 dp.

@barryhunter

This comment has been minimized.

Copy link
Contributor Author

barryhunter commented Dec 17, 2018

btw, just realized can get slightly better results doing the conversion back to degrees, directly in searchd itself.

sphinxQL>select id,wgs84_lat,wgs84_lat * 57.29577951308232 as lat_degrees from geopointrt where id=134501\G
*************************** 1. row ***************************
             id: 134501
      wgs84_lat: 0.952893
    lat_degrees: 54.596756

But interestingly notice its still not quite right, getting 54.596756 as opposed to 54.596755 what was stored in mySQL originally.

MySQL doing the conversion back to degrees, using the value stored in attribute

mysql> select wgs84_lat,degrees(0.952893),degrees(0.95289313793182),0.95289313793182*57.29577951308232  from gridimage_search where gridimage_id = 134501\G
*************************** 1. row ***************************
                         wgs84_lat: 54.596755
                 degrees(0.952893): 54.596747227559554
         degrees(0.95289313793182): 54.5967551304707
0.95289313793182*57.29577951308232: 54.5967551304706977099182074224
1 row in set (0.00 sec)

would give 54.596755 again.

@klirichek

This comment has been minimized.

Copy link
Contributor

klirichek commented Dec 18, 2018

We don't use any rounding inside. The only thing that we use 'float' internally (not 'double'), and output it with format '%f'. But this is actual for all kinds of API (we just don't have anything aside these floats to provide some 'extra digits').

@manticoresearch

This comment has been minimized.

Copy link
Contributor

manticoresearch commented Dec 18, 2018

Hi. Thanks for pointing this out. Indeed the precision is lost at some point. The same actually happens in mysql too by default:

mysql> show create table t1;
+-------+-----------------------------------------------------------------------------------+
| Table | Create Table                                                                      |
+-------+-----------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `t` float DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into t1 values(0.95289313793182);
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1;
+----------+
| t        |
+----------+
| 0.952893 |
+----------+
1 row in set (0.00 sec)

but there you can get a higher precision by using FORMAT():

mysql> select format(t, 14) from t1;
+------------------+
| format(t, 14)    |
+------------------+
| 0.95289313793182 |
+------------------+
1 row in set (0.00 sec)

We'll need to think what we can do about this.

@manticoresearch

This comment has been minimized.

Copy link
Contributor

manticoresearch commented Dec 18, 2018

MySQL and just printf('%f%') in general round to 6 dp since it provides the precision guarantee. More practical would be to round to 7 dp (as this study https://www.exploringbinary.com/decimal-precision-of-binary-floating-point-numbers/ shows) since it would cover about 90% of cases. But ideally we would have FORMAT(value, dp) instead to let users decide whether they want a higher precision with higher chance of mistake or lower precision with the guarantee of no mistake.

@githubmanticore

This comment has been minimized.

Copy link
Contributor

githubmanticore commented Dec 18, 2018

➤ Aleksey N. Vinogradov commented:

We can also consider this approach:

string < printf ("%f", number)
if sscanf (string) <> number: string < printf ("%1.8f", number)
(m.b. then repeat, but I think 8 digits is precise enough for any case)
@barryhunter

This comment has been minimized.

Copy link
Contributor Author

barryhunter commented Dec 18, 2018

Thanks for investigating. I admit not aware of the default 6dp with sprintf.

Looks like 8dp does allow reproducing the original lat/long from radians

mysql> select wgs84_lat,wgs84_long,radians(wgs84_lat),radians(wgs84_long),degrees(0.952896),degrees(-0.127488) from gridimage_search where gridimage_id = 134501\G
*************************** 1. row ***************************
          wgs84_lat: 54.596946
         wgs84_long: -7.304539
 radians(wgs84_lat): 0.9528964692335481
radians(wgs84_long): -0.12748825589033408
  degrees(0.952896): 54.59691911489809
 degrees(-0.127488): -7.304524338563839
  degrees(0.95289647): 54.596946043914464
degrees( -0.12748826): -7.304539235466512
1 row in set (0.00 sec)

(note in this case teh lat/long database columns are explicit DECIMAL(10,6) columns so hid that mysql is could provide 6 dp too. But this query does demonstrate that mysql expressions can produce more thatn 6dp, maybe expressions are actully done as double precision, so using a longer defauilt)

Note have found this this seems to work mostly ok in the meantime :)

sphinxQL> SELECT id,wgs84_lat*1000 as lat,wgs84_long*1000 as lng FROM sample8 WHERE MATCH(...
*************************** 1. row ***************************
 id: 134501
lat: 952.896484            #compared to radians(wgs84_lat): 0.9528964692335481
lng: -127.488258

because it still using 6dp, get a few extra digits. Seems more reliable than using the radians>degree multiplication in manticore, presumably due to rounding errors. But is still a tiny bit out. degrees(0.95289648): 54.59694661687226 - which would round (rather than truncate) as 54.596947.

@githubmanticore githubmanticore changed the title SphinxQL protocol rounds float attributes to 6 decimal places GitHub#140 ⁃ SphinxQL protocol rounds float attributes to 6 decimal places Jan 24, 2019

@klirichek

This comment has been minimized.

Copy link
Contributor

klirichek commented Feb 1, 2019

Behavior corrected @ 14ff372 with proposed way: output 6 digits, parse back to float and compare with original. If equal, return this. If not - output 8 digits.

Note that it is still talk about 'float' numbers, not doubles. So, it will trigger sometime, but really seldom, since for majority of floats 6 digits is really enough, only few may be printed in 8 digits more precisely (and in the flow they are mostly result of functions like sin, ln, tan, and even rand).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.