Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for per-node floating point format (for numbers) #71

Closed
jamesbond3142 opened this issue Feb 20, 2017 · 10 comments
Closed

Add support for per-node floating point format (for numbers) #71

jamesbond3142 opened this issue Feb 20, 2017 · 10 comments

Comments

@jamesbond3142
Copy link

jamesbond3142 commented Feb 20, 2017

I have the patch here: https://gist.github.com/jamesbond3142/19a857d2cc31e2fd550f8fd129d5802f
It's quick and dirty and the format is supposed to be constant (not malloc-ed or snprintf etc) as no memory management is done for that.
Backward compatible (I created new function name).

@jamesbond3142 jamesbond3142 changed the title Add support for per-node floating format (for numbers) Add support for per-node floating point format (for numbers) Feb 20, 2017
@kgabis
Copy link
Owner

kgabis commented Feb 20, 2017

Well, I don't understand a rationale behind this feature. Care to explain?

Also, please read "Contributing" in readme.

@jamesbond3142
Copy link
Author

jamesbond3142 commented Feb 20, 2017

Many reasons, for me at least. Number object can be anything, including integer larger than "int". In this case, we want to serialise it as integer with no decimals "%.0f".

Also, certain number objects have certain precisions, and we want them to be output as such, instead of the generic six digits decimal. Some may have only one decimal, some twelve.

Consider this use case: You have an object that contains GPS coordinate (latitude, longitude) and a Unix-epoch millisecond timestamp. The coordinates needs at least 8 digits to serialise correctly. The timestamp is bigger than an "int", but you don't want to serialise it with any decimal digits. How would you do it in the existing version?

The patch makes it possible to attach a floating-format to each node; so they are correctly serialised according to their (preset) precision. If you don't attach anything (use the original APIs), the "number_format" is NULL and the default DOUBLE_SERIALIZATION_FORMAT is used, as before.

If you agree the concept is useful, please feel free to ignore the patch and comes with a better API (or API names). As I said, this is just a quick and dirty patch to fix my needs, which I thought is useful for others. If you think this feature isn't generally useful, that's okay too.

@mbelicki
Copy link

The most obvious way to go around this issue is to store numbers as strings. And to be honest I think it's the only way to have any guarantee about precision when transferring numbers as JSON.

Adding additional pointer to each node increases size of the parsed tree (by at least 25%), which with current API you have to always hold in the memory at once. That might not be a welcome change for some of the users.

@jamesbond3142
Copy link
Author

jamesbond3142 commented Feb 21, 2017

Yes, that's the most obvious way and I tried that. But:
a) it's mightily inconvenient
b) you need to convert that back to number on the other end, too (when you parse the data back).
For (a) you can create wrapper functions but (b) is not so easy because the other end isn't necessarily using parson (e.g. in my case the other end is parsed by gson, passing a String when it expects long will generate exception).

I agree that the of cost of additional 25% memory is problematic if you need to have a very large tree.

Another way I'd do it is to pass a callback down the path of json_serialize_* function calls. The callback will get item name, and it should return format; in absence of callback (or if it returns NULL) just proceed with default format. I actually went down this path before but found the change very intrusive (to existing API) and it also caused me to handle more logic on the callback than I'd like. It's always trade-off.

All being said, parson is a very nice little library with enough feature to make it useful and yet small enough without bloat to make it manageable and understandable. Thank you for releasing parson.

@kgabis
Copy link
Owner

kgabis commented Feb 21, 2017

I don't think that having field specific precision is the right answer. Ideally you would be able to specify desired precision when calling json_serialize, but that would require adding another set of functions.
Maybe current number of decimal places used for serialization is too small, I'll consider increasing it and using %g format specifier. I'll keep this open for now.

@jamesbond3142
Copy link
Author

Maybe current number of decimal places used for serialization is too small, I'll consider increasing it and using %g format specifier.

Any number of decimal places may be too small or too big depending on the needs. If you think that per-node specification is too heavy (or too complicated) for now, then perhaps you can provide a function for global specification (instead of hardcoding to DOUBLE_SERIALIZATION_FORMAT), something like you do with parson_malloc. It does not help me this way but may help others.

@mbelicki
Copy link

mbelicki commented Feb 21, 2017

Any number of decimal places may be too small

Well with "%.*g" and DECIMAL_DIG it will never be too small. As for too large precision users can always round their values to lose precision. It's quite clunky but allows to control serialization precision.

@mu578
Copy link

mu578 commented Apr 21, 2017

Hello gents, related to this topic : double parsing locale independent (I have also a patch for addressing this real-floating-point issue ; as well as handling sint64 too, if any interested in, let me know)

#  if defined(_MSC_VER)
#    include <stdlib.h>
#    include <locale.h>
    typedef _locale_t parson_locale_t;
#  else
#    include <stdlib.h>
#    include <xlocale.h>
#    include <locale.h>
    typedef locale_t parson_locale_t;
#  endif

static int g_parson_locale_init = 0;
static parson_locale_t g_parson_locale;
static parson_locale_t parson_locale();
static double parson_strtod(const char * nptr, char ** endptr);

static parson_locale_t parson_locale()
{
  if (!g_parson_locale_init) {
    g_parson_locale_init = 1;
#  if defined(_MSC_VER)
    g_parson_locale = _create_locale(LC_ALL, "C");
#  else defined(ANDROID) // API <= 21
    g_parson_locale = NULL;
#  else
    g_parson_locale = newlocale(LC_ALL_MASK, NULL, NULL);
#  endif
  }
  return g_parson_locale;
}

static double parson_strtod(const char * nptr, char ** endptr)
{
#  if defined(_MSC_VER)
  return _strtod_l(nptr, endptr, parson_locale());
#  else defined(ANDROID) // API <= 21
  return strtod(nptr, endptr);
#  else
  return strtod_l(nptr, endptr, parson_locale());
#  endif
}
// edited: making it portable

#	if defined(_MSC_VER)
#		define _PARSON_GCVT _gcvt
#	else
#		define _PARSON_GCVT gcvt
#	endif

void dtostr(double x, char * buf)
{
// c99
	_PARSON_GCVT(x, DECIMAL_DIG, buf);
	
// c11
	_PARSON_GCVT(x, DBL_DECIMAL_DIG, buf);
}

@kgabis
Copy link
Owner

kgabis commented Apr 27, 2017

Unfortunately this solution is not portable so I cannot add it to parson. I'm open to other suggestions though...

@kgabis
Copy link
Owner

kgabis commented Sep 16, 2017

Low precision problem should be fixed in 4e8a901

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants