Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: How to structure the parsing function families #623

Closed
nlohmann opened this issue Jun 17, 2017 · 10 comments
Closed

Discussion: How to structure the parsing function families #623

nlohmann opened this issue Jun 17, 2017 · 10 comments
Labels
state: help needed the issue needs help to proceed state: please discuss please discuss the issue or vote for your favorite option

Comments

@nlohmann
Copy link
Owner

I want to discuss a restructuring of the way input is parsed.

For this, we need to consider three dimensions:

  1. The C++ type to read input from.
  2. The file format / representation we want to read.
  3. What to do with the result.

1. Inputs

We currently accept inputs with the following types:

  • string literals (const char [])
  • char pointers (const char* or const uint8_t*)
  • input streams (std::ifstream, std::stringstream, ...)
  • iterator ranges
  • contiguous containers (std::vector<char> or std::vector<uint8_t>)

The input handling has been abstracted into a class input_adapter which offers a unified interface to get a single character or return multiple characters as a std::string.

2. File formats

We currently support the following inputs:

3. What to do with the input

  • For JSON, we support the generation of a basic_json object from any inputs using the parse() function family. With operator<< we also allow to sequentially parse multiple JSON values from the same input stream. In addition, we may pass a callback function that allows to react on certain parse events.

  • For CBOR and MessagePack, we also suppor the generation of a basic_json object.

In either case, a parse error yields an exception. There is currently no way to indicate an error without throwing (or calling abort if exceptions are switched off).


Just supporting all input types for all file formats would need 5 x 3 = 15 functions. If we then want to add support for callbacks, make exceptions optional, allow to provide a JSON object reference to the parser as target, or anything else, we may quickly end up in dozens of similar functions. This is madness.

Instead, I would like to discuss the following approach. We implement a parser class that is exposed to the user that can be constructed with any of the input types listed above. Then, the user can decided what to do by configuring this parser object and by calling its member functions. Having a parser object, we may, for instance, decide to create an object that describes a parse error which is available to the user after parsing. So instead of throwing, we can alternativeley allow to query this object afterward, just like errno, but nicer.

I would like to still offer the static parse functions for JSON so the user can write json j = json::parse("[1,2,3]");. Removing these functions would be too confusing, even for a 3.0.0 release. But these functions would then just offer the default behavior (no callback, throwing exceptions). Anything fancy should be moved to the described parser class. Meaning, I would like to offer the simplest use cases with the old interface, but allow for more complex scenarios without an explosion of functions.


What do you think?

@user1095108
Copy link

For the conditional parse consider a std::pair and/or std::optional or your own std::optional lookalike.

@1Hyena
Copy link

1Hyena commented Jun 17, 2017

You could also add a function bool nlohmann::str2json(const char *, nlohmann::json *)

nlohmann::json obj;
if (!nlohmann::str2json("{}", &obj)) printf("parse error");
if (!nlohmann::str2json("{}", nullptr)) printf("parse error"); // same as above but only validates the string

Alternatively you could add a valid/invalid variable to the nlohmann::json class. It would work out like this:

nlohmann::json obj = nlohmann::str2json("{}"));

if (!obj.is_valid()) printf("parse error");

This way you would have something different than yet another parse function.

@1Hyena
Copy link

1Hyena commented Jun 17, 2017

I personally prefer the pointer version the most because it is so obvious that if I pass nullptr as an argument then I don't wish to store the parsed json structure anywhere but I am merely validating the string.

@nlohmann
Copy link
Owner Author

Note that const char* is just one of the many inputs. Therefore, I would like to have a class that manages all possible inputs and offer an interface to do the actual work. Therefore, I think none of these functions should have the actual input as parameter.

With is_discarded() we already have something similar to is_valid(). It means that no JSON value has been created due to restrictions added by a callback function. I think the parser should rather create no value at all than to create a value that needs to be queried whether it is valid.

@user1095108
Copy link

user1095108 commented Jun 17, 2017 via email

@1Hyena
Copy link

1Hyena commented Jun 17, 2017

I am a very practical person and C is a very practical language. I don't think we should have bullshit syntactic sugar crust all over the place just to make the code look more C++ for the sake of it. Less is more. Sure the str2json function could accept more input types than just const char * but depending on a parser object each time you need to parse a json string is kind of unnecessary memory allocation overhead.

@1Hyena
Copy link

1Hyena commented Jun 17, 2017

One more thing is that since your solution is just a single header (which is what I love about nlohmann::json btw) you should avoid bloating the code with too many border use cases. if you want to satisfy any possible potential need of any programmer then you'd better develop a full blown library :P

@nlohmann
Copy link
Owner Author

Some comments:

  • The main idea of this library is to behave the way C++ programmers would expect an STL library to behave. Fortunately, most syntactic improvements have little or no costs, so I think we can achieve both simplicity and efficiency.
  • Right now, we already depend on the creation of a parser, lexer, and a input_adapter object with each parse. All classes are, however, quite light weight. Maybe the described approach allows us to reuse objects which could be an advantage.
  • I am aware of the growing library, and maybe moving border-case aspects into separate headers could be an improvement in the future.

@1Hyena
Copy link

1Hyena commented Jun 17, 2017

@nlohmann
ok these are fair points you brought out. I like the idea of keeping it consistent with the STL library.

maybe nothrow would be of any help? not sure if this can be used here though.

@user1095108
Copy link

user1095108 commented Jun 17, 2017 via email

nlohmann added a commit that referenced this issue Jul 23, 2017
We now rely on implicit conversions to an input_adapter object in the parse/accept functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: help needed the issue needs help to proceed state: please discuss please discuss the issue or vote for your favorite option
Projects
None yet
Development

No branches or pull requests

3 participants