New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust handling of JSON #256
Conversation
To add some context about where the mis-parsing happens, it's here: Before: It's not immediately clear what causes this. On principle, I agree with the proposal to remove JSON comment support, because people can use the JSON5 format if they want it-- I believe simply renaming the files from The change will require a major version bump, since it's not backwards compatible with the current behavior. So in the meantime it's desirable to see if we can solve the comment-stripping bug that broke your config file. Ironically, one source we have for inspiration here is to look at the JSON5 library, since it must be handling this already. |
For what it's worth, the issue appears to be related to an empty string preceding a string that contains a numeral, e.g. {"a":"","b":"1"} will produce
That said, I personally feel a major version bump is appropriate. While allowing comments in JSON is a nice feature provided by node-config it's not part of the JSON specification and there are already several alternatives that this module supports that do allow comments in configuration. Any regular expression approach and indeed any extension of the JSON specification by this module is likely to be fragile and susceptible to corner cases, just as the current regexes are. It appears JSON5's approach is quite a bit more complicated than this. Rather than removing substrings that aren't valid JSON a new JSON string is built up by walking the string and omitting anything that's not valid JSON. A potential solution that avoids re-implementing anything is to add JSON5 as a dependency then either always parse JSON as JSON5 or attempt to parse straight JSON and fall back to JSON5 on an error, e.g. if (format === 'json') {
try {
configObject = JSON.parse(content);
} catch (e) {
if (e.name !== 'SyntaxError') {
throw e;
}
configObject = JSON5.parse(content);
}
} This again though makes an assumption on what is considered valid JSON and could introduce problems for users of the module. |
I considered the idea of using the JSON5 parser directly to parse JSON configuration files, but I don't like that the approach ends up supporting not just comments in JSON files, but all the other JSON5 features. Let's see what the primary maintainer @lorenwest thinks about the proposal. |
Vacationing in New Orleans. Will answer soon ;)
|
We have had comments in JSON files since the beginning, and I'm afraid that removing it will make a lot of people unhappy. Our ample warning commitment would require us to announce this incompatibility now and accept comments over a 1 year period before committing to a major revision. Asking people with existing configurations to change to JSON5 just for purity is a bit heavy handed. I think a better approach is to fix the parsing error. I suggest adding a pull request containing a test that fails due to this bug, and asking the regexp ninjas to have at it. With well over a million downloads, my guess is this bug hasn't shown because configuration files are designed to be hand edited, and hand editing long lines in JSON file is more of something a computer does than a human being. |
This arose becasue of the way comments are stripped from JSON. As comments will remain supported in node-config (node-config#256) this change will correctly handle parsing JSON strings regardless of whether or not a comment is encountered.
Fair enough, I've opened #258 which uses a simpler string matching regex for JSON since valid JSON will begin and end with |
With regard to #259, I'd like to keep this conversation alive and would like to strongly advocate for getting this on the roadmap for a major version bump. As I pointed out in #261, the problem that surfaced this is a combination of inline JSON and empty strings but is not necessarily a readability argument, as is evidenced in the test. Furthermore, while hand edited config is likely the most common case I experienced this in a deployment where config files are generated from (human readable) templates that produce (machine readable) JSON. This allows me to use a script to deploy across the 5 environments and tens of machines my application is using. Are there other ways this could be handled? Most certainly. However, the fact remains my script is generating valid JSON and should therefore be usable by this module. I do understand the ample warning agreement but would argue that semantic versioning creates an agreement that major version bumps may produce breaking changes. This combined with the way npm works should mitigate instances of users accidentally getting bumped up to a version that introduces breaking changes. Perhaps an alternative approach would be to release a major version bump now but provide support for 1.x for the next year and move deprecation warnings into the 1.x branch? This seems to be a more common approach among software projects and less prone to drift between planned improvements and deployed code. |
@elliotttf, I like the idea of going ahead with a major version bump. Any critical security issues or major bugs can continue to be supported in the 1.x branch for a while. Projects like Mongoose and Express use semantic version this way. We have some other items that we've discussed for a major-version bump, but then the conversations often stall around the delay of waiting a year to the work released. |
I was hoping this new regexp would solve the issue so I didn't have to get into the deeper issue. You're also right that correct semver usage should be expected, and that negates the need for the 1 year advance warning. Now for the deeper issue - comments in json files. Philosophically speaking, JSON doesn't allow comments so you could argue that if we support JSON, we shouldn't allow them. But practically speaking we've allowed comments since the beginning, and the work being asked of the community to become major-rev compatible is not trivial. In fact, one could argue it's irresponsible for us to impose that work on the community, even if a major version bump would allow it. So - if we're able to allow valid JSON, or even if we require JSON to be pretty printed to work properly,
that's substantially less work than asking people to make changes to their configurations and enterprise deployments in order to become compatible with the newest node-config. Now you know my opinion. Features can be added, but removing or changing features is heavyweight for the community and shouldn't be considered without equally heavyweight upside. |
The lack of comment support in JSON may have been philosophical but either way it's a decision that was made part of the specification that's been in place since 2013. The decision to support JSON comments in this module was certainly based in common sense and practicality, but now with support for other formats that do natively support comments continuing to support it means dealing with technical debt.
Actually I would disagree, if we add json5 as a dependency for this module becoming compatible with dropped support for JSON comments can be handled in one of two ways:
We could even do some hand holding here and release v2 that does a straight parse of JSON and update the 1.x branch so that it raises a deprecation notice then tries to parse as json5, e.g. if (format === 'json') {
try {
configObject = JSON.parse(content);
} catch (e) {
// All JS Style comments will begin with /, so all JSON parse errors that
// encountered a syntax error will complain about this character.
if (e.name !== 'SyntaxError' || e.message !== 'Unexpected token /') {
throw e;
}
configObject = JSON5.parse(content);
console.error('Deprecation notice: comments are no longer supported in JSON files, consider using JSON5 or removing comments from your file.');
}
} This gives users a heads up about what they need to do to support the major release while not actually ending support for comments in JSON. Any truly invalid JSON (e.g. mismatched brackets) will still raise an exception when the JSON5 parsing fails and the deprecation message will not be displayed. |
Debian-based distros include the
That's documented more here: |
Oy, I knew this would open a pandora's box. If we make a breaking change, the value of that change must be clear and substantially greater than the work required to make the change. I'm not convinced, although I could be, that this is as low impact as you suggest. But before we continue that discussion, let's discuss the value of this change for our customer, and how we can make them love us for making this change. Exactly what are they getting in return for their work? |
I can think of a few of things:
Because the breaking change we're talking about here is comments this really is as simple as renaming files. JSON5 will handle any valid JavaScript objects appropriately; therefore, valid JSON will be parseable by JSON5. Users that need comments in their config files can rename their files to From the JSON5 website:
|
It seems the big "win" would be if we saved the JSON-users from a significant yet-to-be-known bug with JSON comment parsing. However, an additional bug like that might be found. I also don't agree with the line of thinking that any regex-based solution will always be fragile or have corner cases. With over a decade of experience doing Perl development, regex's are frequently used in reliable, production-quality software without bugs being found in the regex. Even when formal grammars are used, they sometimes implement the lexer with regexs, because you have to break the text up into tokens /somehow/. I have not closely examined the regex's in the comment-stripping code, but I expect they can be made to be reliable if they are not already. So, my comments above spoke against making a fuss for the specific case of comments in JSON files, but I'm generally in favor of breaking changes if the new design is better going forward. I have this position because of how I think about "the customer", our module users. In the past year, our downloads have roughly doubled: http://npm-stat.com/charts.html?package=config&author=&from=2014-10-31&to=2015-10-31 It's reasonable to assume that the downloads might double again in the next year in year and keep going up. Considering both the current users as well as the future users for the next two years as "customers", then the majority of our customers are not affected by breaks in backward compatible changes, because they haven't started using the module yet. I once took the stance of strong-backward-compatibility in a popular open source Perl module I maintained (Data::FormValidator). What happened over time is that multiple, simpler competing projects appeared and gained popularity. Meanwhile, there continued to be bugs and support issues related to the "legacy" code that wasn't recommended to be used anymore. I think the project would have been more successful overall if a new release had been made that dropped all the legacy cruft. Perl's infrastructure made in-place breaking changes very unattractive, because "semver" wasn't supported in the package installer the way NPM does. I think both old and new users could be served welly by immediately releasing better designs with a new major version, while supporting the the previous release branch for a year. |
What if we used json5 as our json parser. That ensures correct comment parsing without requiring anyone to change their configurations. The story would be that node-conig never was json compatible. We're closer to json5, so this is a formalization of what we've been all along, using a more reliable technique. This would require no config changes, benefit active users as well as future users, and from a compatibility standpoint, may not even require a major version bump. |
I'm ok with that solution as well, it maybe has some gotchas since JSON5 supports more than just comments, but any problems that I can imagine arising from that would be up to the application to resolve, e.g. if a config file were loaded in some other context directly instead of by the config module. |
There will be people that sneer at relaxing something they see as pure. For them, we can offer a .strict-json extension or something that uses JSON.parse. I like JSON5 because it accepts machine-written JSON as well as more human friendly formats. Some configs can be generated, but the vast majority are hand edited. |
Yeah, I guess that's why I suggested a major version bump. The module past #1 never really supported JSON, it supported a superset. Figuring out if pure JSON support is important could help drive this decision. If we're considering moving to JSON5 for parsing do you think it'd make sense to use one of the Functionally there would be no change to do this. There'd potentially be a little bit of a performance hit if your JSON contained comments since it would effectively be parsed twice but since this only happens the first time the module is required I'd think it's not really an issue. This could also be a way to gauge how many people this is really a problem for. If we add a warning and an issue gets opened then it's a problem that's worth pursuing further. If we add it and it's radio silence though then we can be reasonably confident that the number of people actually affected by this and care about pure JSON support is pretty low. I don't have a strong opinion on the pure/strict JSON parsing but can see that portability would be a valid argument for enforcing it. Bringing existing JSON config files to this module wouldn't be a problem, but if you had JSON files with comments in them and wanted to use those somewhere else you'd be in trouble unless you removed the comments or used JSON5. |
The thing I'm against is anything that adds work for our users without I work in a 7,000(ish) person software company and understand the cost of If we can make it a zero-work install, they'll get the benefit without the -Loren On Wed, Nov 18, 2015 at 12:47 PM, Elliott Foster notifications@github.com
|
The fallback will only use JSON5 if the failure was the result of a comment string. This will provide support for strict JSON as well as maintain support for comments in JSON.
@lorenwest I understand, how now with the changes from a01df75? This will allow us to support strict JSON so portability isn't a concern, but will preserve support for comments in JSON by falling back to JSON5. |
How does that differ from always using json5 as the parser? |
It satisfies these concerns:
If you've provided pure JSON, it will be parsed as such. If you were using the superset of JSON that this module has traditionally supported you'll use JSON5. |
I see, it's about the parser used - that makes sense. This looks like a healthy change. Let's sit on it a day or so to see if there's any opposing discussion, then I'll get it merged in. |
I'm OK with using using JSON5 parser as the primary JSON parser. I think the strongest argument is that the files are often hand-edited, not machine generated. (But machine-generated files should still be parsed the same). The slightly relaxed format will likely fix some other "oops" moments that developers run into when trying to write JSON by hand. |
The fallback-to-JSON5-if-there-are-comments ( elliotttf@a01df75 ) is also OK with me an alternative to regex-parsing for the comments. |
More robust handling of JSON
I like this pull request because it doesn't simply allow .json files to be .json5. It attempts to only use json5 if the parse error was due to a comment string. If people want json5, they'll still have to use the json5 file extension, but now they won't have to include json5 in their project since it's now a dependency of node-config. |
Published to npm in v.1.18.0 |
💥 Thanks for working with me on this @lorenwest and @markstos 🍻 |
@lorenwest I noticed the new 1.18 was released was prepared with this update in it, but has not yet been released. It looks like just "npm publish" is the final step, and possibly updating the date in History.md to reflect the release date. |
Thanks for the bump. Updated History, package.json, the contributor table in README, and published to npm. |
@lorenwest I actually still see 1.17.1 as the published version on npm... ¯_(ツ)_/¯ |
Now I know what happened. NPM was pointed to my private repo, and it got published there. That's probably what happened before as well. Thanks for keeping up with this - I'll re-publish this evening when I get on that computer. |
Published with a better ~/.npmrc. On Mon, Jan 11, 2016 at 11:44 AM, Elliott Foster notifications@github.com
|
This appears to have been added in #1 but it actually causes problems with the module due to the way the data is mutated with regular expressions.
I would like to have this removed because:
The case where this breaks is when a JSON file has all of its whitespace removed, for example, consider this original config file:
Now suppose we remove whitespace, e.g.
The resulting file will look like this:
This is still valid JSON; however, node-config's
util.stripComments
results in the following string:which is no longer valid JSON.