Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust handling of JSON #256

Merged
merged 5 commits into from Nov 20, 2015

Conversation

elliotttf
Copy link
Contributor

This appears to have been added in #1 but it actually causes problems with the module due to the way the data is mutated with regular expressions.

I would like to have this removed because:

  1. Comments in straight JSON are not valid according to the spec
  2. node-config already supports JSON5, so if comments are required users can use that file format instead.

The case where this breaks is when a JSON file has all of its whitespace removed, for example, consider this original config file:

{
  "version": "v3.1",
  "supportedVersions": [
    "v3",
    "v3.1"
  ],
  "supportedExt": [
    "bulk"
  ],
  "publicRoot": "http://localhost:3000",
  "port": 3000,
  "processes": 0,
  "killTimeout": 10000,
  "api": {
    "host": "",
    "options": {}
  },
  "redis": {
    "host": "127.0.0.1",
    "options": {}
  }
}

Now suppose we remove whitespace, e.g.

fs.writeFileSync('./config/default.json', JSON.stringify({
  "version": "v3.1",
  "supportedVersions": [
    "v3",
    "v3.1"
  ],
  "supportedExt": [
    "bulk"
  ],
  "publicRoot": "http://localhost:3000",
  "port": 3000,
  "processes": 0,
  "killTimeout": 10000,
  "api": {
    "host": "",
    "options": {}
  },
  "redis": {
    "host": "127.0.0.1",
    "options": {}
  }
}));

The resulting file will look like this:

{"version":"v3.1","supportedVersions":["v3","v3.1"],"supportedExt":["bulk"],"publicRoot":"http://localhost:3000","port":3000,"processes":0,"killTimeout":10000,"api":{"host":"","options":{}},"redis":{"host":"127.0.0.1","options":{}}}

This is still valid JSON; however, node-config's util.stripComments results in the following string:

{"version":"v3.1","supportedVersions":["v3","v3.1"],"supportedExt":["bulk"],"publicRoot":"http://localhost:3000","port":3000,"processes":0,"killTimeout":10000,"api":{"host":"","options":{}},"redis":{"hostundefined.0.0.1","options":{}}}

which is no longer valid JSON.

@markstos
Copy link
Collaborator

To add some context about where the mis-parsing happens, it's here:

Before: "redis":{"host":"127.0.0.1","options":{}}
After: "redis":{"hostundefined.0.0.1","options":{}}

It's not immediately clear what causes this.

On principle, I agree with the proposal to remove JSON comment support, because people can use the JSON5 format if they want it-- I believe simply renaming the files from .json to .json5 to support that.

The change will require a major version bump, since it's not backwards compatible with the current behavior.

So in the meantime it's desirable to see if we can solve the comment-stripping bug that broke your config file.

Ironically, one source we have for inspiration here is to look at the JSON5 library, since it must be handling this already.

@elliotttf
Copy link
Contributor Author

For what it's worth, the issue appears to be related to an empty string preceding a string that contains a numeral, e.g.

{"a":"","b":"1"}

will produce

{"a":"","bundefined"}

That said, I personally feel a major version bump is appropriate. While allowing comments in JSON is a nice feature provided by node-config it's not part of the JSON specification and there are already several alternatives that this module supports that do allow comments in configuration.

Any regular expression approach and indeed any extension of the JSON specification by this module is likely to be fragile and susceptible to corner cases, just as the current regexes are.

It appears JSON5's approach is quite a bit more complicated than this. Rather than removing substrings that aren't valid JSON a new JSON string is built up by walking the string and omitting anything that's not valid JSON.

A potential solution that avoids re-implementing anything is to add JSON5 as a dependency then either always parse JSON as JSON5 or attempt to parse straight JSON and fall back to JSON5 on an error, e.g.

if (format === 'json') {
  try {
    configObject = JSON.parse(content);
  } catch (e) {
    if (e.name !== 'SyntaxError') {
      throw e;
    }
    configObject = JSON5.parse(content);
  }
}

This again though makes an assumption on what is considered valid JSON and could introduce problems for users of the module.

@markstos
Copy link
Collaborator

I considered the idea of using the JSON5 parser directly to parse JSON configuration files, but I don't like that the approach ends up supporting not just comments in JSON files, but all the other JSON5 features.

Let's see what the primary maintainer @lorenwest thinks about the proposal.

@lorenwest
Copy link
Collaborator

Vacationing in New Orleans. Will answer soon ;)
On Nov 10, 2015 3:49 PM, "Mark Stosberg" notifications@github.com wrote:

I considered the idea of using the JSON5 parser directly to parse JSON
configuration files, but I don't like that the approach ends up supporting
not just comments in JSON files, but all the other JSON5 features.

Let's see what the primary maintainer @lorenwest
https://github.com/lorenwest thinks about the proposal.


Reply to this email directly or view it on GitHub
#256 (comment)
.

@lorenwest
Copy link
Collaborator

We have had comments in JSON files since the beginning, and I'm afraid that removing it will make a lot of people unhappy. Our ample warning commitment would require us to announce this incompatibility now and accept comments over a 1 year period before committing to a major revision.

Asking people with existing configurations to change to JSON5 just for purity is a bit heavy handed. I think a better approach is to fix the parsing error.

I suggest adding a pull request containing a test that fails due to this bug, and asking the regexp ninjas to have at it.

With well over a million downloads, my guess is this bug hasn't shown because configuration files are designed to be hand edited, and hand editing long lines in JSON file is more of something a computer does than a human being.

elliotttf added a commit to elliotttf/node-config that referenced this pull request Nov 11, 2015
This arose becasue of the way comments are stripped from JSON. As
comments will remain supported in node-config (node-config#256) this change
will correctly handle parsing JSON strings regardless of whether or
not a comment is encountered.
@elliotttf
Copy link
Contributor Author

Fair enough, I've opened #258 which uses a simpler string matching regex for JSON since valid JSON will begin and end with " characters.

@elliotttf
Copy link
Contributor Author

With regard to #259, I'd like to keep this conversation alive and would like to strongly advocate for getting this on the roadmap for a major version bump.

As I pointed out in #261, the problem that surfaced this is a combination of inline JSON and empty strings but is not necessarily a readability argument, as is evidenced in the test.

Furthermore, while hand edited config is likely the most common case I experienced this in a deployment where config files are generated from (human readable) templates that produce (machine readable) JSON. This allows me to use a script to deploy across the 5 environments and tens of machines my application is using. Are there other ways this could be handled? Most certainly. However, the fact remains my script is generating valid JSON and should therefore be usable by this module.

I do understand the ample warning agreement but would argue that semantic versioning creates an agreement that major version bumps may produce breaking changes. This combined with the way npm works should mitigate instances of users accidentally getting bumped up to a version that introduces breaking changes.

Perhaps an alternative approach would be to release a major version bump now but provide support for 1.x for the next year and move deprecation warnings into the 1.x branch? This seems to be a more common approach among software projects and less prone to drift between planned improvements and deployed code.

@markstos
Copy link
Collaborator

@elliotttf, I like the idea of going ahead with a major version bump. Any critical security issues or major bugs can continue to be supported in the 1.x branch for a while.

Projects like Mongoose and Express use semantic version this way. We have some other items that we've discussed for a major-version bump, but then the conversations often stall around the delay of waiting a year to the work released.

@lorenwest
Copy link
Collaborator

I was hoping this new regexp would solve the issue so I didn't have to get into the deeper issue. You're also right that correct semver usage should be expected, and that negates the need for the 1 year advance warning.

Now for the deeper issue - comments in json files. Philosophically speaking, JSON doesn't allow comments so you could argue that if we support JSON, we shouldn't allow them.

But practically speaking we've allowed comments since the beginning, and the work being asked of the community to become major-rev compatible is not trivial. In fact, one could argue it's irresponsible for us to impose that work on the community, even if a major version bump would allow it.

So - if we're able to allow valid JSON, or even if we require JSON to be pretty printed to work properly,

JSON.stringify(object, null, 2);

that's substantially less work than asking people to make changes to their configurations and enterprise deployments in order to become compatible with the newest node-config.

Now you know my opinion. Features can be added, but removing or changing features is heavyweight for the community and shouldn't be considered without equally heavyweight upside.

@elliotttf
Copy link
Contributor Author

The lack of comment support in JSON may have been philosophical but either way it's a decision that was made part of the specification that's been in place since 2013. The decision to support JSON comments in this module was certainly based in common sense and practicality, but now with support for other formats that do natively support comments continuing to support it means dealing with technical debt.

the work being asked of the community to become major-rev compatible is not trivial.

Actually I would disagree, if we add json5 as a dependency for this module becoming compatible with dropped support for JSON comments can be handled in one of two ways:

  1. Remove comments from your config files – depending on individual deployments this could be rather tedious
  2. Rename any .json files to .json5 – this is some work, but I wouldn't say is unreasonable.

We could even do some hand holding here and release v2 that does a straight parse of JSON and update the 1.x branch so that it raises a deprecation notice then tries to parse as json5, e.g.

if (format === 'json') {
  try {
    configObject = JSON.parse(content);
  } catch (e) {
    // All JS Style comments will begin with /, so all JSON parse errors that
    // encountered a syntax error will complain about this character.
    if (e.name !== 'SyntaxError' || e.message !== 'Unexpected token /') {
      throw e;
    }
    configObject = JSON5.parse(content);
    console.error('Deprecation notice: comments are no longer supported in JSON files, consider using JSON5 or removing comments from your file.');
  }
}

This gives users a heads up about what they need to do to support the major release while not actually ending support for comments in JSON. Any truly invalid JSON (e.g. mismatched brackets) will still raise an exception when the JSON5 parsing fails and the deprecation message will not be displayed.

@markstos
Copy link
Collaborator

Rename any .json files to .json5 – this is some work, but I wouldn't say is unreasonable.

Debian-based distros include the rename perl script to make renames easy:

rename 's/\.json/\.json5/'

That's documented more here:
http://tips.webdesign10.com/how-to-bulk-rename-files-in-linux-in-the-terminal

@lorenwest
Copy link
Collaborator

Oy, I knew this would open a pandora's box.

If we make a breaking change, the value of that change must be clear and substantially greater than the work required to make the change.

I'm not convinced, although I could be, that this is as low impact as you suggest. But before we continue that discussion, let's discuss the value of this change for our customer, and how we can make them love us for making this change.

Exactly what are they getting in return for their work?

@elliotttf
Copy link
Contributor Author

Exactly what are they getting in return for their work?

I can think of a few of things:

  1. A guarantee that valid JSON will be parsed correctly
  2. That this module isn't attempting to extend a well defined standard
  3. That as long as their config files conform to that standard this module will continue to be able to read them
  4. That a viable and standardized format exists that does provide the same functionality as what was removed and is supported by this module. As long as updates to their config files are made to conform to that standard this module will continue to support that too.

I'm not convinced, although I could be, that this is as low impact as you suggest.

Because the breaking change we're talking about here is comments this really is as simple as renaming files. JSON5 will handle any valid JavaScript objects appropriately; therefore, valid JSON will be parseable by JSON5. Users that need comments in their config files can rename their files to .json5 and go on with their business, users that don't have comments in their config files will never notice the change.

From the JSON5 website:

JSON5 remains a strict subset of JavaScript, adds no new data types, and works with all existing JSON content.

@markstos
Copy link
Collaborator

It seems the big "win" would be if we saved the JSON-users from a significant yet-to-be-known bug with JSON comment parsing. However, an additional bug like that might be found.

I also don't agree with the line of thinking that any regex-based solution will always be fragile or have corner cases. With over a decade of experience doing Perl development, regex's are frequently used in reliable, production-quality software without bugs being found in the regex. Even when formal grammars are used, they sometimes implement the lexer with regexs, because you have to break the text up into tokens /somehow/. I have not closely examined the regex's in the comment-stripping code, but I expect they can be made to be reliable if they are not already.

So, my comments above spoke against making a fuss for the specific case of comments in JSON files, but I'm generally in favor of breaking changes if the new design is better going forward. I have this position because of how I think about "the customer", our module users. In the past year, our downloads have roughly doubled:

http://npm-stat.com/charts.html?package=config&author=&from=2014-10-31&to=2015-10-31

It's reasonable to assume that the downloads might double again in the next year in year and keep going up. Considering both the current users as well as the future users for the next two years as "customers", then the majority of our customers are not affected by breaks in backward compatible changes, because they haven't started using the module yet.

I once took the stance of strong-backward-compatibility in a popular open source Perl module I maintained (Data::FormValidator). What happened over time is that multiple, simpler competing projects appeared and gained popularity. Meanwhile, there continued to be bugs and support issues related to the "legacy" code that wasn't recommended to be used anymore. I think the project would have been more successful overall if a new release had been made that dropped all the legacy cruft.

Perl's infrastructure made in-place breaking changes very unattractive, because "semver" wasn't supported in the package installer the way NPM does.

I think both old and new users could be served welly by immediately releasing better designs with a new major version, while supporting the the previous release branch for a year.

@lorenwest
Copy link
Collaborator

What if we used json5 as our json parser. That ensures correct comment parsing without requiring anyone to change their configurations.

The story would be that node-conig never was json compatible. We're closer to json5, so this is a formalization of what we've been all along, using a more reliable technique.

This would require no config changes, benefit active users as well as future users, and from a compatibility standpoint, may not even require a major version bump.

@elliotttf
Copy link
Contributor Author

I'm ok with that solution as well, it maybe has some gotchas since JSON5 supports more than just comments, but any problems that I can imagine arising from that would be up to the application to resolve, e.g. if a config file were loaded in some other context directly instead of by the config module.

@lorenwest
Copy link
Collaborator

There will be people that sneer at relaxing something they see as pure. For them, we can offer a .strict-json extension or something that uses JSON.parse.

I like JSON5 because it accepts machine-written JSON as well as more human friendly formats. Some configs can be generated, but the vast majority are hand edited.

@elliotttf
Copy link
Contributor Author

Yeah, I guess that's why I suggested a major version bump. The module past #1 never really supported JSON, it supported a superset. Figuring out if pure JSON support is important could help drive this decision.

If we're considering moving to JSON5 for parsing do you think it'd make sense to use one of the try/catch methods that I suggested before? Maybe instead of a deprecation notice just add a warning that support for comments in pure JSON files could be removed at a later point and you should consider using JSON5 instead?

Functionally there would be no change to do this. There'd potentially be a little bit of a performance hit if your JSON contained comments since it would effectively be parsed twice but since this only happens the first time the module is required I'd think it's not really an issue.

This could also be a way to gauge how many people this is really a problem for. If we add a warning and an issue gets opened then it's a problem that's worth pursuing further. If we add it and it's radio silence though then we can be reasonably confident that the number of people actually affected by this and care about pure JSON support is pretty low.

I don't have a strong opinion on the pure/strict JSON parsing but can see that portability would be a valid argument for enforcing it. Bringing existing JSON config files to this module wouldn't be a problem, but if you had JSON files with comments in them and wanted to use those somewhere else you'd be in trouble unless you removed the comments or used JSON5.

@lorenwest
Copy link
Collaborator

The thing I'm against is anything that adds work for our users without
return in value that they can feel. Even reading the change blog is work,
and if they have to add a work item to their team to "be compatible with
the new node-config", there'd better be an ROI that they can justify to
their managers, or it won't get done.

I work in a 7,000(ish) person software company and understand the cost of
keeping up with dependency changes. If more package authors understood the
effect of their (seemingly) random decisions, my life would be easier.
Multiply that times the number of non-trivial node-config installs, and
it's a significant amount of work.

If we can make it a zero-work install, they'll get the benefit without the
work. It's our responsibility to exhaust all zero-work options before
considering options that require changes.

-Loren

On Wed, Nov 18, 2015 at 12:47 PM, Elliott Foster notifications@github.com
wrote:

Yeah, I guess that's why I suggested a major version bump. The module past
#1 #1 never really
supported JSON, it supported a superset. Figuring out if pure JSON support
is important could help drive this decision.

If we're considering moving to JSON5 for parsing do you think it'd make
sense to use one of the try/catch methods that I suggested before? Maybe
instead of a deprecation notice just add a warning that support for
comments in pure JSON files could be removed at a later point and you
should consider using JSON5 instead?

Functionally there would be no change to do this. There'd potentially be a
little bit of a performance hit if your JSON contained comments since
it would effectively be parsed twice but since this only happens the first
time the module is required I'd think it's not really an issue.

This could also be a way to gauge how many people this is really a problem
for. If we add a warning and an issue gets opened then it's a problem
that's worth perusing further. If we add it and it's radio silence though
then we can be reasonably confident that the number of people actually
affected by this and care about pure JSON support is pretty low.

I don't have a strong opinion on the pure/strict JSON parsing but can see
that portability would be a valid argument for enforcing it. Bringing
existing JSON config files to this module wouldn't be a problem, but if you
had JSON files with comments in them and wanted to use those somewhere else
you'd be in trouble unless you removed the comments or used JSON5.


Reply to this email directly or view it on GitHub
#256 (comment)
.

The fallback will only use JSON5 if the failure was the result of
a comment string. This will provide support for strict JSON as well
as maintain support for comments in JSON.
@elliotttf
Copy link
Contributor Author

@lorenwest I understand, how now with the changes from a01df75? This will allow us to support strict JSON so portability isn't a concern, but will preserve support for comments in JSON by falling back to JSON5.

@elliotttf elliotttf changed the title Remove ability to have comments in JSON More robust handling of JSON Nov 19, 2015
@lorenwest
Copy link
Collaborator

How does that differ from always using json5 as the parser?

@elliotttf
Copy link
Contributor Author

It satisfies these concerns:

There will be people that sneer at relaxing something they see as pure.

If you've provided pure JSON, it will be parsed as such. If you were using the superset of JSON that this module has traditionally supported you'll use JSON5.

@lorenwest
Copy link
Collaborator

I see, it's about the parser used - that makes sense.

This looks like a healthy change. Let's sit on it a day or so to see if there's any opposing discussion, then I'll get it merged in.

@markstos
Copy link
Collaborator

I'm OK with using using JSON5 parser as the primary JSON parser. I think the strongest argument is that the files are often hand-edited, not machine generated. (But machine-generated files should still be parsed the same). The slightly relaxed format will likely fix some other "oops" moments that developers run into when trying to write JSON by hand.

@markstos
Copy link
Collaborator

The fallback-to-JSON5-if-there-are-comments ( elliotttf@a01df75 ) is also OK with me an alternative to regex-parsing for the comments.

lorenwest added a commit that referenced this pull request Nov 20, 2015
@lorenwest lorenwest merged commit 4b304e5 into node-config:master Nov 20, 2015
@lorenwest
Copy link
Collaborator

I like this pull request because it doesn't simply allow .json files to be .json5. It attempts to only use json5 if the parse error was due to a comment string.

If people want json5, they'll still have to use the json5 file extension, but now they won't have to include json5 in their project since it's now a dependency of node-config.

@lorenwest
Copy link
Collaborator

Published to npm in v.1.18.0

@elliotttf
Copy link
Contributor Author

💥

Thanks for working with me on this @lorenwest and @markstos 🍻

@markstos
Copy link
Collaborator

markstos commented Jan 6, 2016

@lorenwest I noticed the new 1.18 was released was prepared with this update in it, but has not yet been released. It looks like just "npm publish" is the final step, and possibly updating the date in History.md to reflect the release date.

@lorenwest
Copy link
Collaborator

Thanks for the bump. Updated History, package.json, the contributor table in README, and published to npm.

@elliotttf
Copy link
Contributor Author

@lorenwest I actually still see 1.17.1 as the published version on npm... ¯_(ツ)_/¯

@lorenwest
Copy link
Collaborator

Now I know what happened. NPM was pointed to my private repo, and it got published there. That's probably what happened before as well. Thanks for keeping up with this - I'll re-publish this evening when I get on that computer.

@lorenwest
Copy link
Collaborator

Published with a better ~/.npmrc.

On Mon, Jan 11, 2016 at 11:44 AM, Elliott Foster notifications@github.com
wrote:

@lorenwest https://github.com/lorenwest I actually still see 1.17.1 as
the published version on npm... ¯_(ツ)_/¯


Reply to this email directly or view it on GitHub
#256 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants