As discussed, I put data-normalization code that I had extracted from read-package-json into a new module, now published as normalize-package-data. I updated my fork of read-package-json to use this new module. It now passes all tests (although README warning had to be adjusted slightly).
I don't think it's ready for pull yet, because I'm not fully convinced that functionality is totally equivalent (or better).
See https://npmjs.org/package/normalize-package-data .
normalize-package-data is now independently tested, although only slightly more thorough than read-package-json is (wrt normalization behavior - other tests did not apply).
I see just one issue with current interplay between my modified read-package-json and normalize-package-data: normalize-package-data currently sets data._id, but (understandably) only after data.name and data.version have been fixed and validated.
At the same time, the warn function that is passed to the normalizeData function relies on data._id to be there. This is no problem as long as no "warn" calls happen before data._id is set, which is not the case now. But I don't think it's proper design, and can cause surprising behavior in the future.
The best solution I see is to fix up data.name and data.version, then set data._id inside the final function, before normalizeData is called. It's no problem if normalizeData function effectively does the same thing again. It's job is to normalize, after all.
To make read-package-json code even more dependable, not reliant on correct behavior of normalizeData, is to change the reference to data._id inside the warn function to a local variable (say packageId). This way even if normalizeData malfunctions and removes data._id, the id passed to readJson.log.warn won't be undefined.
What do you think of this approach?
Aside from this issue, is there anything else standing in the way to go ahead with this change?
Do you have any requests for normalize-package-data, wrt code quality, code style, or (new) functionality.
Would you like to see more tests to increase confidence in the refactored code?
I'm open to suggestions.
I'd also appreciate your comments on questions posted on https://github.com/meryn/read-package-data/issues . None of those are critical though. They more represent the obvious opportunities for improvement that I saw. Going ahead with some could influence the behavior of npm as a whole though (i.e. whether or not all unknown fields are removed). Of course the goal would be to influence npm behavior in a positive manner, and I guess deciding on what's right for npm is your prerogative (first time I used that word).
I realized one other thing:
normalizeData will warn if it hasn't found a readme field inside the package data. In how the function is used in read-package-json, this actually means that not only there's no readme field inside the package.json, but neither is there a README.md file in the package directory.
In read-package-json, the warning(s) emitted could be improved, as could the final value of the readme field. This now reads "Error: No README data found" , a change from the original "Error: No README.md found". I think it's a minor issue.