Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
x/vgo: go.mod format should not have a bespoke syntax #23966
Mentions YAML being confusing and not well understood. I dont particularly understand it either, considering the standard disallows tabs as separators, which is unusual and awkward for a whitespace agnostic language like Go.
I'm having a very hard time reconciling "yaml" and "perfectly fine format". It's not the description that springs to mind based on my experience.
A benefit of a custom format here is that only what's allowed is legal. Another is that everything can be given a nice expressive syntax. Error messages can be more easily tailored.
The similarity to Go syntax means it shouldn't be hard for anyone to learn it and syntax highlighters and the like should be easy to adapt from their Go counterparts.
The only major downside I see is that, as a new format, its implementation will require a certain amount of fuzzing and additional testing that would (hopefully) already be done otherwise. (And if the parser is put in the stdlib no one else will have to worry about that either).
...why not just have it be written Go?
I mean, we all know how to write it. We have well-tested lexers and parsers. We have syntax highlighting. We have formatters and tools that can vet the code. We even have a framework for parsing and running sets of files in
could become something like
It'd end up being similar to how
Yeah, it doesn't feel like a set of directives as much as runnable code, but since when has Go done something just so it feels good as opposed to the practical option?
Theoretically, this could also take care of #23972
Let’s avoid bikeshedding on which existing format is best and wait for a response on why a custom syntax was chosen in the first place. It may have been an arbitrary decision, or it may not have. If it wasn’t, then understanding the decision will help inform future choices.
Given that the go.mod file looks very similar to a go source file, why not add module and require as top level declarations and then we can write module syntax inline with our source code?…
On 21 Feb 2018, at 15:58, david karapetyan ***@***.***> wrote: Most folks have settled on TOML. We don't really need another custom format or a format embedded in JSON or YAML. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@davecheney I guess the
The problem with TOML and YAML is that no-one has written code (AFAIK) that can read those formats (including comments) and write them back out again, gofmt style. See BurntSushi/toml#213 for example. Also, YAML is a terrible format. Please no YAML.
I think I quite like the choice of a custom format as long as there is some straightforward way to convert to/from a well known format, because it can be exactly as simple as necessary, and as clean as possible.
I'm very reluctant to jump in here. I'm liking what I see from vgo so far, and I don't want to bikeshed on what might feel like a trivial topic.
However, I feel that part of the friction I'm feeling from my initial vgo experiments comes from the rest of the tools that I use to write Go and work with code in general. I think this is an opportunity to make adoption a little easier.
Here’s why I think we should consider adopting an existing common data format:
I don’t have strong feelings about YAML vs JSON vs whatever else. I’ve used JSON fine with npm and YAML fine with Kubernetes, Helm, and Ansible. They both work, and I’m long past the point in my career where I care about arguments like that. (And for what it’s worth, I’ve never been bugged by the lack of inline comments — READMEs and Issues worked for the rare cases we needed to communicate about dependencies.) From where I’m sitting, the requirements are:
Apologies in advance if I'm off base. I'm fairly new to Go myself, and I confess that I don't yet understand some the original motivations for a bespoke file format. There may be good reasons to go another direction that I'm overlooking!
@rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool. Given that dep made an explicit decision to go with TOML partly because it wasn't hierarchical, it seems unlikely that vgo would reverse that requirement.
@ericlagergren While I like the simplicity of reusing go syntax, using the .go extension for the module file makes it likely that some projects will run into a conflict and have to rename some of their files to switch to vgo, which goes against the goal of making the migration as painless as possible.
The file name is not super central to the idea, IMO. Le mer. 21 févr. 2018 à 08:05, Hugues <email@example.com> a écrit :…
@ecowden <https://github.com/ecowden> Hierarchical. For instance, .properties files are too restricting to future extension. @rsc <https://github.com/rsc> states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool. Given that dep made an explicit decision to go with TOML partly because it *wasn't* hierarchical, it seems unlikely that vgo would reverse that requirement. From golang/dep#119 (comment) <golang/dep#119 (comment)> The one thing that does stick out with TOML is, being not tree-structured, it's possible for us to append constraints to the manifest without rewriting it. That may turn out to be a very important factor in applying sane defaults that help guard us (that is, the entire public Go ecosystem) against nasty exponential growth in solver running time. @ericlagergren <https://github.com/ericlagergren> While I like the simplicity of reusing go syntax, using the .go extension for the module file makes it likely that *some* projects will run into a conflict and have to rename some of their files to switch to vgo, which goes against the goal of making the migration as painless as possible. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23966 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFnwZ2Ni2oC28frH2klja-K6ywyOXBCgks5tXD6RgaJpZM4SM3IZ> .
Whatever format is in use, I certainly hope (with Rob) that there are good public manipulation libraries.
Here are some comments from years of working on goimports:
My point, which was arguably phrased too strongly, is that go.mods people write today will be understood by the eventual official tooling. I want to make clear that people will not have to throw them away and start over. Given that vgo already supports reading nine different legacy file formats (GLOCKFILE, Godeps/Godeps.json, Gopkg.lock, dependencies.tsv, glide.lock, vendor.conf, vendor.yml, vendor/manifest, vendor/vendor.json), I am confident it won't be a burden to read this one too, if we move to something new. And the tooling already rewrites go.mod in place when needed, so updating to a new format will be easy if that's what we decide. I was not attempting to lock this in place.
I obviously agree with this in principle. In practice I spent a while looking at all the existing formats and found them not "perfectly fine" for this job. In particular, look at how much shorter and clearer a go.mod is compared to the equivalent Gopkg.toml. I'm happy to return to this question once we're happy with all the other higher-level details.
And to answer @josharian's concern, if we keep the custom format then yes there would be public tooling, probably along the lines of x/vgo/vendor/cmd/go/internal/modfile.
I like the suggestions of @ericlagergren and @davecheney. It leverages the entirety of the
Have inline module information on
Rust uses the
Swift has a
EDIT: Added comparison to suggestions above.
I’m curious: why does “hierarchical” imply “complex?”
Stepping back, I probably misphrased that last requirement. I was looking for an intersection of the familiar and the extensible, and doing so with YAML and JSON on my mind. “Hierarchical” isn’t really the goal here, and I’m happy to scratch it off the list.
I’m surprised to see the reaction about it being complex, though, and I'm wondering if I'm missing something. When I look at the example
...personally, I see a “hierarchical” data structure. By that, I mean a list of key-value pairs, where values can be primitives, lists, or other lists of key-value pairs. Changing nothing but formatting and punctuation, it becomes:
module: rsc.io/hello require: - golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54 - rsc.io/quote: v1.5.2
…And it’s even 4 characters shorter! (That’s a joke, if it’s not obvious.
When I jumped in here, I was thinking about extending an existing git repo dependency analyzer written in Node.js to recognize vgo modules. (Well, that, and how I missed the pretty colors my editor makes highlighting files...) Then I realized how much I didn’t want to create and maintain a custom parser, and how much easier it would be with a “standard” data format.
By all means, put this question on the back burner. There are waaay more important things to figure out with vgo, and I like what I’m seeing so far!
Even if YAML is considered to be too complex or confusing, I would still prefer it (or JSON, or TOML or whatever other standard declarative format) over bespoke format, and support the subset of it that we are happy with.
In other words, if
@ecowden's example above makes it immediately clear to me which format I would prefer.
Another concern with
I think we should continue to use the very simple go.mod format, after the further simplification of making quotes optional (#24641). Once the dust settles, we should also publish a package like x/vgo/vendor/cmd/go/internal/modfile so that other tools can parse and edit mod files too.
As I wrote originally, I do understand the appeal of a standard file format, but I am still unable to find one that worked well for this task. My main concern is ease of editing, for both people and programs.
The files have to be easy for people to edit. For example, the hacked-up blog post system I built stores a JSON blob at the top of each file, above the post text, because it was very easy to implement that. But I am sick of needing to leave out the comma after the last key-value pair, because it makes adding a new key-value mean editing the previous one too. This is exactly why we allow trailing commas in Go literals. Those annoyances add up.
The files also have to be easy for programs to edit, without mangling it. Think about all the benefit we’ve gotten from gofmt and tools being able to collaborate with people to work on Go source files. People and programs working together on go.mod will be similarly beneficial. In fact this is a key part of the design. If you read through the Tour of Versioned Go you’ll see repeated alternation between the developer editing go.mod and vgo itself editing go.mod. That has to run very smoothly.
All the “generalized key-value pair” formats become awkward when there’s more than a single key-value pair to express. It’s true that we could use a YAML-like notation:
but that nice one-line-at-a-time breaks when we get to
But then what does
The awkwardness here is not much, but it’s still quite annoying: three lines instead of one, with corresponding reduced readability and ability to use line-based tools like grep, sort, diff.
The fundamental problem is that not everything a developer needs to say is best expressed as key-value pairs. We don’t use shells that require us to write:
Yet somehow many developers accept this in config files. Why? Because, as Rob said, existing formats “are well understood and have publicly available parsers.” At least, we think that’s true. The more I look at these formats the less convinced I become. And even assuming it's true, that benefit has to outweigh the disadvantages imposed by the format itself.
JSON is too picky (for example, about commas) and has no support for comments. It’s out.
XML is equally picky about closing tags and is too noisy in general. It’s out.
TOML and YAML are at least easier for people to edit, but they both have the general key-value problem.
Additionally, TOML requires quotes around both module paths as keys (because they have slashes) and all values (
Both TOML and YAML also turn out to be more complex than they first appear, a detail that’s very important if you need not just a parser but a mechanical editor that can parse, edit, and reprint the file. TOML’s complexity starts to show once you move away from key-value pairs: you have to learn the distinction between [x] and [[x]] and then start thinking about regular key-value pair lines versus inline tables. Of course, that’s nothing compared to YAML. Here’s an illuminating exercise: flip through http://yaml.org/spec/1.2/spec.pdf and try to find out what syntactic restrictions are placed on unquoted keys and values in key-value pairs. I’m still not completely sure. YAML embeds JSON as a subset but they didn’t stop there. As far as I can tell from the document, instead of writing:
it appears to be equally valid to write:
and it also appears the two forms can be blended arbitrarily. Something as simple as
appears to be valid YAML yet mean something different from what our “subset” parser would understand. There would be constant pressure to give up the insistence on using a subset of YAML, and yet it becomes more difficult to write a good mechanical editor (parse+edit+reprint) the more complexity is introduced.
If we had to pick some existing format, I’d pick TOML, but even that seems wrong:
The [[ ]] are necessary here because [require] is a single table (of key-value pairs each of which stands alone) while [[replace]] is an array of tables, in which each table is one replacement, with three keys: the path being replaced and the special keys “with” and “at”. If you wanted to reserve any possible future expansion you’d have to use [[require]] too, making it:
All in all, it doesn’t seem like these file formats are actually helping advance our goal of making the file easy for people and programs to edit. We’d probably have to write a custom parser+reprinter anyway, so the only real benefit would be syntax highlighting in editors. I think that benefit is easily outweighed by the awkwardness of shoehorning our semantics into these files in the first place. If your configuration is a few basic key-value pairs, they make a lot of sense. Ours is not just key-value pairs, so those files don’t make sense.
P.S. I wondered for a long time why it was that “dep ensure -add” did not modify existing constraints in Gopkg.toml. The answer is that Dep can’t reliably modify hand-written TOML, preserving comments and the like. Dep sometimes appends to Gopkg.toml but otherwise imposes the rule that Gopkg.toml is owned by people and Gopkg.lock is owned by programs. This seems to be an artifact of the available libraries as much as it is a design choice.