Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: spec: introduce structured tags #23637

Open
urandom opened this issue Jan 31, 2018 · 70 comments
Open

proposal: Go 2: spec: introduce structured tags #23637

urandom opened this issue Jan 31, 2018 · 70 comments

Comments

@urandom
Copy link

@urandom urandom commented Jan 31, 2018

This proposal is for a new syntax for struct tags, one that is formally defined in the grammar and can be validated by the compiler.

Problem

The current struct tag format is defined in the spec as a string literal. It doesn't go into any detail of what the format of that string might look like. If the user somehow stumbles upon the reflect package, a simple space-separated, key:"value" convention is mentioned. It doesn't go into detail about what the value might be, since that format is at the discretion of the package that uses said tag. There will never be a tool that will help the user write the value of a tag, similarly to what gocode does with regular code. The format itself might be poorly documented, or hard to find, leading one to guess what can be put as a value. The reflect package itself is probably not the biggest user-facing package in the standard library as well, leading to a plethora of stackoverflow questions about how multiple tags can be specified. I myself have made the error a few times of using a comma to delimit the different tags.

Proposal

EDIT: the original proposal introduced a new type. After the initial discussion, it was decided that there is no need for a new type, as a struct type or custom types whose underlying types can be constant (string/numeric/bool/...) will do just as well.

A tag value can be either a struct, whose field types can be constant, or custom types, whose underlying types are constant. According to the go spec, that means a field/custom type can be either a string, a boolean, a rune, an integer, a floating-point, or a complex number. Example definition and usage:

package json

type Rules struct {
    Name string
    OmitEmpty bool
    Ignore bool
}

func processTags(f reflect.StructField) {
    // reflect.StructField.Tags []interface{}   
    for _ ,t := range f.Tags {
        if jt, ok := t.(Rules); ok {
              ...
              break
        }
    }
}
package sqlx

type Name string

Users can instantiate values of such types within struct definitions, surrounded by [ and ] and delimited by ,. The type cannot be omitted when the value is instantiated.

package mypackage

import json
import sqlx

type MyStruct struct {
      Value      string [json.Rules{Name: "value"}, sqlx.Name("value")]
      PrivateKey []byte [json.Rules{Ignore: true}]
}

Benefits

Tags are just types, they are clearly defined and are part of a package's types. Tools (such as gocode) may now be made for assisting in using such tags, reducing the cognitive burden on users. Package authors will not need to create "value" parsers for their supported tags. As a type, a tag is now a first-class citizen in godoc. Even if a tag lacks any kind of documentation, a user still has a fighting chance of using it, since they can now easily go to do definition of a tag and just look up its fields, or see the definition in godoc. Finally, if the user has misspelled something, the compiler will now inform them of an error, instead of it occurring either at runtime, or being silently ignored as is the case right now.

Backwards compatibility

To preserve backwards compatibility, string-based tags will not be removed, but merely deprecated. To ensure a unified behavior across libraries, their authors should ignore any string-based tags if any of their recognized structured tags have been included for a field. For example:

type Foo struct {
    Bar int `json:"bar" yaml:"bar,omitempty"` [json.OmitEmpty]
}

A hypothetical json library, upon recognizing the presence of the json.OmitEmpty tag, should not bother looking for any string-based tags. Whereas, the yaml library in this example, will still use the defined string-based tag, since no structured yaml tags it recognizes have been included by the struct author.

Side note

This proposal is strictly for replacing the current stuct tags. While the tag grammar can be extended to be applied to a lot more things that struct tags, this proposal is not suggesting that it should, and such a discussion should be done in a different proposal.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 31, 2018

Related to #20165, which was recently declined. But this version is better, because it proposes an alternative.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 31, 2018

I don't see any special need for a new tag type. You may as well simply say that a struct field may be followed by a comma separated list of values, and that those values are available via reflection on the struct.

On the other hand, something this proposal doesn't clearly address is that those values must be entirely computable at compile time. That is not a notion that the language currently defines for anything other than constants, and it would have to be carefully spelled out to decide what is permitted and what is not. For example, can a tag, under either the original definition or this new one, have a field of interface type?

Loading

@urandom
Copy link
Author

@urandom urandom commented Jan 31, 2018

@ianlancetaylor
You raise an interesting point. A struct will pretty much have the same benefits as a new tag type would. I imagine it would probably make the implementation a bit simpler. Other types might only be useful if they are the underlying type of a custom one, and as such one would have to use them explicitly, otherwise there might be ambiguity when a constant is provided directly:

package sqlx

type ColumnName string
...

package main
import sqlx

type MyStruct struct {
    Total int64 [sqlx.ColumnName("total")]
}

vs what I would consider an invalid usage:


package main
import sqlx

type MyStruct struct {
    Total int64 ["total"]
}

For your second point, I assumed that it would be clear that any value for any field of a tag has to be a constant. Such a "restriction" makes it clear what can and cannot be a field type, and will rule out having a struct as a field type (or an interface, in your example).

Loading

@dlsniper
Copy link
Contributor

@dlsniper dlsniper commented Feb 5, 2018

I wonder if we could solve this without having to change the language, and even better, in Go 1.X rather than waiting for Go 2. As such, I've tried to understand the problem as well as the proposed solution and came up with a different approach to the problem, please see below.

First, the problem.
I think the description starts from the wrong set of assumptions:

There will never be a tool that will help the user write the value of a tag, similarly to what gocode does with regular code.

There can totally be a tool that understands how these flags work and allow users to define custom tags and have them validated.
One such tool might for example benefit from "magic" comments in the code, for example, the structure proposed could be "annotated" with a comment like // +tag.

This would of course have the advantage of not having to force the change in the toolchain, with the downside that you'd need to have the tool to validate this instead of the compiler. The values should be json, for example:

package mypackage

import "json"
import "sqlx"

type MyStruct struct {
	Value string `json:"{\"Name\":\"value\"}" sqlx:"{\"Name\":\"value\"}"`
	PrivateKey []byte `json:"{\"Ignore\":true}"`
}
package json

// +tag
type Tag struct {
	Name string
	OmitEmpty bool
	Ignore bool
}
package sqlx

// +tag
type SQLXTag struct {
	Name string
}

More details can be put into this on how this should be a single tag per package, the struct must be exportable, and so on (which the current proposal also does not address).

The format itself might be poorly documented, or hard to find, leading one to guess what can be put as a value. The reflect package itself is probably not the biggest user-facing package in the standard library as well, leading to a plethora of stackoverflow questions about how multiple tags can be specified.

This sounds like a problem one could be able to fix with a CL / PR to the documentation of the package which specifically improves it by documenting the tag available or how to use these struct tags.

I myself have made the error a few times of using a comma to delimit the different tags.

Should the above proposal with "annotating" a struct in a package work, this means that the tools could also fix the problem of navigating to the definition of tag.

Furthermore, the original proposal adds the problem that tag now is a keyword and cannot be used as in regular code. Imho, should any new keyword be added in Go 2, there should be a really good reason to do so and it should be kept in mind that it would make the porting of existing Go 1 sources that much harder given how now the code needs to be refactored before being ported over.

The downside of my proposal is that this requires people to use non-compiler tools. But given how govet is now partially integrated in go test, this check could also be added to that list.

Tools that offer completion to users can be adapted to fulfill the requirement of assisting the user in writing the tag, all without having the language changed with an extra keyword added.

And should the compiler ever want to validate these tags, the code would already be there in govet,

Loading

@creker
Copy link

@creker creker commented Feb 5, 2018

@dlsniper

Furthermore, the original proposal adds the problem that tag now is a keyword and cannot be used as in regular code

You can always make compiler a bit smarter to understand context and when a keyword is a keyword. tag keyword would appear in a very specific context which compiler could easily detect and understand. No code could ever be broken with that new keyword. Other languages do that and have no problem adding new contextual keywords without breaking backwards compatibility.

As for your proposal, for me adding another set of magic comments further establishes opinion that there's something wrong with the design of the language. Every time I look at these comments they look out of place. Like someone forgot to add some feature and, in order to not break things, shoved everything into comments. There's plenty of magic comments already. I think we should stop and implement proper Go features and not continue developing another language on top of Go.

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 5, 2018

You can always make compiler a bit smarter to understand context and when a keyword is a keyword.

As I said above, though, I see no advantage at all to using tag rather than struct.

This proposal still needs more clarity on precisely what is permitted in a tag type, whatever we call it. It's not enough to say "it has to be a constant." We need to spell out precisely what that means. Can it be a constant expression? What types are the fields permitted to have?

Loading

@urandom
Copy link
Author

@urandom urandom commented Feb 5, 2018

One such tool might for example benefit from "magic" comments in the code, for example, the structure proposed could be "annotated" with a comment like // +tag.

This seems to make the problem worse, to be honest. Instead of making tags easier to write and use, you are now introducing more magic comments. I'm sure I'm not the only one opposed to such solutions, as such comments are very confusing to users (plenty of questions on stackoverflow). Also, what happens when someone puts // +tag before multiple types?

Value string json:"{\"Name\":\"value\"}" sqlx:"{\"Name\":\"value\"}"

This not only ignores a major part of the problem, illustrated by the proposal (syntax), but also makes it harder to write.

More details can be put into this on how this should be a single tag per package, the struct must be exportable, and so on (which the current proposal also does not address).

Should we address the obvious? Honest question, I skipped some things as I thought they were too obvious to write.

This sounds like a problem one could be able to fix with a CL / PR to the documentation of the package which specifically improves it by documenting the tag available or how to use these struct tags.

It's still in the reflect package. Why would an average user ever go and read the reflect package. It's index alone is larger that the documentation of some packages.

Furthermore, the original proposal adds the problem that tag now is a keyword and cannot be used as in regular code. Imho, should any new keyword be added in Go 2, there should be a really good reason to do so and it should be kept in mind that it would make the porting of existing Go 1 sources that much harder given how now the code needs to be refactored before being ported over.

I edited my original proposal to remove the inclusion of a new type. This was discussed in the initial discussion with @ianlancetaylor, and I was hoping further discussion would include that as well.

Loading

@urandom
Copy link
Author

@urandom urandom commented Feb 5, 2018

This proposal still needs more clarity on precisely what is permitted in a tag type, whatever we call it. It's not enough to say "it has to be a constant." We need to spell out precisely what that means. Can it be a constant expression? What types are the fields permitted to have?

I've edited the proposal to add more information as to what types are permitted as tags.

Loading

@giniedp
Copy link

@giniedp giniedp commented Feb 5, 2018

i like the idea of the proposal. I would love to be able to write Metadata that is then checked at compile time, and that i do not need to parse the metadata at runtime.

The updated proposal makes sense to me. A Metadata is simply a struct.
Also the syntax is ok for the tags. It event lets me write the metadata line by line

1. initial proposal

type MyStruct struct {
    Value string [
        json.Rules{Name: "value"}, 
        sqlx.Name{"value"}
    ]
    PrivateKey []byte [
        json.Rules{Ignore: true}
    ]
}

That appears even to be readable (at least to me). Fields and annotations are clearly distinguishable, even without syntax highlighting.

Now, i know this is about tags only but if we would want to add metadata to other things than struct fields, i would suggest to put the metadata above the thing that you annotate instead of having it behind.

2 examples that arise to me, are C# Attributes and Java Annotations. Lets see how they would look like on go struct fields

2. C# like

type MyStruct struct {
    [json.Rules{Name: "value"}]
    [sqlx.Name{"value"}]
    Value string

    [json.Rules{Ignore: true}]
    PrivateKey []byte 
}

That is less readable than 1.

3. Java like

type MyStruct struct {
    @json.Rules{Name: "value"}
    @sqlx.Name{"value"}
    Value string

    @json.Rules{Ignore: true}
    PrivateKey []byte 
}

That is less readable than 1. but way cleaner than 2.

Now in go we already have a syntax for multiple imports and multiple constants. Lets try that

4. Go like

type MyStruct struct {
    meta (
        json.Rules{Name: "value"}
        sqlx.Name{"value"}
    )
    Value string

    meta (json.Rules{Ignore: true})
    PrivateKey []byte 
}

That is less compact. Removing the meta keyword wouldnt help i think. Neither when using square brackets

5. with square brackets

type MyStruct struct {
    [
        json.Rules{Name: "value"}
        sqlx.Name{"value"}
    ]
    Value string

    [json.Rules{Ignore: true}]
    PrivateKey []byte 
}

The single statement looks like 2. C# like and the multiline statement is still not as compact as i would like it to be.

So far i still like the 3. Java like style best. However, if metadata should not be applied to anything other than struct fields (ever) then i prefer the 1. initial proposal style. Now if there are some legal issues with stealing a syntax from another language (i am not a lawyer) then i could think of following

6. hash

type MyStruct struct {
    # json.Rules{Name: "value"}
    # sqlx.Name{"value"}
    Value string

    # json.Rules{Ignore: true}
    PrivateKey []byte 
}

Loading

@urandom
Copy link
Author

@urandom urandom commented Feb 5, 2018

Having the field tags before the field declaration is not very readable, compared to having them afterwards. For the same reason that it is more readable to have the type of a variable after the variable. When you start reading the struct definition, you come upon some meta information about something you haven't yet read. Currently, you know that a PrimaryKey is a byte array, and it is ignored for json marshaling. With your suggestioun, You know that something will be ignored for json marshaling, and afterwards you learn that what is ignored will be a primary key, which is a byte array.

Loading

@giniedp
Copy link

@giniedp giniedp commented Feb 5, 2018

I understand your point and i partially agree on that. Having metadata on tags only, your suggestion looks best (i might want to omit the comma in multiline though)

My intention is to suggest a syntax that might work on structs and methods. Those elements have their own body. Adding metadata behind the body might push it out of sight if the implementation is lengthy. I think metadata should live somewhere near to the name of the element that it annotates and my natural choice is above that name, since below is the body.

Syntax highlighting helps to spot the parts of the code you are interested in. So if you are interested in reading the struct definition, your eye will skip the metadata syntax.

Loading

@creker
Copy link

@creker creker commented Feb 5, 2018

I don't think tags are that important to care about them being out of sight. For the most part, I only care about actual fields and look at tags only in very specific cases. It's actually a good thing that they're out of the way because most of the time you don't need to look at them.

Your examples with @ and # prefixes look good and readable but I don't think it's that important to pursue and change existing syntax. Even C# syntax is easy to read for me being a C# programmer.

Loading

@urandom
Copy link
Author

@urandom urandom commented Feb 12, 2018

Just wanted to add another quick anecdote. I recently saw this committed by a colleague of mine, a seasoned developer:

ID int `json: "id"`

Obviously this wasn't tested at runtime, but its clear that even the best of us can overlook the syntax, especially since were are used to catching 'syntax' errors at compile time.

Loading

@kprav33n
Copy link
Contributor

@kprav33n kprav33n commented Feb 12, 2018

I like this proposal for property tags. Would it be too much to ask for a similar feature for struct-level tags?

Loading

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 12, 2018

@kprav33n I'm not sure what you mean by struct-level tags, but I'm guessing it's something that the language does not support today. It sounds like an orthogonal idea that should be discussed separately from this issue.

Loading

@kprav33n
Copy link
Contributor

@kprav33n kprav33n commented Feb 13, 2018

@ianlancetaylor Thanks for the comment. Yes, this feature doesn't exist in the language today. Will discuss as a separate issue.

Loading

@Backfighter
Copy link

@Backfighter Backfighter commented Mar 25, 2018

I really like this proposal since I encountered problems with the current implementation of tags

type MyStruct struct {
    field string `tag:"This is an extremely long tag and it's hard to view on smaller screens since it is so incredibly long, but it can't be broken using a line break because that would prevent correct parsing of the tag."`
}

Adding line breaks will prevent the tag from being parsed correctly:

package main

import (
	"fmt"
	"reflect"
)

type MyStruct struct {
	field string `tag:"This is an extremely long tag and it's hard to view on smaller 
screens since it is so incredibly long, but it can't be broken using a 
line break because that would prevent correct parsing of the tag."`
}

func main() {
	v, ok := reflect.ValueOf(MyStruct{}).Type().Field(0).Tag.Lookup("tag")
	fmt.Printf("%q, %t", v, ok)
	// Output: "", false
}

This is kind of documented at https://golang.org/pkg/reflect/#StructTag which clearly forbids line breaks between the "tag-pairs" and says that the quoted string part is in "Go string literal syntax". Which means "interpreted string literal" per specification.
In this case a compile time check could have saved me some debugging time.

Loading

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Mar 25, 2018

I'm not sure if this is worth the increase in complexity to the language, etc., etc. But the potential benefits to tooling and improvement to developer experience do seem quite nice, though. I think it is worth exploring the idea.

I'm not concerned about the particulars of the syntax, so I'll stick with the convention in the first post. (To give the [] syntax a distinct name, I'll call it a vector.)

There's discussion about extending what kinds of types can be constants—which, currently, seems to be mostly taking place in #21130. Let's assume, for the moment that, it's extended to allow a struct whose fields are all types that can be constants.

While I agree that a tag should be a defined type, I don't think that should be enforced by the compiler—that can be left to linters.

With the above, the proposal reduces to: any vector of constants is a valid tag.

This also allows an old-style tag to be easily converted to a new-style tag.

For example, say we have some struct with a field like

Field Type `json:",omitempty" something:"else" another:"thing"`

Given a tool with a complete understanding of the tags defined in the stdlib but no third party libraries, this could be automatically rewritten to

Field Type [
    json.Rules{OmityEmpty: true},
    `something:"else"`,
    `another:"thing"`,
]

Then, the third party tags could be manually rewritten or rewritten further by tools provided by the third party packages.

It would also be possible for the reflect API to work with both old- and new-style tags: Get and Lookup would search for a tag that is an untyped string with the old-style format in the vector of constants while a new API allowed introspection of the new-style tags.

I'd also note that most of the benefits of this proposal are for tooling and increased compile time checking. There's little benefit for the program at runtime, but there are some:

  1. No parser needed in already reflect-heavy code, reducing the bug/security surface while requiring fewer tests
  2. Tags can have methods, potentially allowing a better factoring of the code for handling the tags.

Loading

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Apr 20, 2018

Some points brought up in #24889 by myself, @ianlancetaylor, @balasanjay, and @bcmills

If the tags are any allowed constant they could also be named constants and not just constant literals, for example:

const ComplicatedTag = pkg.Qual{ /* a lot of fields */ }
type S struct {
  A int [ ComplicatedTag ]
  B string [ ComplicatedTag, pkg.Other("tag") ]
  C interface{} [ ComplicatedTag ]
}

which allows both reuse and documentation on ComplicatedTag

Tags, being ordinary types, can use versioning which in turn allows them to be standardized and collected in central locations.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jul 11, 2018

I dislike the idea of binding tags and external packages. While field tags are very widely used by those external packages, binding them is very hindering.

Having an json:... tag does not means the struct (or the package holding the struct) should have a dependency on encoding/json: since that Go currently does not have ways to modify field tags, it makes sense to be there for other packages (usually in the same program) to marshal/unmarshal. Being dependent on encoding/json does not make sense.

I think field tags have problems need to be addressed, like OP said: It needs syntax check and tools helping that. But binding them to dependency feels overdoing.

Loading

@urandom
Copy link
Author

@urandom urandom commented Jul 11, 2018

@leafbebop

Why would importing the json package pose problems? Or in fact, any package that will provide tags? The only problem with dependencies is circular imports, which would never happen here. And so far, I haven't seen that many third-party packages that use other third party packages' tag definitions, which means that more or less all current tags, like json, are pretty much placing a dependency on a package that defines them, since you are also more than likely to be importing said package somewhere else in your code in order to work with the structs that have these tags.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jul 11, 2018

@urandom

Say I am writing a website, and since Go is supporting Wasm, I am going fullstack. Because of that, I isolate my code of data models for re-usabilty.

Now that there is not a way to add field tags to struct, for my server code to be able to co-operate with sql, I add tag fields for sqlx. To do so, I imported sqlx, of course.

And then I decide to re-use the code of data models for my Wasm-based frontend, to trully enjoy the benefit of full-stacking. But here is the problem. I imported sqlx, and sqlx has an init function, which means the whole package cannot be eleminated by DCE - that means binary size increases for no gain at all - and for Wasm, binary size is critical. The worst part is yet to come: sqlx uses cgo, which cannot be compiled to Wasm (yet, but I do not think Go will ever come to a point that compiles C to Wasm, that's pure magic).

Sure, I can just copy my code, delete all tags and build it again. But why should I? The code suddenly becomes non-cross-platform-able (now JS is a GOOS) just because something trivial. It does not make sence.

Alternatively, I think it can remains in a keyword style - instead of package, use a string-typed keyword.

Loading

@urandom
Copy link
Author

@urandom urandom commented Jul 11, 2018

@leafbebop
I'd say that first of all, that is an incredibly specific example with not a lot of real world consequence. That being said, since you are worried about the resulting file size (again, not something a lot of us care about on a daily basis), you can also copy the struct and omit the tags. As you said so yourself, that would work just fine.

Such code would be in the extreme minority. Not only would it target wasm, but would also have to import cgo modules. Not a lot of people will have to deal with that. And why should everyone else have to suffer the current error-prone structureless mess that you have to triple check when you write, because no tool will be able to help you with that, and then pray that down the line you won't get any runtime errors because a field didn't match?

Loading

@urandom
Copy link
Author

@urandom urandom commented May 28, 2020

@devil418
This is an already solved problem, since the current string tag system already produces coupling that may need to be avoided. And we have several solutions. 2 of the top of my head are: creating a local struct with the needed tags, and creating a new type using the desired one as a base, and implementing whatever interface the encoding/decoding library is using. Of course, nothing is stopping anyone from implementing what your wrote as a library, since it doesn't require any changes in the language.

Loading

@tv42
Copy link

@tv42 tv42 commented May 29, 2020

@devil418

type-checked at compile time

A lot of your example is stringly typed. json.Allow and json.Rename refer to struct fields by name; nothing about that is type-checked at compile time.

Loading

@Merovius
Copy link

@Merovius Merovius commented Jun 12, 2020

(Apologies if I repeat what has been said - I've skimmed the thread for the points I made, but it's very possible that I overlooked something)

I agree with the general criticism that how a type is encoded into JSON shouldn't really be a property of the type, but a property of whatever does the encoding (that is, what field-names to use should be something that's somehow passed to json.Marshal, not something that is put on the struct). But I think that's likely a lost battle by now.

I also agree with the criticism that typed struct-tags require an import that shouldn't be necessary. If I annotate a struct with json-tags, that doesn't necessarily mean it ever gets encoded into json. And a program that doesn't do it, shouldn't have to compile the json package in when using the type. Importing isn't really the relevant question here, though. It is very possible for a program to import a package without it actually being compiled in - the linker can sometimes determine that a package isn't actually used and strip it out. So even if we use typed struct fields, a sufficiently clever linker might solve this problem. But as the tags are available via reflection, I think that's really hard (I assume, for example, that passing the type to fmt would then also trigger the "we need to include type-info about the tags" check).

Lastly: I also agree that it would be better, in general, if struct-tags would have more structure. One way to do this, that addresses at least some of the issues brought up by @ianlancetaylor, is to allow them to be a list of constant-expressions instead of a single string-literal. Untyped constants get their default type assigned (to deal with the question of precision and such for runtime-representation). The way this could work, is that json declares something like this

package json

type Name string

type ImaginaryTag int

type boolTag bool

const (
    OmitEmpty = boolTag(true)
    Omit = boolTag(true)
)

which could be used as

type Foo struct {
    Foo int json.Name("bar"),json.OmitEmpty,bson.Name("bar")
    Bar float64 json.Name("baz"),bson.Name("baz")
    Baz string json.ImaginaryTag(42)
    EmbeddedThing json.Omit
}

(there is a parsing-ambiguity with embedded fields, if the constant-expression is an identifier. This could either be resolved in the type-checking phase, or there might be some color of bikeshed that doesn't have this problem)

The json-package could then, via reflection, see the type of the struct-tag and switch behavior based on it. Tags would still be compile-time computable (as they are constants) and type-checked. You can't have composite types as struct-tags, but I believe use-cases that require that are very rare - if they exist at all. Complicated use-cases might even still use some custom grammar inside a stringly-typed tag. But I believe for 90%+ of use-cases the types which can already be constants in Go would suffice.

Loading

@urandom
Copy link
Author

@urandom urandom commented Jun 12, 2020

@Merovius
Is there a technical reason why the tags can't be composite, if they are only made up of constant values? Otherwise, as the proposal says, regular constants are also fine and would be an improvement over the current string-based tags

Loading

@Merovius
Copy link

@Merovius Merovius commented Jun 12, 2020

@urandom The technical reason is "constants can't be composite values". We'd need to introduce either some general way to have composite constants (there are proposals about that elsewhere) or a separate notion of a pseudo-constant literal only used in struct tags to the spec, if we want that. The complexity of doing that hardly seems worth the benefit. Constants are already well-defined for strings, bools and numeric types, so we'd really only need to touch the field-tag part of the spec to do what I suggested :)

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 20, 2020

I am still not convinced on having to have an extra dependency just because one of my dependency have tags on one of its field. I think during the development of go mod, many great articles are written on the topic why dependency can be problematic.

@Merovius Importing is the issue. I am not confident that yaml or sql world would have a common package for tags, and that means when designing the data model, the library used for the field tag must be determined, or when writing code of implementing details, the model code must be changed accordingly. And since string tags can be read if the keywords are the same, I very much doubt structured (or typed) tags in different package can understand each other, so it is extra bad if there is two different packages (different type of sql usually) using the model. That is a no to me.

On the other hand, I would say a linter, possibly supported by go vet, allowing some kind of package-defined protocol of tag fields is more interesting (and probably with help form gopls, which I still haven't got time to look into to understand more to comment on). But as the discussion shows, it is probably just me.

Another thought just occurred to me before hitting the comment button: If we are asking packages to create a new package for tags, how about we ask them to provide a test function that can be plugged into unit tests, which simply accepts interface{} and validate if the tag field is type safe, well-formed and good?

Loading

@creker
Copy link

@creker creker commented Jun 20, 2020

How about actually not allowing packages with tag definitions to contain anything else? That not only encourages but forces people to define tags in very small separate packages that can be imported everywhere. Compiler doesn't even need to compile those. It only needs to parse them and use the information for syntax checks.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 20, 2020

Loading

@creker
Copy link

@creker creker commented Jun 20, 2020

@leafbebop why do they need to write common package? SQL, yaml, json, they all need completely different tags. They should be in separate packages. And if a package is not allowed to contain anything else apart from tag definitions you don't pay the usual price for the dependency. This is eliminating the main problem - transient dependencies. You can treat this tag package as a tag protocol like you described earlier.

if the tag is so complicated to write by hand

Even with simple tags like json you can make mistakes. Gopls recently started giving warnings about json tags and caught a few typos in my code. With libraries like gorm things get even worse.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 21, 2020

Loading

@urandom
Copy link
Author

@urandom urandom commented Jun 22, 2020

@leafbebop
This is a bit anecdotal, but I've noticed that we never add tags to a struct, unless the primary user plans to use the package that requires these tags in the first place. In terms of dependencies, if the tags were structured, no extra dependencies would be added because of them, since the dependent package is already used (i.e. the struct is already decoded via yaml, or a db, in the primary user / same package). For us, we would also not be adding extra dependencies due to tags due to secondary users (users of the struct in a completely different package/module), since we don't like adding tags that aren't used by the primary user itself. That is to say, if the primary user of the struct isn't using a package that defines some tags, those tags aren't being added in the first place. We always copy the structs and add different tags, since we don't want to couple structs to such packages unnecessarily, even when the tag is just a string.

So, at least for our shop, we would never see the problem you are describing.

On the subject of linters, such a solution would not be able to provide completion of non-structured tags. One would have to hard-code the tags into gopls, which is unfeasible for non-stdlib tags.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 22, 2020

@urandom
I don't know about your code base, as well as many other's so I really can't comment on that. In my experience, however, it is pretty common to see data, and basic domain-specific data behaviour separated from decoding/encoding/storing logic. The data defined often describe a few requirements in tag fields, but not specific to any implementation of them.

I am not certain how much of the user base me or your shop can count for, but that's some concern.

As for linter, I can't see how that is impossible (maybe I should read into gopls soon). I am not saying that this is feasible for now, but saying that if a package defined typed tags they supported, with magic comments or other proposed or conventionalized protocol, a linter can read about them, and provide completion of non-structured tags according to that info.

On the other hand, I am curious to see a use case that type-check [1] would not suffice, and does not deserve its own logic, but auto-completion and other gopls features would help largely.

[1] which can easily be done by unit tests, with the package provides a validate facility, probably helped by some "x/tools/" utility - again this requires change of the package, but so does the proposal.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 22, 2020

And another note: while this proposed spec change can be done in a backward compatible way, I would see many tools that built on purely unstructured tags (of other packages) would be broken, for people will rarely keep both unstructured tags and structured tags (and the current proposal does not support keeping them both).

I am not saying there a significant amount of tools reading on unstructured tags out there (or not, I am not sure about that), but that's another concern, if we are doing this.

Loading

@urandom
Copy link
Author

@urandom urandom commented Jun 22, 2020

@leafbebop
The proposal only wants to deprecate the string-based tags, not remove them, as that would be backwards incompatible.

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 22, 2020

Loading

@Merovius
Copy link

@Merovius Merovius commented Jun 22, 2020

FWIW, my rough design would make it relatively simple to keep both (at least in an intermediary period). StructField.Tag could refer to the first string-typed tag, formatted in the old way. A new field, Tags []Value could be added, that lists all tags. New packages defining tags would only use the latter and select those of the appropriate type (a helper method func (StructField) TypedTag(Type) Value could be added for convenience). Packages like json, that want to stay backwards-compatible, would look in both to decide what to do.

That would enable a transition from

type Foo struct {
    Foo           int     `json:"foo,omitempty" bson:"foo"`
    Bar           float64 `json:"bar"`
    EmbeddedThing         `json:"-"`
}

to

type Foo struct {
    Foo           int     json.Name("foo"), json.OmitEmpty, `bson:"foo"`
    Bar           float64 json.Name("bar")
    EmbeddedThing         json.Omit
}

This would still work as expected with the updated json package and the (presumed pre-structured) bson package for both versions. Of course, the second version doesn't work with a pre-structured json-package - the assumption is, that an appropriate go version would be specified in go.mod, when making this change.

(Not commenting on the import-issue, because as I said, I tend to agree it's a problem to be solved and IMO that's everything I can say about it for now)

Loading

@leafbebop
Copy link

@leafbebop leafbebop commented Jun 22, 2020

@Merovius

By meaning keeping both, I am not referring json.Name and bson:"foo", but rather json.Name("foo") and json:"foo". And yes, it seems like in your pseudo proposal it can be kept but it is awkward (as it should be), and I think most people would not really do it. Even if, bug of inconsistency between the tag field redundancy is easy to make and hard to catch.

This will cause code that reads on string field of json:"foo", but not part of the json library (at least the imported json library) would break.

This might be very rare and impractical case, but I think unofficial implementation of json package would suffer that. Not it is hard to fix, or I'm sure that is going to have an impact, but another concern.

Loading

@bminer
Copy link

@bminer bminer commented Jun 22, 2020

Since tags are essentially a string and structured tags are essentially []interface{}, I feel like it would be relatively easy for a package to support both, especially if string-based tags are already supported and actively being parsed by the package. Deprecating the unstructured tags seems like the right approach since structured tags provide many more benefits, and this is probably the road most packages will follow.

One unanswered question is... what should happen if a struct field tries to use structured and unstructured tags at the same time? Surely this will happen as libraries slowly migrate to structured tags. Should conflicting options in the structured tag override the string-based tags? Should the library make this decision? Should there be some sort of convention? If so, what?

Loading

@urandom
Copy link
Author

@urandom urandom commented Jun 22, 2020

@bminer
Since string tags would be deprecated, the ideal solution would be for libraries to consider the structured tags first, falling back to string tags when the former aren't available. Of course, this isn't enforceable, so its really up to the library itself.

Loading

@bminer
Copy link

@bminer bminer commented Jun 22, 2020

@urandom - I agree, and it's probably not enforceable. Just pointing out the corner case that might warrant a new guideline / convention -- perhaps even something that is checked by a linter. Should a library completely ignore the string / unstructured tags when both structured and unstructured tags are present?

Example:

type Person struct {
  FirstName string `json:"firstName" bson:"first"` [json.Name("fName")]
}

EDIT: Fixed invalid string tag. :)

  • Does the JSON encoder encode the FirstName field as "firstName" or "fName" above?
  • What does the BSON encoder do since there are structured tags present?
  • Suppose a library only supports parsing unstructured / string tags. Can structured tags be "stringified" to address this? Can this be done at compile time?

I think it would help to address these questions in the proposal -- even if it's just a convention that isn't enforceable. Thoughts?

Loading

@Merovius
Copy link

@Merovius Merovius commented Jun 22, 2020

I think building this idea of stringification in would mostly defeat the point of the proposal. If you stringify the typed tags, that mapping has to be injective, so you don't get collisions. But if you have an injective mapping, why not just use that mapping for your string-tags in the first place? Or to put another way: To be well-defined, the stringification would likely need to include the import path of the type defining the tag - and if the package using it knows that it needs to specify the import path, why not move it over to the structured tagging API while you're at it?

Do any of y'all actually know a case where you'd need to support both simultaneously? ISTM that usually whoever adds the struct-tags will have a certain amount of control (and/or expectations) over what packages that type is used with - and at the end of the day, the syntax is still bound to the package defining the tags, so if a third-party library wants to drop-in replace that package (say, a third-party json encoder) it seems reasonable to expect it to conform to that new API in a timely manner (after all, support for structured tags in json could be known at least 3 months in advance to any third-party replacement, who could then start adding that support themselves guarded by build tags). There'll probably still be some percentage of cases where it makes sense to use both - but IMO it's fine to just expect those cases to put both kinds of tags into their structs and keep them in sync.

At the end of the day, I don't really think the goal here should be to prescribe how packages actually end up performing that move in all eventualities. It should really just be about making a reasonable path possible.

Loading

@urandom
Copy link
Author

@urandom urandom commented Jun 22, 2020

@bminer
only the fName structure tag would be used, regardless, since your stringified tag is invalid anyway :D (hence the proposal itself)
but if it were valid, the theoretical json library would ideally discard any stringified tags when structured tags exist. E.g.: if reflect.StructField.Tags contains a value that matches any of its types, it would not bother looking at the old reflect.StructField.Tag

Loading

@bminer
Copy link

@bminer bminer commented Jun 23, 2020

@urandom - Ha! Nice catch. I didn't realize the tag was invalid. :) Anyway, I think that I agree with your convention: ignore string-based tags entirely if structured tags exist. This can lead to strange situations, though...

Suppose the BSON library does not support structured tags yet. So we write:

type Person struct {
  FirstName string `bson:"firstName"` [json.Name("firstName")]
}

Then, suddenly the author adds support for structured tags and pushes a minor version release. The code above might silently break because the string-based tag will be ignored (by convention) by the BSON library.

Anyway, maybe this is worth discussing further? Sadly, since Go does not support structured tags and unstructured tags won't be going anywhere, I think it's really important to understand how we migrate between the two.

@Merovius - you are right that stringification can be weird, but current libraries have their own opinions about the namespace already. For example, for the sake of brevity, one writes json:"fieldName", not encoding/json:"fieldName". So, certainly namespace collisions are possible with unstructured tags. Again, I don't want to dive into the weeds too much either here, but it's worth considering exactly how the Go community would move forward (end users and library creators alike) if structured tags became a thing.

Loading

@bminer
Copy link

@bminer bminer commented Jun 23, 2020

As a follow-up to my last comment, perhaps the convention should be:

If a pertinent structured tag is used, all unstructured tags should be ignored.

In the example above, there are no BSON-specific structured tags, so unstructured tags would get processed. If there were 1 or more BSON-specific structured tags, the BSON encoder library would ignore all unstructured tags. I also like this approach because linters can warn users when both tag types are being used (at least for commonly-used libraries like encoding/json).

Still, using the word "pertinent" makes for rather loose language, and as a computer scientist, I don't like it much.

Loading

@urandom
Copy link
Author

@urandom urandom commented Jun 23, 2020

@bminer, with your bson example, your code would have to contain a single bson.Tag() for the bson library to start ignoring string tags. It's not enough that it contains any tags at all. So you, as a writer, will have to make the change yourself as well.

Loading

@Merovius
Copy link

@Merovius Merovius commented Jun 23, 2020

@bminer my point was exactly that you are diving too far into the weeds. What to do if there are both is exactly relevant, if a) there have to be both and b) if they have differences. IMO a) won't be the case in >99% of use-cases and thus for b) it's fine to rely on "don't do that then".

IMO you are complicating the design enormously for very little benefit - trying to address something that I believe to be entirely uncommon. And not just that, IMO the case you worry about even gets worse. If there are some rules about the compiler sometimes rewriting tags or them sometimes being ignored, then if someone sees that their encoding-process (or whatever) doesn't work as expected, they need to worry about and try to understand those rules too. If there are simply two APIs, one for typed and one for string tags, then thinking through how to move from one to the other is a pretty simple matrix - does the declaration contain typed/string tags yes/no and does the consumer site use the typed tag API yes/no. If a package wants to move to typed packages, it just has to look at this matrix and decide which cases are relevant to its particular use-case and decide if it's happy with what will happen. If, however, there is some magical system in place to translate from one system to the other, that will have to be understood and taken into account, which will make the decision much harder, what to do. And if some user wonders why their encoding (or whatever) is broken, they also have to understand this system - and users already have problems understanding the current string-tag syntax on its own.

IMO, if you just treat them as separate systems, it is pretty clear what happens when you add typed tags to a struct - code will simply continue to work as before, until it explicitly adopts the typed tag API. And it's pretty clear what happens when you remove string tags from a struct - code that is still relying on the string tag API will break. That's clear, it's simple and it gives a good migration path.

Note, that without any magic there is a pretty clear and obvious strategy for tag-consumers to adopt typed tags that makes the problem you identified vanish: Look for typed tags you are interested in, if you can't find any, use the string-tag. When bson in your example above adopts typed tags, it doesn't matter that there are some typed tags and a string-tag - it will ignore the json tags and not find one of the type it's interested in and thus use the string-tag. Even the case mentioned where you might have independent tag-consumers works fine under this strategy - if there is a struct-declaration that cares about interoperability with multiple such packages, it can just leave both the typed and the string-tag there. Any tag-consumer that knows about typed tags will ignore the string-tag altogether and a tag-consumer that doesn't will ignore the typed ones (obviously).

Loading

@bminer
Copy link

@bminer bminer commented Jun 23, 2020

I think we have all reached the same conclusion: a library should ignore all string tags if a relevant structured tag exists.

@urandom Perhaps this should be included in the original proposal for clarity?

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet