Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmp: add Or #60204

Closed
earthboundkid opened this issue May 15, 2023 · 80 comments
Closed

cmp: add Or #60204

earthboundkid opened this issue May 15, 2023 · 80 comments

Comments

@earthboundkid
Copy link
Contributor

An extremely common string operation is testing if a string is blank and if so replacing it with a default value. I propose adding First(...strings) string to package strings (and probably an equivalent to bytes for parity, although it is less useful).

// First returns the first non-blank string from its arguments.
func First(ss ...string) string {
	for _, s := range ss {
		if s != "" {
			return s
		}
	}
	return ""
}

Here are three example simplifications from just archive/tar because it shows up first alphabetically when I searched the standard library:

archive/tar diff
diff --git a/src/archive/tar/reader.go b/src/archive/tar/reader.go
index cfa50446ed..bc3489227f 100644
--- a/src/archive/tar/reader.go
+++ b/src/archive/tar/reader.go
@@ -136,12 +136,8 @@ func (tr *Reader) next() (*Header, error) {
 			if err := mergePAX(hdr, paxHdrs); err != nil {
 				return nil, err
 			}
-			if gnuLongName != "" {
-				hdr.Name = gnuLongName
-			}
-			if gnuLongLink != "" {
-				hdr.Linkname = gnuLongLink
-			}
+			hdr.Name = strings.First(gnuLongName, hdr.Name)
+			hdr.Linkname = strings.First(gnuLongLink, hdr.Linkname)
 			if hdr.Typeflag == TypeRegA {
 				if strings.HasSuffix(hdr.Name, "/") {
 					hdr.Typeflag = TypeDir // Legacy archives use trailing slash for directories
@@ -235,13 +231,8 @@ func (tr *Reader) readGNUSparsePAXHeaders(hdr *Header) (sparseDatas, error) {
 	hdr.Format.mayOnlyBe(FormatPAX)
 
 	// Update hdr from GNU sparse PAX headers.
-	if name := hdr.PAXRecords[paxGNUSparseName]; name != "" {
-		hdr.Name = name
-	}
-	size := hdr.PAXRecords[paxGNUSparseSize]
-	if size == "" {
-		size = hdr.PAXRecords[paxGNUSparseRealSize]
-	}
+	hdr.Name = strings.First(hdr.PAXRecords[paxGNUSparseName], hdr.Name)
+	size := strings.First(hdr.PAXRecords[paxGNUSparseSize], hdr.PAXRecords[paxGNUSparseRealSize])
 	if size != "" {
 		n, err := strconv.ParseInt(size, 10, 64)
 		if err != nil {
diff --git a/src/archive/tar/writer.go b/src/archive/tar/writer.go
index 1c95f0738a..e9c635a02e 100644
--- a/src/archive/tar/writer.go
+++ b/src/archive/tar/writer.go
@@ -188,10 +188,7 @@ func (tw *Writer) writePAXHeader(hdr *Header, paxHdrs map[string]string) error {
 		var name string
 		var flag byte
 		if isGlobal {
-			name = realName
-			if name == "" {
-				name = "GlobalHead.0.0"
-			}
+			name = strings.First(realName, "GlobalHead.0.0")
 			flag = TypeXGlobalHeader
 		} else {
 			dir, file := path.Split(realName)
@gopherbot gopherbot added this to the Proposal milestone May 15, 2023
@seankhliao
Copy link
Member

Duplicate of #14423

@seankhliao seankhliao marked this as a duplicate of #14423 May 15, 2023
@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2023
@earthboundkid
Copy link
Contributor Author

@seankhliao, good find, but that issue was frozen before the modern proposal process existed. Either that issue should be reopened and put through the proposal process or this one should be unclosed, but I don't think it's fair to call it a duplicate when the old one was never actually evaluated.

@seankhliao
Copy link
Member

The idea was clearly evaluated in the previous issue and declined.
The decisions we made before the proposal process are just as valid.

@earthboundkid
Copy link
Contributor Author

I don't think it's fair to call 5 comments "clearly evaluated." The reception was mixed. Abbgrade was for it. Minux was against it. Bradfitz was neutral to positive on the idea if there was more data.

It ends with @griesemer saying,

This is not an issue, this is a feature request. Please discuss this first on one of the popular Go forums (mailing list, etc.).

I don't think he would have said "go discuss it somewhere else" if that discussion was precluded from having an effect because the issue was permanently closed once and for all. I think the idea was "go discuss it more and if it comes up again we can take another look." Now we have a formal process, so it's time to take a look. :-)

@seankhliao
Copy link
Member

It was quite clear it doesn't belong in strings, and the natural place for it be in now, slices, also has the similar idea being declined in #52006

@earthboundkid
Copy link
Contributor Author

It doesn't work in slices because it would need a T comparable, which is confusing, or to be a find func, which as you note was already declined. Just because it could be a generic doesn't mean it should be. :-) I've been using my personal stringutils.First for years and for me it's above the bar to get it into the standard library. Maybe I'm wrong, but I think it's worth having a discussion.

@ianlancetaylor
Copy link
Contributor

I agree that the earlier issue didn't get a full proposal review. We can do it again.

That said, finding some more examples would help justify adding this.

And, in general the strings and bytes package are parallel. What would this look like in bytes, and would anybody use that variant?

@earthboundkid
Copy link
Contributor Author

earthboundkid commented May 16, 2023

That said, finding some more examples would help justify adding this.

I've been using a version of First for at least three years that I can recall, and I'm up to 19 uses in a 13,000 line project. It's pretty routinely useful for me. (It might go back further, and I've just forgotten the history of it.)

Going back to the examples from archive/tar above, I think there's a readability gain in hdr.Name = strings.First(gnuLongName, hdr.Name) and name = strings.First(realName, "GlobalHead.0.0"), because you can tell quickly tell what the preferred value is and what's the fallback default, whereas in the old code the first example was set with if gnuLongName != "" and the second was set with if realName == "".

What would this look like in bytes, and would anybody use that variant?

I suppose it should be First(...[]byte) []byte, but I agree that it is unlikely to be used much, since the main use is to set a default for a string value.

@earthboundkid
Copy link
Contributor Author

earthboundkid commented May 16, 2023

I’m doing some very basic searching on SourceGraph to find versions of this in the wild.

  • A version exists in kubernetes and kubectl although they don't seem to be used very much.
  • Istio has several implementations of it. There is public version used 15 times. Two are identical unexported versions of them in two main packages 1 2. Also there’s a variation hardcoded to return "profile" for blank.
  • Grafana also has more than one implementation of it.
  • SeaweedFS has 19 uses of their version of it.
  • /x/net/http has getAnyEnv, which is similar but works specifically for ENV vars.
  • Moby (Docker) also has a version of getEnvAny
  • Rancher has seven uses of theirs, and also a getEnvAny
  • Go-Resty uses theirs twice
  • A Chinese repo for a config management db has 16 uses of their
  • Goreleaser uses theirs three times
  • Hugo has a default template helper that is an any interface version of this
  • Go-Fiber has a version of this with a more awkward API.
  • Chezmoi uses theirs four times
  • Revel has one in their public API but doesn’t use it
  • Cert Manager a variation where it is hardcoded to turn "" into <none>
  • Pomerium only uses theirs once
  • Terratest has an env var version
  • StackExchange has a dns controller that is hard coded to return "FAIL" for blank
  • A load tester called death by 1000 needles has one
  • Vagrant has an envDefault, but doesn’t seem to use it.

Okay, that’s as much looking at search results as I feel like doing now. If anyone can do a more semantic search over a larger corpus, I would be interested to see the results. One thing that surprised me was how often a repo would have multiple versions of it. Also the env var default thing comes up a lot.

Edit: Couldn't help myself, and I found another one in Istio 😆 Gotta force myself to close the tab before I go crazy.

@ianlancetaylor
Copy link
Contributor

That's great data, thanks.

@jimmyfrasche
Copy link
Member

This can be written for any comparable type using generics today

func First[T comparable](vs ...T) T {
  var zero T
  for _, v := range vs {
    if v != zero {
      return v
    }
  }
  return zero
}

The type constraint could be loosened to any if #26842 gets accepted.

While often for strings, I've written similar for all kinds of types, though I don't think I've ever needed anything other than 2 values at a time.

It's quite common in dealing with configuration where the zero value models an absence to be replaced by a default.

@earthboundkid
Copy link
Contributor Author

I’ve had a toy repo with generic First for several years, but I’ve found that in practice I only ever use strings.

As for varadic vs a pair, most instances are just pairs, but I think the Go optimizer now optimizes the slice away, so you may as well have a variadic version for the occasional times when you need more than two.

@AndrewHarrisSPU
Copy link

It's quite common in dealing with configuration where the zero value models an absence to be replaced by a default.

Neither comparable nor any tightly constrain to types where var zero T is a robust sentinel value for inferring absence. I think this is a problem for a generic First, it's not foolproof enough.

@rsc
Copy link
Contributor

rsc commented May 17, 2023

This is clearly a useful operation, perhaps even useful enough to have in the standard library.

But is First the right name? Is it the name used anywhere else with this meaning?
#53510 proposed slices.First(x) that returns x[0].
If I saw strings.First(x, y, z) I'd probably expect that it returned x
(and wonder what the point was).

In text/template (and also in Lisp and Scheme, where I took it from), the name for this operation is or.

@jimmyfrasche
Copy link
Member

It's also similar to the min/max builtins proposed in #59488 except that the item is selected in a less mathematical and more Go-specific way

@cespare
Copy link
Contributor

cespare commented May 17, 2023

Even if it's mostly used for strings, it really feels not string-related to me and not a good fit for the strings package.

However, I think I would use it quite a bit for not-strings if it existed. There's a certain kind of operation I write regularly in Go which would be written using a ternary in another language. Something like (this is grabbed from some real code):

port := h.GRPCPort
if port == 0 {
	port = 8500
}

With a ternary expression, you might write something like

port := h.GRPCPort == 0 ? 8500 : h.GRPCPort

With a slices-based function, you could do

port := slices.Or(h.GRPCPort, 8500)

I think or works well in Lisp and text/template but slices.Or seems a bit mysterious. But maybe it could work.

A longer, but more self-evident name is slices.FirstNonZero.

@rsc
Copy link
Contributor

rsc commented May 17, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@earthboundkid
Copy link
Contributor Author

However, I think I would use it quite a bit for not-strings if it existed. There's a certain kind of operation I write regularly in Go which would be written using a ternary in another language. Something like (this is grabbed from some real code):

port := h.GRPCPort
if port == 0 {
	port = 8500
}

That's #37165, which also uses a default port as an example. :-)

@earthboundkid
Copy link
Contributor Author

earthboundkid commented May 17, 2023

slices.FirstNonZero would presumably take an actual slice instead of a variadic argument, which is less ergonomic.

I'm fine with the name strings.FirstNonZero though.

@earthboundkid
Copy link
Contributor Author

earthboundkid commented May 17, 2023

I don't think this is necessarily a great idea, but just to consider it, you could have package bools with Or[T comparable](...T) T and Cond[T any](cond bool, ifVal, elseVal T) T. I think that having Cond probably changes the feel of the language too much though because you'd be using a function call in a lot of places that use x = a; if cond { x = b} now.

@AndrewHarrisSPU
Copy link

Even if it's mostly used for strings, it really feels not string-related to me and not a good fit for the strings package.

I guess the follow-up is, what package would it fit in? Looking through #60204 (comment) I think it is pretty surprising that this is so frequently about environment variables (or at least configuration variables with similar usage patterns).

I wonder if there might be a more Glasgow (not New Jersey)-style package for aggregating configuration from program constants, env variables, json/yaml/toml etc., and I think this functionality would probably be natural in that style.

OTOH, applying the New Jersey philosophy, maybe it's not objectionable that people are re-implementing this functionality ad-hoc - it doesn't seem error-prone, and people can be as narrow or as abstract as they want.

@jimmyfrasche
Copy link
Member

One place where comparable is insufficient for this use case is callbacks.

I've written a lot of code like this:

func New(cfg *Cfg) *Thing {
  t := &Thing{
    foo: cfg.Foo
  }
  if t.foo == nil {
    t.foo == fooDefault
  }
  return t
}

If there were an or that handled zero-comparable types (either a builtin or language change that allows it to be written with generics) that would just be:

func New(cfg *Cfg) *Thing {
  return &Thing{
    foo: or(cfg.Foo, fooDefault),
  }
}

@leaxoy
Copy link

leaxoy commented May 18, 2023

A more general scenario. Use Nth-like instead First:

func NthOrZero[T any](elems []T, n int) T {
    if n <= 0 || n >= len(elems) {
        var empty T
        return empty
    }
    return elems[n]
}

and for ok check:

func Nth[T any](elems []T, n int) (T, bool) {
    if n <= 0 || n >= len(elems) {
        var empty T
        return empty, false
    }
    return elems[n], true
}

or if we has Maybe or Optional type:

func Nth[T any](elems []T, n int) Optional[T] {
    if n <= 0 || n >= len(elems) {
        return None()
    }
    return Some(elems[n])
}

@mibk
Copy link
Contributor

mibk commented May 18, 2023

But is First the right name? Is it the name used anywhere else with this meaning? […]

In text/template (and also in Lisp and Scheme, where I took it from), the name for this operation is or.

I'm reminded of the SQL function COALESCE.

Returns the first non-NULL value in the list, or NULL if there are no non-NULL values. At least one parameter must be passed.

@mpx
Copy link
Contributor

mpx commented May 21, 2023

I've found First is quite a common operation beyond strings. In particular, selecting the first error between operation(s) and cleanup (if any). I have used n-ary versions for different types (errors, strings), but 2 is most common. Similar issue for default fallback for ints and other types.

Writing this now, I'd use a generic version similar to @jimmyfrasche 's example above.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/504883 mentions this issue: cmp: add Or

@AndrewHarrisSPU
Copy link

AndrewHarrisSPU commented Jun 22, 2023

@tdakkota, that's very interesting.

A constraint based on underlying types ~string, ~int, etc. for cmp.Or also seems like a tractable implementation.

@ianlancetaylor
Copy link
Contributor

@tdakkota Thanks. I filed #60933.

@hherman1
Copy link

Reading through the comments in #60933, I’m concerned that despite finding a way to prevent implicitly converting to a non zero interface we still won’t have a way to make something like cmp.Or(os.Stdout, io.Discard) work correctly. Even with explicit types, it seems busted.

It makes me wonder if we really want cmp.Or to be generic or if it might be safer to go with something concretely typed.

@earthboundkid
Copy link
Contributor Author

earthboundkid commented Jun 23, 2023

os.Stdout is always going to be non-nil, so it’s hard to judge based on that as an example. For code like this, it should be okay:

type appEnv struct {
  out io.Writer
}

…

if verboseFlag {
  app.out = os.Stdout
}

…

_, err := io.Copy(cmp.Or(app.out, io.Discard), logSrc)

@hherman1
Copy link

Ok, but I don’t think that addresses the issue. If you use a field with an actually nillable *os.File, cmp.Or fails again.

@ianlancetaylor
Copy link
Contributor

@hherman1 There is a long-standing tripping point in Go regarding storing a typed nil in an interface value. There is a FAQ about it: https://go.dev/doc/faq#nil_error. That does mean that instantiating cmp.Or with an interface type can have subtle effects. However, I don't think we should let that fact about the language stand in the way of adopting cmp.Or.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: cmp: add Or cmp: add Or Jun 28, 2023
@rsc rsc modified the milestones: Proposal, Backlog Jun 28, 2023
@mibk
Copy link
Contributor

mibk commented Jul 19, 2023

Recently, I encountered a piece of code where I thought a logical XOR operation could be helpful.

if found && wanted || !found && !wanted {
	// Nothing to do.
	return
}

With the introduction of the package cmp, I imagined that there might be an Xor function:

if cmp.Xor(found, !wanted) {
	// Nothing to do.
	return
}

However, I recalled that cmp.Or will soon be added and realized that the purpose of package cmp doesn't align with this function:

Package cmp provides types and functions related to comparing ordered values.

Since cmp.Xor (which I'm not proposing) would seem to me like a perfect fit for a package named cmp,
I began to question whether "cmp" is then a good name for a package "related to comparing ordered values", and/or whether Or is a good name for a function "returning the first non-zero value of its arguments".

I just wanted to bring it up for discussion.

@earthboundkid
Copy link
Contributor Author

I actually had truthy.Xor in an old version of my truthiness testing package, but I removed it because I wasn’t actually using it.

@jub0bs
Copy link

jub0bs commented Aug 8, 2023

I salute Carl's effort to collect usage data. However, I remain unconvinced that such a functionality warrants an addition to the standard library. Projects that need such an Or/First function can easily (and already do) re-implement it when the need arises.

The semantics are different, but I would keep this function out of the standard library for the same reason I wouldn't want to see the addition of a First (or Last) function to the slices package.

A little copying is better than a little dependency.

@meyermarcel
Copy link
Contributor

os.Stdout is always going to be non-nil, so it’s hard to judge based on that as an example. For code like this, it should be okay:

type appEnv struct {
  out io.Writer
}

…

if verboseFlag { // here is logic (1)
  app.out = os.Stdout
}

…
//                here is also logic inlined (2)
_, err := io.Copy(cmp.Or(app.out, io.Discard), logSrc)

Sorry for commenting very late this issue. I did not react to this proposal because emoji-meter (👍👎) in the description suggested no acceptance.

I understand the use cases found in the collected data. Thanks for finding and summarize them 🙏🏼

My concern is readability and I do not know if cmp.Or changes how are if statements are written.

Can this example also be written like this?

type appEnv struct {
  out io.Writer
}

…

if verboseFlag {
  app.out = os.Stdout // logic is only here (1)
} else {
  app.out = os.Discard // and here (2) and written only with existing if and else
}
…

_, err := io.Copy(app.out, logSrc)

If the previous example can also be written this way, then there is a good example of why this example could fragment the writing of logic and ultimately make it difficult to read because of inconsistency. There are now multiple ways to write this code.

However, if this is very rarely the case, then please ignore my comment.

In the future, should you read cmp.Or in many places where you expect an if or else, that would be a backdoor ternary operator and lead to inconsistent code.

The implications for consistent writeability are unclear to me, and I hope this is considered in this proposal.

@earthboundkid
Copy link
Contributor Author

The app.out example seems to me more like it should be written with hypothetical func Cond(bool, ifVal, elseVal T) T:

type appEnv struct {
  out io.Writer
}

…

app.out = Cond(verboseFlag, os.Stdout, io.Discard)

…

_, err := io.Copy(app.out, logSrc)

Cond would be basically the ?: ternary but as a function and without short circuiting. I agree that having Cond would lead to changes in the Go ecosystem that would affect the experience of reading it. I don't think it's a good fit for Go as it exists. The most clear straightforward to write the code ought to be something like:

type appEnv struct {
  out io.Writer
}

…

app.out = io.Discard
if verboseFlag {
    app.out = os.Stdout
}

…

_, err := io.Copy(app.out, logSrc)

Yes, you could use cmp.Or here, but that seems like more work and less clarity than just using an assignment, so I don't think it will come up too much in practice. We'll see!

@meyermarcel
Copy link
Contributor

Yes, you could use cmp.Or here, but that seems like more work and less clarity than just using an assignment, so I don't think it will come up too much in practice. We'll see!

New code constructions will be introduced slowly and their use will be determined by the developer who does not have so much clarity. If cmp.Or is used incorrectly, we will notice it late and there will be no or only a rocky way back via vet tools. We might be left with worse readability and more inconsistent code. "We'll see!" is too risky for me in the context of long-term changed readability.

But on the other hand I have no concrete examples and if it remains only with the use as with the code examples found thanks to carlmjohnson then cmp.Or is an improvement.

eric pushed a commit to fancybits/go that referenced this issue Dec 23, 2023
Fixes golang#60204

Change-Id: I1234cacf0f25097d034038bcfb33f6630373a057
GitHub-Last-Rev: e9098ed
GitHub-Pull-Request: golang#60931
Reviewed-on: https://go-review.googlesource.com/c/go/+/504883
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

Successfully merging a pull request may close this issue.