Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: inconsistency in handling of repeated nested values and repeated values in valueMap #4950

Closed
jameshartig opened this issue Oct 5, 2021 · 5 comments · Fixed by #7315
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@jameshartig
Copy link

jameshartig commented Oct 5, 2021

Client

bigquery

Environment

N/A

Go Environment

$ go version
go version go1.17 windows/amd64
$ go env

set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\james\AppData\Local\go-build
set GOENV=C:\Users\james\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Users\james\go\pkg\mod
set GONOPROXY=gerrit.levenlabs.com,gitlab.com/levenlabs
set GONOSUMDB=gerrit.levenlabs.com,gitlab.com/levenlabs
set GOOS=windows
set GOPATH=C:\Users\james\go;
set GOPRIVATE=gerrit.levenlabs.com,gitlab.com/levenlabs
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.17
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\Users\james\Dropbox\aftermath\backend\change-recorder\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\james\AppData\Local\Temp\go-build2223974634=/tmp/go-build -gno-record-gcc-switches

Code

e.g.

package main

import (
	"context"
	"errors"
	"fmt"
	"reflect"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

func main() {
	c, err := bigquery.NewClient(context.Background(), "project")
	if err != nil {
		panic(err)
	}

	iter, err := c.Query("SELECT ARRAY<string>[] as a, ARRAY<STRUCT<name string>>[] as b").Read(context.Background())
	if err != nil {
		panic(err)
	}
	for {
		vals := map[string]bigquery.Value{}
		if err := iter.Next(&vals); err != nil {
			if errors.Is(err, iterator.Done) {
				break
			}
		}
		fmt.Println(reflect.ValueOf(vals["a"]).IsNil())
		fmt.Println(reflect.ValueOf(vals["b"]).IsNil())
	}
}

Expected behavior

I expect the values to be nil in both cases of empty arrays.

Actual behavior

The repeated nested values are empty slices instead of nil.

Screenshots

N/A

Additional context

This can be fixed with:

--- a/bigquery/value.go
+++ b/bigquery/value.go
@@ -75,7 +75,11 @@ func loadMap(m map[string]Value, vals []Value, s Schema) {
                        v = m2
                default: // repeated and nested
                        sval := val.([]Value)
-                       vs := make([]Value, len(sval))
+
+                       var vs []Value
+                       if len(sval) > 0 {
+                               vs = make([]Value, len(sval))
+                       }
                        for j, e := range sval {
                                m2 := map[string]Value{}
                                loadMap(m2, e.([]Value), f.Schema)
@jameshartig jameshartig added the triage me I really want to be triaged. label Oct 5, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Oct 5, 2021
@shollyman shollyman added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Oct 6, 2021
@shollyman
Copy link
Contributor

Your example query projects empty arrays as the result, so the behavior is expected. However, you could just as easily do something like project CAST(NULL AS ARRAY<STRING>) and the expectation might be that you get a null, but you still get the empty array, which is the more surprising result.

The oddity is noted over here in https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type

BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.

I think if you want to alter this behavior, it would be achievable via a custom type that implements its own ValueLoader.

Let me know if I'm misunderstanding your request.

@jameshartig
Copy link
Author

jameshartig commented Oct 6, 2021

@shollyman I was mostly concerned with the discrepancy between []string and the []struct where the first one is set as nil but the second is set as an empty slice. This inconsistency requires us to look into what type of array it is so we know what to expect.

They should both be nil or they should both be empty.

@jameshartig
Copy link
Author

@shollyman Did what I say above make sense? I can try to clarify further.

@shollyman
Copy link
Contributor

Thanks for the clarifications; I misread some of the details of the initial report.

@codyoss
Copy link
Member

codyoss commented Jul 18, 2023

@shollyman is there something actionable to do with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
4 participants