Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: how to create a BQ schema #5833

Closed
laurentsimon opened this issue Mar 31, 2022 · 4 comments · Fixed by #5877
Closed

bigquery: how to create a BQ schema #5833

laurentsimon opened this issue Mar 31, 2022 · 4 comments · Fixed by #5877
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.

Comments

@laurentsimon
Copy link

Hi

I've been trying to create a BQ schema to use with the bq from a Go structure:

func GenerateBQSchema(t interface{}) (string, error) {
}

and use it as input to the bq command to create/update tables:

bq mk --table   ...   myschema.schema

The only way I found to create this schema is to generate it myself. The schema needed seems to be the non-exported structure defined in

type bigQueryJSONField struct {

I was wondering if I missed something and it's already feasible to generate the schema, or whether this is a missing feature which would be nice to have.

Can you advise?

@laurentsimon laurentsimon added the triage me I really want to be triaged. label Mar 31, 2022
@codyoss codyoss added api: bigquery Issues related to the BigQuery API. and removed triage me I really want to be triaged. labels Mar 31, 2022
@codyoss codyoss changed the title packagename: short description of feature request bigquery: how to create a BQ schema Mar 31, 2022
@codyoss codyoss added the type: question Request for information or clarification. Not an issue. label Mar 31, 2022
@shollyman
Copy link
Contributor

Not sure I understand the question. Take a look at InferSchema if you want to generate a schema for a Go datastructure: https://pkg.go.dev/cloud.google.com/go/bigquery#InferSchema

If you're trying to mapping unexported fields, this likely needs custom code.

@laurentsimon
Copy link
Author

laurentsimon commented Mar 31, 2022

InferSchema generates some different schema when I tried, and was not accepted by the bq command to generate tables. All the fields in the structure are public.
Is there a different command to use to create table with the results of InferSchema?

@shollyman
Copy link
Contributor

Oh, I think I understand now. The "schema file" used by CLI is just the underlying API's representation of the schema. We have support via SchemaFromJSON for parsing the API representation into a Schema type, but don't expose the converse.

If you're asking about how to create a table with the library and InferSchema, take a look at the examples:

func main() {
	ctx := context.Background()
	// Infer table schema from a Go type.
	schema, err := bigquery.InferSchema(Item{})
	if err != nil {
		// TODO: Handle error.
	}
	client, err := bigquery.NewClient(ctx, "project-id")
	if err != nil {
		// TODO: Handle error.
	}
	t := client.Dataset("my_dataset").Table("new-table")
	if err := t.Create(ctx,
		&bigquery.TableMetadata{
			Name:           "My New Table",
			Schema:         schema,
			ExpirationTime: time.Now().Add(24 * time.Hour),
		}); err != nil {
		// TODO: Handle error.
	}
}

type Item struct {
	Name  string
	Size  float64
	Count int
}

If you mean using this in other tools, could you elaborate more about the workflow you're trying to build? It seems odd to use the library for all the schema detection and then use the CLI for the actual resource creation, but I may just not be understanding.

@laurentsimon
Copy link
Author

laurentsimon commented Mar 31, 2022

right, thanks. I actually just use the CLI. I ended up coding it as follows (which works):

func generateSchema(schema bigquery.Schema) []bigQueryJSONField {
	var bqs []bigQueryJSONField
	for _, fs := range schema {
		bq := bigQueryJSONField{
			Description: fs.Description,
			Name:        fs.Name,
			Type:        string(fs.Type),
			Fields:      generateSchema(fs.Schema),
		}
		// https://github.com/googleapis/google-cloud-go/blob/bigquery/v1.30.0/bigquery/schema.go#L125

		switch {
		// Make all fields optional to give us flexibility:
		// discard `fs.Required`.
		// An alternative would be to let the caller
		// use https://pkg.go.dev/cloud.google.com/go/bigquery#Schema.Relax.
		case fs.Repeated:
			bq.Mode = "REPEATED"
		default:
			bq.Mode = "NULLABLE"
		}

		bqs = append(bqs, bq)
	}

	return bqs
}

// GenerateBQSchema generates the BQ schema in JSON format.
// Can be used to generate a BQ table:
// `bq mk --table    ...  the.schema`.
// The structure `t` must be annotated using BQ fields:
// a string `bigquery:"name"`.
func GenerateBQSchema(t interface{}) (string, error) {
	schema, err := bigquery.InferSchema(t)
	if err != nil {
		return "", fmt.Errorf("bigquery.InferSchema: %w", err)
	}
	jsonFields := generateSchema(schema)

	jsonData, err := json.Marshal(jsonFields)
	if err != nil {
		return "", fmt.Errorf("json.Marshal: %w", err)
	}
	return string(jsonData), nil
}

but it would make more sense for your repo to have this, rather than me keeping up with possible changes to the schema.

shollyman added a commit to shollyman/google-cloud-go that referenced this issue Apr 8, 2022
This PR does two things: It enhances SchemaFromJSON to
work directly with the underlying TableFieldSchema messages
from the discovery API definition, and adds a FormatJSONFields
method to Schema to export the same format consumed by SchemaFromJSON.

With this, we're able clear up the existing internal duplicate logic
for this special case, and we manage to address two different feature
requests at the same time.

Fixes: googleapis#5833
Fixes: googleapis#5867
shollyman added a commit that referenced this issue Apr 12, 2022
* feat(bigquery): enhance SchemaFromJSON

This PR does two things: It enhances SchemaFromJSON to
work directly with the underlying TableFieldSchema messages
from the discovery API definition, and adds a FormatJSONFields
method to Schema to export the same format consumed by SchemaFromJSON.

With this, we're able clear up the existing internal duplicate logic
for this special case, and we manage to address two different feature
requests at the same time.

Fixes: #5833
Fixes: #5867


Co-authored-by: Steffany Brown <30247553+steffnay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants