Big Query data source #638

gruuya · 2022-02-25T14:34:18Z

Add another remote data source plugin, this time for GCP's Big Query.

CU-26udw0h

mildbyte · 2022-02-25T14:42:27Z

splitgraph/ingestion/big_query/__init__.py

+    credentials_schema: Dict[str, Any] = {
+        "type": "object",
+        "properties": {
+            "credentials_path": {


This should support passing in credentials as a JSON string instead of a file path, since we won't be able to use a path when e.g. adding this data source from a Web form (and this schema is used to generate it).

The commandline ergonomics will be awkward but we can add a special from_commandline method that loads and injects the JSON file when invoked from sgr mount, e.g. https://github.com/splitgraph/splitgraph/blob/c37291267ad60d085703b4a3068a8f39a70d2d7d/splitgraph/ingestion/csv/__init__.py#L299-L305

I've now added the JSON string optional credentials parameter, and implemented from_commandline conversion.

mildbyte · 2022-02-25T14:43:13Z

splitgraph/ingestion/big_query/__init__.py

+            },
+            "dataset_name": {
+                "type": "string",
+                "title": "Big Query dataset",


It's branded as BigQuery -- can you change it in the descriptions, as well as change the plugin name / package names to bigquery instead of big_query?

Certainly, I was split about that as well.

Convert json file creds parameter to the raw param when present. Also, align all entity names to bigquery, without underscore.

mildbyte · 2022-02-28T08:52:08Z

splitgraph/ingestion/bigquery/__init__.py

+
+    @classmethod
+    def get_name(cls) -> str:
+        return "Google Big Query"


Suggested change

return "Google Big Query"

return "Google BigQuery"

mildbyte · 2022-02-28T08:52:15Z

splitgraph/ingestion/bigquery/__init__.py

+
+    @classmethod
+    def get_description(cls) -> str:
+        return "Query data in GCP Big Query datasets"


Suggested change

return "Query data in GCP Big Query datasets"

return "Query data in GCP BigQuery datasets"

mildbyte · 2022-02-28T08:57:22Z

splitgraph/ingestion/bigquery/__init__.py

+    credentials_schema: Dict[str, Any] = {
+        "type": "object",
+        "properties": {
+            "credentials": {


There's (currently) no point in letting users of the JSONSchema (which is used in form generation) to pass credentials via a path. I think this could be simplified to treat the commandline-passed credential string as a path and the one passed via __init__ as a JSON-serialized credential.

JSONSchema:

"credentials": { "type": "string", "title": "GCP credentials", "description": "GCP credentials in JSON format", }

commandline:

$ sgr mount bigquery bq -o@- <<EOF { "credentials": "/path/to/my/creds.json", "project": "my-project-name", "dataset_name": "my_dataset" } EOF

...

credentials = Credentials({}) with open(params.pop("credentials"), "r") as credentials_file: credentials_str = credentials_file.read() params.pop("credentials") credentials["credentials"] = credentials_str

gruuya added 3 commits February 25, 2022 09:36

Add Big Query data source

4885725

Encode the GCP credentials into the connection string

eb88fde

Add basic tests for Big Query data source

0b51635

gruuya requested a review from mildbyte February 25, 2022 14:34

gruuya self-assigned this Feb 25, 2022

Add Big Query lib installation to debug Dockerfile

87a18a8

mildbyte approved these changes Feb 25, 2022

View reviewed changes

gruuya added 2 commits February 25, 2022 21:20

Add raw credentials passing option

ec4e137

Convert json file creds parameter to the raw param when present. Also, align all entity names to bigquery, without underscore.

Fix comment and docstring

97e8cae

mildbyte approved these changes Feb 28, 2022

View reviewed changes

Simplify credentials schema for BigQuery plugin

0a6f297

gruuya merged commit b4a7e77 into master Feb 28, 2022

gruuya deleted the add-big-query-data-source-cu-26udw0h branch February 28, 2022 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big Query data source #638

Big Query data source #638

gruuya commented Feb 25, 2022

mildbyte Feb 25, 2022

gruuya Feb 25, 2022 •

edited

mildbyte Feb 25, 2022

gruuya Feb 25, 2022

mildbyte Feb 28, 2022

mildbyte Feb 28, 2022

mildbyte Feb 28, 2022

	return "Query data in GCP Big Query datasets"
	return "Query data in GCP BigQuery datasets"

Big Query data source #638

Big Query data source #638

Conversation

gruuya commented Feb 25, 2022

mildbyte Feb 25, 2022

Choose a reason for hiding this comment

gruuya Feb 25, 2022 • edited

Choose a reason for hiding this comment

mildbyte Feb 25, 2022

Choose a reason for hiding this comment

gruuya Feb 25, 2022

Choose a reason for hiding this comment

mildbyte Feb 28, 2022

Choose a reason for hiding this comment

mildbyte Feb 28, 2022

Choose a reason for hiding this comment

mildbyte Feb 28, 2022

Choose a reason for hiding this comment

gruuya Feb 25, 2022 •

edited