Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigQuery] fqtn is not valid if project name contains '-' #49

Closed
amirbtb opened this issue Mar 3, 2022 · 10 comments
Closed

[BigQuery] fqtn is not valid if project name contains '-' #49

amirbtb opened this issue Mar 3, 2022 · 10 comments
Labels
bug Something isn't working

Comments

@amirbtb
Copy link

amirbtb commented Mar 3, 2022

Hello,

The fqtn of the table I want to get is following this pattern : my-awesome-project.schema.table.
I tried to get it using rql.dataset(fqtn="my-awesome-project.schema.table") but I get a [ValueError: my-awesome-project.schema.table is not a well-formed fqtn]().
It seems that the validate_fqtn() function is applying this regex \w+\.\w+\.\w+ that isn't accepting my GCP project name pattern.
Is there a way to make this work without changing my GCP project name ?

Thank you for this awesome package, I can't wait to try it ! ❤️ 🚀

@griffatrasgo
Copy link
Contributor

@amirbtb Thanks for opening this issue and using the package! Nice find on my lazy regex. I'll get a patch out tonight to loosen the logic to unblock you. Will ping you here in a bit when it's ready to go.

@griffatrasgo
Copy link
Contributor

@amirbtb i raised the PR above to fix this for you. This loosens the logic of what constitutes a "valid" fqtn. If my testing is correct, it should support your project with dashes. Admittedly, I swung the pendulum pretty far in the other direction in this regex change. I'm sure we'll need to ratchet it back down in the future, but getting you unblocked tonight feels like a better north star than writing regex that will conquer the unknown.

We'll shoot to get a new version out tomorrow, but I pushed an alpha that you can access now if you want to test it out: pip install rasgoql==1.0.2a1

Let us know if you run into any other issues 🍻

@amirbtb
Copy link
Author

amirbtb commented Mar 3, 2022

Thank you for this fix (1.0.2a1), it seems that it accepts my project name now but I'm getting another error :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/src/test.ipynb Cell [1]()2' in <cell line: 1>()
----> 1[ ds = rql.dataset(fqtn="my-awesome-project.schema.table")

File /usr/local/lib/python3.8/dist-packages/rasgoql/main.py:58, in RasgoQL.dataset(self, fqtn)
     ]()[51](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=50)[ def dataset(
     ]()[52](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=51)[         self,
     ]()[53](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=52)[         fqtn: str
     ]()[54](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=53)[     ) -> Dataset:
     ]()[55](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=54)[     """
     ]()[56](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=55)[     Returns a Dataset connected to the Cloud Data Warehouse
     ]()[57](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=56)[     """
---> ]()[58](file:///usr/local/lib/python3.8/dist-packages/rasgoql/main.py?line=57)[     return Dataset(fqtn, self._dw)

File /usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py:128, in Dataset.__init__(self, fqtn, dw)
    ]()[126](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=125)[ self.table_state = TableState.UNKNOWN
    ]()[127](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=126)[ self.is_rasgo = False
--> ]()[128](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=127)[ self._dw_sync()

File /usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py:62, in require_dw.<locals>.wrapper(*args, **kwargs)
     ]()[60](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=59)[ if not self._dw:
     ]()[61](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=60)[     raise NotImplementedError(f'{func.__name__} method is only available for classes instantiated with a DW connection')
---> ]()[62](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=61)[ return func(*args, **kwargs)

File /usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py:143, in Dataset._dw_sync(self)
    ]()[141](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=140)[ if obj_exists:
    ]()[142](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=141)[     self.table_state = TableState.IN_DW
--> ]()[143](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=142)[     self.table_type = TableType[obj_type]
    ]()[144](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=143)[     self.is_rasgo = is_rasgo_obj
    ]()[145](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=144)[ else:

File /usr/lib/python3.8/enum.py:387, in EnumMeta.__getitem__(cls, name)
    ]()[386](file:///usr/lib/python3.8/enum.py?line=385)[ def __getitem__(cls, name):
--> ]()[387](file:///usr/lib/python3.8/enum.py?line=386)[     return cls._member_map_[name]

KeyError: 'EXTERNAL']()

I'm not sure of the cause but the EXTERNAL KeyError made me think that it may be due to the fact that I'm trying to get an External Table.

Do you want me to open another issue or is it related ?

@griffatrasgo
Copy link
Contributor

aha! Great find @amirbtb. It looks like we'll need to open up our TableType enum to accept temporary and external tables. All of our initial testing was done on base tables and views in SF and BQ, so I'm not sure if this will break something downstream. I'll do some testing on this today with a goal to support external tables ASAP. Will ping you here when we have a patch ready to test (hoping today).

@griffatrasgo
Copy link
Contributor

Hi @amirbtb thanks for hanging in there. Here's an alpha that should support external tables: pip install rasgoql==1.0.2a2.
This touched a few more parts of the code than I was expecting so there may still be one or two gremlins to discover. Feel free to take it for a spin and let us know if you hit any other hurdles.
I'll target getting this out as an actual release later tonight. Cheers 🍻

@amirbtb
Copy link
Author

amirbtb commented Mar 3, 2022

Hi @griffatrasgo thank you for your quick fixes !
The method ds = rql.dataset(fqtn="my-awesome-project.schema.table") works fine and I get my external table as a RasgoQL dataset. I can access some methods and attributes (.get_schema(), .namespace) but when I try ds.preview() or df = ds.to_df() I get this error (example for ds.preview() but the final Invalid project ID 'None'... error is always the same) :

---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
/src/test.ipynb Cell [1]
----> 1[ ds.preview()

File /usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py:64, in require_dw.<locals>.wrapper(*args, **kwargs)
     ]()[62](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=61)[ if not self._dw:
     ]()[63](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=62)[     raise NotImplementedError(f'{func.__name__} method is only available for classes instantiated with a DW connection')
---> ]()[64](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=63)[ return func(*args, **kwargs)

File /usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py:76, in require_materialized.<locals>.wrapper(*args, **kwargs)
     ]()[74](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=73)[ if self.table_state != TableState.IN_DW.value:
     ]()[75](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=74)[     raise NotImplementedError(f'{func.__name__} method is only available for tables that exist in the DataWarehouse')
---> ]()[76](file:///usr/local/lib/python3.8/dist-packages/rasgoql/utils/decorators.py?line=75)[ return func(*args, **kwargs)

File /usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py:163, in Dataset.preview(self)
    ]()[157](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=156)[ @require_dw
    ]()[158](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=157)[ @require_materialized
    ]()[159](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=158)[ def preview(self) -> pd.DataFrame:
    ]()[160](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=159)[     """
    ]()[161](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=160)[     Return a pandas DataFrame of top 10 rows
    ]()[162](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=161)[     """
--> ]()[163](file:///usr/local/lib/python3.8/dist-packages/rasgoql/primitives/transforms.py?line=162)[     return self._dw.preview(f'SELECT * FROM {self.fqtn}')

File /usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py:425, in BigQueryDataWarehouse.preview(self, sql, limit)
    ]()[411](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=410)[ def preview(
    ]()[412](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=411)[         self,
    ]()[413](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=412)[         sql: str,
    ]()[414](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=413)[         limit: int = 10
    ]()[415](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=414)[     ) -> pd.DataFrame:
    ]()[416](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=415)[     """
    ]()[417](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=416)[     Returns 10 records into a pandas DataFrame
    ]()[418](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=417)[ 
   (...)
    ]()[423](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=422)[         Records to return
    ]()[424](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=423)[     """
--> ]()[425](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=424)[     return self.execute_query(
    ]()[426](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=425)[         f'{sql} LIMIT {limit}',
    ]()[427](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=426)[         response='df',
    ]()[428](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=427)[         acknowledge_risk=True
    ]()[429](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=428)[     )

File /usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py:280, in BigQueryDataWarehouse.execute_query(self, sql, response, acknowledge_risk)
    ]()[278](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=277)[     return self._query_into_dict(sql)
    ]()[279](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=278)[ if response == 'DF':
--> ]()[280](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=279)[     return self._query_into_pandas(sql)
    ]()[281](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=280)[ return self._execute_string(sql, ignore_results=(response == 'NONE'))

File /usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py:643, in BigQueryDataWarehouse._query_into_pandas(self, query)
    ]()[636](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=635)[     return self.connection.query(
    ]()[637](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=636)[         query,
    ]()[638](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=637)[         job_config=self._default_job_config
    ]()[639](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=638)[         ) \
    ]()[640](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=639)[         .result() \
    ]()[641](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=640)[         .to_dataframe()
    ]()[642](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=641)[ except Exception as e:
--> ]()[643](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=642)[     self._error_handler(e)

File /usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py:552, in BigQueryDataWarehouse._error_handler(self, exception, query)
    ]()[544](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=543)[ if isinstance(exception, gcp_exc.ServiceUnavailable):
    ]()[545](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=544)[     raise DWConnectionError(
    ]()[546](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=545)[         'BigQuery is unavailable. Please check that your are using '
    ]()[547](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=546)[         'valid credentials, that you have internet access, and '
   (...)
    ]()[550](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=549)[         'for outage status.'
    ]()[551](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=550)[     ) from exception
--> ]()[552](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=551)[ raise exception

File /usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py:636, in BigQueryDataWarehouse._query_into_pandas(self, query)
    ]()[632](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=631)[ """
    ]()[633](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=632)[ Return results of query in a pandas DataFrame
    ]()[634](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=633)[ """
    ]()[635](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=634)[ try:
--> ]()[636](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=635)[     return self.connection.query(
    ]()[637](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=636)[         query,
    ]()[638](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=637)[         job_config=self._default_job_config
    ]()[639](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=638)[         ) \
    ]()[640](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=639)[         .result() \
    ]()[641](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=640)[         .to_dataframe()
    ]()[642](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=641)[ except Exception as e:
    ]()[643](file:///usr/local/lib/python3.8/dist-packages/rasgoql/data/bigquery.py?line=642)[     self._error_handler(e)

File /usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py:3391, in Client.query(self, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry)
   ]()[3388](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3387)[     else:
   ]()[3389](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3388)[         return query_job
-> ]()[3391](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3390)[ future = do_query()
   ]()[3392](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3391)[ # The future might be in a failed state now, but if it's
   ]()[3393](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3392)[ # unrecoverable, we'll find out when we ask for it's result, at which
   ]()[3394](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3393)[ # point, we may retry.
   ]()[3395](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3394)[ if not job_id_given:

File /usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py:3368, in Client.query.<locals>.do_query()
   ]()[3365](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3364)[ query_job = job.QueryJob(job_ref, query, client=self, job_config=job_config)
   ]()[3367](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3366)[ try:
-> ]()[3368](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3367)[     query_job._begin(retry=retry, timeout=timeout)
   ]()[3369](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3368)[ except core_exceptions.Conflict as create_exc:
   ]()[3370](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3369)[     # The thought is if someone is providing their own job IDs and they get
   ]()[3371](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3370)[     # their job ID generation wrong, this could end up returning results for
   ]()[3372](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3371)[     # the wrong query. We thus only try to recover if job ID was not given.
   ]()[3373](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=3372)[     if job_id_given:

File /usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py:1297, in QueryJob._begin(self, client, retry, timeout)
   ]()[1277](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1276)[ """API call:  begin the job via a POST request
   ]()[1278](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1277)[ 
   ]()[1279](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1278)[ See
   (...)
   ]()[1293](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1292)[     ValueError: If the job has already begun.
   ]()[1294](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1293)[ """
   ]()[1296](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1295)[ try:
-> ]()[1297](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1296)[     super(QueryJob, self)._begin(client=client, retry=retry, timeout=timeout)
   ]()[1298](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1297)[ except exceptions.GoogleAPICallError as exc:
   ]()[1299](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1298)[     exc.message = _EXCEPTION_FOOTER_TEMPLATE.format(
   ]()[1300](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1299)[         message=exc.message, location=self.location, job_id=self.job_id
   ]()[1301](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/query.py?line=1300)[     )

File /usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py:510, in _AsyncJob._begin(self, client, retry, timeout)
    ]()[507](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=506)[ # jobs.insert is idempotent because we ensure that every new
    ]()[508](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=507)[ # job has an ID.
    ]()[509](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=508)[ span_attributes = {"path": path}
--> ]()[510](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=509)[ api_response = client._call_api(
    ]()[511](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=510)[     retry,
    ]()[512](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=511)[     span_name="BigQuery.job.begin",
    ]()[513](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=512)[     span_attributes=span_attributes,
    ]()[514](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=513)[     job_ref=self,
    ]()[515](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=514)[     method="POST",
    ]()[516](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=515)[     path=path,
    ]()[517](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=516)[     data=self.to_api_repr(),
    ]()[518](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=517)[     timeout=timeout,
    ]()[519](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=518)[ )
    ]()[520](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/job/base.py?line=519)[ self._set_properties(api_response)

File /usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py:782, in Client._call_api(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)
    ]()[778](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=777)[ if span_name is not None:
    ]()[779](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=778)[     with create_span(
    ]()[780](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=779)[         name=span_name, attributes=span_attributes, client=self, job_ref=job_ref
    ]()[781](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=780)[     ):
--> ]()[782](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=781)[         return call()
    ]()[784](file:///usr/local/lib/python3.8/dist-packages/google/cloud/bigquery/client.py?line=783)[ return call()

File /usr/local/lib/python3.8/dist-packages/google/api_core/retry.py:283, in Retry.__call__.<locals>.retry_wrapped_func(*args, **kwargs)
    ]()[279](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=278)[ target = functools.partial(func, *args, **kwargs)
    ]()[280](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=279)[ sleep_generator = exponential_sleep_generator(
    ]()[281](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=280)[     self._initial, self._maximum, multiplier=self._multiplier
    ]()[282](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=281)[ )
--> ]()[283](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=282)[ return retry_target(
    ]()[284](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=283)[     target,
    ]()[285](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=284)[     self._predicate,
    ]()[286](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=285)[     sleep_generator,
    ]()[287](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=286)[     self._deadline,
    ]()[288](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=287)[     on_error=on_error,
    ]()[289](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=288)[ )

File /usr/local/lib/python3.8/dist-packages/google/api_core/retry.py:190, in retry_target(target, predicate, sleep_generator, deadline, on_error)
    ]()[188](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=187)[ for sleep in sleep_generator:
    ]()[189](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=188)[     try:
--> ]()[190](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=189)[         return target()
    ]()[192](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=191)[     # pylint: disable=broad-except
    ]()[193](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=192)[     # This function explicitly must deal with broad exceptions.
    ]()[194](file:///usr/local/lib/python3.8/dist-packages/google/api_core/retry.py?line=193)[     except Exception as exc:

File /usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py:480, in JSONConnection.api_request(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout)
    ]()[469](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=468)[ response = self._make_request(
    ]()[470](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=469)[     method=method,
    ]()[471](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=470)[     url=url,
   (...)
    ]()[476](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=475)[     timeout=timeout,
    ]()[477](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=476)[ )
    ]()[479](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=478)[ if not 200 <= response.status_code < 300:
--> ]()[480](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=479)[     raise exceptions.from_http_response(response)
    ]()[482](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=481)[ if expect_json and response.content:
    ]()[483](file:///usr/local/lib/python3.8/dist-packages/google/cloud/_http/__init__.py?line=482)[     return response.json()

BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/my-awesome-project/jobs?prettyPrint=false: Invalid project ID 'None'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.

Location: None
Job ID: f16c9225-458e-5846-8683-g8c8g515b2f5]()

It looks like the values for Location and/or the project ID are lost in the process.

@griffatrasgo
Copy link
Contributor

Hi @amirbtb thanks for sharing. I think Rasgo oughta send you a cake with all this super testing you're doing 😉

I am able to reproduce this locally under only one condition. when I don't include a project or dataset in the credentials class. Maybe this is what you're doing as well?

So this does not produce the error:

creds = rasgoql.BigQueryCredentials(
    json_filepath="/Users/griff/.../client_secrets.json",
    project="rasgo",
    dataset="public"
)
rql = rasgoql.connect(creds)

But this does:

creds = rasgoql.BigQueryCredentials(
    json_filepath="/Users/griff/.../client_secrets.json"
)
rql = rasgoql.connect(creds)

If this is what you're doing, the easy fix is to include your project and dataset in the credentials class when connecting. I don't remember why we made those optional, but it looks like we're missing logic to support when they're not passed. I can go down that rabbiit hole, but do you want to check if the above fix unblocks you?

@griffatrasgo griffatrasgo added the bug Something isn't working label Mar 4, 2022
@amirbtb
Copy link
Author

amirbtb commented Mar 4, 2022

Hello @griffatrasgo you are totally right, it works fine when I include the project and dataset in the credentials class. Thanks for the quick fix 👌🏽 !

Initially I didn't want to define the credentials class with project and schema because we use multiple BigQuery databases (1 GCP project = 1 BigQuery database) and multiple datasets.
It means that every time I need to change BigQuery database/dataset, I need to redefine the credentials class with the right database/dataset. That can be suboptimal when trying to join multiple tables in different databases/datasets.
It may be useful to be able to define the credentials class without project/database so we can provide them later in the process.

@griffatrasgo
Copy link
Contributor

Great, I'm glad we got you unblocked @amirbtb. Thanks for explaining your use case, this makes sense.

In current state, rasgoql prefers a default connection so it knows where to write views and tables to. Many of our early testers needed to select from certain schemas they only had read access to and write all SQL transforms to a different schema.

If you find yourself needing to switch between namespaces mid-project, we do support one way to do this. On a SQLChain, you can use the .change_namespace() method to switch to a different namespace before you save. This will effectively change the namespace you run select statements and write the final view to. It should still respect the fact that your original dataset was from a different namespace.

I'll consider the need to run in multiple namespaces and see if we can build more helpful experiences to support that. If you have ideas, please keep the feedback coming.

If it's alright with you, may I consider this issue resolved and close it?

@amirbtb
Copy link
Author

amirbtb commented Mar 5, 2022

Yes, sorry @griffatrasgo you unblocked me on the exact issue a couple of days ago.

Thank you for taking into account this use case and for the change_namespace() tip !

@amirbtb amirbtb closed this as completed Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants