Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Hash function for BigQuery #2310

Closed
tswast opened this issue Aug 4, 2020 · 1 comment · Fixed by #2508
Closed

ENH: Hash function for BigQuery #2310

tswast opened this issue Aug 4, 2020 · 1 comment · Fixed by #2508
Labels
expressions Issues or PRs related to the expression API

Comments

@tswast
Copy link
Collaborator

tswast commented Aug 4, 2020

I'd like to implement the hash function for BigQuery (specifically, I'd like to use the BigQuery SHA256 hash function), but I see a couple of questions that need to be answered before implementing.

  • Currently, hash is a ValueExpr, but in BigQuery only BYTES / STRING are acceptable as inputs. Should we implement it in BigQuery as a string expression instead?
  • Currently, hash outputs int64, but BigQuery outputs BYTES (which makes sense since SHA256 generates a 256-bit hash). Would it make sense to change the impala return type to match? Or is it okay to be inconsistent?

Note: other databases tend to align with how BigQuery handles hashes. See:

tswast added a commit to GoogleCloudPlatform/professional-services-data-validator that referenced this issue Sep 3, 2020
This will allow the SHA256 function to run and should be flexible
enough to support other hash functions and backends. `hashbytes` was
chosen as the function name because `hash` conflicts with the existing
function that has a different output type. See:
ibis-project/ibis#2310
@srinathh
Copy link

srinathh commented Sep 5, 2020

There are hash functions that return both bytes & ints (farm_fingerprint) in the bigquery api. It would be great to have both forms supported

https://cloud.google.com/bigquery/docs/reference/standard-sql/hash_functions

@jreback jreback added backends - bigquery expressions Issues or PRs related to the expression API labels Oct 30, 2020
@jreback jreback added this to the Next Feature Release milestone Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants