Skip to content
This repository has been archived by the owner on Apr 17, 2024. It is now read-only.

How to get keyset representation that is compatible with BigQuery? #373

Closed
thnee opened this issue Jun 18, 2020 · 5 comments
Closed

How to get keyset representation that is compatible with BigQuery? #373

thnee opened this issue Jun 18, 2020 · 5 comments

Comments

@thnee
Copy link

thnee commented Jun 18, 2020

The BigQuery documentation says this:

The returned keyset is a serialized BYTES representation of google.crypto.tink.Keyset that contains a primary cryptographic key and no additional keys. You can use the keyset with the AEAD.ENCRYPT, AEAD.DECRYPT_BYTES, and AEAD.DECRYPT_STRING functions for encryption and decryption, as well as with the KEYS group of key- and keyset-related functions.

How can I, in my python application, create a keyset that is compatible?

I have tried the following:

import io
import base64
import json

import tink
from tink import aead

aead.register()

keyset_handle = tink.new_keyset_handle(aead.aead_key_templates.AES256_GCM)
aead_primitive = keyset_handle.primitive(aead.Aead)

binary_out = io.BytesIO()
writer = tink.BinaryKeysetWriter(binary_out)
keyset_handle.write(writer, aead_primitive)
binary_out.seek(0)
print(base64.b64encode(binary_out.read()))

json_out = io.StringIO()
writer = tink.JsonKeysetWriter(json_out)
keyset_handle.write(writer, aead_primitive)
json_out.seek(0)
print(json.loads(json_out.read())["encryptedKeyset"])

Which produces the following output.

b'Eo0BAUOoGAhu4F/2fK1pF0k9CPcuWyncFuRSDblIQzSrjhJbPlAEwKsMhkeRmJkzJWcK8pRHBgpHwb1Y2DPdKfLYOTIfKkoWJ2AckDAnljYx3Vfwh589VQ74lb+d35wvTVeiA3OBTk6hBu7V/1hiEpuOOlUiK3ivgHN4Qfdk6y0PGcMVmCWXhNrYqCE2VDGgGkQIiLCgnQQSPAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EAEYiLCgnQQgAQ=='
AUOoGAhzRqjrxMvz2ACwKDpZ1rMIgYJxLBGIlESsOtETTRBOkixU0Pr09kIb6WWnH7R4ntxIU3d1/ER4ZBVN4p2JdveVk+GlRkAUhZEtOtXQ7CHGtHoiG6L6sXJQFt4uj9379xhZJKbN5KVI1gMXsDpCQ7eBVEb0xKF6FuoKWe2i3krNDRfmPXhZYcKg

Neither of these seem to be compatible with BigQuery.

For reference, the BigQuery KEYS.NEW_KEYSET('AEAD_AES_GCM_256') function returns something that looks like this:

CInj7qENEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EiIaIJscUWVL43EmlceRRWMcCkkFgRXW/fBxsm7NHFJWZaipGAEQARiJ4+6hDSAB

This value has a different length, and if I try to use the values from my Python code in BigQuery, it just gives an error like this: AEAD.DECRYPT_STRING failed: Keyset deserialization failed: Error reading keyset data: Could not parse the input stream as a Keyset-proto.

I understand that my current code is wrong. But what would be the correct way to do this?

@thnee
Copy link
Author

thnee commented Jun 18, 2020

There is also KEYS.ADD_KEY_FROM_RAW_BYTES() function in BigQuery, which takes bytes of length of either 16 or 32.

So I thought maybe keyset_handle._keyset.key[0].key_data.value could be used for that, and it's close, but not quite right, it's 34 bytes.

@tholenst
Copy link
Contributor

I never tried this, but most likely you should use BinaryKeysetWriter but no base64 encoding (just the binary encoding).

@thnee
Copy link
Author

thnee commented Jun 18, 2020

I am only base64 encoding it to be able to copy it and paste it in this issue and in bigquery. When loading the value in bigquery I do parse it with FROM_BASE64().

@tholenst
Copy link
Contributor

Ah ok, I see now.

The code you have creates a new Aead, and encrypts the keyset with this. You will not be able to read it like this.

Instead, you should write it using CleartextKeysetHandle.

@thnee
Copy link
Author

thnee commented Jun 18, 2020

Yeah I just tried doing this, and it actually seems to work. I can't say I understand it, but after many hours of attempts I will gladly accept it as a solution :)

out = io.BytesIO()
writer = tink.BinaryKeysetWriter(out)
cleartext_keyset_handle.write(writer, keyset_handle)
out.seek(0)
print(base64.b64encode(out.read()))

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants