Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialise, deserialise; for pickle, deepcopy #72

Closed
DreamingRaven opened this issue Jun 1, 2020 · 22 comments · Fixed by #109
Closed

Serialise, deserialise; for pickle, deepcopy #72

DreamingRaven opened this issue Jun 1, 2020 · 22 comments · Fixed by #109
Assignees
Labels
Priority: 2 - High 😰 Should be fixed as quickly as possible, ideally within the current or following sprint Type: New Feature ➕ Introduction of a completely new addition to the codebase

Comments

@DreamingRaven
Copy link
Contributor

DreamingRaven commented Jun 1, 2020

Feature Description

I feel it would be vital to be able to deep-copy objects like context, private key, and ciphertext.
Similarly It will also be vital to real use to be able to pickle, and unpickle, context, private key, and ciphertext, or at the least save to a file-like object, but not necessarily a file itself.

Is your feature request related to a problem?

I would like to save directly to a non local system such as a database the necessary objects to encrypt, evaluate, and decrypt ciphertext, without having to write to the local filesystem, thus I need to create a file-like object to store elsewhere. Similarly to this end this will involve serialisation and desirialisation, probably from pickle, which currentley does not work with the pybin11 bindings here. This also prevents things like python deep-copying any of the tenseal objects which is necessary under certain use cases like several workers copying from the same object to evaluate/ compute some function.

What alternatives have you considered?

Saving to a file and loading that file to a database; time consuming, IO intensive, bottlenecked, and not easily scalable.

Additional Context

Here is a unit test showcasing the current inability to pickle, and deep-copy, unless I am misunderstanding how it is to be done here:
test.py

import copy
import unittest


class seal_tests(unittest.TestCase):
    """Unit test class aggregating all tests for the seal class."""

    def get_context_bfv(self):
        import tenseal as ts
        context = ts.context(ts.SCHEME_TYPE.BFV, poly_modulus_degree=4096,
                             plain_modulus=1032193)
        return context

    def test_copy_context_bfv(self):
        context = self.get_context_bfv()
        copy.deepcopy(context)

    def get_keys_bfv(self):
        context = self.get_context_bfv()
        sk = context.secret_key()
        return (context, sk)

    def test_copy_keys_bfv(self):
        context, sk = self.get_keys_bfv()
        copy.deepcopy(sk)

    def get_ciphertext(self):
        import tenseal as ts
        context = self.get_context_bfv()
        plain_vector = [60, 66, 73, 81, 90]
        encrypted_vector = ts.bfv_vector(context, plain_vector)
        return encrypted_vector

    def test_copy_ciphertext(self):
        ciphertext = self.get_ciphertext()
        copy.deepcopy(ciphertext)


if __name__ == "__main__":
    # run all the unit-tests
    unittest.main()

when run python3 ./test.py

EEE
======================================================================
ERROR: test_copy_ciphertext (__main__.seal_tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 36, in test_copy_ciphertext
    copy.deepcopy(ciphertext)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_tenseal_cpp.BFVVector' object

======================================================================
ERROR: test_copy_context_bfv (__main__.seal_tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 16, in test_copy_context_bfv
    copy.deepcopy(context)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_tenseal_cpp.TenSEALContext' object

======================================================================
ERROR: test_copy_keys_bfv (__main__.seal_tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 25, in test_copy_keys_bfv
    copy.deepcopy(sk)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_tenseal_cpp.SecretKey' object

----------------------------------------------------------------------
Ran 3 tests in 0.048s

FAILED (errors=3)

@DreamingRaven DreamingRaven added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Jun 1, 2020
@youben11
Copy link
Member

youben11 commented Jun 1, 2020

We also believe that this is a vital feature to have, we aim at providing serialization ASAP, but this requires planning a serialization strategy and a discussion around different possibilities like the ones you mentioned above, which is something that we can't start in the following days, but definitely an upcoming feature.

@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jun 1, 2020

Cheers, thanks for the lightning quick response.
Glad to hear it was already on your minds, im trying to forward FHE even if only slightly in whatever way I can with my limited pybind know-how. Let me know if there is a good way to further this, as I have seen multiple projects battling this very issue and im keen to attain this ability for some research projects in particular.

@youben11
Copy link
Member

youben11 commented Jun 1, 2020

Our serialization would extend SEAL's serialization but we should also setup a strategy for our TenSEALContext as it can contain different object at a certain time, other more simple objects like CKKSVector are pretty straightforward. I haven't digged into the details but I guess there is an issue as well regarding dealing with file-like objects on both python and cpp worlds. We can discuss those challenges in more details if you are up to looking into them, but you can expect from our side to have things done during July as a late date.

@DreamingRaven
Copy link
Contributor Author

Thanks @youben11 for letting me know. I will leave this issue open for now/ for anything further relating to this. Cheers.

@youben11
Copy link
Member

youben11 commented Jun 1, 2020

Thanks @DreamingRaven for your inputs

@youben11 youben11 added Priority: 2 - High 😰 Should be fixed as quickly as possible, ideally within the current or following sprint Type: New Feature ➕ Introduction of a completely new addition to the codebase and removed Type: New Feature ➕ Introduction of a completely new addition to the codebase labels Jul 5, 2020
@bcebere
Copy link
Member

bcebere commented Jul 14, 2020

I will work on this one

@bcebere bcebere self-assigned this Jul 14, 2020
@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jul 14, 2020

While hacky I have created seal->python serialisation and deserialisation here primarily for CKKS by dictionaries:

https://github.com/DreamingRaven/python-reseal/blob/39f6b250d18d62cbff1b185a10d52f12eac316e9/fhe/reseal.py#L24-L77

and I use them in a meta object here: https://github.com/DreamingRaven/python-reseal/blob/39f6b250d18d62cbff1b185a10d52f12eac316e9/fhe/reseal.py#L173-L232

this allows me to do everything I need, including saving the serialised intermediates to databases deep copying etc. However since it uses file intermediaries from SEAL its a complete hack. But it may give some semblance of help I hope.

NOTE: my pybind11 bindings are likely different to yours as my bindings are from huelse's seal-python repository but if they are consistently named I dont think that should add too much confusion. And this repository is super early stage still so not a lot of documentation has been added, sorry.

@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jul 14, 2020

The key thing I have found from doing this is that we don't need to make everything serialisable, only the primitives, parameters, ciphertext and keys. The rest (workers like keygen and encoder etc) can be regenerated on the fly as you can see in the above mentioned meta object ReSeal.

E.G context is always rebuildable from parameters consistently, so even if the context object is rebuilt every time between generating keys and ciphertext etc the output is still properly decryptable. So I combine this with a Caching scheme to speed things up so we dont have to rebuild it every time, and just discard the cache when its time to serialise, so we get the best of having the objects stored for speed, but the ability to store a relative subset of the objects that are minimally required to rebuild the rest.

EDIT: I realise now though that this might be at a higher level than you guys will end up tackling it, but hey its something maybe to help.

@youben11
Copy link
Member

@DreamingRaven Bogdan just made TenSEAL context serializable using pickle. Whenever we add the feature for serializing CKKSVector then we will be able to implement client/server apps!

Please let us know how you feel about it :)

@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jul 18, 2020

@youben11 thanks for letting me know. I will give context serialization a spin when I get back to my desk and let you know if there are any issues..

Hopefully this can replace my existing workaround in my research, which would be amazing.

I don't think its going to be very long before we start seeing FHE more and more in the field to start tackling very sensitive mostly untouched data. Its just a shame I wish I could find a way to add bootstrapping to MS-SEAL for arbitrary depth computation albeit with added noise, as that I believe is the big barrier for many libraries to be used in encrypted deep learning as a service applications.

@youben11
Copy link
Member

Indeed bootstrapping can let us compute circuits of arbitrary depth, however, its computation cost is still impractical as far as I know. Many research have been focusing on the use of Leveled-HE with optimized circuits and batching for implementing practical use cases.

Looking forward for your feedback on serialization!

@DreamingRaven
Copy link
Contributor Author

Hey apologies for the delay. I tried out the same test file from my initial post to see if the context was now serialisable and deserialisable in python. However I have been having some issues with this:

I still get:

ERROR: test_copy_context_bfv (__main__.seal_tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 16, in test_copy_context_bfv
    copy.deepcopy(context)
  File "/usr/local/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_tenseal_cpp.TenSEALContext' object

im not sure if I have missed something with this, I will need to take a closer look to find out why. I am running from inside the latest openmined/tenseal docker container, I presume that is kept up to date with master.
I will try building the container next to double check.

@youben11
Copy link
Member

youben11 commented Jul 22, 2020

latest seems to be attached to the last release which doesn't have the serialization feature, we will look further why it's the case. By the mean time, you can use the one with latest-py38 tag, which is up to date with master, or build it locally as you suggested.

cc @philomath213

@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jul 22, 2020

Hey @youben11 serialisation test (for context) has passed by building the docker container from master, for my own purposes as well I will later confirm that the serialized objects can be easily bound to http requests and inserted into MongoDB (the latter I don't foresee being an issue). So all good there!

minor dockerfile comment:

On a related note, I suspect that the documentation and Dockerfiles are from different times, as the documentation says:

Use Docker
You can use our Docker image for a ready to use environment with TenSEAL installed

$ docker container run --interactive --tty openmined/tenseal
You can also build your custom image, this might be handy for developers working on the project

$ docker image build --tag tenseal .

However that last command does not use the -f flag to specify the dockerfile name since they are not named exactly "Dockerfile" and rather Dockerfile-py3x such as:

docker build -t archer/tenseal . -f Dockerfile-py38

Also I noted the dockerfiles attempt to copy the current directory into the containers however since they are in a subdirectory all that will be copied if one changes directory to run them will be the dockerfiles themselves and failing to build. I presume they were moved into a subdirectory after the documentation was written. To my undertstanding the docker context only reaches into subdirectories and not parent directories meaning either the dockerfiles need to be moved to the parent directory or need to be called with the context being in the top level directory of the project (of which I am not to certain of the specific command).

So as a quick fix so I could test I just copied one to the parent directory and ran the following from the same directory:

docker build -t archer/tenseal . -f Dockerfile-py38 && docker run -it archer/tenseal

If I can find the correct invocation to put the docker context in the top level directory I will put in a minor documentation edit so the command works as expected without any tweaks.

@DreamingRaven
Copy link
Contributor Author

Actually I may draft a pull request to use a more normal top level Dockerfile, .dockerignore, and docker-compose; to see if you prefer that to the current setup. Although its not a huge issue either way, it just might throw some people who don't necessarily use docker a lot.

@philomath213
Copy link
Member

Thank you @DreamingRaven for your notes, as @youben11 said, the latest tag is attached to the latest release (v0.1.0) with python 3.8.
If you want to use the image that is attached to the master branch you can use latest-py38 tag.
Regarding the documentation on building docker image, it's kinda outdated, the instruction should be:
docker image build --tag tenseal -f docker-images/Dockerfile-py38 .
Make sure to run the commend withing the repository root directory, otherwise you should specify the building context path
Example, if the working directory is docker-images the building context path should be .. :
docker image build --tag tenseal -f Dockerfile-py38 ..

@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jul 22, 2020

Hey @philomath213 thanks for getting back to me, I created a minor PR a few minutes ago #111 just to patch the documentation, to use the new command similar to the one you listed. I also took the liberty of adding a .dockerignore as well for good practice.

EDIT: I think i prefer your order of options, I might swap it around in the PR

@DreamingRaven
Copy link
Contributor Author

DreamingRaven commented Jul 22, 2020

Let me know when there is any progress on serialization of keys and ciphertext, I will create a unittest PR for verifying this functionality in the mean time since I dont feel confident id be able to implement it at the C++ level. Thanks for putting up with my ramblings, and for progressing this guys.

@youben11
Copy link
Member

Thank you @DreamingRaven ! The PR #109 is under review and should be merged in the following days.

@lunan0320
Copy link

Well, the same problem as you when I want to share the tenseal CKKS context in some processes in Federated Learning. I guess deep_copy()'s problem is similar.
I solve the problem by write the context into a file, other processes just need to read the context.
But you need to notice that, the context.serialize() method is public by default. If you want to save the secret key, please set the parameter 'save_secret_key=True' in serialize()
Hope it is useful to you.

@Ksieber26
Copy link

@lunan0320 Hey, currently struggeling with a similar problem. Do you also use the FL Framework Flower ?

@lunan0320
Copy link

@lunan0320 Hey, currently struggeling with a similar problem. Do you also use the FL Framework Flower ?
Not yet. I use the torch.multiprocessing package, and implemented my FL system in multiprocess method. If you have the similar problem, you can try to save the CKKS context to a file, which doesn't affect the Flower Framework you used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 2 - High 😰 Should be fixed as quickly as possible, ideally within the current or following sprint Type: New Feature ➕ Introduction of a completely new addition to the codebase
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants