Skip to content

Commit

Permalink
Finished release 0.1.4.
Browse files Browse the repository at this point in the history
  • Loading branch information
erikvw committed May 20, 2015
2 parents f3e16a6 + 26f33be commit 3896cf4
Show file tree
Hide file tree
Showing 14 changed files with 143 additions and 77 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,17 @@ Features
Advantages
----------

- encryption keys are automatically created
- unique constraint on encrypted fields: because the hash is stored in the model's db_table and not the secret, the unique=True parameter works as well as the django.form validation messages.
- de-identified dataset: the data analysis team should never need to see PII. They just want a de-identified dataset. A de-identified dataset is one where PII fields are encrypted and others not. With the RSA key removed, the dataset is effectively deidentified.
- datasets from other systems with shared values, such as identity numbers, can be prepared for meta-analysis using the same keys and algorithms;
- to completely obscure the encrypted data, the secret reference table may be dropped before releasing the database.
- by default field classes exist for two sets of keys. You can customize KEY_FILENAMES to create as many sets as needed. With multiple sets of keys you have more control on who gets to see what.
- Automatically creates encryption key sets (RSA, AES and salt) and stores them in the KEY_PATH folder;
- Supports unique constraints and compound constraints that including encrypted fields. The hash is stored in the model's db_table and not the secret. The __unique=True__ and __unique_together__ attributes work as expected;
- The dataset is de-identified at rest. This has many advantages but helps us work well with our analysis team. The data analysis team do not need to see PII. They just want a de-identified dataset. A de-identified dataset is one where PII fields are encrypted and others not. With the RSA keys removed, the dataset is effectively de-identified;
- Datasets from other systems with shared PII values, such as identity numbers, can be prepared for meta-analysis using the same keys and algorithms;
- The dataset can be permanently obscured by dropping the Crypt table from the DB (it has all the secrets);
- By default field classes exist for two sets of keys. You can customize KEY_FILENAMES to create as many sets as needed. With multiple sets of keys you have more control over who gets to see what.

Disadvantages
-------------

- Limited support for lookup types. The "query value" is the hash not the decrypted secret, so Django lookups like ['startswith', 'istartswith', 'endswith', 'iendswith', 'contains', 'icontains', 'iexact'] are not supported.
- Hashing with a secret may be considered less secure than just a "secret". You decide what your requirements are. For systems that collect PII in fields classes from _django-crypto-fields_, we take all the basic security precautions: OS and application-level password protection, Full-Drive encryption, physical security and so on.

Other encrypted field modules are available if you just want to use encrypted field classes in Django models and do not need unique constraints nor plan to join tables on encrypted fields for analysis.
Expand Down
2 changes: 2 additions & 0 deletions django_crypto_fields/classes/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
from .cryptor import Cryptor
from .field_cryptor import FieldCryptor

__all__ = [Cryptor, FieldCryptor]
17 changes: 9 additions & 8 deletions django_crypto_fields/classes/field_cryptor.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,20 +143,21 @@ def verify_ciphertext(self, ciphertext):
ValueError('Malformed ciphertext. Expected prefixes {}, {}'.format(HASH_PREFIX, CIPHER_PREFIX))
try:
if ciphertext[:len(HASH_PREFIX)] != HASH_PREFIX.encode(ENCODING):
raise MalformedCiphertextError('Malformed ciphertext. Expected hash prefix {}'.format(HASH_PREFIX))
raise MalformedCiphertextError(
'Malformed ciphertext. Expected hash prefix {}'.format(HASH_PREFIX))
if (len(ciphertext.split(HASH_PREFIX.encode(ENCODING))[1].split(
CIPHER_PREFIX.encode(ENCODING))[0]) != self.hash_size):
raise MalformedCiphertextError('Malformed ciphertext. Expected hash size of {}.'.format(self.hash_size))
raise MalformedCiphertextError(
'Malformed ciphertext. Expected hash size of {}.'.format(self.hash_size))
except IndexError:
MalformedCiphertextError('Malformed ciphertext.')
return ciphertext

def get_prep_value(self, ciphertext, value):
""" Gets the hash from encrypted value for the DB """
if ciphertext != value:
self.update_cipher_model(ciphertext)
hashed_value = self.get_hash(ciphertext)
return HASH_PREFIX.encode(ENCODING) + hashed_value
def get_query_value(self, ciphertext):
""" Returns the prefix + hash as stored in the DB's table column.
Used by get_prep_value()"""
return ciphertext.split(CIPHER_PREFIX.encode(ENCODING))[0]

def get_hash(self, ciphertext):
"""Returns the hashed_value given a ciphertext or None."""
Expand Down
2 changes: 2 additions & 0 deletions django_crypto_fields/classes/keys.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import copy
import sys

from builtins import FileNotFoundError

from Crypto.Cipher import PKCS1_OAEP
from Crypto.PublicKey import RSA
from Crypto.Util import number
Expand Down
4 changes: 4 additions & 0 deletions django_crypto_fields/exceptions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,9 @@ class EncryptionKeyError(Exception):
pass


class EncryptionLookupError(Exception):
pass


class MalformedCiphertextError(Exception):
pass
5 changes: 5 additions & 0 deletions django_crypto_fields/fields/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,8 @@
from .firstname_field import FirstnameField
from .identity_field import IdentityField
from .lastname_field import LastnameField

__all__ = [
BaseField, EncryptedCharField, EncryptedDateField, EncryptedDecimalField,
EncryptedIntegerField, EncryptedTextField, FirstnameField, LastnameField,
IdentityField]
66 changes: 24 additions & 42 deletions django_crypto_fields/fields/base_field.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
import types

from django.core.exceptions import ValidationError
from django.db import models

from ..classes import FieldCryptor
from ..classes.keys import keys
from ..constants import HASH_PREFIX, ENCODING
from ..exceptions import CipherError, EncryptionError, MalformedCiphertextError
from ..exceptions import CipherError, EncryptionError, MalformedCiphertextError, EncryptionLookupError
from django_crypto_fields.constants import CIPHER_PREFIX


class BaseField(models.Field):
Expand Down Expand Up @@ -55,58 +58,37 @@ def decrypt(self, value):
self.readonly = True # did not decrypt
return decrypted_value

def from_db_value(self, value, expression, connection, context):
def from_db_value(self, value, *args):
if value is None:
return value
return self.decrypt(value)

def to_python(self, value):
if value is None:
if value is None or not isinstance(value, (str, bytes)):
return value
return self.decrypt(value)
value = self.decrypt(value)
return super(BaseField, self).to_python(value)

def get_prep_value(self, value, encrypt=None):
""" Returns the hashed_value with prefix (or None) and, if needed, updates the cipher_model.
Keyword arguments:
encrypt -- if False, the value is returned as is (default True)
"""
if value is None:
def get_prep_value(self, value):
"""Returns the query value."""
value = super(BaseField, self).get_prep_value(value)
if value is None or not isinstance(value, (str, bytes)):
return value
encrypt = True if encrypt is None else encrypt
if encrypt:
ciphertext = self.field_cryptor.encrypt(value)
if ciphertext != value:
self.field_cryptor.update_cipher_model(ciphertext)
value = HASH_PREFIX.encode(ENCODING) + self.field_cryptor.get_hash(ciphertext)
return value
ciphertext = self.field_cryptor.encrypt(value)
return self.field_cryptor.get_query_value(ciphertext)

def get_prep_lookup(self, lookup_type, value):
""" Only decrypts the stored value to handle 'exact' and 'in'
but excepts 'icontains' as if it is 'exact' so that the admin
search fields work.
"""Raises an exception for unsupported lookups.
Also, 'startswith' does not decrypt and may only be used to check for the hash_prefix.
All others are errors.
"""
if lookup_type == 'exact' or lookup_type == 'icontains':
return self.get_prep_value(value)
elif lookup_type == 'isnull':
if type(value) != bool:
raise TypeError(('Value for lookup type \'{0}\' must be a boolean '
'for fields using encryption. Got {1}').format(lookup_type, value))
return self.get_prep_value(value, encrypt=False)
elif lookup_type == 'startswith':
# allow to test field value for the hash_prefix only, NO searching on the hash
if value != HASH_PREFIX:
raise TypeError(('Value for lookup type {0} may only be \'{1}\' for '
'fields using encryption.').format(lookup_type,
HASH_PREFIX))
return self.get_prep_value(value, encrypt=False)
elif lookup_type == 'in':
return [self.get_prep_value(v) for v in value]
else:
raise TypeError('Lookup type %r not supported.' % lookup_type)
Since the available value is the hash, only exact match lookup types are supported."""
if lookup_type in {
'startswith', 'istartswith', 'endswith', 'iendswith',
'contains', 'icontains', 'iexact'
}:
raise EncryptionLookupError(
'Unsupported lookup type for field class {}. Got \'{}\'.'.format(
self.__class__.__name__, lookup_type))
return super(BaseField, self).get_prep_lookup(lookup_type, value)

def get_internal_type(self):
"""This is a Charfield as we only ever store the hash, which is a \
Expand Down
2 changes: 2 additions & 0 deletions django_crypto_fields/mixins/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
from .crypto_mixin import CryptoMixin

__all__ = [CryptoMixin]
2 changes: 2 additions & 0 deletions django_crypto_fields/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
from .crypt import Crypt

__all__ = [Crypt]
1 change: 1 addition & 0 deletions django_crypto_fields/tests/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .test_cryptors import TestCryptors
from .test_models import TestModels
5 changes: 3 additions & 2 deletions django_crypto_fields/tests/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,14 @@
from ..edc.base.models import BaseModel

from ..fields import EncryptedTextField, FirstnameField, IdentityField
from ..mixins import CryptoMixin
from ..mixins.crypto_mixin import CryptoMixin


class TestModel (CryptoMixin, BaseModel):

first_name = FirstnameField(
verbose_name="First Name")
verbose_name="First Name",
null=True)

identity = IdentityField(
verbose_name="Identity",
Expand Down
19 changes: 0 additions & 19 deletions django_crypto_fields/tests/test_fields.py

This file was deleted.

80 changes: 80 additions & 0 deletions django_crypto_fields/tests/test_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
from django.db.utils import IntegrityError
from django.test import TestCase

from ..fields.base_field import BaseField
from ..exceptions import EncryptionLookupError

from .models import TestModel


class TestModels(TestCase):

def test_encrypt_rsa(self):
"""Assert deconstruct."""
test_model = TestModel()
fld_instance = test_model._meta.fields[-1:][0]
name, path, args, kwargs = fld_instance.deconstruct()
new_instance = BaseField(*args, **kwargs)
# self.assertEqual(fld_instance.max_length, new_instance.max_length)

def test_list_encrypted_fields(self):
self.assertEquals(len(TestModel.encrypted_fields()), 3)

def test_blank(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='')
self.assertEqual(1, TestModel.objects.filter(comment='').count())

def test_equals(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='')
self.assertEqual(1, TestModel.objects.filter(first_name='Erik1').count())

def test_null(self):
TestModel.objects.create(identity='11111111', comment='no comment')
self.assertEqual(1, TestModel.objects.filter(first_name__isnull=True).count())

def test_exact(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='')
self.assertEqual(1, TestModel.objects.filter(first_name__exact='Erik1').count())

def test_iexact(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='')
# self.assertEqual(1, TestModel.objects.filter(first_name__iexact='Erik1').count())
self.assertRaises(EncryptionLookupError, TestModel.objects.filter, first_name__iexact='Erik1')

def test_contains(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='')
# self.assertEqual(1, TestModel.objects.filter(first_name__contains='k1').count())
self.assertRaises(EncryptionLookupError, TestModel.objects.filter, first_name__contains='k1')

def test_icontains(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='')
# self.assertEqual(1, TestModel.objects.filter(first_name__icontains='k1').count())
self.assertRaises(EncryptionLookupError, TestModel.objects.filter, first_name__icontains='k1')

def test_in(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='no comment')
TestModel.objects.create(first_name='Erik2', identity='11111112', comment='no comment')
TestModel.objects.create(first_name='Erik3', identity='11111113', comment='no comment')
TestModel.objects.create(first_name='Erik4', identity='11111114', comment='no comment')
self.assertEqual(2, TestModel.objects.filter(first_name__in=['Erik1', 'Erik2']).count())

def test_unique(self):
TestModel.objects.create(first_name='Erik1', identity='11111111', comment='no comment')
TestModel.objects.create(first_name='Erik2', identity='11111112', comment='no comment')
self.assertRaises(IntegrityError, TestModel.objects.create, first_name='Erik1', identity='11111111', comment='no comment')

def test_startswith(self):
TestModel.objects.create(first_name='Eriak1', identity='11111111', comment='no comment')
TestModel.objects.create(first_name='Eriak2', identity='11111112', comment='no comment')
TestModel.objects.create(first_name='Eriek3', identity='11111113', comment='no comment')
TestModel.objects.create(first_name='Eriek4', identity='11111114', comment='no comment')
# self.assertEqual(2, TestModel.objects.filter(first_name__startswith='Eria').count())
self.assertRaises(EncryptionLookupError, TestModel.objects.filter, first_name__startswith='Eria')

def test_endsswith(self):
TestModel.objects.create(first_name='Eriak1', identity='11111111', comment='no comment')
TestModel.objects.create(first_name='Eriak2', identity='11111112', comment='no comment')
TestModel.objects.create(first_name='Eriek3', identity='11111113', comment='no comment')
TestModel.objects.create(first_name='Eriek4', identity='11111114', comment='no comment')
# self.assertEqual(1, TestModel.objects.filter(first_name__endswith='ak2').count())
self.assertRaises(EncryptionLookupError, TestModel.objects.filter, first_name__endswith='ak2')
2 changes: 2 additions & 0 deletions django_crypto_fields/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
from .key_generator import KeyGenerator

__all__ = [KeyGenerator]

0 comments on commit 3896cf4

Please sign in to comment.