# openp

this notebook implements open-pseudonymiser in jupyter

## dependencies

### java

as installed by [scoop](https://scoop.sh)

```sh
scoop bucket add java
scoop install java/zulu-jdk
```

### ant

if you want to build the jar (hint, you should be able to base your tests on the current codebase, so it's worth building the libray jar)

```sh
scoop install main/ant
```

... then simply run 

```
ant
```

### pyjnius

a python library that uses JNI (java native interface) to use java code

```sh
pip install pyjnius
```

## code

### python imports

In [1]:
import os
from pathlib import Path
# pip install pyjnius
import jnius_config


### getting OpenPseudonymiserCryptoLib.jar

you can either build OpenPseudonymiserCryptoLib.jar from the codebase by running

```sh
ant
```

... or download it

In [2]:
def set_java_lib(jar_path):
  """set location of OpenPseudonymiserCryptoLib.jar 
  note - you will need to have built (or downloaded) it first

  Args:
      jar_path (string): path to lib jar

  Returns:
      string: location
  """  
  jnius_config.set_classpath(".", jar_path)
  return jar_path

### import the `crypto` lib from the jar

In [3]:
def import_java_crypto():
  """import the crypto lib from java
  """
  from jnius import autoclass
  Crypto = autoclass("OpenPseudonymiser.Crypto")
  crypto = Crypto()
  return crypto

### use encrypted salt

give the crypto lib an ecryption salt that has itself been encrypted

In [4]:
def set_encrypted_salt(salt_path, crypto):
  """ usually ./mackerel.EncryptedSalt"

  Args:
      salt_path (string): path to encrypted salt
      crypto (object): the java crypto class

  Returns:
      object: the java crypto class
  """
  from jnius import autoclass
  File = autoclass("java.io.File")
  file = File(salt_path)
  crypto.SetEncryptedSalt(file)
  return crypto

### test

this test creates a dataset, and tests that encrypting a known dataset with a salt encrypted by someone else (trusted keyholder) produces the same encrypted output

In [5]:
def test_crypto(crypto):
  from jnius import autoclass
  Map = autoclass("java.util.Map")
  TreeMap = autoclass("java.util.TreeMap")
  treeMap = TreeMap()
  treeMap.put("DOB", "29.11.1973")
  treeMap.put("NHSNumber", "943 476 5919")

  # test encryption matches expected outcome 
  try:
      assert "ED72F814B7905F3D3958749FA90FE657C101EC657402783DB68CBE3513E76087" == crypto.GetDigest(treeMap)
      return("test passed", crypto.GetDigest(treeMap))
  except(Exception):
      return(Exception)


### run the test

In [6]:
# set crypto lib
jar_path = os.path.join(
    Path(os.getcwd()).absolute()
    , "dist"
    , "*"
)
try:
    # only needed to run this once
    set_java_lib(jar_path)
except:
    # print('note: couldn\'t set java lib in JVM, likely because it is already running')
    pass

# get crypto class
crypto = import_java_crypto()  

# set salt
salt_path = "./mackerel.EncryptedSalt"
set_encrypted_salt(salt_path, crypto)

# some test data
print(test_crypto(crypto))


('test passed', 'ED72F814B7905F3D3958749FA90FE657C101EC657402783DB68CBE3513E76087')


### encrypt columns with PII

In [7]:
# create a sample DataFrame
import pandas as pd
from jnius import autoclass

data: dict[str, list[str]] = {
    'upn': ['A012345678901','A012345678902'],
    'school': ['school x', 'school y'],
    'forename': ['forname x', 'forname y'],
    'surname': ['surname x', 'surnme y'],
    'door_number': ['1a', '2b'],
    'postcode': ['FO0 8AR', 'FO0 8AZ']
}
df = pd.DataFrame(data)

def encrypt_dataframe(df, salt, columns_to_encrypt):
    
    try:
        # only needed to run this once
        set_java_lib(jar_path)
    except:
        # print('note: couldn\'t set java lib in JVM, likely because it is already running')
        pass

    # instantiate a crypto
    crypto = import_java_crypto()  
    # give it a salt ...an encrypted salt
    set_encrypted_salt(salt, crypto)
    TreeMap = autoclass("java.util.TreeMap")


    # convert a key value pair to TreeMapp and encrypt it
    def encrypt_field(key, val):
        field = TreeMap()
        field.put(key, val)
        return crypto.GetDigest(field)

    # cycle through the df and encrypt required fields
    for col in columns_to_encrypt:
        df[col] = df[col].apply(lambda x: encrypt_field(col, x))
    return df

ed = encrypt_dataframe(df, salt="./mackerel.EncryptedSalt", columns_to_encrypt = ['upn','forename','surname', 'door_number'])
ed.head()


Unnamed: 0,upn,school,forename,surname,door_number,postcode
0,C28751DE095AED7F05DF04D09FF596827D6D8B5E8F8715...,school x,4D228D5303398735BF3887FA107C2A39D6456166053D82...,A2B570BD5BA407D6D3100CA7637E84D0CEC3163322413D...,9BC76C53885BB2515773FC29B7BEB19BFCCDA241F17F79...,FO0 8AR
1,6E55FDB4F05886727797BFC5B97A4EC0F3048881815DFD...,school y,9F5EEB0D7D9B4D19AB1418D2CBEABE08EE3F4013FE56AC...,CC1E5AE140716C0CC1C67DBAF3DC2588888C5800537B8C...,342E4977E7102ED61AA227BF38F593875A5A68F3D8E7B0...,FO0 8AZ
