Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Java
Latest commit b0eece5 Feb 9, 2017 @ellisjoe ellisjoe committed on GitHub manual wire serialization (#44)

README.md

CircleCI Build Status Download

Hadoop Crypto

Hadoop Crypto is a library for per-file client-side encryption in Hadoop FileSystems such as HDFS or S3. It provides wrappers for the Hadoop FileSystem API that transparently encrypt and decrypt the underlying streams. The encryption algorithm uses Key Encapsulation: each file is encrypted with a unique symmetric key, which is itself secured with a public/private key pair and stored alongside the file.

Architecture

The EncryptedFileSystem wraps any FileSystem implementation and encrypts the streams returned by open and close. These streams are encrypted/decrypted by a unique per-file symmetric key which is then passed to the KeyStorageStrategy which stores the key for future access. The provided storage strategy implementation encrypts the symmetric key using a public/private key pair and then stores the encrypted key on the FileSystem with the encrypted file.

Standalone Example

The hadoop-crypto-all.jar can be added to the classpath of any client and used to wrap any concrete backing FileSystem. The scheme of the EncryptedFileSystem is e[FS-scheme] where [FS-scheme] is any FileSystem that can be instantiated statically using FileSystem#get (eg: efile). The FileSystem implementation, public key, and private key must be configured in the core-site.xml as well.

Hadoop Cli

Add hadoop-crypto-all.jar to the classpath of the cli (ex: share/hadoop/common).

Generate public/private keys
openssl genrsa -out rsa.key 2048
# Public Key
openssl rsa -in rsa.key -outform PEM -pubout 2>/dev/null | grep -v PUBLIC | tr -d '\r\n'
# Private Key
openssl pkcs8 -topk8 -inform pem -in rsa.key -outform pem -nocrypt | grep -v PRIVATE | tr -d '\r\n'
core-site.xml
<configuration>
    <property>
        <name>fs.efile.impl</name> <!-- others: fs.es3a.impl or fs.ehdfs.impl -->
        <value>com.palantir.hadoop.StandaloneEncryptedFileSystem</value>
    </property>

    <property>
        <name>fs.efs.key.public</name>
        <value>MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqXkSOcB2UpLrlG3scAHDavPnSucxOwRWG12woY5JerYlqyIm7xcNuyLQ/rLPxdlCGgOZOoPzKVXc/3pAeOdPM1LcXLNW8d7Uht3vo7a6SR/mXMiCTMn+9wOx40Bq0ofvx9K4RSpW2lKrlJNUJG+RP5lO7OhB5pveEBMn/8OR2yMLgS58rHQ0nrXXUHqbWiMI8k+eYK7aimexkQDhIXtbqmQ5tAXKyoSMDAyeuDNY8WsYaW15OCwGSIRClNAiwPEGLQCYJQi41IxwQxwN42jQm7fwoVSrN4lAfi5B8EHxFglAZcE8nUTdTnXCbUk9SPz8XXmK4hmK9X4L+2Av4ucNLwIDAQAB</value>
    </property>

    <property>
        <name>fs.efs.key.private</name>
        <value>MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQCpeRI5wHZSkuuUbexwAcNq8+dK5zE7BFYbXbChjkl6tiWrIibvFw27ItD+ss/F2UIaA5k6g/MpVdz/ekB4508zUtxcs1bx3tSG3e+jtrpJH+ZcyIJMyf73A7HjQGrSh+/H0rhFKlbaUquUk1Qkb5E/mU7s6EHmm94QEyf/w5HbIwuBLnysdDSetddQeptaIwjyT55grtqKZ7GRAOEhe1uqZDm0BcrKhIwMDJ64M1jxaxhpbXk4LAZIhEKU0CLA8QYtAJglCLjUjHBDHA3jaNCbt/ChVKs3iUB+LkHwQfEWCUBlwTydRN1OdcJtST1I/PxdeYriGYr1fgv7YC/i5w0vAgMBAAECggEASvSLhROEwbzNiRadLmT5Q4Kg19YtRgcC9pOXnbzK7wVE3835HmI55nzdpuj7UGxo+gyBZwoZMD0Tw8MUZOUZeH+7ixye5ddCdGwQo34cIl+DiaH9T20/4Yy2zuYc2QTanqyqZ5z0URejX9FRs9PMkC6EY+/NxetGaiGu3UZoalz7F/5wS8bCaKPkm3AjLvqXHL5KiSbPDPBQj4m+iFWLoWZL9FB1zyif+yBatU4cBCLHaTTgXroItEKcxTwFfyi2l059ItoP5E10djKHpMuPiPrTMS0FHAom3GZAYEFnjRgInR0sIotEwuSDObqcio1PdXRsi5Ul8MxfpXxLSuL+UQKBgQDcvmehBARNDksQJGzIyegKg10eLYdfXFCR+QDZeqJod/pCQ6gtW0aFYAoL0uXiMwQzSb6m7offmXH0JLLqOnjgcZlejHUDSTTWtNOYlGaO7OVgFcnG6/UnCE54eJcaw68auvPB9XW3gm5cfWSNpUI+6aJDBb6BKx8uNMoRreq9wwKBgQDEilhsCgUOIRkJfM5MYUzMT0gR8qt671q+lgTjBDwYvdoQ7BijG6Lbqbp9Xd4nODiw1t7e1Rexw+cuIeRs8NITU4f4Nfe25rRhZ+0n7g9OoCiRUoEsmd7cqDk6pubpw9hW1TKKLzTqExisGFy+bnUA8FFs2TbU9Xeb9kdm1GXgJQKBgAsN9f6YRubc+mFakaAUjGxKW9VxDkB2TQqiX6qEe7GjoILFBJ0Q3x06zAX/j8eeKm2vGb8eXuuRsaU6WUNlnjwPNFEJ06pQdjbyY05W0DQEJRCExtARbPuBbPyXfWm3twMtrZtfAYApJgG3vdtiFUk1Rgz5MqshT7RurFfqT8ElAoGAE2BEOVp/hxYSPtI0EGmjRZ0nUMWozDTesF1f2/Wl6xaEchikkSf/VUKVZRik9x7ez+hPDo7ZiCf1GaIzv926CDe69uhzJG/4JoY1ZjNdBPZbKYCFxZzh0MUw5yxfJXquUFkyY1cmE1GQpB6+vfNry4zlqiJ7+mC8yv5rqaKU7JUCgYBXPYpuQppR1EFj66LSrZ8ebXmt5TtwR839UkgEhLOBkO0cFP2BXVAMx9p0/MYLNIPk7vVpVtRCKYr6tBVdUWCin0obC5O+JzuhilQ0aH3xl5mbiasOvCNPjniaTViRt6zNlaq6RMS4x1LqYUyqc4LUrBbGMWJsdjYqVAi1Rq1FTw==</value>
    </property>
</configuration>
Commands
./bin/hadoop dfs -put file.txt efile:/tmp/file.txt
./bin/hadoop dfs -ls efile:/tmp
./bin/hadoop dfs -cat efile:/tmp/file.txt

Programatic Example

Source for examples can be found here

Initialization

KeyPair pair = KeyPairs.generateKeyPair(); // Long lived KeyPair that must be saved
FileSystem fs = FileSystem.get(new URI("file:///"), new Configuration());
KeyStorageStrategy keyStore = new FileKeyStorageStrategy(fs, pair);
FileSystem efs = new EncryptedFileSystem(fs, keyStore);

Writing data using EFS

// Init data and local path to write to
byte[] data = "test".getBytes(StandardCharsets.UTF_8);
byte[] readData = new byte[data.length];
Path path = new Path(folder.newFile().getAbsolutePath());

// Write data out to the encrypted stream
OutputStream eos = efs.create(path);
eos.write(data);
eos.close();

// Reading through the decrypted stream produces the original bytes
InputStream ein = efs.open(path);
IOUtils.readFully(ein, readData);
assertThat(data, is(readData));

// Reading through the raw stream produces the encrypted bytes
InputStream in = fs.open(path);
IOUtils.readFully(in, readData);
assertThat(data, is(not(readData)));

// Wrapped symmetric key is stored next to the encrypted file
assertTrue(fs.exists(new Path(path + FileKeyStorageStrategy.EXTENSION)));

Hadoop Configuration Properties

Key Value Default
fs.cipher The cipher used to wrap the underlying streams. AES/CTR/NoPadding
fs.e[FS-scheme].impl Must be set to com.palantir.hadoop.StandaloneEncryptedFileSystem
fs.efs.key.public Base64 encoded X509 public key
fs.efs.key.private Base64 encoded PKCS8 private key
fs.efs.key.algorithm Public/private key pair algorithm RSA

License

This repository is made available under the Apache 2.0 License.