Protocols for Private STR Queries

This library contains a proof-of-concept implementation of our protocol for performing private STR queries. This implementation is a research prototype and is not intended for use in production environments.

Paper Reference

This is an implementation of the protocols from the following paper:

Blindenbach, J.A., Jagadeesh, K.A., Bejerano, G., Wu, D.J. Avoiding genetic racial profiling in criminal DNA profile databases. Nat Comput Sci 1, 272–279 (2021). https://doi.org/10.1038/s43588-021-00058-3

The full text is available at https://rdcu.be/cjq70.

Please contact Jacob Blindenbach (jab7dq@virginia.edu) if you have questions about this code.

System Requirements

The provided code was tested on Ubuntu 18.04 LTS and was tested with the following compiler and interpreters:

python 3.6.9
clang 6.0.0 (x86-64)

There are no external dependencies other than the standard Python/C++ libraries. The implementation relies on the Intel AES-NI instruction set, which is supported by most modern Intel processors (since the Westmere architecture).

Compiling the Code

To compile the code, simply run make in the project root directory.

Local Benchmark

After compiling the code, to run a local benchmark, use the following command:

python3 run.py -N DATABASE_SIZE -config CONFIG_FILE -in IN_DATABASE -p PORT -local LOCAL

DATABASE_SIZE: Specifies the size of the database the secure query will be performed on.
CONFIG_FILE: Specifies the STR configuration. For a list of configurations supported by this sample implementation, see the configs directory.
IN_DATABASE (optional parameter): Specifies if the simulated query will be in the database or not. 0 denotes the record is not in the database, while 1 denotes that the record is in the database (defaults to 1).
PORT (optional parameter): Specifies port used for client/server communication (defaults to 8000).
LOCAL (optional parameter): Specifies whether to run a local benchmark (client/server on the same machine) or a networked benchmark (defaults to 1). If a networked benchmark is requested, the script will generate the data files and the OT correlations for the client and the server. Refer to the section on Client/Server Benchmarking for further details on running the protocol over the network.

Example

python3 run.py -N 100000 -config ./configs/US-Current.config -in 1

This script will generate a synthetic database with 100000 STR records according to the current US standard. It will also generate a sample query that matches one of the records within the database. The script will then start a local client and server instance to execute the protocol (using the default port 8000 for all communication).

Example Output

Sample output for an example run of the above script:

python3 run.py -N 100000 -config ./configs/US-Current.config -in 1

Collecting protocol parameters
Generating test case
Number of OT correlations: 11600000
Query is in database at location: 24520
Running local test
[client] successfully connected to server
[client] starting client protocol
[client] found a match at index 24520
[client] protocol execution completed
protocol execution time: 0.848771 seconds
total bytes sent: 4475000
total bytes received: 13600000
total bytes communicated (received + sent): 18075000

Note that the data generation script generates a random database and query, so the exact parameters will vary from run to run.

Client/Server Benchmarking

To run a benchmark in a networked environment with separate client/server instances, proceed as follows:

Preprocessing: First, run the run.py script with the local flag turned off:

python3 run.py -N DATABASE_SIZE -config CONFIG_FILE -in IN_DATABASE -local 0

Copy the data/client directory to the client instance and the data/server directory to the server instance. The data/client directory contain the precomputed client OT correlations as well as the query while the data/server directory contain the precomputed server OT correlations and the database.
Copy the data/info.txt file to the client and server instance. This file contains the protocol arguments (database size and STR configuration information) that will be used during execution
Server: Start the server by running on the server
```
./tests/ProtocolServer INFO_FILE SERVER_DATA_DIR PORT
```
In this case server will start listening on the specified PORT. For example,
```
./tests/ProtocolServer ./data/info.txt ./data/server/ 8000
```
Client: Start the client using
```
./tests/ProtocolClient INFO_FILE CLIENT_DATA_DIR SERVER_IP_ADDDRESS PORT
```
The client will connect to the server at address SERVER_IP_ADDDRESS using PORT. Once the server accepts the connection, the client and server will initiate the protocol using the data in the data/client and data/server directories, respectively. For example,
```
./tests/ProtocolClient ./data/info.txt ./data/client/ 127.0.0.1 8000
```

Example Output

Sample output for an example run of the preprocessing, server, and the client:

Preprocessing

python3 run.py -N 100000 -config ./configs/US-Current.config -in 1 -local 0

Collecting protocol parameters
Generating test case
Number of OT correlations: 11600000
Query is in database at location: 82837

Server

./tests/ProtocolServer ./data/info.txt ./data/server/ 8000

[server] finished loading data from disk
[server] accepted connection from client
[server] starting server protocol
[server] protocol execution completed

bytes sent: 13600000
bytes received: 4475000

Client

./tests/ProtocolClient ./data/info.txt ./data/client/ 127.0.0.1 8000

[client] successfully connected to server
[client] starting client protocol
[client] found a match at index 82837
[client] protocol execution completed
protocol execution time: 0.873334 seconds
total bytes sent: 4475000
total bytes received: 13600000
total bytes communicated (received + sent): 18075000

Note that the data generation script generates a random database and query, so the exact parameters will vary from run to run.

Sample Benchmarks

We provide sample benchmarks of our protocol below (using the provided implementation). These measurements were collected by running the protocol on two Amazon EC2 instance (m4.2xlarge), with the client instance located in the Northern California region (us-west-1) and the server instance located in the Northern Virginia region (us-east-1). The network latency between the two instances was roughly 60ms and the bandwith was roughly 25 MB/s.

Protocol Execution Time and Communication Cost

The following table summarizes the average protocol execution time and network communication for running the protocol over a database of 1,000,000 entries and different STR configurations (described in the configs directory). Each measurement was taken using the exact procedure described in described in Client/Server Benchmarking with the respective configuration. We refer to the paper for more details on each configuration.

CODIS System	Configuration	Protocol Execution Time (s)	Communication (MB)
US System, current	`configs/US-Current.config`	38.35	172
US System, pre-2017	`configs/US-Old.config`	27.74	115
UK-like System	`configs/UK.config`	24.96	101
EU-like System	`configs/Europe.config`	32.66	143
Chinese-like System	`configs/China.config`	39.95	180
NIST 40 Loci System	`configs/US-New.config`	81.62	389

Precomputation Size and Number of OT Correlations

The following table summarizes the size of the client and server OT correlations as well as the total number of OT correlations for running the protocol over a database of 1,000,000 entries and different STR configurations (same as above). As before, each measurement was taken using the exact procedure described in described in Client/Server Benchmarking with the respective configuration.

CODIS System	Client Precomputation (MB)	Server Precomputation (MB)	Number of OT Correlations
US System, current	58	130	116 million
US System, pre-2017	39	87	78 million
UK-like System	35	76	69 million
EU-like System	48	108	97 million
Chinese-like System	61	136	122 million
NIST 40 Loci System	132	292	266 million

Full Benchmarks

We include additional benchmarks for CODIS databases of different sizes for each of the configurations in the benchmarks directory. We refer to the paper for a full discussion of the provided values.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks		benchmarks
configs		configs
GenerateOTs.cpp		GenerateOTs.cpp
LICENSE		LICENSE
Makefile		Makefile
ProtocolClient.cpp		ProtocolClient.cpp
ProtocolServer.cpp		ProtocolServer.cpp
README.md		README.md
README.pdf		README.pdf
aes.h		aes.h
protocol.cpp		protocol.cpp
protocol.h		protocol.h
run.py		run.py
socket.h		socket.h
util.cpp		util.cpp
util.h		util.h
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protocols for Private STR Queries

Paper Reference

System Requirements

Compiling the Code

Local Benchmark

Example

Example Output

Client/Server Benchmarking

Example Output

Preprocessing

Server

Client

Sample Benchmarks

Protocol Execution Time and Communication Cost

Precomputation Size and Number of OT Correlations

Full Benchmarks

About

Releases

Packages

Languages

License

jBlinden/private-codis

Folders and files

Latest commit

History

Repository files navigation

Protocols for Private STR Queries

Paper Reference

System Requirements

Compiling the Code

Local Benchmark

Example

Example Output

Client/Server Benchmarking

Example Output

Preprocessing

Server

Client

Sample Benchmarks

Protocol Execution Time and Communication Cost

Precomputation Size and Number of OT Correlations

Full Benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages