Secure Semantic Search over Encrypted Big data in the Cloud (S3BD)

Introduction

This is an open-source program that enables semantic enterprise search for unstructured encrypted big datasets stored in the cloud. If you are using this tool in your research projects, please cite the following paper:

@article{woodworth2019s3bd,
  title={S3BD: Secure semantic search over encrypted big data in the cloud},
  author={Woodworth, Jason W and Amini Salehi, Mohsen},
  journal={Concurrency and Computation: Practice and Experience},
  volume={31},
  number={11},
  pages={e5050},
  year={2019},
  publisher={Wiley Online Library}
}

The paper is also available in the following address: http://hpcclab.org/paperPdf/ccpe18/ccpe18.pdf

Set up

Clone S3BD. git clone https://github.com/hpcclab/S3BD.git
Go to S3BD folder and provide read, write, and execute permission to all. chmod a+x *
Command ./execute.sh to build required folders and load the upload folder with dataset.

S3BD Running Instructions

S3BD is composed of two projects named ClientEncryptedSearch and EncryptedSearchServer that act as client and cloud respectively.
These two can be run on the same machine, or on two machines connected by a network. There are three primary actions you can perform when running the system: Uploading documents, Partitioning the dataset into clusters, and Searching over the dataset.

Uploading Documents

To start an upload, launch the cloud server and select upload as its action (the -u flag) in IDE (e.g., Eclipse) console.
Choose how they'll be uploaded based on your project configuration. Default: input -n, hit enter to upload files through network.
Once the server is running, launch the client and select upload as its action (again, the -u flag).
To perform a batch upload, enter the path for the directory with the files. Command: data/tmp to upload demo dataset. Single file uploads are not currently supported.
After finishing the uploads, go to S3BD->cloud->cloudserver->utilities location. Check index.txt and docsizes.txt files. They should be filled with recent update.

Partitioning Dataset

To start partitioning the dataset, launch the cloud server and select partition as its action (the -p flag).
The process will start automatically. The console will display the progress of the partitioning. Once the IDE console notes that it is ready to connect to the client, launch the client, select partitioning by providing -p as its action, and the files will be transferred.

Searching

To start a search, launch the cloud server and select search as its action (the -s flag). The cloud will begin loading relevant data.
Once the server is ready to accept a search, launch the client and select search as its action by -s flag, then enter the desired search query. The client's console should soon show the results of the search. Example queries: host network, network protocol, internet engineering, routing protocol

Configuration details.

The config files for each project can be found in the utilities subdirectory of the src directory. Most importantly:

Client Project

cloudIP - the IP used to connect to the server. Use localhost if both projects are local.
socketPort - the socket used for data transfers. Ensure this is the same on both client and cloud.
calcMetrics - only set this to true if you are testing speeds.
numSearchedAbstracts - determines the number of abstracts and clusters that should be searched over. Has a heavy impact on speed and accuracy.

Cloud Project

socketPort - the socket used for data transfers. Ensure this is the same on both client and cloud.
abstractIndexCound - the number of terms that will be included in an abstract index.
k - the number of clusters created during partitioning.
calcMetrics - only set this to true if you are testing speeds.

How to Contribute

We welcome new features, extension, or enhancements of S3BD.

We are, in particular, looking for new collaborations, taking this framework further. As extension of S3C, we have developed S3BD that is similar to S3C but can perform search over encrypted Big Data. In addition to S3BD, we are also researching to extend the capabilities of S3BD such as search query expansion, intelligent pruning, clustering, and so on. Please drop us an email if you are interested to collaborate.

Contacts

Acknowledgements

This project is derived from Jason Woodworth Master's thesis. In addition to the code base, the project includes his thesis latex files too.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ClientEncryptedSearch		ClientEncryptedSearch
EncryptedSearchServer		EncryptedSearchServer
WS4J		WS4J
results-data-script		results-data-script
Jason-Thesis-final.pdf		Jason-Thesis-final.pdf
Readme.md		Readme.md
archi.png		archi.png
exec.sh		exec.sh
jawjaw-1.0.2.jar		jawjaw-1.0.2.jar
thesis-latex.zip		thesis-latex.zip
ws4j-1.0.1.jar		ws4j-1.0.1.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Secure Semantic Search over Encrypted Big data in the Cloud (S3BD)

Introduction

Set up

S3BD Running Instructions

Uploading Documents

Partitioning Dataset

Searching

Configuration details.

Client Project

Cloud Project

How to Contribute

Contacts

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Secure Semantic Search over Encrypted Big data in the Cloud (S3BD)

Introduction

Set up

S3BD Running Instructions

Uploading Documents

Partitioning Dataset

Searching

Configuration details.

Client Project

Cloud Project

How to Contribute

Contacts

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages