MSSD: Multidimensional Skylines Over Streaming Data

Repository organization

This repository includes MSSD sources as well as state of the art methods to answer subspace skyline queries over streaming data.

If you consider MSSD in your work, please cite:

@article{alami2020framework,
  title={A framework for multidimensional skyline queries over streaming data},
  author={Alami, Karim and Maabout, Sofian},
  journal={Data \& Knowledge Engineering},
  pages={101792},
  year={2020},
  publisher={Elsevier}
}

Compile and run C++ implementation

To begin, please clone this repository. This software requires C++ compiler.

git clone https://github.com/karimalami7/MSSD.git

Next, change current directory to ./src and compile the source code

cd ./src
./compile.sh

Then, define the experiment to perform in the file run.sh and execute it

./run.sh

Arguments to define in the file run.sh are:

Data type:

INDE: Independent data.
ANTI: Anti-correlated data.
CORR: Correlated data.
other: when using a real dataset.

In case of real dataset, input the path of the file. In case of synthetic dataset, define the number of distinct values of each dimension.
Omega: size of the sliding window.
Experiment lifetime.
Number of dimensions.
Number of parallel threads to be run.
Batch interval

Run Spark implementation

Start services for standalone cluster

{spark_home}/sbin/start-all.sh
{kafka_home}/bin/zookeeper-server-start.sh config/zookeeper.properties
{kafka_home}/bin/kafka-server-start.sh config/server.properties

Run with twitter API through TCP

cd twitter_api_pipeline/
python3 twitter_api_pipeline.py

cd mssd_spark/
{spark_home}/bin/spark-submit TcpSocket_receiver.py localhost 9010

Run with data from Kafka

cd mssd_spark/
python3 kafka_producer.py
{spark_home}/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.4.3 Kafka_receiver.py localhost:9092 twitter_topic

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
mssd_spark		mssd_spark
src		src
twitter_api_pipeline		twitter_api_pipeline
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mssd_spark

mssd_spark

src

src

twitter_api_pipeline

twitter_api_pipeline

.gitignore

.gitignore

README.md

README.md

Repository files navigation

MSSD: Multidimensional Skylines Over Streaming Data

Repository organization

Compile and run C++ implementation

Run Spark implementation

Start services for standalone cluster

Run with twitter API through TCP

Run with data from Kafka

About

Releases

Packages

Languages

karimalami7/MSSD

Folders and files

Latest commit

History

Repository files navigation

MSSD: Multidimensional Skylines Over Streaming Data

Repository organization

Compile and run C++ implementation

Run Spark implementation

Start services for standalone cluster

Run with twitter API through TCP

Run with data from Kafka

About

Resources

Stars

Watchers

Forks

Languages