Streaming news world wide

We will get a big amount of streaming data from the NewsAPI. We will store this requested data into a Kafka cluster through a Java spring boot application. After we will process this data with Spark and store in HBase and Hive.

Structure

collector: Get information from APIs and introduce this data into Kafka Cluster

consumer: Receive information and process via Spark streaming and save it into Hive & HBase

start.sh : Script to start project

test.sh: Script to run tests in both projects

scripts.sh: Scripts to manage kafka and stop servers

config.txt: Configuration to apply in the project (query to run, time intervals...)

Run project

Get into the folder and:

./test.sh ./run.sh

Documentation v1 (Updated) : Google slides Documentation v2 : Google doc

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
collector		collector
consumer		consumer
tests_output		tests_output
tests_performance		tests_performance
.gitignore		.gitignore
README.md		README.md
application.properties		application.properties
config.txt		config.txt
initialize.sh		initialize.sh
scripts.sh		scripts.sh
start.sh		start.sh
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming news world wide

Structure

Run project

About

Releases

Packages

Languages

javsanbel2/streaming-news-worldwide

Folders and files

Latest commit

History

Repository files navigation

Streaming news world wide

Structure

Run project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages