Skip to content

πŸŒπŸ“ˆ Analyze real-time COVID-19 tweets with ease! The microservices ETL architecture leverages Spring Cloud Stream and Apache Kafka for data ingestion, processing, and visualization. Stay informed and empowered. πŸ¦ πŸ”

License

Notifications You must be signed in to change notification settings

sergio11/covid_tweets_etl_architecture

Repository files navigation

Real-time COVID-19 Tweet Analysis πŸŒπŸ“Š

Unleash the power of real-time COVID-19 Tweet analysis with this microservices ETL architecture. Built on Spring Cloud Stream and Apache Kafka, this project is your gateway to ingesting, processing, and visualizing tweets about the pandemic.

Project developed to practice what I have learned in the Udemy course Apache Kafka Series - Learn Apache Kafka for Beginners v2.

The tech stack includes Spring Boot 2.3.2, Apache Maven 3.6.3, Spring Cloud Stream, Elasticsearch, Kibana, and more, all running as Docker containers.

Explore the project, visualize COVID-19 tweet data, and analyze sentiment and trending terms with ease. For more detailed information, check out our Medium article.

Thank you for visiting the Covid Tweets ETL Architecture GitHub repository! Stay informed and empowered with real-time Tweet analysis. πŸ“ˆπŸ¦ πŸ”

Architecture Overview

Applications

  • covid-tweets-api Spring Boot Web Java application that allows to retrieve and view the tweets processed through a REST API or STOMP over WebSocket.

  • covid-tweets-collector Spring Boot Web Java application that listens to news messages in processed-tweets topic in Kafka, saves them in Elasticsearch.

  • covid-tweets-ingest Spring Boot Web Java application that implement a Twitter client that receives the latest tweets about COVID-19, creates the data model associated with the tweet, and posts it to the topic tweets-ingestin Kafka.

  • covid-tweets-processor Spring Boot Web Java application that listens to news messages in tweets-ingest topic in Kafkaand it make the analysis of the text through the analysis service implemented on Standford Core NLP.

Used technology

  • Spring Boot 2.3.2 / Apache Maven 3.6.3.
  • Spring Cloud Stream (to build highly scalable event-driven applications connected with shared messaging systems)
  • Spring Cloud Starter Stream Kafka.
  • lombok.
  • Twitter4j Stream.
  • Mapstruct.
  • Elasticsearch oss 7.6.2.
  • Spring Boot Starter Data Elasticsearch.
  • kibana oss 7.6.2.
  • Spring Boot Starter Web.
  • Springdoc Openapi UI.
  • Spring Boot Starter Websocket.
  • Stanford Corenlp.

Running Applications as Docker containers.

Rake Tasks

The available tasks are detailed below (rake --task)

Task Description
check_deployment_file_task Check Deployment File
check_docker_task Check Docker and Docker Compose Task
cleaning_environment_task Cleaning Evironment Task
deploy Deploys the Covid Tweets Architecture and laun...
login Authenticating with existing credentials
start Start Containers
status Status Containers
stop Stop Containers
undeploy UnDeploy Covid Tweets Architecture

To start the platform make sure you have Ruby installed, go to the root directory of the project and run the rake deploy task, this task will carry out a series of preliminary checks, discard images and volumes that are no longer necessary and also proceed to download all the images and the initialization of the containers.

Also make sure to define your own credentials in the twitter4j.properties file

oauth.consumerKey=YOUR_CONSUMER_KEY
oauth.consumerSecret=YOUR_CONSUMER_SECRET
oauth.accessToken=YOUR_ACCESS_TOKEN
oauth.accessTokenSecret=YOUR_ACCESS_TOKEN_SECRET

Some screenshots

Deploy with Docker Compose.

Using Akhq dashboard for topics management and kafka broker.

Tweets processed will be stored to elasticsearch index and visualize with Kibana.

Tweets processed can be get through the REST API / WebSockets offered from the Covid Tweets Microservice API.

Visualization of the general sentiment and the most frequent terms.

Visitors Count

Please Share & Star the repository to keep me motivated.

About

πŸŒπŸ“ˆ Analyze real-time COVID-19 tweets with ease! The microservices ETL architecture leverages Spring Cloud Stream and Apache Kafka for data ingestion, processing, and visualization. Stay informed and empowered. πŸ¦ πŸ”

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages