Skip to content

A multi-container microservice based analytics for /r/technology subreddit comment

Notifications You must be signed in to change notification settings

phoebe20200523/tech_reddit_comment_analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

💻💬 /r/technology reddit comment analytics microservices💬📱

A multi-container application for technology related subreddit comment analytics

  • Designed a microservice architecture to perform real-time analytics on comments from technology-related subreddits (e.g., r/technology), with a cumulative analysis of over 220k comments from 76k users.
  • Utilized Kafka as the message broker to decouple the comment ingestion and keyword extraction using named entity recognition enabled by Spacy.
  • Streamed data from Kafka to Elasticsearch using Kafka Connect Elasticsearch Sink and ksqlDB, and built a Kibana dashboard to identify the active Redditors and hot topics within user-specified time range.
  • Orchestrated the multi-container (10 containers) application with docker-compose.

The percentage change in the most discussed topics:

    increased from 15.23% to 18.65% as for "AI", 📈
    9.35% to 10.24% as for "Google", 📈
    4.5% to 6.45% as for "TikTok", 📈
    decreased from 4.7% to 3.24% as for "SVB", 📉
    during the period of March 26 to April 5.

Dashboard on March 26

Dashboard on April 5

About

A multi-container microservice based analytics for /r/technology subreddit comment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages