Website Article Aggregator App

Introduction

This project facilitates the discovery and aggregation of RSS feeds from diverse websites and makes them searchable. It has a simple(incomplete) front end web ui with a fairly robust backend. The backend crawler, when given a set of seed urls, finds feeds from it and extracts all links to be given back to the crawler inorder continue the process infinitely.

Installation

This app can easily be installed using docker-compose command. Make sure docker is installed on your system before installing the app, you can view instructions here. To install go to the root directory where docker-compose.yml is present and do the following.

docker compose up --build

This would build and start 5 containers. web service is the gui container which is a simple react app that talks with backend to fetch articles according to user's query. article_service provides apis relating to artcles and domains. crawler_service takes care of crawling websites, finding rss feeds and retrieving feed details. summarizer_service uses NLP libraries to find relevant keywords in an article. tasks_service is responsible for managing long running tasks by adding it to redis queue.

You can run just web and article_service if articles are already populated. To do it type in the following command.

docker compose up --build web article_service

Usage

Following command will start neccessary services for accessing web ui for the app.

docker compose up web article_service

You can access the web ui by visiting http://localhost:3000.

The above command will fire up all containers required for running the frontend of the app. Namely, article_service, web, article_db containers.

Now to find feeds and crawl them, 4 services are required. article_service, crawler_service, task_service, summarizer_service will take care of all crawling and saving to database tasks. You can edit the above command with name of containers based on the task you need to do.

Architecture

Frontend

Upon receiving a user's keyword input, the web service communicates with the article_service to identify all articles containing the specified keyword. The article_service then selects relevant articles and organizes them based on factors such as website popularity, article read time, categories, and keyword rank. The web service subsequently presents the ranked articles to the user. You can view the pipeline code in this file.

Backend

Here, initially a set of seed URLs are given to the crawler_service. Using libraries like Beautiful Soup, crawler service crawls the urls provided and finds rss feeds in them. Along with the feeds, all links are parsed and are given to task_service to further process them to find rss feeds in them. These feeds are then parsed to get articles. The articles received are then sent to summarizer service to determine it's quality and find keywords in the article. Then articles along with the keywords are added to a queue to be saved later. The task_service obtains the articles from the queue and sends it to article_service to be saved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website Article Aggregator App

Introduction

Installation

Usage

Architecture

Frontend

Backend

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
article_service		article_service
assets/img		assets/img
crawler_service		crawler_service
summarizer_service		summarizer_service
tasks_service		tasks_service
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.yml		docker-compose.yml

gadheyan-dev/article-aggregator

Folders and files

Latest commit

History

Repository files navigation

Website Article Aggregator App

Introduction

Installation

Usage

Architecture

Frontend

Backend

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages