LiBook: Book Search Engine 🔍

In this repository, you can find the source code for building up an inverted index based search engine for books obtained from both Project Gutenberg and registered users' accounts directly. We also implemented both relational and non-relational datamarts to be able to make queries on the available books. This is a micro-service-oriented application that consists of the next modules:

Crawler: Obtains books directly from Project Gutemberg book platform and stores them into our datalake.
Cleaner: Processes the books and prepares them to be indexed.
Indexer: Indexes the books into our inverted index structure in Hazelcast.
MetadataDatamartBuilder: Creates a metadata datamart for queries.
QueryEngine: Offers an API for users to be able to query our inverted index.
UserService: Handles users' accounts in MongoDB, and session tokens through a distributed Hazelcast datamart.
UserBookProcessor: Processes the books uploaded by users and sends them to the cleaner.
ApiGateway: Serves an API merging all the public APIs of the final application, improving security on petitions.

Crucially, this project employs three distinct datamart technologies—Hazelcast, MongoDB, and Rqlite. Rqlite, based on SQLite and adapted for clustered usage, is particularly notable for its role in distributed relational database management within the application. The integration of these datamarts enhances the overall scalability, efficiency, and versatility of the search engine, accommodating both centralized and distributed data processing needs.

1) How to run (Docker and Docker Compose)

For each module, you should generate the corresponding docker image. If we take the indexer as a reference, a command like the following should be executed

docker build -t ricardocardn/indexer path_to_repo/Indexer/.

Or whether pull our own image directly

docker run -p 8081:8081 --network host ricardocardn/indexer

(*) The specification of the option --network host is crucial, and some problems related to hazelcast could raise if omitted. The query-engine image itself could be obtained in the following way

docker run -p 8080:8080 --network host susanasrez/queryengine

Other modules, Crawler and CLeaner, are already running on the server which ip is specified on the dockerfiles among the project, but could be refactored to execute it locally. If so, take a look at the docker compose file, and make sure that both modules are running in the same computer. Make also sure that active mq is running before starting the app.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.idea		.idea
ApiGateway		ApiGateway
Cleaner		Cleaner
Crawler		Crawler
DataLoader		DataLoader
Indexer		Indexer
MetadataDatamartBuilder		MetadataDatamartBuilder
QueryEngine		QueryEngine
UserBooksProcessor		UserBooksProcessor
UserService		UserService
resources		resources
.DS_Store		.DS_Store
README.md		README.md
Scientific Paper SearchEngineProject.pdf		Scientific Paper SearchEngineProject.pdf
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiBook: Book Search Engine 🔍

1) How to run (Docker and Docker Compose)

Credits

About

Releases

Packages

Contributors 5

Languages

ricardocardn/LiBook

Folders and files

Latest commit

History

Repository files navigation

LiBook: Book Search Engine 🔍

1) How to run (Docker and Docker Compose)

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages