Skip to content

Small search engine based on scrapped data using graph database.

Notifications You must be signed in to change notification settings

hajali-amine/pfe-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PFE SEARCH ENGINE

A search engine for final year internships.

Note You can find the infrastructure repository here.

Components

  • Frontend: Implemented with ReactJS. It consists of a search bar and a search filter. Once you execute the search, a list of the offers that correspond to your search will be returned. It queries the result from the Data Reader.
  • Data Reader: A Flask API that exposes an endpoint to do filtered searches.
  • Scrapper: Implemented with Go - used to be in Python - and Selenium APIs. It is used to scrap job offers to load them to the database using the Data Loader.
  • RabbitMQ: Used to transfer asynchronously the scrapped data to the Data Loader. -Data Loader: This consumes the scrapped data and loads them in the Database.
  • Database: We use Neo4J for our database. Having a Graph Database enables us to define perfectly the relations between the data.

Workflow

The current workflow is the following;

workflow

Node-Relationship model

The following Node-Relationship model is defined as the following;

node-rel-model

Thus, the database looks like the following;

graph

Next Steps

  • Convert the scrapping scripts to go and dockerize it.
  • Use RabbitMQ between the scrapper and the loader.
  • Separate the backend to two different containers, one for the reader and one for the loader.
  • Add scripts for the Makefile.
  • Send messages in queue in Protocol Buffers - Google's data interchange format.
  • Add a preliminary GitHub actions pipeline to push new docker images on every push.
    • Find a way to cache Docker layers.
  • Add a pre-commit hook.
  • Add logging in different components.
  • Add application metrics using Prometheus API.
  • Use K8S for deployment.
  • Use ArgoCD for GitOps.
  • Add Linkerd and Flagger for Canary deployment strategy.
  • Use Terraform to provision infrastructure and set-up the first Helm charts.
  • Add application metrics and visualize them using Prometheus and Grafana.
    • Grafana is crashing. Check why.
  • Add retention policy for logs both for dev and prod.
  • Add an alerting system with Discord Webhooks.
  • Make an ingress with a domain name.
  • Improve the front's UI.
  • Add UTs.
  • Add a CI pipeline.
  • Learn and apply security best practices.