Skip to content

Building a Search Engine from scratch

Notifications You must be signed in to change notification settings

helios2003/Andromeda

 
 

Repository files navigation

pylint

Andromeda

Building a search engine from scratch. We plan on implementing the 3 major components in a search engine - Crawler, Parser and Indexing. We will begin by developing command line tools for these components and then wrapping these with an API service to be used by a frontend. This project is being done under IEEE-NITK.

Dev Server

VPN Connection

To establish a VPN connection to NITK-NET:

  1. Login at the Sophos portal - link.
  2. Download SSL-VPN config file for the necessary OS.
  3. Execute sudo openvpn <path-to-config-file> to initiate the connection sequence. Keep this terminal open.

SSH

  1. Execute ssh <user>@<container-ip> and then enter necessary details on being prompted.

Getting Started

Setup

Docker

  1. Install Docker Engine by following this link.

Chromedriver

# Install Chrome
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add \
    && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
    && apt-get -y update \
    && apt-get -y install google-chrome-stable

# Install chromedriver
RUN wget -N https://chromedriver.storage.googleapis.com/111.0.5563.64/chromedriver_linux64.zip -P ~/ \
    && unzip ~/chromedriver_linux64.zip -d ~/ \
    && rm ~/chromedriver_linux64.zip \
    && mv -f ~/chromedriver /usr/local/bin/chromedriver

Warning
Take care to use compatible versions for google-chrome and chromedriver. Refer this answer on StackOverflow.

Env

  1. Touch a file - .env.
    MONGO_USER=admin
    MONGO_PASSWORD=adminpw
    MONGO_DATABASE=test
  2. Create a virtual environment and then install all the dependencies in andromeda/requirements.txt after activating the environment.

Running

  1. Activate the virtual environment.
  2. Execute docker-compose up -d to bring up the MongoDB server.
  3. Execute python3 andromeda/crawler.py start to start the process of crawling.

Note
In the Docker network, the MongoDB server will be running at port - 27017 and a service known as Mongo-Express will be running at port - 8081 which provides a GUI to access the database.

Linting

  • Execute pylint andromeda/ before making a PR and get rid of any lint errors.

Guidelines

  • Please follow the pep8 python style guide
  • Refer to this link to write proper commit messages

Intended Tech Stack

References

About

Building a Search Engine from scratch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.6%
  • JavaScript 26.5%
  • CSS 6.6%
  • Dockerfile 2.5%
  • Shell 0.8%