Building a search engine from scratch. We plan on implementing the 3 major components in a search engine - Crawler
, Parser
and Indexing
. We will begin by developing command line tools for these components and then wrapping these with an API service to be used by a frontend. This project is being done under IEEE-NITK.
To establish a VPN connection to NITK-NET:
- Login at the Sophos portal - link.
- Download SSL-VPN config file for the necessary OS.
- Execute
sudo openvpn <path-to-config-file>
to initiate the connection sequence. Keep this terminal open.
- Execute
ssh <user>@<container-ip>
and then enter necessary details on being prompted.
- Install
Docker Engine
by following this link.
# Install Chrome
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
&& apt-get -y update \
&& apt-get -y install google-chrome-stable
# Install chromedriver
RUN wget -N https://chromedriver.storage.googleapis.com/111.0.5563.64/chromedriver_linux64.zip -P ~/ \
&& unzip ~/chromedriver_linux64.zip -d ~/ \
&& rm ~/chromedriver_linux64.zip \
&& mv -f ~/chromedriver /usr/local/bin/chromedriver
Warning
Take care to usecompatible
versions forgoogle-chrome
andchromedriver
. Refer this answer on StackOverflow.
- Touch a file -
.env
.MONGO_USER=admin MONGO_PASSWORD=adminpw MONGO_DATABASE=test
- Create a virtual environment and then install all the dependencies in
andromeda/requirements.txt
after activating the environment.
- Activate the virtual environment.
- Execute
docker-compose up -d
to bring up theMongoDB
server. - Execute
python3 andromeda/crawler.py start
to start the process of crawling.
Note
In the Docker network, the MongoDB server will be running at port -27017
and a service known as Mongo-Express will be running at port -8081
which provides a GUI to access the database.
- Execute
pylint andromeda/
before making a PR and get rid of any lint errors.
- Python
- NextJS
- click
- Flask-RESTful