WebCrawler and Indexer

This project builds a local index engine using Whoosh and SpaCy with the purpose of implementing a proof of concept AI type application that can index the content of websites. It also allows to perform search queries and make questions in plain English, and then it either displays relevant matching content or answers to the question.

Build and activate a virtual environment for the project (optional):

python3 -m venv venv

venv/scripts/activate

Install requirements:

pip install -r requirements.txt

Install the SpaCy english model:

python3 -m spacy download en_core_web_sm

Make sure to have PostgreSQL installed and running, and set the details on the DATABASES section of the indexengine/indexengine/settings.py file.

In the file websites_to_crawl.txt there are a list of urls, that will be scraped and extracted content from them for indexation, at application startup.

Run the application:

python3 manage.py runserver

There are 3 main url paths:

/search: Allows to search for indexed documents.
/index_documents: Allows to index new documents manually.
/question: Allows to make questions to the system.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
indexengine		indexengine
myapp		myapp
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt
websites_to_crawl.txt		websites_to_crawl.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

indexengine

indexengine

myapp

myapp

.gitignore

.gitignore

LICENSE

LICENSE

Procfile

Procfile

README.md

README.md

manage.py

manage.py

requirements.txt

requirements.txt

websites_to_crawl.txt

websites_to_crawl.txt

Repository files navigation

WebCrawler and Indexer

About

Releases

Packages

Languages

License

MarioCSilva/WebCrawler_and_Indexer

Folders and files

Latest commit

History

Repository files navigation

WebCrawler and Indexer

About

Topics

Resources

License

Stars

Watchers

Forks

Languages