Table of Contents
I created this project for LabLab.ai's AI21 Labs Hackathon.
The existing search bar in most websites performs keyword search. A slow and arduous process in which the user has to read through a myriad of information before finding the tidbit that they wanted in the first place.
My goal was to create a question answering tool that can be easily integrated into any website. It allows you to find specific information, provides answers in a clear, understandable way and includes sources and more information should the user need it.
Presenting Web Indexer. Developed to significantly improve the user experience by providing a service that ChatGPT, Google and standard search bars cannot.
- Navigate to the spiders directory in the scraper directory
- Change the urls and domains in the Spider class
- Run the command in the terminal
scrapy crawl text -O ../data/{filename}.csv
- Specify pages to scrape with
CLOSESPIDER_PAGECOUNT = 10
in settings.py
- Navigate to main directory
- Run
scrapy shell <url>
- Choose the website you wish to scrape on the Streamlit server
- Enter a question that you would like answered
- Adjust the threshold and number of paragraphs to control the context
- Create benchmarks
- Summarize context? May lead to improved accuracy
- Conversation style with prior questions as context
- Finetune both embedding and generation models
- Access to attention layer for improved relevant links
This repository is intended as an archive. No changes will be made to it in the future.
You may fork the project and work in your own repository.
Distributed under the MIT License. See LICENSE.txt
for more information.
Rahel Gunaratne: