ResearchLens is an AI-powered research assistant that allows users to interact with research papers using natural language.
Upload a paper or choose from default datasets, ask questions, and get context-aware answers powered by Retrieval-Augmented Generation (RAG).
👉 https://researchlenss.streamlit.app/
- 📄 Ask questions about research papers
- 📤 Upload your own PDF and query it instantly
- 📚 Switch between multiple research papers
- 🔍 Semantic search using FAISS
- 🧠 Cross-encoder reranking for better relevance
- 🤖 LLM-based answer generation
- 💬 Chat-style interface (like ChatGPT)
- ⚙️ Adjustable answer length
PDF → Text Extraction → Chunking → Embeddings → FAISS Index
→ Retrieval → Reranking → LLM → Answer
- Text Extraction: Extracts raw text from PDFs
- Chunking: Splits text into meaningful segments
- Embeddings: Converts text into vector representations
- FAISS Index: Enables fast similarity search
- Retriever: Finds relevant chunks for a query
- Reranker: Improves relevance using cross-encoder
- LLM: Generates final answer from retrieved context
Try asking:
- What is the main contribution of the paper?
- What architecture does the paper propose?
- What dataset was used?
- How does the model work?
- Frontend: Streamlit
- Backend: Python
- Vector DB: FAISS
- Embeddings: Sentence Transformers
- LLM: FLAN-T5 / TinyLlama (local)
- Reranking: Cross-encoder
git clone https://github.com/your-username/researchlens.git
cd researchlenspip install -r requirements.txtpython -m src.indexing.build_indexstreamlit run app/streamlit_app.pyapp/
└── streamlit_app.py
src/
├── data/
├── embeddings/
├── indexing/
├── models/
├── retrieval/
data/
└── papers/
- Lightweight model used for cloud deployment
- Larger models (TinyLlama) used locally for better performance
- Handles CPU-only environments
- Inline citations in answers
- Hybrid search (BM25 + embeddings)
- Multi-document memory
- Improved UI/UX
- Better LLM integration
- Built a full RAG pipeline from scratch
- Solved real-world deployment issues (CPU, memory limits)
- Designed scalable multi-document architecture
- Balanced model performance vs resource constraints
Ronak Das
Give it a star on GitHub ⭐