Custom Retriever Tutorial

LangChain custom retriever tutorial with text preprocessing and benchmark

Preparation

Install Ollama and load the model. Tested with gemma3:27b-it-qat, qwen3:14b, gemma4:31b on GPU 16GB

set OLLAMA_CONTEXT_LENGTH=16000
ollama pull gemma3:27b-it-qat

Create a Python environment (recommended python=3.11)

conda create --name custom_retriever python==3.11
conda activate custom_retriever

Install dependencies from requirements along with Jupyter

pip install -r requirements.txt

Start the jupyter notebook

jupyter notebook

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
gensim		gensim
retriever		retriever
splits		splits
LICENSE		LICENSE
README.md		README.md
S1_Processing.ipynb		S1_Processing.ipynb
S2_CreateRetrieverModel.ipynb		S2_CreateRetrieverModel.ipynb
S3_CreateCustomRetriever.ipynb		S3_CreateCustomRetriever.ipynb
S4_QuestionGeneration.ipynb		S4_QuestionGeneration.ipynb
S5_RetrieversBenchMark.ipynb		S5_RetrieversBenchMark.ipynb
hist_after.png		hist_after.png
hist_before.png		hist_before.png