# Simple RAG system

RAG stands for Retrieval-Augmented Generation. Its one method to reduce hallucination while focusing an LLM on specific set of documents.

In this notebook, you will a very basic RAG, without a proper database or user interface.

## Setup

In [20]:
from IPython.display import Markdown, display; display(Markdown("env_instructions.md"))

From university computers, use the Conda environment `ppoulain-llm-24`:

```bash
$ conda activate ppoulain-llm-24
```

You can also try to create this environement on your own computer.

Either with [Miniconda](https://docs.anaconda.com/miniconda/):

```bash
$ mkdir -p llm-practicals
$ cd llm-practicals
$ curl https://raw.githubusercontent.com/pierrepo/llm-practicals/main/content/practical-env.yml --output practical-env.yml
# or wget https://raw.githubusercontent.com/pierrepo/llm-practicals/main/content/practical-env.yml
$ conda env create -f practical-env.yml
$ conda activate ppoulain-llm-24
$ jupyter lab
```

or with [Pixi](https://pixi.sh):

```bash
$ mkdir -p llm-practicals
$ cd llm-practicals
$ curl https://raw.githubusercontent.com/pierrepo/llm-practicals/main/content/practical-env.yml --output practical-env.yml
# or wget https://raw.githubusercontent.com/pierrepo/llm-practicals/main/content/practical-env.yml
$ pixi init --import practical-env.yml
$ pixi run jupyter lab
```

## Data preparation

In this example, you will use the [Gromacs Reference Manual](https://manual.gromacs.org/2024.3/reference-manual/index.html) as the source document for your RAG. Gromacs is a molecular dynamics package mainly designed for simulations of biomolecules.

The Gromacs Reference Manual is available as a set of HTML files. Here is the code I used to download these files and convert them to Markdown. Do not run this code yourself:

In [24]:
!python prepare_rag_docs.py

Found 76 urls
Downloading https://manual.gromacs.org/2024.3/reference-manual/algorithms/normal-mode-analysis.html
Saving to normal-mode-analysis.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/analysis/radius-of-gyration.html
Saving to radius-of-gyration.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/analysis/looking-at-trajectory.html
Saving to looking-at-trajectory.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/topologies/force-field-organization.html
Saving to force-field-organization.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/algorithms/algorithms.html
Saving to algorithms.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/algorithms/group-concept.html
Saving to group-concept.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/analysis/correlation-function.html
Saving to correlation-function.md
Downloading https://manual.gromacs.org/2024.3/reference-manual/algorithms/periodic