from [A beginner’s guide to building a Retrieval Augmented Generation (RAG) application from scratch](https://medium.com/@wachambers/a-beginners-guide-to-building-a-retrieval-augmented-generation-rag-application-from-scratch-e52921953a5d)

**The High Level Components of our RAG System**
1. a collection of documents (formally called a corpus)
2. An input from the user
3. a similarity measure between the collection of documents and the user input

In [1]:
import requests
import json
    
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

# one of the easiest similarity metrics: size(intersection)/size(union)
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

# given a query, return the most similar document
# no AI, just jaccard similarity
def return_response(query, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(query, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]

def ranked_documents(query, corpus):
    ranked = []
    for doc in corpus:
        similarity = jaccard_similarity(query, doc)
        ranked.append((similarity,doc))
    ranked.sort(key=lambda a:a[0],reverse=True)
    return ranked

In [2]:
return_response("What is a leisure activity that you like?", corpus_of_documents)

'Visit a local museum and discover something new.'

In [3]:
return_response("I like to hike", corpus_of_documents)

'Go for a hike and admire the natural scenery.'

In [4]:
ranked_documents("What is a leisure activity that you like?", corpus_of_documents)

[(0.06666666666666667, 'Visit a local museum and discover something new.'),
 (0.0625, 'Attend a live music concert and feel the rhythm.'),
 (0.0625, 'Go for a hike and admire the natural scenery.'),
 (0.0625, 'Have a picnic with friends and share some laughs.'),
 (0.0625, 'Take a yoga class and stretch your body and mind.'),
 (0.058823529411764705,
  'Explore a new cuisine by dining at an ethnic restaurant.'),
 (0.058823529411764705,
  'Join a local sports league and enjoy some friendly competition.'),
 (0.058823529411764705,
  "Attend a workshop or lecture on a topic you're interested in."),
 (0.05555555555555555,
  'Take a leisurely walk in the park and enjoy the fresh air.'),
 (0.0, 'Visit an amusement park and ride the roller coasters.')]

In [5]:
# No semantic analysis so far, just statistics
return_response("I don't like to hike", corpus_of_documents)

'Go for a hike and admire the natural scenery.'

To add semantics we'll need a LLM. We'll use [Ollama](https://ollama.com/), to install it run (you'll need sudo):
```bash
# install ollama 
curl -fsSL https://ollama.com/install.sh | sh
# start the server
ollama serve
# install the llama3 model
ollama pull llama3
```

Once installed you can simply run:

```bash
ollama serve
```
Ollama will then be available at [http://localhost:11434](http://localhost:11434)

In [12]:
def callllama(input,document):
    prompt = """
You are a bot that makes recommendations for activities. You answer in very short sentences and do not include extra information.
This is the recommended activity: {document}
The user input is: {input}
Compile a recommendation to the user based on the recommended activity and the user input.
"""
    url = 'http://localhost:11434/api/generate'
    headers = {'Content-Type': 'application/json'}
    data = { "model": "llama3", "prompt": prompt.format(input=input, document=document) }
    response = requests.post(url, data=json.dumps(data), headers=headers, stream=True)
    full_response = []
    try:
        count = 0
        for line in response.iter_lines():
            # filter out keep-alive new lines
            # count += 1
            # if count % 5== 0:
            #     print(decoded_line['response']) # print every fifth token
            if line:
                decoded_line = json.loads(line.decode('utf-8'))
                full_response.append(decoded_line['response'])
    finally:
        response.close()
    return(''.join(full_response))

def simpleRag(input,corpus):
    document = return_response(input, corpus)
    resp = callllama(input,document)
    return resp


    
simpleRag("I don't like to hike", corpus_of_documents)

"Sorry to hear that you don't enjoy hiking. Here's a new recommendation for you:\n\nExplore a local museum or art gallery instead."