# RAG from scratch

Tutorial: https://learnbybuilding.ai/tutorials/rag-from-scratch.

RAG paper:https://arxiv.org/abs/2005.11401.

Jaccard index: https://en.wikipedia.org/wiki/Jaccard_index.

ollama: https://ollama.com/.
- https://itsfoss.com/ollama-setup-linux/

In [11]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

In [12]:
def preprocess_query(query):
    return query.lower().split(" ")

def jaccard_similarity(query, document):
    query = preprocess_query(query)
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

def return_response(user_input, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]


In [13]:
user_input = "I like to hike"

In [14]:
return_response(user_input, corpus_of_documents)

'Go for a hike and admire the natural scenery.'

In [15]:
user_input = "I don't like to hike"

In [16]:
return_response(user_input, corpus_of_documents)

'Go for a hike and admire the natural scenery.'

## Using `ollama`

Installing `ollama`:

```
curl -fsSL https://ollama.com/install.sh | sh
```

Running `ollama`:

```
ollama serve
```

Pulling `tinyllama`:

```
ollama pull tinyllama
```


In [17]:
import requests
import json

In [18]:
user_input = "I like to hike"
relevant_document = return_response(user_input, corpus_of_documents)
full_response = []

In [19]:
prompt = """
You are a bot that makes recommendations for activities. You answer in very short sentences and do not include extra information.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input.
"""

In [None]:
full_response = []
url = 'http://localhost:11434/api/generate'
model = 'tinyllama'
data = {
    "model": model,
    "prompt": prompt.format(user_input=user_input, relevant_document=relevant_document)
}

headers = {'Content-Type': 'application/json'}
response = requests.post(url, data=json.dumps(data), headers=headers, stream=True)
try:
    count = 0
    for line in response.iter_lines():
        # filter out keep-alive new lines
        # count += 1
        # if count % 5== 0:
        #     print(decoded_line['response']) # print every fifth token
        if line:
            decoded_line = json.loads(line.decode('utf-8'))
            #print(decoded_line)
            
            full_response.append(decoded_line['response'])
finally:
    response.close()

full_response = ''.join(full_response)

In [22]:
print(full_response)

As per your input, here's a potential response:

Welcome to our AI-powered personal assistant! We have heard that you enjoy hiking, so we suggest taking a day trip to one of our recommended natural scene locations in the nearby area. Our suggestion for today is for you to embark on a 3-hour hike along the scenic path with stunning views of the rolling hills, streams and wildflowers. The trail is approximately 2 miles long, so it's perfect for a leisurely walk at your own pace. We hope this suggestion helps you to appreciate nature in all its glory!

In case you prefer to take a more guided route, we can provide some tips on how to access the natural park, where the hike will start from. There are several parking lots nearby that you can choose from, but our recommendation is for you to use the one with the best views and facilities. It's located just 15 minutes away from downtown. You'll find it on your left as you approach the town square. If you prefer to walk there yourself, it will

In [32]:
def generate_response(user_input, relevant_document):
    full_response = []
    url = 'http://localhost:11434/api/generate'
    data = {
        "model": "tinyllama",
        "prompt": prompt.format(user_input=user_input, relevant_document=relevant_document)
    }
    headers = {'Content-Type': 'application/json'}
    response = requests.post(url, data=json.dumps(data), headers=headers, stream=True)
    try:
        for line in response.iter_lines():
            # filter out keep-alive new lines
            if line:
                decoded_line = json.loads(line.decode('utf-8'))
                #print(decoded_line)
                # print(decoded_line['response'])  # uncomment to results, token by token
                full_response.append(decoded_line['response'])
    finally:
        response.close()

    return ''.join(full_response)


In [33]:
user_input = "I don't like hiking"
full_response = generate_response(
    user_input=user_input,
    relevant_document=return_response(user_input, corpus_of_documents)
)
print(full_response)

Based on the user input "I don't like hiking," the recommended activity for the bot is not to hike. Here's the output:

Activity suggestion: Take a leisurely walk in the park instead! Enjoy the fresh air and beautiful scenery at your own pace.
