### About
### Build an LLM-powered RAG using Elasticsearch with vectors.

### Import necessary libraries and packages

In [1]:
%pip install OpenAI -qq

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install elasticsearch -qq 

Note: you may need to restart the kernel to use updated packages.


In [3]:
%pip install --upgrade ipywidgets -qq


Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install sentence_transformers -qq

Note: you may need to restart the kernel to use updated packages.


In [22]:
import pandas as pd
import json
import elasticsearch
from elasticsearch import Elasticsearch
from tqdm.notebook import tqdm, tqdm_notebook 

import os

# Disable tokenizers parallelism
os.environ["TOKENIZERS_PARALLELISM"] = "false"

### Get data

In [2]:
data = 'https://raw.githubusercontent.com/hariprasath-v/Nnet101_Assistant/refs/heads/main/data/Stackoverflow_data(neural_networks_stats)_pre_processed_Gemini_LLM.csv'
data = pd.read_csv(data)
data.head()

Unnamed: 0,question,q_link,tags,q_question_id,q_is_answered,q_accepted_answer_id,q_view_count,q_answer_count,q_score,q_last_activity_date,q_creation_date,a_score,a_creation_date,a_answer,answer
0,How to choose the number of hidden layers and ...,https://stats.stackexchange.com/questions/181/...,model-selection|neural-networks,181,True,1097,1145801,10,820,1661947755,1279584902,671,1280715630,"I realize this question has been answered, but...",**Network Configuration in Neural Networks**\n...
1,What should I do when my neural network doesn&...,https://stats.stackexchange.com/questions/3520...,neural-networks|faq,352036,True,352037,365434,9,368,1701358003,1529367960,455,1529367960,1. Verify that your code is bug free\nThere's...,**Key Considerations for Neural Network Develo...
2,"What exactly are keys, queries, and values in ...",https://stats.stackexchange.com/questions/4219...,neural-networks|natural-language|attention|mac...,421935,True,424127,261109,11,309,1708928023,1565686855,281,1567068576,The key/value/query formulation of attention i...,In the key/value/query formulation of attentio...
3,What is batch size in neural network?,https://stats.stackexchange.com/questions/1535...,neural-networks|python|terminology|keras,153531,True,153535,731148,6,305,1650529048,1432286121,421,1432288067,The batch size defines the number of samples t...,**Summary**\n\n**Batch Size**\n\nBatch size de...
4,What are the advantages of ReLU over sigmoid f...,https://stats.stackexchange.com/questions/1262...,machine-learning|neural-networks|sigmoid-curve...,126238,True,126362,290897,9,234,1723495231,1417486429,205,1417567286,Two additional major benefits of ReLUs are spa...,**Summary:**\n\nRectified Linear Units (ReLUs)...


In [3]:
data_dict = data[['question','tags','answer']].to_dict(orient='records')

###

### Elasticsearch setup

In [26]:
es_client = Elasticsearch('http://localhost:9200/', request_timeout=60) 

In [27]:
!curl localhost:9200

{
  "name" : "193089df32da",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "sspWQrg0T3yNpltGPAskWA",
  "version" : {
    "number" : "8.4.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "42f05b9372a9a4a470db3b52817899b99a76ee73",
    "build_date" : "2022-10-04T07:17:24.662462378Z",
    "build_snapshot" : false,
    "lucene_version" : "9.3.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}


### Creating embedding using sentence transformer

In [10]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

  from tqdm.autonotebook import tqdm, trange


### Let's embed a sentence and see the dimension

In [12]:
len(model.encode("This is a simple sentence",show_progress_bar=False))

384

In [13]:
data_dict[0]

{'question': 'How to choose the number of hidden layers and nodes in a feedforward neural network?',
 'tags': 'model-selection|neural-networks',
 'answer': "**Network Configuration in Neural Networks**\n\n**Standardization**\nThere is no single standardized method for configuring networks. However, guidelines exist for setting the number and type of network layers, as well as the number of neurons in each layer.\n\n**Initial Architecture Setup**\nBy following specific rules, one can establish a competent network architecture. This involves determining the number and type of neuronal layers and the number of neurons within each layer. This approach provides a foundational architecture but may not be optimal.\n\n**Iterative Tuning**\nOnce the network is initialized, its configuration can be iteratively tuned during training. Ancillary algorithms, such as pruning, can be used to eliminate unnecessary nodes, optimizing the network's size and performance.\n\n**Network Layer Types and Sizing

### Let's create embedding for answer

In [14]:
len(data_dict)

500

In [15]:
#created the dense vector using the pre-trained model
operations = []
for doc in tqdm(data_dict):
    # Transforming the answer into an embedding using the model
    doc["answer_vector"] = model.encode(doc["answer"],show_progress_bar=False,normalize_embeddings=True).tolist()
    operations.append(doc)

  0%|          | 0/500 [00:00<?, ?it/s]

In [16]:
for k,v in operations[1].items():
    print(f"{k}: {len(v)}")

question: 58
tags: 19
answer: 1699
answer_vector: 384


### Create and add index to elasticsearch

In [28]:
model_dimension=384

index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "question": {"type": "text"},
            "answer": {"type": "text"},
            "tags": {"type": "keyword"},
            "answer_vector": {"type": "dense_vector", "dims": model_dimension, "index": True, "similarity": "cosine"},
        }
    }
}

index_name = "nnet101"

es_client.indices.delete(index=index_name, ignore_unavailable=True)
es_client.indices.create(index=index_name, body=index_settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'nnet101'})

In [30]:
for doc in tqdm_notebook(data_dict):
    es_client.index(index=index_name, document=doc)

  0%|          | 0/500 [00:00<?, ?it/s]

### Create query and embed it

In [31]:
search_term = "What is pooling layer?"
vector_search_term = model.encode(search_term,show_progress_bar=False)

In [32]:
vector_search_term.shape

(384,)

### Search

In [33]:
query = {
    "field": "answer_vector",
    "query_vector": vector_search_term,
    "k": 5,
    "num_candidates": 10000, 
}

In [34]:
res = es_client.search(index=index_name, knn=query, source=["answer", "tags", "question"])
res["hits"]["hits"]

[{'_index': 'nnet101',
  '_id': 'HzieZZIBNZ0du4-0j3JE',
  '_score': 0.76976824,
  '_source': {'question': 'How to calculate output shape in 3D convolution',
   'answer': '**Convolution Layer Summary:**\n\nThe convolution formula determines the output size of a convolution layer. It considers the input size, receptive field (kernel) size, stride, and zero padding. In the example, with $W=40$, $F=3$, $S=1$, and $P=0$, the output size is $(38, 62, 62, 8)$.\n\n**Pooling Layer Summary:**\n\nPooling layers reduce spatial dimensions. By default, they halve each dimension with a receptive field of $(2, 2, 2)$ and a stride of $(2, 2, 2)$. However, if the stride is set to $(1, 1, 1)$, each dimension is reduced by 1 instead. For instance, the tensor $(38, 62, 62, 8)$ would become $(19, 31, 31, 8)$ with a stride of $(2, 2, 2)$ and $(37, 61, 61, 8)$ with a stride of $(1, 1, 1)$.',
   'tags': 'machine-learning|neural-networks|convolutional-neural-network'}},
 {'_index': 'nnet101',
  '_id': '8jieZZIB

In [35]:
data_dict[0]['tags']

'model-selection|neural-networks'

### Elasticsearch knn similarity

In [36]:
def knn_search(search_query):
    vector_search_term = model.encode(search_query,show_progress_bar=False)
    
    knn_query = {
        "field": "answer_vector",
        "query_vector": vector_search_term,
        "k": 5,
        "num_candidates": 10000
    }
    
 
    

    response = es_client.search(
        index=index_name,

        knn=knn_query,
        size=5
    )
    result1 = {k: v for k,v in response["hits"]["hits"][0].items() if k != "_source"}
    result2 = ({k: v for k, v in response["hits"]["hits"][0]['_source'].items() if k != 'answer_vector'})
    final_result = {**result1, **result2}
    return final_result

### Sample elasticsearch

In [37]:
knn_search(search_query='what is pooling layer?')

{'_index': 'nnet101',
 '_id': 'HzieZZIBNZ0du4-0j3JE',
 '_score': 0.76976824,
 'question': 'How to calculate output shape in 3D convolution',
 'tags': 'machine-learning|neural-networks|convolutional-neural-network',
 'answer': '**Convolution Layer Summary:**\n\nThe convolution formula determines the output size of a convolution layer. It considers the input size, receptive field (kernel) size, stride, and zero padding. In the example, with $W=40$, $F=3$, $S=1$, and $P=0$, the output size is $(38, 62, 62, 8)$.\n\n**Pooling Layer Summary:**\n\nPooling layers reduce spatial dimensions. By default, they halve each dimension with a receptive field of $(2, 2, 2)$ and a stride of $(2, 2, 2)$. However, if the stride is set to $(1, 1, 1)$, each dimension is reduced by 1 instead. For instance, the tensor $(38, 62, 62, 8)$ would become $(19, 31, 31, 8)$ with a stride of $(2, 2, 2)$ and $(37, 61, 61, 8)$ with a stride of $(1, 1, 1)$.'}

### Ollamaâ€™s OpenAI compatible API endpoint

In [38]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',
)


### Functions to a create prompt using retrieved results and user query

In [39]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"tags: {doc['tags']}\nquestion: {doc['question']}\nanswer: {doc['answer']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model="gemma:2b",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

### RAG
#### The function retrieves elasticsearch knn similarity results based on the user's query, creates a prompt using those results, and feeds it into the LLM to generate the final response.


In [40]:
def rag(query):
    search_results = knn_search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [41]:
%%time
print(llm('what is pooling layer?'))

Sure. Here's a summary  of the term POOKING LAYER:

**Pooling layer** is a layer in a neural network that reduces the dimensionality of the input data by taking a subset of the input features and using them to represent the entire input. 

 **Key characteristics of a pooling layer:**

* Uses a specific method to extract features from the input.
* Is typically followed by a fully connected layer.
* Reduces the computational complexity of the model and speeds up training.

**Some common pooling layers include:**

* **Max pooling:** Takes the maximum value from each input feature and uses it as the output.
* **Average pooling:** Takes the average of each input feature and uses it as the output.
* **Average pooling with weight factor:** The average value of each feature is weighted by a coefficient before being used as the output. 

**Benefits of using a pooling layer:**

* Reduce overfitting by sharing information from multiple features across all neurons in the layer.
* Improve the perfo