### About
### Build an LLM-powered RAG using Elasticsearch with vectors .

### Import necessary libraries and packages

In [1]:
%pip install OpenAI -qq

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install elasticsearch -qq 

Note: you may need to restart the kernel to use updated packages.


In [3]:
%pip install --upgrade ipywidgets -qq


Note: you may need to restart the kernel to use updated packages.


In [4]:
%pip install sentence_transformers -qq

Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
import ast
import json
import elasticsearch
from elasticsearch import Elasticsearch
from tqdm.notebook import tqdm, tqdm_notebook 

### Get data

In [2]:
data = 'https://raw.githubusercontent.com/hariprasath-v/Nnet101_Assistant/refs/heads/main/data/Stackoverflow_data(neural_networks_stats)_pre_processed_Gemini_LLM.csv'
data = pd.read_csv(data)
data.head()

Unnamed: 0,q_title,q_link,q_tags,q_question_id,q_is_answered,q_accepted_answer_id,q_view_count,q_answer_count,q_score,q_last_activity_date,q_creation_date,a_score,a_creation_date,a_answer,llm_answer_summary
0,How to choose the number of hidden layers and ...,https://stats.stackexchange.com/questions/181/...,"['model-selection', 'neural-networks']",181,True,1097,1145532,10,820,1661947755,1279584902,671,1280715630,"I realize this question has been answered, but...",**Network Configuration in Neural Networks**\n...
1,What should I do when my neural network doesn&...,https://stats.stackexchange.com/questions/3520...,"['neural-networks', 'faq']",352036,True,352037,365347,9,368,1701358003,1529367960,455,1529367960,1. Verify that your code is bug free\nThere's...,**Summary:**\n\nBuilding neural networks requi...
2,"What exactly are keys, queries, and values in ...",https://stats.stackexchange.com/questions/4219...,"['neural-networks', 'natural-language', 'atten...",421935,True,424127,260772,11,309,1708928023,1565686855,281,1567068576,The key/value/query formulation of attention i...,Attention is a retrieval process that involves...
3,What is batch size in neural network?,https://stats.stackexchange.com/questions/1535...,"['neural-networks', 'python', 'terminology', '...",153531,True,153535,730947,6,305,1650529048,1432286121,421,1432288067,The batch size defines the number of samples t...,**Batch Size: Optimization in Deep Learning**\...
4,What are the advantages of ReLU over sigmoid f...,https://stats.stackexchange.com/questions/1262...,"['machine-learning', 'neural-networks', 'sigmo...",126238,True,126362,290838,9,234,1723495231,1417486429,205,1417567286,Two additional major benefits of ReLUs are spa...,"ReLU (Rectified Linear Unit) functions, define..."


### Data pre-processing
#### combine tags with pipe operator

In [7]:
ast.literal_eval(data['q_tags'][0])

['model-selection', 'neural-networks']

In [3]:
data['q_tags'] = data['q_tags'].apply(lambda x: "|".join(i.strip() for i in ast.literal_eval(x)))

#### Column rename

In [4]:
data_1 = data[['q_title','q_tags','llm_answer_summary']].rename(columns={'q_title':'question','q_tags':'tags','llm_answer_summary':'answer'}).to_dict(orient='records')

###

### Elasticsearch setup

In [10]:
es_client = Elasticsearch('http://localhost:9200/', request_timeout=60) 

In [12]:
!curl localhost:9200

{
  "name" : "b1e0dee937f4",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "VR6OBcGxS8yGbVNXfQ3enQ",
  "version" : {
    "number" : "8.4.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "42f05b9372a9a4a470db3b52817899b99a76ee73",
    "build_date" : "2022-10-04T07:17:24.662462378Z",
    "build_snapshot" : false,
    "lucene_version" : "9.3.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}


### Creating embedding using sentence transformer

In [13]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

  from tqdm.autonotebook import tqdm, trange


### Let's embed a sentence and see the dimension

In [14]:
len(model.encode("This is a simple sentence",show_progress_bar=False))

384

In [15]:
data_1[0]

{'question': 'How to choose the number of hidden layers and nodes in a feedforward neural network?',
 'tags': 'model-selection|neural-networks',
 'answer': '**Network Configuration in Neural Networks**\n\nNeural networks require network configuration, which involves determining the number and types of layers and the number of neurons within each layer.\n\n**Standard Method:**\n\n* Initialize a competent network architecture using a set of rules that determine the number and size of input, hidden, and output layers.\n\n**Optimization:**\n\n* Once initialized, the network configuration can be iteratively tuned during training using pruning techniques.\n* Pruning eliminates unnecessary nodes based on their low weight values.\n\n**Layer Configuration:**\n\n* Input layer: Number of neurons determined by the number of features in the training data.\n* Output layer: Number of neurons determined by the model configuration (classifier vs. regressor).\n* Hidden layers: Typically one hidden layer

### Let's create embedding for answer

In [16]:
len(data_1)

100

In [17]:
#created the dense vector using the pre-trained model
operations = []
for doc in tqdm(data_1):
    # Transforming the title into an embedding using the model
    doc["answer_vector"] = model.encode(doc["answer"],show_progress_bar=False,normalize_embeddings=True).tolist()
    operations.append(doc)

  0%|          | 0/100 [00:00<?, ?it/s]

In [18]:
for k,v in operations[1].items():
    print(f"{k}: {len(v)}")

question: 58
tags: 19
answer: 1790
answer_vector: 384


### Create and add index to elasticsearch

In [None]:
model_dimension=384

index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "question": {"type": "text"},
            "answer": {"type": "text"},
            "tags": {"type": "keyword"},
            "answer_vector": {"type": "dense_vector", "dims": model_dimension, "index": True, "similarity": "cosine"},
        }
    }
}

index_name = "nnet101"

es_client.indices.delete(index=index_name, ignore_unavailable=True)
es_client.indices.create(index=index_name, body=index_settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'nnet101'})

In [20]:
for doc in tqdm_notebook(data_1):
    es_client.index(index=index_name, document=doc)

  0%|          | 0/100 [00:00<?, ?it/s]

### Create query and embed it

In [21]:
search_term = "What is pooling layer?"
vector_search_term = model.encode(search_term,show_progress_bar=False)

In [22]:
vector_search_term.shape

(384,)

### Search

In [23]:
query = {
    "field": "answer_vector",
    "query_vector": vector_search_term,
    "k": 5,
    "num_candidates": 10000, 
}

In [24]:
res = es_client.search(index=index_name, knn=query, source=["answer", "tags", "question"])
res["hits"]["hits"]

[{'_index': 'nnet101',
  '_id': 'vA5fYZIBY9iTjmQApslC',
  '_score': 0.79374504,
  '_source': {'question': 'What is global max pooling layer and what is its advantage over maxpooling layer?',
   'answer': "Global max pooling is a max pooling operation where the pool size equals the input size. It outputs the maximum value for each feature across the input's temporal dimension. Ordinary max pooling, in contrast, takes a specified pool size and outputs maximum values within that window.\n\nIn Keras, the `GlobalMaxPooling1D` layer performs global max pooling on 1D temporal data. It converts a 3D tensor (samples, steps, features) to a 2D tensor (samples, features).\n\nGlobal max pooling is commonly used in domains like natural language processing, while ordinary max pooling is more prevalent in domains like computer vision.",
   'tags': 'neural-networks|conv-neural-network|pooling'}},
 {'_index': 'nnet101',
  '_id': 'AA5fYZIBY9iTjmQAqMqQ',
  '_score': 0.73397934,
  '_source': {'question': '

In [None]:
data_1[0]['tags']

'model-selection|neural-networks'

### Elasticsearch knn similarity

In [40]:
def knn_search(search_query):
    vector_search_term = model.encode(search_query,show_progress_bar=False)
    
    knn_query = {
        "field": "answer_vector",
        "query_vector": vector_search_term,
        "k": 5,
        "num_candidates": 10000
    }
    
 
    

    response = es_client.search(
        index=index_name,

        knn=knn_query,
        size=5
    )
    result1 = {k: v for k,v in response["hits"]["hits"][0].items() if k != "_source"}
    result2 = ({k: v for k, v in response["hits"]["hits"][0]['_source'].items() if k != 'answer_vector'})
    final_result = {**result1, **result2}
    return final_result

### Sample elasticsearch

In [41]:
knn_search(search_query='what is pooling layer?')

{'_index': 'nnet101',
 '_id': 'vA5fYZIBY9iTjmQApslC',
 '_score': 0.79374504,
 'question': 'What is global max pooling layer and what is its advantage over maxpooling layer?',
 'tags': 'neural-networks|conv-neural-network|pooling',
 'answer': "Global max pooling is a max pooling operation where the pool size equals the input size. It outputs the maximum value for each feature across the input's temporal dimension. Ordinary max pooling, in contrast, takes a specified pool size and outputs maximum values within that window.\n\nIn Keras, the `GlobalMaxPooling1D` layer performs global max pooling on 1D temporal data. It converts a 3D tensor (samples, steps, features) to a 2D tensor (samples, features).\n\nGlobal max pooling is commonly used in domains like natural language processing, while ordinary max pooling is more prevalent in domains like computer vision."}

### Ollama’s OpenAI compatible API endpoint

In [42]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',
)


### Functions to a create prompt using retrieved results and user query

In [43]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"tags: {doc['tags']}\nquestion: {doc['question']}\nanswer: {doc['answer']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model="gemma:2b",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

### RAG
#### The function retrieves elasticsearch knn similarity results based on the user's query, creates a prompt using those results, and feeds it into the LLM to generate the final response.


In [46]:
def rag(query):
    search_results = knn_search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [47]:
%%time
print(llm('what is pooling layer?'))

Sure. Here is the definition of a pooling layer:

**Pooling layer** is a type of neural network layer that reduces the dimensionality of a feature map by taking a subset of the input features and using them to represent the whole feature map. The output of the pooling layer is a single value or vector, which represents the feature map as a whole.

**Here are some of the key characteristics of pooling layers:**

* They operate on feature maps, which are 2D tensors of feature maps.
* They take a subset of the input features and use them to represent the whole feature map.
* The size of the subset can be specified by the pool size parameter.
* Pooling layers can be used as part of a convolutional neural network (CNN).

**Types of Pooling Layers:**

* **Average pooling layer:** The average value of the input features is taken to represent the output value.
* **Max pooling layer:** The maximum value of the input features is taken to represent the output value.
* **Max-pooling layer:** This 