### RAG achitecture implementation with large language model (LLM)
### We will see:

- How to access the HuggingFace platform to access a LLM model?
- LLM without RAG architecture:
  - the reponse generated by the GPT2 LLM when ask a question
- LLM with RAG architecture:
  - Loading custom documents
  - Indexing
  - Retrieval
  - Reponse generation


In [4]:
#importing all relevant libraries
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from transformers import pipeline
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

In [5]:
#accessing the HuggingFace platform using the account token
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) y
Token is valid (permission: read).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store

In [6]:
#accessing openAI's gpt2 model for text generation from the HuggingFace
generator_gpt2 = pipeline('text-generation', model='openai-community/gpt2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [7]:
#loading a SentenceTransformer model (all-MiniLM-L6-v2) for embeddings from the HuggingFace
#generating semantically meaningful embeddings for sentences

embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### LLM Without RAG:

- We will see the reponse generated by the GPT2 LLM when ask a question

In [8]:
#our question
question = "What is the Kolmogorov–Arnold theorem?"
#The GPT2 generating a response
response = generator_gpt2(question, max_length=150, num_return_sequences=1)
#printing the generated response
print(response[0]['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


What is the Kolmogorov–Arnold theorem?

The Kolmogorov–Arnold theorem has two components: the constant-dimensional parameter $\gamma$, and the discrete-dimensional parameter $\log$ from the top down. The $\gamma$ is the Kolmogorov constant-dimensional parameter $\beta$ from the bottom down.

The $log$ is the Kolmogorov discrete-dimensional parameter $\cos$ from the bottom down. If $C<$2$ for $C<$2$, then each "cos" is a $n$ $log$ from the bottom up of the Kolmogorov constant-dimensional parameter $\Gamma$.



### LLM With RAG architecture:

###Loading custom documents

In [9]:
#accessing a webpage as document
url = 'https://kindxiaoming.github.io/pykan/intro.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

In [10]:
soup

<!DOCTYPE html>

<html class="writer-html5" data-content_root="./" lang="en">
<head>
<meta charset="utf-8"/><meta content="width=device-width, initial-scale=1" name="viewport"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport"/>
<title>Hello, KAN! — Kolmogorov Arnold Network  documentation</title>
<link href="_static/pygments.css?v=80d5e7a1" rel="stylesheet" type="text/css"/>
<link href="_static/css/theme.css?v=19f00094" rel="stylesheet" type="text/css"/>
<!--[if lt IE 9]>
    <script src="_static/js/html5shiv.min.js"></script>
  <![endif]-->
<script src="_static/jquery.js?v=5d32c60e"></script>
<script src="_static/_sphinx_javascript_frameworks_compat.js?v=2cd50e6c"></script>
<script src="_static/documentation_options.js?v=5929fcd5"></script>
<script src="_static/doctools.js?v=9a2dae69"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script sr

###RAG: Indexing

In [11]:
#extracting text blocks from the webpage.
#the line of code extracts and cleans text from all paragraph and heading tags (p, h1, h2, h3)
text_chunks = [tag.get_text().strip() for tag in soup.find_all(['p', 'h1', 'h2', 'h3']) if tag.get_text().strip()]
print(text_chunks)

['Contents:', 'Hello, KAN!\uf0c1', 'Kolmogorov-Arnold representation theorem\uf0c1', 'Kolmogorov-Arnold representation theorem states that if \\(f\\) is a\nmultivariate continuous function on a bounded domain, then it can be\nwritten as a finite composition of continuous functions of a single\nvariable and the binary operation of addition. More specifically, for a\nsmooth \\(f : [0,1]^n \\to \\mathbb{R}\\),', 'where \\(\\phi_{q,p}:[0,1]\\to\\mathbb{R}\\) and\n\\(\\Phi_q:\\mathbb{R}\\to\\mathbb{R}\\). In a sense, they showed that the\nonly true multivariate function is addition, since every other function\ncan be written using univariate functions and sum. However, this 2-Layer\nwidth-\\((2n+1)\\) Kolmogorov-Arnold representation may not be smooth\ndue to its limited expressive power. We augment its expressive power by\ngeneralizing it to arbitrary depths and widths.', 'Kolmogorov-Arnold Network (KAN)\uf0c1', 'The Kolmogorov-Arnold representation can be written in matrix form', 'where',

In [12]:
#converting to a dataframe
df_text_chunks = pd.DataFrame(text_chunks)
df_text_chunks

Unnamed: 0,0
0,Contents:
1,"Hello, KAN!"
2,Kolmogorov-Arnold representation theorem
3,Kolmogorov-Arnold representation theorem state...
4,"where \(\phi_{q,p}:[0,1]\to\mathbb{R}\) and\n\..."
5,Kolmogorov-Arnold Network (KAN)
6,The Kolmogorov-Arnold representation can be wr...
7,where
8,We notice that both \({\bf \Phi}_{\rm in}\) an...
9,\({\bf \Phi}_{\rm in}\) corresponds to\n\(n_{\...


In [13]:
#embedding the documents
document_embeddings = embedding_model.encode(text_chunks)
print(document_embeddings)


[[-0.00275739  0.06304079 -0.05931183 ...  0.06667764  0.06027625
   0.01256512]
 [-0.04109296  0.03857326  0.0541905  ...  0.01046866 -0.01462469
   0.0334672 ]
 [-0.03940616  0.0350132   0.01208134 ...  0.07816786  0.04163244
   0.02606278]
 ...
 [ 0.01765317 -0.02745955  0.04844334 ... -0.02118649 -0.07220367
  -0.00615193]
 [ 0.03043775  0.06828739  0.02344436 ...  0.10779697  0.03543585
  -0.02865962]
 [-0.02735136  0.03393628  0.01200789 ... -0.06201553  0.01029439
   0.03309628]]


In [14]:
#Converting to a dataframe
df_document_embeddings = pd.DataFrame(document_embeddings)
df_document_embeddings

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,374,375,376,377,378,379,380,381,382,383
0,-0.002757,0.063041,-0.059312,0.019385,0.027277,0.007352,0.137099,0.016234,-0.058252,0.023517,...,0.001203,0.034491,0.071825,0.019836,-0.010318,-0.028939,0.110656,0.066678,0.060276,0.012565
1,-0.041093,0.038573,0.05419,0.063639,-0.050733,-0.036939,0.078938,0.01859,-0.01866,-0.020631,...,0.061411,0.093497,0.048654,0.009329,-0.111613,0.050245,0.083622,0.010469,-0.014625,0.033467
2,-0.039406,0.035013,0.012081,-0.033871,-0.059951,0.170208,0.133329,-0.032945,0.059443,-0.034391,...,0.017635,0.0307,-0.032536,-0.072904,-0.002501,0.030838,-0.038679,0.078168,0.041632,0.026063
3,-0.023126,-0.031781,-0.033747,-0.026306,-0.120683,0.099896,0.094608,-0.091566,0.037781,0.007668,...,-0.007131,0.046137,-0.053904,-0.107084,0.029521,0.063568,-0.076472,0.014851,0.046806,-0.014047
4,-0.055007,-0.025899,0.041755,-0.024739,-0.069876,0.084725,0.055429,-0.087491,0.047632,-0.01354,...,0.019459,0.02611,-0.065981,-0.078322,-0.015437,0.074297,-0.069304,0.0311,0.022661,-0.017973
5,-0.044464,-0.063702,0.021071,-0.009007,-0.055712,0.064134,0.051541,-0.05016,0.059065,-0.004199,...,0.019671,0.001577,-0.065103,-0.082703,-0.0718,0.084635,-0.043236,0.053172,-0.027561,-0.014514
6,-0.066461,0.026791,-0.084382,-0.017757,-0.096869,0.11252,0.027629,-0.058471,0.074945,0.008914,...,0.017612,0.032998,-0.072126,-0.092195,0.044279,0.063615,0.004497,0.091512,0.054702,0.020347
7,0.007764,0.033559,0.00502,0.049062,0.058596,0.034693,0.062655,-0.011064,-0.032648,-0.000773,...,0.020834,0.01939,-0.017426,-0.077387,-0.08231,0.07878,0.077304,0.053457,-0.024161,0.036373
8,-0.05777,0.042692,-0.014942,-0.013696,-0.026977,0.069489,0.097451,-0.105804,0.107878,-0.021651,...,0.012162,-0.053829,-0.038191,-0.100319,-0.039807,0.066937,-0.033092,0.090231,0.062198,0.005329
9,-0.021719,0.094247,0.131454,0.008899,0.028798,0.029945,0.083097,0.014344,0.053999,-0.064442,...,-0.065408,-0.110118,0.038175,0.110043,-0.092153,0.002138,-0.029129,0.076846,0.036313,0.002641


In [15]:
#joining two dataframes (chunks and associated embeddings)
df_chunks_embeddings = pd.concat([df_text_chunks, df_document_embeddings], axis=1)
df_chunks_embeddings

Unnamed: 0,0,0.1,1,2,3,4,5,6,7,8,...,374,375,376,377,378,379,380,381,382,383
0,Contents:,-0.002757,0.063041,-0.059312,0.019385,0.027277,0.007352,0.137099,0.016234,-0.058252,...,0.001203,0.034491,0.071825,0.019836,-0.010318,-0.028939,0.110656,0.066678,0.060276,0.012565
1,"Hello, KAN!",-0.041093,0.038573,0.05419,0.063639,-0.050733,-0.036939,0.078938,0.01859,-0.01866,...,0.061411,0.093497,0.048654,0.009329,-0.111613,0.050245,0.083622,0.010469,-0.014625,0.033467
2,Kolmogorov-Arnold representation theorem,-0.039406,0.035013,0.012081,-0.033871,-0.059951,0.170208,0.133329,-0.032945,0.059443,...,0.017635,0.0307,-0.032536,-0.072904,-0.002501,0.030838,-0.038679,0.078168,0.041632,0.026063
3,Kolmogorov-Arnold representation theorem state...,-0.023126,-0.031781,-0.033747,-0.026306,-0.120683,0.099896,0.094608,-0.091566,0.037781,...,-0.007131,0.046137,-0.053904,-0.107084,0.029521,0.063568,-0.076472,0.014851,0.046806,-0.014047
4,"where \(\phi_{q,p}:[0,1]\to\mathbb{R}\) and\n\...",-0.055007,-0.025899,0.041755,-0.024739,-0.069876,0.084725,0.055429,-0.087491,0.047632,...,0.019459,0.02611,-0.065981,-0.078322,-0.015437,0.074297,-0.069304,0.0311,0.022661,-0.017973
5,Kolmogorov-Arnold Network (KAN),-0.044464,-0.063702,0.021071,-0.009007,-0.055712,0.064134,0.051541,-0.05016,0.059065,...,0.019671,0.001577,-0.065103,-0.082703,-0.0718,0.084635,-0.043236,0.053172,-0.027561,-0.014514
6,The Kolmogorov-Arnold representation can be wr...,-0.066461,0.026791,-0.084382,-0.017757,-0.096869,0.11252,0.027629,-0.058471,0.074945,...,0.017612,0.032998,-0.072126,-0.092195,0.044279,0.063615,0.004497,0.091512,0.054702,0.020347
7,where,0.007764,0.033559,0.00502,0.049062,0.058596,0.034693,0.062655,-0.011064,-0.032648,...,0.020834,0.01939,-0.017426,-0.077387,-0.08231,0.07878,0.077304,0.053457,-0.024161,0.036373
8,We notice that both \({\bf \Phi}_{\rm in}\) an...,-0.05777,0.042692,-0.014942,-0.013696,-0.026977,0.069489,0.097451,-0.105804,0.107878,...,0.012162,-0.053829,-0.038191,-0.100319,-0.039807,0.066937,-0.033092,0.090231,0.062198,0.005329
9,\({\bf \Phi}_{\rm in}\) corresponds to\n\(n_{\...,-0.021719,0.094247,0.131454,0.008899,0.028798,0.029945,0.083097,0.014344,0.053999,...,-0.065408,-0.110118,0.038175,0.110043,-0.092153,0.002138,-0.029129,0.076846,0.036313,0.002641


In [16]:
#reloading our question
question = "What is the Kolmogorov–Arnold theorem?"

In [17]:
#embedding the question
question_embedding = embedding_model.encode([question])[0]
print(question_embedding)

[-4.64987382e-02  2.78894287e-02  1.56010995e-02  8.33944068e-04
 -4.66041751e-02  1.31328046e-01  1.01821005e-01 -2.81248875e-02
  6.48026764e-02  3.30686532e-02 -9.57197417e-03  1.95531789e-02
  3.47561426e-02 -8.57438147e-02 -4.16314714e-02 -6.48677200e-02
 -7.23136365e-02 -3.36058512e-02 -3.01638003e-02  3.32537107e-02
 -4.85202344e-03  3.69902551e-02  4.23169509e-02  3.80324088e-02
 -3.07877380e-02 -7.71738738e-02 -1.40839461e-02  7.46843740e-02
  3.06142773e-02  3.68980765e-02  5.07076569e-02 -1.89930499e-02
  4.97409068e-02 -4.33039702e-02  2.42207535e-02  6.82117492e-02
  6.45791814e-02  1.01521051e-04  2.87257619e-02  8.16831887e-02
  5.42482175e-02  1.02469847e-01 -6.09245524e-02  6.02870621e-02
 -2.56759897e-02  2.54671481e-02 -4.03635092e-02 -8.08733404e-02
 -4.66010422e-02 -2.13084351e-02 -7.68177211e-02  7.43763968e-02
  5.40623777e-02 -5.80513142e-02  2.91304500e-03 -3.40030044e-02
 -6.00115471e-02  4.91062962e-02 -2.35180557e-03 -4.64101620e-02
  8.56888741e-02 -9.71273

###RAG: Retrieval

In [18]:
#retrieve the most relevant documents based on cosine similarity
#Reference paper for the cosine similarity: Lahitani, A. R., Permanasari, A. E., & Setiawan, N. A. (2016, April). Cosine similarity to determine similarity measure: Study case in online essay assessment. In 2016 4th International conference on cyber and IT service management (pp. 1-6). IEEE.)
similarities = cosine_similarity([question_embedding], document_embeddings)[0]
similarities

array([ 0.09782393, -0.00110108,  0.84872293,  0.7125924 ,  0.58014786,
        0.61846733,  0.68963206,  0.07410505,  0.5948779 ,  0.09056571,
        0.58991677,  0.05367207,  0.00092283, -0.04342143, -0.00750676,
        0.05777499,  0.02113   ,  0.05630594,  0.03632673, -0.01431532,
       -0.00376751,  0.04196355,  0.01266492,  0.12433145, -0.02925698,
        0.01019145], dtype=float32)

In [19]:
#getting top four documents
top_indices = np.argsort(similarities)[::-1][:4]
top_indices

array([2, 3, 6, 5])

In [20]:
#checking the selected documents/chunks/paraghraphs using the top four array index numbers.
df_chunks_embeddings

Unnamed: 0,0,0.1,1,2,3,4,5,6,7,8,...,374,375,376,377,378,379,380,381,382,383
0,Contents:,-0.002757,0.063041,-0.059312,0.019385,0.027277,0.007352,0.137099,0.016234,-0.058252,...,0.001203,0.034491,0.071825,0.019836,-0.010318,-0.028939,0.110656,0.066678,0.060276,0.012565
1,"Hello, KAN!",-0.041093,0.038573,0.05419,0.063639,-0.050733,-0.036939,0.078938,0.01859,-0.01866,...,0.061411,0.093497,0.048654,0.009329,-0.111613,0.050245,0.083622,0.010469,-0.014625,0.033467
2,Kolmogorov-Arnold representation theorem,-0.039406,0.035013,0.012081,-0.033871,-0.059951,0.170208,0.133329,-0.032945,0.059443,...,0.017635,0.0307,-0.032536,-0.072904,-0.002501,0.030838,-0.038679,0.078168,0.041632,0.026063
3,Kolmogorov-Arnold representation theorem state...,-0.023126,-0.031781,-0.033747,-0.026306,-0.120683,0.099896,0.094608,-0.091566,0.037781,...,-0.007131,0.046137,-0.053904,-0.107084,0.029521,0.063568,-0.076472,0.014851,0.046806,-0.014047
4,"where \(\phi_{q,p}:[0,1]\to\mathbb{R}\) and\n\...",-0.055007,-0.025899,0.041755,-0.024739,-0.069876,0.084725,0.055429,-0.087491,0.047632,...,0.019459,0.02611,-0.065981,-0.078322,-0.015437,0.074297,-0.069304,0.0311,0.022661,-0.017973
5,Kolmogorov-Arnold Network (KAN),-0.044464,-0.063702,0.021071,-0.009007,-0.055712,0.064134,0.051541,-0.05016,0.059065,...,0.019671,0.001577,-0.065103,-0.082703,-0.0718,0.084635,-0.043236,0.053172,-0.027561,-0.014514
6,The Kolmogorov-Arnold representation can be wr...,-0.066461,0.026791,-0.084382,-0.017757,-0.096869,0.11252,0.027629,-0.058471,0.074945,...,0.017612,0.032998,-0.072126,-0.092195,0.044279,0.063615,0.004497,0.091512,0.054702,0.020347
7,where,0.007764,0.033559,0.00502,0.049062,0.058596,0.034693,0.062655,-0.011064,-0.032648,...,0.020834,0.01939,-0.017426,-0.077387,-0.08231,0.07878,0.077304,0.053457,-0.024161,0.036373
8,We notice that both \({\bf \Phi}_{\rm in}\) an...,-0.05777,0.042692,-0.014942,-0.013696,-0.026977,0.069489,0.097451,-0.105804,0.107878,...,0.012162,-0.053829,-0.038191,-0.100319,-0.039807,0.066937,-0.033092,0.090231,0.062198,0.005329
9,\({\bf \Phi}_{\rm in}\) corresponds to\n\(n_{\...,-0.021719,0.094247,0.131454,0.008899,0.028798,0.029945,0.083097,0.014344,0.053999,...,-0.065408,-0.110118,0.038175,0.110043,-0.092153,0.002138,-0.029129,0.076846,0.036313,0.002641


In [21]:
#top-k documents based on similarities between question and document embedding distances
top_k_doc = [(text_chunks[i], similarities[i]) for i in top_indices]
top_k_doc

[('Kolmogorov-Arnold representation theorem\uf0c1', 0.84872293),
 ('Kolmogorov-Arnold representation theorem states that if \\(f\\) is a\nmultivariate continuous function on a bounded domain, then it can be\nwritten as a finite composition of continuous functions of a single\nvariable and the binary operation of addition. More specifically, for a\nsmooth \\(f : [0,1]^n \\to \\mathbb{R}\\),',
  0.7125924),
 ('The Kolmogorov-Arnold representation can be written in matrix form',
  0.68963206),
 ('Kolmogorov-Arnold Network (KAN)\uf0c1', 0.61846733)]

###RAG: Response Generation

In [22]:
#combining the retrieved documents to form a context
context = " ".join([doc for doc, _ in top_k_doc])
print(context)


Kolmogorov-Arnold representation theorem Kolmogorov-Arnold representation theorem states that if \(f\) is a
multivariate continuous function on a bounded domain, then it can be
written as a finite composition of continuous functions of a single
variable and the binary operation of addition. More specifically, for a
smooth \(f : [0,1]^n \to \mathbb{R}\), The Kolmogorov-Arnold representation can be written in matrix form Kolmogorov-Arnold Network (KAN)


In [23]:
#passing system prompt
system_prompt = "Please provide a concise and accurate response."
system_prompt

'Please provide a concise and accurate response.'

In [24]:
#combining system prompt with question and context
prompt = f"system: {system_prompt} Question: {question} Context: {context}"


### LLM response after the RAG architecture integration

In [25]:
#generating the response using GPT-2 model
response = generator_gpt2(prompt, max_new_tokens=50, num_return_sequences=1)[0]['generated_text']


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [26]:
response

'system: Please provide a concise and accurate response. Question: What is the Kolmogorov–Arnold theorem? Context: Kolmogorov-Arnold representation theorem\uf0c1 Kolmogorov-Arnold representation theorem states that if \\(f\\) is a\nmultivariate continuous function on a bounded domain, then it can be\nwritten as a finite composition of continuous functions of a single\nvariable and the binary operation of addition. More specifically, for a\nsmooth \\(f : [0,1]^n \\to \\mathbb{R}\\), The Kolmogorov-Arnold representation can be written in matrix form Kolmogorov-Arnold Network (KAN)\uf0c1, representing a finite composition of k+\nvariable \\(\\pi \\), the input variable \\(\\pi \\) is the complex sum-of-sum of\nsigmoid functions and \\(\\int \\pi = 0\\), with the same'