<a href="https://colab.research.google.com/github/mrudulagavas/All-About-AI-ML/blob/main/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Setup

In [1]:
!pip install -q sentence-transformers
!pip install -q wikipedia-api
!pip install -q numpy
!pip install -q scipy

Load the Embedding Model

In [2]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Alibaba-NLP/gte-base-en-v1.5", trust_remote_code=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetch Text Content from Wikipedia:

In [3]:
from wikipediaapi import Wikipedia
wiki = Wikipedia('RAGBot/0.0', 'en')
doc = wiki.page('Hayao_Miyazaki').text
paragraphs = doc.split('\n\n') # chunking

In [4]:
import textwrap

In [5]:
for i, p in enumerate(paragraphs):
  wrapped_text = textwrap.fill(p, width=100)

  print("-----------------------------------------------------------------")
  print(wrapped_text)
  print("-----------------------------------------------------------------")

-----------------------------------------------------------------
Hayao Miyazaki (宮崎 駿 or 宮﨑 駿, Miyazaki Hayao; [mijaꜜzaki hajao]; born January 5, 1941) is a Japanese
animator, filmmaker, and manga artist. He co-founded Studio Ghibli and serves as honorary chairman.
Throughout his career, Miyazaki has attained international acclaim as a masterful storyteller and
creator of Japanese animated feature films, and is widely regarded as one of the most accomplished
filmmakers in the history of animation. Born in Tokyo City, Miyazaki expressed interest in manga and
animation from an early age. He joined Toei Animation in 1963, working as an inbetween artist and
key animator on films like Gulliver's Travels Beyond the Moon (1965), Puss in Boots (1969), and
Animal Treasure Island (1971), before moving to A-Pro in 1971, where he co-directed Lupin the Third
Part I (1971–1972) alongside Isao Takahata. After moving to Zuiyō Eizō (later Nippon Animation) in
1973, Miyazaki worked as an animator on Wo

In [6]:
docs_embed = model.encode(paragraphs, normalize_embeddings=True)

In [7]:
docs_embed.shape

(20, 768)

In [8]:
docs_embed[0]

array([ 9.43500269e-03, -4.05938923e-03,  4.70433533e-02, -1.91588923e-02,
       -1.05207856e-03, -2.19987910e-02,  4.46006432e-02,  4.33994383e-02,
        5.02612330e-02, -7.20755849e-03, -4.79073171e-03,  8.74279067e-03,
        3.30359787e-02,  1.80243589e-02, -2.41563004e-02,  1.47018805e-02,
        1.42382411e-02, -3.51168886e-02, -2.57179420e-02, -2.99220961e-02,
        6.12406526e-03,  8.12836736e-03, -2.69539114e-02,  3.50551456e-02,
       -7.06918025e-03, -4.24130037e-02, -3.33318263e-02, -1.78539176e-02,
       -2.74858195e-02, -3.14296247e-03,  2.51866225e-02, -1.22505119e-02,
        9.56701860e-03,  1.21276639e-02, -2.77877133e-02, -5.69833741e-02,
       -4.16358821e-02,  5.35411164e-02, -2.30257604e-02, -3.22550759e-02,
       -6.75189719e-02,  1.19094606e-02, -2.85630897e-02, -1.64049193e-02,
       -5.26259169e-02,  1.68623161e-02,  1.85449831e-02, -3.72159220e-02,
        3.73746920e-03, -1.39373504e-02, -4.20705527e-02,  3.79179232e-03,
       -1.04365102e-03, -

Embed the Query

In [9]:
query = "What was Studio Ghibli's first film?"
query_embed = model.encode(query, normalize_embeddings=True)

In [10]:
query_embed.shape

(768,)

Find the Closest Paragraphs to the Query

In [11]:
import numpy as np
similarities = np.dot(docs_embed, query_embed.T)

In [12]:
similarities.shape

(20,)

In [13]:
similarities

array([0.55277526, 0.4559155 , 0.51093805, 0.5068357 , 0.48231435,
       0.63854665, 0.5517907 , 0.629382  , 0.5332569 , 0.5397611 ,
       0.49232978, 0.46270692, 0.41976058, 0.45126027, 0.40898627,
       0.45950577, 0.43716413, 0.43962175, 0.19419606, 0.46501303],
      dtype=float32)

In [14]:
top_3_idx = np.argsort(similarities, axis=0)[-3:][::-1].tolist()

In [15]:
top_3_idx

[5, 7, 0]

In [16]:
most_similar_documents = [paragraphs[idx] for idx in top_3_idx]

In [17]:
CONTEXT = ""
for i, p in enumerate(most_similar_documents):
  wrapped_text = textwrap.fill(p, width=100)

  print("-----------------------------------------------------------------")
  print(wrapped_text)
  print("-----------------------------------------------------------------")
  CONTEXT += wrapped_text + "\n\n"

-----------------------------------------------------------------
Studio Ghibli Early films (1985–1995) Following the success of Nausicaä of the Valley of the Wind,
Miyazaki and Takahata founded the animation production company Studio Ghibli on June 15, 1985, as a
subsidiary of Tokuma Shoten, with offices in Kichijōji designed by Miyazaki. The studio's name had
been registered a year earlier; Miyazaki named it after the nickname of the Caproni Ca.309 aircraft,
meaning "a hot wind that blows in the desert" in Italian. Suzuki worked for Studio Ghibli as
producer, joining full-time in 1989, while Topcraft's Tōru Hara became production manager; Suzuki's
role in the creation of the studio and its films has led him to being occasionally named a co-
founder, and Hara is often viewed as influential to the company's success. Yasuyoshi Tokuma, the
founder of Tokuma Shoten, was also closely related to the company's creation, having provided
financial backing. Topcraft had been considered as a par

In [18]:
query = "What was Studio Ghibli's first film?"

In [19]:
prompt = f"""
use the following CONTEXT to answer the QUESTION at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

CONTEXT: {CONTEXT}
QUESTION: {query}

"""

In [20]:
!pip install -q openai

In [21]:
!pip install openai==0.28




In [27]:
!pip install transformers




In [28]:
from transformers import pipeline
chat = pipeline("text-generation", model="gpt2")
print(chat("Write a haiku about the ocean")[0]["generated_text"])


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a haiku about the ocean and the atmosphere.

The Hiku of the Sea

The Hawaiian Hiku

The Hawaiian Hiku by J.S. Ho.

The Hawaiian Hiku is the most famous of the modern Hawaiian Hiku. It is a poem written by Captain John D. Halsey of the Pacific, in 1855. In the poem, Halsey tells his Hawaiian friends his story of how he and his people got on a ship to the ocean and had a bad dream. This story was later re-written in the Hawaiian Hiku by the Hawaiian writer, John D. Halsey.

The Hawaiian Hiku is a poem written by Captain John D. Halsey of the Pacific, in 1855. In the poem, Halsey tells his Hawaiian friends his story of how he and his people got on a ship to the ocean and had a bad dream. This story was later re-written in the Hawaiian Hiku by the Hawaiian writer, John D. Halsey. In the Hawaiian Hiku, the people are the ocean. The sea is the place where things will happen.

The Hiku is a poem written by Captain D. Halsey of the Pacific, in 1855. In the poem


In [29]:
pip install nbformat


