#### Setup



In [1]:
!pip install -q sentence-transformers
!pip install -q wikipedia-api
!pip install -q numpy
!pip install -q scipy
!pip install -q dotenv

### Load the Embedding Model:

In [2]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Alibaba-NLP/gte-base-en-v1.5", trust_remote_code=True)

  from tqdm.autonotebook import tqdm, trange
A new version of the following files was downloaded from https://huggingface.co/Alibaba-NLP/new-impl:
- configuration.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/Alibaba-NLP/new-impl:
- modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


### Fetch Text Content from Wikipedia:



In [3]:
from wikipediaapi import Wikipedia
wiki = Wikipedia('RAGBot/0.0', 'en')
doc = wiki.page('Hayao_Miyazaki').text
paragraphs = doc.split('\n\n') # chunking


In [4]:
import textwrap


In [5]:
for i, p in enumerate(paragraphs):
  wrapped_text = textwrap.fill(p, width=100)

  print("-----------------------------------------------------------------")
  print(wrapped_text)
  print("-----------------------------------------------------------------")


-----------------------------------------------------------------
Hayao Miyazaki (宮崎 駿 or 宮﨑 駿, Miyazaki Hayao, Japanese: [mijaꜜzaki hajao]; born January 5, 1941) is
a Japanese animator, filmmaker, and manga artist. A founder of Studio Ghibli, he has attained
international acclaim as a masterful storyteller and creator of Japanese animated feature films, and
is widely regarded as one of the most accomplished filmmakers in the history of animation. Born in
Tokyo City in the Empire of Japan, Miyazaki expressed interest in manga and animation from an early
age, and he joined Toei Animation in 1963. During his early years at Toei Animation he worked as an
in-between artist and later collaborated with director Isao Takahata. Notable films to which
Miyazaki contributed at Toei include Doggie March and Gulliver's Travels Beyond the Moon. He
provided key animation to other films at Toei, such as Puss in Boots and Animal Treasure Island,
before moving to A-Pro in 1971, where he co-directed Lupi

### Embed the Document:

In [6]:
docs_embed = model.encode(paragraphs, normalize_embeddings=True)

In [7]:
docs_embed.shape

(19, 768)

In [8]:
docs_embed[0]

array([ 1.20693753e-02, -9.31198243e-03,  3.86870727e-02, -1.79369226e-02,
        1.90601754e-03, -2.30415985e-02,  4.13035154e-02,  4.27563749e-02,
        4.51744162e-02, -1.10854506e-02, -1.45723822e-03,  7.79729662e-03,
        2.79529486e-02,  2.05775481e-02, -1.38202664e-02,  1.35757383e-02,
        1.48335882e-02, -3.48629802e-02, -3.57357971e-02, -3.17235515e-02,
        1.70311006e-03,  7.33009167e-03, -2.75423173e-02,  4.06559482e-02,
       -7.05511263e-03, -3.68454158e-02, -4.07886282e-02, -1.76706687e-02,
       -2.05303561e-02, -7.42685702e-03,  2.92662214e-02, -1.18500376e-02,
        5.10034058e-03,  1.64352916e-02, -3.09135430e-02, -5.56215979e-02,
       -4.58039753e-02,  5.18231951e-02, -2.54022125e-02, -3.31458561e-02,
       -7.28017166e-02,  1.32486904e-02, -3.62598747e-02, -1.61632542e-02,
       -5.15350364e-02,  1.28652947e-02,  2.11367328e-02, -4.02927734e-02,
        3.80877010e-03, -2.21524220e-02, -3.45152915e-02,  1.30858039e-03,
       -9.98819130e-04, -

### Embed the Query:

In [9]:
query = "What was Studio Ghibli's first film?"
query_embed = model.encode(query, normalize_embeddings=True)


In [10]:
query_embed.shape

(768,)

### Find the Closest Paragraphs to the Query:



In [11]:
import numpy as np
similarities = np.dot(docs_embed, query_embed.T)

In [12]:
similarities.shape

(19,)

In [13]:
similarities

array([0.5579108 , 0.46477067, 0.50707126, 0.5064436 , 0.43503454,
       0.56337035, 0.68840396, 0.4985274 , 0.59929216, 0.48889726,
       0.45300704, 0.42018938, 0.45964792, 0.4028834 , 0.49752194,
       0.43716425, 0.42018422, 0.13663514, 0.46501303], dtype=float32)

In [14]:
top_3_idx = np.argsort(similarities, axis=0)[-3:][::-1].tolist()


In [15]:
top_3_idx

[6, 8, 5]

In [16]:
most_similar_documents = [paragraphs[idx] for idx in top_3_idx]

In [17]:
CONTEXT = ""
for i, p in enumerate(most_similar_documents):
  wrapped_text = textwrap.fill(p, width=100)

  print("-----------------------------------------------------------------")
  print(wrapped_text)
  print("-----------------------------------------------------------------")
  CONTEXT += wrapped_text + "\n\n"

-----------------------------------------------------------------
Studio Ghibli Early films (1985–1996) On June 15, 1985, Miyazaki and Takahata founded the animation
production company Studio Ghibli as a subsidiary of Tokuma Shoten. Studio Ghibli's first film was
Laputa: Castle in the Sky (1986), directed by Miyazaki. Some of the architecture in the film was
also inspired by a Welsh mining town; Miyazaki witnessed the mining strike upon his first visit to
Wales in 1984 and admired the miners' dedication to their work and community. Laputa was released on
August 2, 1986, by the Toei Company. It sold around 775,000 tickets; Miyazaki and Suzuki expressed
their disappointment with the film's box office figures. Miyazaki's following film, My Neighbor
Totoro, was released alongside Takahata's Grave of the Fireflies in April 1988 to ensure Studio
Ghibli's financial status. My Neighbor Totoro features the theme of the relationship between the
environment and humanity, showing that harmony is t

In [18]:
query = "What was Studio Ghibli's first film?"

In [19]:
prompt = f"""
use the following CONTEXT to answer the QUESTION at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

CONTEXT: {CONTEXT}
QUESTION: {query}

"""

In [20]:
!pip install -q openai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [32]:
# importing os module for environment variables
import os
# importing necessary functions from dotenv library
from dotenv import load_dotenv, dotenv_values 
# loading variables from .env file
load_dotenv() 

In [33]:

from openai import OpenAI

client = OpenAI()

In [34]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": prompt},
  ]
)

In [35]:
print(response.choices[0].message.content)


Studio Ghibli's first film was **Laputa: Castle in the Sky** (1986), directed by Hayao Miyazaki.
