In [112]:
import openai
import os, sys
import textwrap

# Environment Setup and Connection to OpenAI
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
  raise ValueError("OpenAI API key not found. Please set OPENAI_API_KEY in your environment variables.")

# Initialize OpenAI client
client = openai.OpenAI(api_key=api_key)

print(client.models.list())

# Select GPT model
gptmodel = "gpt-4"

SyncPage[Model](data=[Model(id='omni-moderation-2024-09-26', created=1732734466, object='model', owned_by='system'), Model(id='gpt-4o-mini-audio-preview-2024-12-17', created=1734115920, object='model', owned_by='system'), Model(id='dall-e-3', created=1698785189, object='model', owned_by='system'), Model(id='dall-e-2', created=1698798177, object='model', owned_by='system'), Model(id='gpt-4o-audio-preview-2024-10-01', created=1727389042, object='model', owned_by='system'), Model(id='gpt-4o-audio-preview', created=1727460443, object='model', owned_by='system'), Model(id='gpt-4o-mini-realtime-preview-2024-12-17', created=1734112601, object='model', owned_by='system'), Model(id='gpt-4o-mini-realtime-preview', created=1734387380, object='model', owned_by='system'), Model(id='o1-mini-2024-09-12', created=1725648979, object='model', owned_by='system'), Model(id='o1-preview-2024-09-12', created=1725648865, object='model', owned_by='system'), Model(id='o1-mini', created=1725649008, object='model

In [113]:
# Test connection to OpenAI API
response = client.chat.completions.create(
    model=gptmodel,
    messages=[{"role": "user", "content": "Test connection successful?"}],
)
print(response.choices[0].message.content.strip())

Yes, the connection is successful. How can I assist you today?


In [114]:
def call_llm_with_full_text(input_text):
    text_input = '\n'.join(input_text)
    prompt = f"Please elaborate on the following subject: {text_input}"
    try:
        response = client.chat.completions.create(
            model=gptmodel,
            messages=[
                {"role": "system", "content": "You are an expert on Natural Language Processing."},
                {"role": "system", "content": "You can explain read the input and answer in detail."},
                {"role": "system", "content": prompt}
            ],
            temperature=0.1,
            max_completion_tokens=200
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return str(e)

In [115]:
def print_formatted_response(response):
    wrapper = textwrap.TextWrapper(width=80)
    wrapped_text = wrapper.fill(text=response)
    print("Response: ")
    print("-----------------") 
    print(wrapped_text)
    print("-----------------\n")

In [137]:
db_records = [
    # ✅ Best definition (should be picked)
    "Retrieval-Augmented Generation (RAG) is an advanced framework that enhances Large Language Models (LLMs) by integrating a retrieval mechanism. Instead of relying solely on pre-trained knowledge, a RAG system retrieves relevant information from an external knowledge base (e.g., a document store, database, or vector index) and uses it to generate more accurate and contextually relevant responses.",

    # 🔸 Somewhat related, but vague
    "RAG is a technique used in AI to improve performance. It involves retrieval of external information, which is then used in a response generation process.",

    # ❌ Completely vague (not useful)
    "RAG is a system that retrieves documents from databases and displays them to users. It is commonly used in search engines.",

    # ❌ Totally unrelated
    "Rags are often used in cleaning, polishing, and household tasks. They come in various materials like cotton and microfiber.",

    # ❌ Off-topic but looks similar
    "AI-powered search engines retrieve relevant documents based on queries. They use ranking algorithms and keyword matching to provide accurate results.",

    # ❌ Another vague response
    "Retrieval systems are important for fetching stored information, whether it’s in AI models or traditional database queries."
]


In [138]:
paragraph = ' '.join(db_records)
wrapped_text = textwrap.fill(paragraph, width=80)
print(wrapped_text)

Retrieval-Augmented Generation (RAG) is an advanced framework that enhances
Large Language Models (LLMs) by integrating a retrieval mechanism. Instead of
relying solely on pre-trained knowledge, a RAG system retrieves relevant
information from an external knowledge base (e.g., a document store, database,
or vector index) and uses it to generate more accurate and contextually relevant
responses. RAG is a technique used in AI to improve performance. It involves
retrieval of external information, which is then used in a response generation
process. RAG is a system that retrieves documents from databases and displays
them to users. It is commonly used in search engines. Rags are often used in
cleaning, polishing, and household tasks. They come in various materials like
cotton and microfiber. AI-powered search engines retrieve relevant documents
based on queries. They use ranking algorithms and keyword matching to provide
accurate results. Retrieval systems are important for fetching stored

In [139]:
query = "Define a rag store"
llm_response = call_llm_with_full_text(query)
print_formatted_response(llm_response)

Response: 
-----------------
A rag store is a retail or wholesale shop that sells various types of used or
second-hand clothes, also known as "rags". These stores are often found in
lower-income neighborhoods where people are looking for affordable clothing
options. Some rag stores also sell other used items like furniture, household
goods, and toys.   In a broader context, the term "rag store" can also refer to
a business that buys and sells scrap materials, including textiles. These
businesses play a crucial role in the recycling industry, as they help to reduce
waste and promote the reuse of materials.   It's important to note that the term
"rag store" is not commonly used in all regions or countries. In some places,
these types of stores might be referred to as thrift stores, second-hand stores,
or charity shops.
-----------------



In [140]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [141]:
def calculate_cosine_similarity(text1, text2):
    " Convert text pieces into TfidfVectors and compare using cosine similarity."
    vectorizer = TfidfVectorizer(
        stop_words='english',
        use_idf=True,
        norm='l2',
        # ngram_range=(1, 2),
        sublinear_tf=True,
        analyzer='word'
    )
    tfidf = vectorizer.fit_transform([text1, text2])
    similarity = cosine_similarity(tfidf[0:1], tfidf[1:2])
    return similarity[0][0]

In [142]:
text1 = "FPGA constraints define timing and placement."
text2 = "Constraints in FPGA set placement and timing rules."

text3 = "FPGA constraints define timing and placement rules."
text4 = "Timing and placement rules are defined by FPGA constraints."

score = calculate_cosine_similarity(text1, text2)
print("Similarity Score:", score)

score2 = calculate_cosine_similarity(text3, text4)
print("Similarity Score:", score2)

Similarity Score: 0.5803329846765686
Similarity Score: 0.716811741443062


In [143]:
def find_best_match_keyword_similarity(query, db_records):
    best_score = 0
    best_record = None

    query_keywords = set(query.lower().split())
    print(query_keywords)

    for record in db_records:
        record_keywords = set(record.lower().split())
        common_keywords = query_keywords.intersection(record_keywords)
        current_score = len(common_keywords)
        similarity = calculate_cosine_similarity(query, record)
        print(similarity)
        if current_score > best_score:
            best_score = current_score
            best_record = record
    
    return best_score, best_record

best_keyword_score, best_matching_record = find_best_match_keyword_similarity(query, db_records)
print(f"Best Keyword Score: ", best_keyword_score)
print(f"Best Matching Record: ", best_matching_record)
print_formatted_response(best_matching_record)

{'a', 'define', 'rag', 'store'}
0.1528049645732037
0.08434796492247011
0.10371551133313005
0.0
0.0
0.0
Best Keyword Score:  2
Best Matching Record:  Retrieval-Augmented Generation (RAG) is an advanced framework that enhances Large Language Models (LLMs) by integrating a retrieval mechanism. Instead of relying solely on pre-trained knowledge, a RAG system retrieves relevant information from an external knowledge base (e.g., a document store, database, or vector index) and uses it to generate more accurate and contextually relevant responses.
Response: 
-----------------
Retrieval-Augmented Generation (RAG) is an advanced framework that enhances
Large Language Models (LLMs) by integrating a retrieval mechanism. Instead of
relying solely on pre-trained knowledge, a RAG system retrieves relevant
information from an external knowledge base (e.g., a document store, database,
or vector index) and uses it to generate more accurate and contextually relevant
responses.
-----------------



In [144]:
augmented_input = query + ":" + best_matching_record
llm_response = call_llm_with_full_text(augmented_input)
print_formatted_response(llm_response)

Response: 
-----------------
Retrieval-Augmented Generation (RAG) is a sophisticated framework that enhances
Large Language Models (LLMs) by incorporating a retrieval mechanism. This
mechanism is a significant departure from traditional LLMs, which rely solely on
pre-trained knowledge.  In a RAG system, instead of generating responses based
solely on pre-existing knowledge, the model retrieves relevant information from
an external knowledge base. This knowledge base could be a document store, a
database, or a vector index. The retrieved information is then used to generate
responses.  The primary advantage of this approach is that it allows the model
to generate more accurate and contextually relevant responses. This is because
the model has access to a broader range of information than what is available in
its pre-trained knowledge.   For example, if a user asks a question about a
recent event, a traditional LLM might struggle to provide an accurate answer if
it was trained on data th