<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/embeddings/OpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI Embeddings

In [None]:
# %pip install llama-index-embeddings-azure-openai==0.3.0 
# %pip install llama-index-llms-azure-openai==0.3.0
# %pip install llama-index==0.12.2

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
# get API key and create embeddings
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding


embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=os.environ['AZURE_OPENAI_API_KEY'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT'],
    api_version=os.environ['AZURE_OPENAI_API_VERSION'],
)

In [4]:
embeddings1 = embed_model.get_text_embedding(
    "Azure Open AI uses latest LLM models for text generation"
)

embeddings2 = embed_model.get_text_embedding(
    "Azure Open AI combines OpenAI with model governance"
)

embeddings3 = embed_model.get_text_embedding(
    "The government is looking to provide better services to the people."
)


In [5]:
from numpy import dot
from numpy.linalg import norm

cos_sim = dot(embeddings1, embeddings2)/(norm(embeddings1)*norm(embeddings2))
print(cos_sim)

0.8790993374795034


In [6]:
cos_sim = dot(embeddings2, embeddings3)/(norm(embeddings2)*norm(embeddings3))
print(cos_sim)

0.7568739825381245


In [7]:
import os
from dotenv import load_dotenv
load_dotenv()

from llama_index.core import Settings
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

# You need to deploy your own embedding model as well as your own chat completion model
llm = AzureOpenAI(
    deployment_name='gpt-4o',
    model='gpt-4o',
    api_key=os.environ['AZURE_OPENAI_API_KEY'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT'],
    api_version=os.environ['AZURE_OPENAI_API_VERSION'],
)

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=os.environ['AZURE_OPENAI_API_KEY'],
    azure_endpoint=os.environ['AZURE_OPENAI_ENDPOINT'],
    api_version=os.environ['AZURE_OPENAI_API_VERSION'],
)

# global settings
Settings.llm = llm
Settings.embed_model = embed_model

In [8]:
from datasets import load_dataset
import pandas as pd
from llama_index.core import Document


ds = load_dataset("rag-datasets/rag-mini-bioasq", "text-corpus")
ds = ds['passages'].to_pandas().set_index('id', drop=True)
query_set = load_dataset("rag-datasets/rag-mini-bioasq", "question-answer-passages")
queries = query_set['test'].take(5)

# create a subset of the documents for faster testing

passages_required = set()
[ passages_required.update([int(id) for id in ids[1:-1].split(", ")]) for ids in query_set['test'].take(15)['relevant_passage_ids'] ];


docs = [Document(text=ds.loc[id].passage, metadata = {'id' : id}) for id in passages_required]
for x in docs:
    x.doc_id = str(x.metadata['id'])
    x.excluded_llm_metadata_keys = ['id']

In [None]:
from llama_index.core import VectorStoreIndex
 
index = VectorStoreIndex.from_documents(docs)

# converting vector store to query engine
query_engine = index.as_query_engine(similarity_top_k=3)

In [10]:
# generating query response
query = queries[0]['question']
print("Query:", query)
response = query_engine.query(query)
print(response, response.get_formatted_sources(length = 500))

Query: Is Hirschsprung disease a mendelian or a multifactorial disorder?
Hirschsprung disease is a multifactorial disorder. It exhibits complex inheritance patterns, including non-Mendelian inheritance for the more common short-segment form, and involves multiple genetic loci and modifier genes. While some forms of the disease follow Mendelian inheritance patterns, the overall genetic basis of Hirschsprung disease is highly complex and involves interactions between various genes. > Source (Doc id: 1616ecb1-6e9d-492b-a2c0-45f2ad66a9d6): Hirschsprung's disease (HSCR) is a fairly frequent cause of intestinal 
obstruction in children. It is characterized as a sex-linked heterogonous 
disorder with variable severity and incomplete penetrance giving rise to a 
variable pattern of inheritance. Although Hirschsprung's disease occurs as an 
isolated phenotype in at least 70% of cases, it is not infrequently associated 
with a number of congenital abnormalities and associated syndromes, 
demonst

In [15]:
from llama_index.core.evaluation import RetrieverEvaluator
from llama_index.core.evaluation.retrieval.metrics import resolve_metrics, HitRate, MRR
from llama_index.core.node_parser import SentenceSplitter, TokenTextSplitter

metric_dict = {}
metrics = ["precision", "recall", "ap", "ndcg"]
metrics = [x() for x in resolve_metrics(metrics)] + [HitRate(use_granular_hit_rate=True), MRR(use_granular_mrr=True)]

results_data = []
splitters = {'sentance_512_0': SentenceSplitter(chunk_size=512, chunk_overlap=100),
             'token_512': TokenTextSplitter(chunk_size=512, chunk_overlap=100)}

for splitter_name in splitters.keys():
    splitter = splitters[splitter_name]
    index = VectorStoreIndex.from_documents(docs, transformations = [splitter])

    for k in [5, 10]:
        query_engine = index.as_query_engine(similarity_top_k=k)
        for row in queries:
            row['relevant_passage_ids'] = row['relevant_passage_ids'][1:-1].split(', ')
            query = row['question']
            retrieved_nodes = query_engine.retrieve(query)
            retrieved_passage_ids = [str(node.metadata['id']) for node in retrieved_nodes]
            
            for metric in metrics:
                eval_result = metric.compute(
                    query, row['relevant_passage_ids'], retrieved_passage_ids,
                )
                metric_dict[metric.metric_name] = eval_result.score
            
            results_data.append({
                'splitter': splitter_name,
                'k': k,
                'query': query,
                'retrieved_ids': retrieved_passage_ids,
                'relevant_ids': row['relevant_passage_ids'],
                **metric_dict
            })

results_df = pd.DataFrame(results_data)
results_df.drop(['query', 'retrieved_ids', 'relevant_ids'], axis=1).groupby(['k','splitter']).mean()

Metadata length (6) is close to chunk size (50). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (6) is close to chunk size (50). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (6) is close to chunk size (50). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (6) is close to chunk size (50). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (6) is close to chunk size (50). Resulting chunks are less than 50 tokens. Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
Metadata length (6) is close to chunk size (50). Resulting chunks are less than 50 tokens. Cons

Unnamed: 0_level_0,Unnamed: 1_level_0,precision,recall,ap,ndcg,hit_rate,mrr
k,splitter,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5,sentance_512_0,0.76,0.404722,0.374944,0.778461,0.404722,0.454667
5,sentence_50,0.85,0.304167,0.453278,0.855455,0.480278,0.410333
5,token_512,0.8,0.404722,0.393278,0.8,0.424722,0.4335
10,sentance_512_0,0.553333,0.534722,0.489488,0.633507,0.554722,0.379544
10,sentence_50,0.816667,0.415,0.913904,0.87959,0.978333,0.271236
10,token_512,0.56,0.522222,0.503694,0.646062,0.574722,0.374548


In [16]:
from ragas.dataset_schema import SingleTurnSample
                            
from ragas.metrics import (LLMContextRecall,LLMContextPrecisionWithReference, Faithfulness, 
                            SemanticSimilarity, NonLLMContextRecall, answer_correctness, FactualCorrectness)
from ragas import evaluate, EvaluationDataset
from langchain_openai import AzureOpenAIEmbeddings
from langchain.chat_models import AzureChatOpenAI
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

evaluator_llm = LangchainLLMWrapper(AzureChatOpenAI(
                openai_api_version=os.environ['AZURE_OPENAI_API_VERSION'],
                azure_deployment='gpt-4o',
                model='gpt-4o',
            ))

# evaluator_llm = LangchainLLMWrapper(AzureChatOpenAI(
#                 openai_api_version=os.environ['AZURE_OPENAI_API_VERSION'],
#                 azure_deployment='gpt-35-turbo16k',
#                 model='gpt-35-turbo',
#             ))

evaluator_embeddings = LangchainEmbeddingsWrapper( AzureOpenAIEmbeddings(
                openai_api_version=os.environ['AZURE_OPENAI_API_VERSION'],
                azure_deployment='text-embedding-ada-002',
                model='text-embedding-ada-002',
))


metrics = [
    LLMContextRecall(), # Recall based on claims made in response vs those in reference, uses LLM
    LLMContextPrecisionWithReference(), # Precision based on claims made in response vs those in reference, uses LLM
    FactualCorrectness(), # F1-Score of claims made in response vs those in reference
    SemanticSimilarity(), # embedding based similarity between generated answer and ground truth
    answer_correctness,
    Faithfulness()
]

results_data = []
splitters = {'sentance_512_0': SentenceSplitter(chunk_size=512, chunk_overlap=100),
             'token_512': TokenTextSplitter(chunk_size=512, chunk_overlap=100)}
             
for splitter_name in splitters.keys():
    splitter = splitters[splitter_name]
    index = VectorStoreIndex.from_documents(docs, transformations = [splitter])

    for k in [5, 10]:
        query_engine = index.as_query_engine(similarity_top_k=k)
        samples = []
        for row in queries:
            query = row['question']
            response = query_engine.query(query)
            retrieved_nodes = response.source_nodes
            retrieved_passage_ids = [node.metadata['id'] for node in retrieved_nodes]
            retrieved_passages = [ ds.loc[int(id)].passage for id in retrieved_passage_ids ]
            relevant_passages = [ ds.loc[int(id)].passage for id in row["relevant_passage_ids"][1:-1].split(', ') ]
            
            sample = SingleTurnSample(
                user_input=query,
                reference=row["answer"],
                response=response.response,
                retrieved_contexts=retrieved_passages,
                reference_contexts=relevant_passages,
            )
            samples.append(sample)

        eval_dataset = EvaluationDataset(samples = samples)
        results = evaluate(dataset=eval_dataset, metrics=metrics, llm = evaluator_llm, embeddings = evaluator_embeddings)
        df = results.to_pandas()
        df['k'] = k
        df['splitter'] = splitter_name
        results_data.append(df)

results_df = pd.concat(results_data).reset_index(drop=True)
results_df.drop(['user_input','retrieved_contexts','reference_contexts','response','reference'], axis=1).groupby(['k','splitter']).mean()

  evaluator_llm = LangchainLLMWrapper(AzureChatOpenAI(


Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Unnamed: 0_level_0,Unnamed: 1_level_0,context_recall,llm_context_precision_with_reference,factual_correctness,semantic_similarity,answer_correctness,faithfulness
k,splitter,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5,sentance_512_0,0.8,0.731111,0.658,0.926734,0.590017,0.771429
5,token_512,0.8,0.757778,0.69,0.922672,0.739,0.8
10,sentance_512_0,0.8,0.699524,0.684,0.925472,0.536956,0.8
10,token_512,0.8,0.689286,0.646,0.923493,0.729206,0.8


In [None]:
print(results_df.iloc[-1]['user_input'])
print(results_df.iloc[-1]['response'])
print(results_df.iloc[-1]['reference'])

Is RANKL secreted from the cells?
Yes, RANKL is secreted from the cells.
Receptor activator of nuclear factor κB ligand (RANKL) is a cytokine predominantly secreted by osteoblasts.
