<a href="https://colab.research.google.com/github/kmk4444/Retrieval-augmented-generation/blob/main/Part4_show_and_compare_embeddingd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We will compare embedding models which are OPENAI, COHERE AND HUGGING FACE

**Requirements.txt**

In [1]:
!touch requirements.txt
!echo langchain >> requirements.txt
!echo langchain-openai >> requirements.txt
!echo openai >> requirements.txt
!echo langchain-google-genai >> requirements.txt
!echo cohere >> requirements.txt
!echo faiss-cpu >> requirements.txt
!echo streamlit >> requirements.txt
!echo python-dotenv >> requirements.txt
!echo llama-index >> requirements.txt
!echo pypdf >> requirements.txt
!echo chromadb >> requirements.tx
!echo beautifulsoup4 >> requirements.tx
!echo matplotlib >> requirements.tx
!echo rank_bm25 >> requirements.tx
!echo replicate >> requirements.txt

**Bash/command**

In [2]:
pip install -r requirements.txt

Successfully installed boto3-1.34.102 botocore-1.34.102 cohere-5.4.0 dataclasses-json-0.6.6 deprecated-1.2.14 dirtyjson-1.0.8 faiss-cpu-1.8.0 fastavro-1.9.4 gitdb-4.0.11 gitpython-3.1.43 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 httpx-sse-0.4.0 jmespath-1.0.1 jsonpatch-1.33 jsonpointer-2.4 langchain-0.1.19 langchain-community-0.0.38 langchain-core-0.1.52 langchain-google-genai-1.0.3 langchain-openai-0.1.6 langchain-text-splitters-0.0.1 langsmith-0.1.56 llama-index-0.10.36 llama-index-agent-openai-0.2.4 llama-index-cli-0.1.12 llama-index-core-0.10.36 llama-index-embeddings-openai-0.1.9 llama-index-indices-managed-llama-cloud-0.1.6 llama-index-legacy-0.9.48 llama-index-llms-openai-0.1.18 llama-index-multi-modal-llms-openai-0.1.5 llama-index-program-openai-0.1.6 llama-index-question-gen-openai-0.1.3 llama-index-readers-file-0.1.22 llama-index-readers-llama-parse-0.1.4 llama-parse-0.4.2 llamaindex-py-client-0.1.19 marshmallow-3.21.2 mypy-extensions-1.0.0 openai-1.28.0 orjson-3.10.3 packaging-

In [6]:
%%writefile app.py

from openai import OpenAI
import cohere
import streamlit as st
import requests
import os
from dotenv import load_dotenv

#load_dotenv()
#my_key_openai = os.getenv("openai_apikey")
#my_key_cohere = os.getenv("cohere_apikey")
#my_key_hf = os.getenv("huggingface_access_token")

my_key_openai="---"
my_key_cohere="----"
my_key_hf="-----"
OpenAI_client = OpenAI(api_key=my_key_openai)
Cohere_client = cohere.Client(api_key=my_key_cohere)

sample_text ="Mevsimler neden oluşur? Dünya kendi etrafında döndüğü için mi?"

def get_openai_embeddings(text):
  response = OpenAI_client.embeddings.create(
      input=text,
      model="text-embedding-3-small"
  )
  embeddings = response.data[0].embedding
  return embeddings

"""
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ... (omitted for spacing)
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}
"""


def get_cohere_embeddings(text):
  response = Cohere_client.embed(
      texts=[text],
      input_type="classification",
      model="embed-multilingual-v3.0"
  )
  return response.embeddings[0]

"""
{
  "text": "The following notable deaths occurred in 2022. Names are reported under the date of death, in alphabetical order......",
  "embeddings": {
    "float":[0.006572723388671875, 0.0090484619140625, -0.02142333984375,....],
    "int8":null,
    "uint8":null,
    "binary":null,
    "ubinary":null
  }
}
"""

def get_hf_embeddings(text):

    model_id = "sentence-transformers/all-MiniLM-L6-v2"

    api_url = f"https://api-inference.huggingface.co/pipeline/feature-extraction/{model_id}"
    headers = {"Authorization": f"Bearer {my_key_hf}"}

    response = requests.post(api_url, headers=headers, json={"inputs": text, "options":{"wait_for_model":True}})
    return response.json()

"""
[[-0.02388945  0.05525852 -0.01165488 ...  0.00577787  0.03409787  -0.0068891 ]
 [-0.0126876   0.04687412 -0.01050217 ... -0.02310316 -0.00278466   0.01047371]
 [ 0.00049438  0.11941205  0.00522949 ...  0.01687654 -0.02386115   0.00526433]
 ...
 [-0.03900796 -0.01060951 -0.00738271 ... -0.08390449  0.03768405   0.00231361]
 [-0.09598278 -0.06301168 -0.11690582 ...  0.00549841  0.1528919   0.02472013]
 [-0.01162949  0.05961934  0.01650903 ... -0.02821241 -0.00116556   0.0010672 ]]
"""

st.set_page_config("Embedding Modelleri Karşılaştırması", layout="wide")
st.title("Farklı Embedding Modelleriyle Vektörizasyon")
st.divider()

col_input, col_openai, col_cohere, col_hf = st.columns([2,1,1,1])

with col_input:
    text_input = st.text_area(label="Metin Girdisi", value=sample_text)
    submit_btn = st.button(label="Gönder")

    if submit_btn:

        with col_openai:
            st.header("OpenAI")
            openai_embeddings = get_openai_embeddings(text=sample_text)
            st.success(f"Vektördeki Boyut Sayısı: {len(openai_embeddings)}")
            for i, embedding in enumerate(openai_embeddings):
                col_openai.code(f"{i+1}: {embedding}")

        with col_cohere:
            st.header("Cohere")
            cohere_embeddings = get_cohere_embeddings(text=sample_text)
            st.info(f"Vektördeki Boyut Sayısı: {len(cohere_embeddings)}")
            for i, embedding in enumerate(cohere_embeddings):
                col_cohere.code(f"{i+1}: {embedding}")

        with col_hf:
            st.header("Hugging Face")
            hf_embeddings = get_hf_embeddings(text=sample_text)
            st.warning(f"Vektördeki Boyut Sayısı: {len(hf_embeddings)}")
            for i, embedding in enumerate(hf_embeddings):
                col_hf.code(f"{i+1}: {embedding}")


Overwriting app.py


In [7]:
!npm install localtunnel
!streamlit run /content/app.py &>/content/logs.txt &
!npx localtunnel --port 8501

[K[?25h[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35msaveError[0m ENOENT: no such file or directory, open '/content/package.json'
[0m[37;40mnpm[0m [0m[30;43mWARN[0m [0m[35menoent[0m ENOENT: no such file or directory, open '/content/package.json'
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No description
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No repository field.
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No README data
[0m[37;40mnpm[0m [0m[30;43mWARN[0m[35m[0m content No license field.
[0m
[K[?25h+ localtunnel@2.0.2
updated 1 package and audited 36 packages in 0.63s

3 packages are looking for funding
  run `npm fund` for details

found 2 [93mmoderate[0m severity vulnerabilities
  run `npm audit fix` to fix them, or `npm audit` for details
[K[?25hnpx: installed 22 in 1.956s
your url is: https://itchy-worms-chew.loca.lt
^C
