# Packages

In [4]:
#!pip install langchain-chroma

In [5]:
# !pip install -U -q google.generativeai
# !pip install -q chromadb

In [6]:
import os
import yaml

import textwrap
import chromadb

import google.generativeai as genai
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from chromadb import Documents, EmbeddingFunction, Embeddings

import numpy as np
import pandas as pd

import google.generativeai as genai
import google.ai.generativelanguage as glm
from IPython.display import Markdown


### API_KEY

In [7]:
with open('chatgpt_api_credentials.yml', 'r') as file:
    api_creds = yaml.safe_load(file)

Once you have the API key, pass it to the SDK. You can do this in two ways:

- Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there).
- Pass the key to genai.configure(api_key=...)

In [8]:
#  use `os.getenv('API_KEY')` to fetch an environment variable.
# import os
# os.environ['GOOGLE_API_KEY'] = api_creds['gemini_api_key']

# # Or

# from google.colab import userdata
# API_KEY=userdata.get('API_KEY')

genai.configure(api_key=api_creds['gemini_api_key'])


Choose a model. The outputs of different models are not compatible with each other.

In [9]:
for m in genai.list_models():
    if 'embedContent' in m.supported_generation_methods:
        print(m.name)

models/embedding-001
models/text-embedding-004


# Load the DATA

In [10]:
loader = TextLoader("Projects.txt", encoding="utf-8")

In [11]:
raw_documents= loader.load()

In [12]:
len(raw_documents)

1

In [13]:
raw_documents[0].page_content[0:100]

'Elon Reeve Musk (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a businessman and investor. He is the found'

In [14]:
# raw_documents= raw_documents[0].page_content.split(". ")

In [15]:
(raw_documents)

[Document(page_content='Elon Reeve Musk (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world; as of April 2024, Forbes estimates his net worth to be $193 billion.[4]\n\nA member of the wealthy South African Musk family, Musk was born in Pretoria and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through his Canadian-born mother. Two years later, he matriculated at Queen\'s University at Kingston in Canada. Musk later transferred to the University of Pennsylvania and received bachelor\'s degrees in economics and physics. He moved to California in 1995 to attend Stanford University, 

# Chunk

In [16]:
text_splitter = CharacterTextSplitter(chunk_size = 100, chunk_overlap = 0)

In [17]:
documents = text_splitter.split_documents(raw_documents)

Created a chunk of size 494, which is longer than the specified 100
Created a chunk of size 938, which is longer than the specified 100
Created a chunk of size 1142, which is longer than the specified 100
Created a chunk of size 461, which is longer than the specified 100
Created a chunk of size 671, which is longer than the specified 100
Created a chunk of size 467, which is longer than the specified 100
Created a chunk of size 204, which is longer than the specified 100
Created a chunk of size 447, which is longer than the specified 100
Created a chunk of size 445, which is longer than the specified 100
Created a chunk of size 618, which is longer than the specified 100
Created a chunk of size 220, which is longer than the specified 100
Created a chunk of size 572, which is longer than the specified 100


In [18]:
len(documents)

13

In [19]:
documents[0:2]

[Document(page_content='Elon Reeve Musk (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world; as of April 2024, Forbes estimates his net worth to be $193 billion.[4]', metadata={'source': 'Projects.txt'}),
 Document(page_content="A member of the wealthy South African Musk family, Musk was born in Pretoria and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through his Canadian-born mother. Two years later, he matriculated at Queen's University at Kingston in Canada. Musk later transferred to the University of Pennsylvania and received bachelor's degrees in economics and physics. He 

In [20]:
type(documents)

list

In [21]:
final_docs = []
for i in documents:
    final_docs.append(i.page_content)

# Embedings
and storing in Chroma DB <br>
By inputting a set of documents into this custom function, you will receive vectors, or embeddings of the documents.

In [22]:
class GeminiEmbeddingFunction(EmbeddingFunction):
    
    def __call__(self, input: Documents) -> Embeddings:
        model = 'models/embedding-001'
        title = "Custom app query"
        return genai.embed_content(model=model,
                                    content=input,
                                    task_type="retrieval_document",
                                    title=title)["embedding"]
    
    


Now you will create the vector database. In the create_chroma_db function, you will instantiate a Chroma client{:.external}. From there, you will create a collection, which is where you store your embeddings, documents, and any metadata. Note that the embedding function from above is passed as an argument to the create_collection.

Next, you use the add method to add the documents to the collection.

In [23]:
def create_chroma_db(documents, name):
    chroma_client = chromadb.Client()
    db = chroma_client.create_collection(name=name, embedding_function=GeminiEmbeddingFunction())

    for i, d in enumerate(documents):
        db.add(
      documents=d,
      ids=str(i)
    )
    return db

In [24]:
# Set up the DB
db = create_chroma_db(final_docs, "googlecarsdatabase10")

In [25]:
# Confirm by looking at the DB
db_df = pd.DataFrame(db.peek(50))

In [26]:
len(db_df['embeddings'][0]) # 768 vector size in gemini

768

# Getting the relevant document

db is a Chroma collection object. You can call query on it to perform a nearest neighbors search to find similar embeddings or documents

In [27]:
def get_relevant_passage(query, db):
    passage = db.query(query_texts=[query], n_results=1)['documents'][0][0]
    return passage

In [28]:
def make_prompt(query, relevant_passage):
    escaped = relevant_passage.replace("'", "").replace('"', "").replace("\n", " ")
    
    prompt = ("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \
      Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
      However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
      strike a friendly and converstional tone. \
      If the passage is irrelevant to the answer, you may ignore it.
      QUESTION: '{query}'
      PASSAGE: '{relevant_passage}'

        ANSWER:      """).format(query=query, relevant_passage=escaped)

    return prompt

### 1. Query

In [29]:
query = "Elon Musk is really owning SpaceX? if yes what is the revenue?"
# "is there a temparature control exists on climate control knob located? if so, how to use it?
# "How do you use the touchscreen in the Google car?"

### 2. Relevent context search

In [30]:
# Perform embedding search
passage = get_relevant_passage(query, db)
Markdown(passage)

Elon Reeve Musk (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world; as of April 2024, Forbes estimates his net worth to be $193 billion.[4]

### 3. Structed Prompt

In [31]:
# Pass a query to the prompt:
prompt = make_prompt(query, passage)
Markdown(prompt)

You are a helpful and informative bot that answers questions using text from the reference passage included below.       Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.       However, you are talking to a non-technical audience, so be sure to break down complicated concepts and       strike a friendly and converstional tone.       If the passage is irrelevant to the answer, you may ignore it.
      QUESTION: 'Elon Musk is really owning SpaceX? if yes what is the revenue?'
      PASSAGE: 'Elon Reeve Musk (/ˈiːlɒn/ EE-lon; born June 28, 1971) is a businessman and investor. He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect, and former chairman of Tesla, Inc.; owner, executive chairman, and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is one of the wealthiest people in the world; as of April 2024, Forbes estimates his net worth to be $193 billion.[4]'

        ANSWER:      

### 4. generate_content

In [32]:
model = genai.GenerativeModel('gemini-pro')
answer = model.generate_content(prompt)
Markdown(answer.text)

Elon Musk is, indeed, the owner of SpaceX. As a private company, SpaceX doesn't release its financial data, so I don't have any information about its revenue.

In [33]:
db_df.head(7)

Unnamed: 0,ids,embeddings,metadatas,documents,uris,data
0,0,"[0.02941473200917244, -0.04040157049894333, -0...",,Elon Reeve Musk (/ˈiːlɒn/ EE-lon; born June 28...,,
1,1,"[0.06620374321937561, -0.08512534201145172, -0...",,A member of the wealthy South African Musk fam...,,
2,10,"[0.03718456253409386, -0.08949185907840729, -0...",,"Musk arrived in Canada in June 1989, connected...",,
3,11,"[0.019050482660531998, -0.07613840699195862, -...",,"Two years later, he transferred to the Univers...",,
4,12,"[0.045048512518405914, -0.07221843302249908, -...",,"In 1994, Musk held two internships in Silicon ...",,
5,2,"[0.04144306853413582, -0.049283891916275024, -...",,"In 2004, Musk became an early investor in elec...",,
6,3,"[0.036542847752571106, -0.12151738256216049, -...",,Musk has expressed views that have made him a ...,,
