In [1]:
!pip uninstall -qqy jupyterlab kfp  # Remove unused conflicting packages
!pip install -qU "google-genai==1.7.0" "chromadb==0.6.3"

[0m

In [2]:
from google import genai
from google.genai import types

from IPython.display import Markdown

genai.__version__

'1.7.0'

In [3]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

#### Explore available models for document embeddings

In [4]:
client = genai.Client(api_key=GOOGLE_API_KEY)

for m in client.models.list():
    if 'embedContent' in m.supported_actions:
        print(m.name)

models/embedding-001
models/text-embedding-004
models/gemini-embedding-exp-03-07
models/gemini-embedding-exp
models/gemini-embedding-001


#### Data

Let's create a simple dataset

In [5]:
DOCUMENT1 = """
Massive stars are born in the densest regions of giant molecular clouds—vast, cold reservoirs of gas and dust scattered throughout galaxies. Under the pull of gravity, small fluctuations in density cause certain pockets within the cloud to collapse inward. As these clumps condense, gravitational potential energy is converted into heat, forming a dense, hot core surrounded by an accretion disk. When the core temperature rises above about ten million Kelvin, hydrogen nuclei begin to fuse into helium, releasing enormous amounts of energy. This marks the star’s ignition—a delicate balance is achieved between the outward pressure from nuclear fusion and the inward force of gravity. Massive protostars, often more than eight times the mass of the Sun, form quickly—within a few hundred thousand years—and their intense radiation and stellar winds blow away surrounding gas, halting further accretion. These young giants illuminate nearby gas clouds, creating the glowing stellar nurseries we observe as nebulae.
"""
DOCUMENT2 = """
Once stable, a massive star spends millions of years in its main sequence phase, burning hydrogen into helium through the CNO (carbon–nitrogen–oxygen) cycle, a fusion process much faster than the proton–proton chain that powers smaller stars like the Sun. Because of their enormous mass and gravitational pressure, core temperatures in massive stars are extremely high, causing them to burn fuel at an astonishing rate—millions of times faster than smaller stars. As hydrogen in the core becomes depleted, fusion continues in concentric shells around the core while the core itself contracts and heats further. This triggers successive stages of fusion: helium into carbon and oxygen, carbon into neon, neon into magnesium and silicon, and eventually silicon into iron. Each stage happens faster than the last—helium burning may last hundreds of thousands of years, but silicon burning can last only a few days. The star’s structure becomes layered like an onion, with each shell fusing a heavier element than the one above it, sustaining equilibrium until the fuel is finally exhausted.
"""
DOCUMENT3 = """
When a massive star’s core becomes dominated by iron, nuclear fusion can no longer produce energy—iron fusion consumes rather than releases energy. Without the outward pressure from fusion to counteract gravity, the core collapses catastrophically in less than a second. Temperatures soar to billions of degrees, crushing protons and electrons into neutrons and releasing a flood of neutrinos. The outer layers of the star rebound violently off the stiffening neutron core, creating a titanic shock wave that blasts the outer envelope into space—a supernova explosion that can outshine entire galaxies for weeks. Depending on the remnant core’s mass, two fates are possible: if the remaining mass is between roughly 1.4 and 3 times that of the Sun, it stabilizes as a neutron star, an object so dense that a teaspoon of its material would weigh billions of tons. If it is heavier still, gravity overwhelms all forces and the core collapses into a black hole, where not even light can escape. The ejected elements—carbon, oxygen, iron, and heavier nuclei—enrich the interstellar medium, seeding the birth of new stars and planets, and ensuring the cosmic cycle of creation continues.
"""

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

In [6]:
documents

['\nMassive stars are born in the densest regions of giant molecular clouds—vast, cold reservoirs of gas and dust scattered throughout galaxies. Under the pull of gravity, small fluctuations in density cause certain pockets within the cloud to collapse inward. As these clumps condense, gravitational potential energy is converted into heat, forming a dense, hot core surrounded by an accretion disk. When the core temperature rises above about ten million Kelvin, hydrogen nuclei begin to fuse into helium, releasing enormous amounts of energy. This marks the star’s ignition—a delicate balance is achieved between the outward pressure from nuclear fusion and the inward force of gravity. Massive protostars, often more than eight times the mass of the Sun, form quickly—within a few hundred thousand years—and their intense radiation and stellar winds blow away surrounding gas, halting further accretion. These young giants illuminate nearby gas clouds, creating the glowing stellar nurseries we o

### Creating embedding database with ChromaDB

In [7]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry

# helper to retry when per minute quota is reached
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

class GeminiEmbeddingFunction(EmbeddingFunction):
    # Specify whether to generate embeddings for documents, or queries
    document_mode = True

    @retry.Retry(predicate=is_retriable)
    def __call__(self, input):
        if self.document_mode:
            embedding_task = "retrieval_document"
        else:
            embedding_task = "retrieval_query"

        response = client.models.embed_content(
            model='models/text-embedding-004',
            config=types.EmbedContentConfig(
                task_type=embedding_task
            ),
            contents=input
        )

        return [e.values for e in response.embeddings]

Now we'll create a Chroma database client that uses the GeminiEmbeddingFunction and populate the database with the documents we defined above.

In [8]:
#pip install -qU posthog

In [9]:
import chromadb

DB_NAME = 'stellerevolution'

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

db.add(documents=documents, ids=[str(i) for i in range(len(documents))])

Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event CollectionAddEvent: capture() takes 1 positional argument but 3 were given


In [10]:
db.count()

3

In [11]:
#db.peek(1)

#### Retrieval: Find relevant documents

In [12]:
# Switch to query mode when generating embeddings.
embed_fn.document_mode = False

query = "How does a heavy start burn its fuel?"

result = db.query(query_texts=[query], n_results=1)
[all_passages] = result["documents"]

Markdown(all_passages[0])

Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given



Once stable, a massive star spends millions of years in its main sequence phase, burning hydrogen into helium through the CNO (carbon–nitrogen–oxygen) cycle, a fusion process much faster than the proton–proton chain that powers smaller stars like the Sun. Because of their enormous mass and gravitational pressure, core temperatures in massive stars are extremely high, causing them to burn fuel at an astonishing rate—millions of times faster than smaller stars. As hydrogen in the core becomes depleted, fusion continues in concentric shells around the core while the core itself contracts and heats further. This triggers successive stages of fusion: helium into carbon and oxygen, carbon into neon, neon into magnesium and silicon, and eventually silicon into iron. Each stage happens faster than the last—helium burning may last hundreds of thousands of years, but silicon burning can last only a few days. The star’s structure becomes layered like an onion, with each shell fusing a heavier element than the one above it, sustaining equilibrium until the fuel is finally exhausted.
