# Gemini API: Embeddings Quickstart

The Gemini API generates state-of-the-art text embeddings. This notebook provides quick code examples that show you how to get started. You can find more examples and guides to learn more at the end.


## Install the Python SDK

In [1]:
!pip install -U -q google.generativeai

In [2]:
import google.generativeai as genai

## Configure your API key

Create an API key in [Google AI Studio](https://aistudio.google.com), then pass it to the Python SDK using Colab Secrets or an environment variable. You can learn more in the [Authentication quickstart](https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb).

In [3]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Select a model

There is currently only one model that generates embeddings in the Gemini API. If there are more in the future, it's important to choose a specific model and stick with it. The outputs of different models are not compatible with each other.

In [4]:
for m in genai.list_models():
  if 'embedContent' in m.supported_generation_methods:
    print(m.name)

models/embedding-001


## Embed content

In [5]:
text = "Hello world"
result = genai.embed_content(model="models/embedding-001", content=text)
print(result['embedding'])

[0.04703258, -0.040190056, -0.029026963, -0.026809642, 0.018920582, -8.3654784e-05, 0.031116402, -0.019520544, 0.0114913415, 0.009625779, 0.04571186, 0.05170951, -0.007854084, -0.07627559, -0.00073652336, -0.02259244, 0.01149677, -0.00761096, 0.006400746, -0.0036826304, -8.6395165e-05, 0.007910556, -0.031401973, -0.027668774, 0.0131483, 0.005762955, -0.0022430476, -0.07029421, 0.007011013, 0.07013052, -0.047634568, 0.008311825, -0.060211696, 0.016431302, 0.042709153, -0.047674265, 0.03426082, 0.021967327, -0.0070651034, 0.00032590108, 0.013825696, -0.08921293, -0.03404069, -0.03793646, 0.059349738, -0.0044174152, 0.015472682, -0.0061533544, 0.022183485, -0.08739371, 0.049185753, 0.025158774, 0.044854913, -0.022910612, 0.02060697, -0.016286727, 0.07367813, 0.013565082, -0.06963922, -0.002877564, 0.02369202, 0.0143784685, -0.012660949, 0.06607742, -0.00069232617, -0.017637717, -0.06946077, 0.042905096, 0.03502765, -0.029362002, 0.0069921436, -0.03341513, 0.036520302, -0.039816536, -0.025

In [6]:
print(len(result['embedding'])) # The embeddings have 768 dimensions

768


## Batch embed content

You can embed a batch of content in one API call for efficiency.

In [7]:
result = genai.embed_content(
    model="models/embedding-001",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'])

for embedding in result['embedding']:
  print(embedding)

[-0.0002620658, -0.05592018, -0.012463195, -0.020672262, 0.0076786764, 0.0024069757, 0.030989334, 0.01582611, -0.015852105, 0.01166464, -0.009116531, 0.0010538619, 0.0364724, -0.03241015, 0.008458401, -0.0033494907, 0.021979257, 0.015039317, 0.0119965095, -0.030202571, -0.03397108, 0.010469226, -0.0036774648, 0.013236868, 0.007085994, -0.03823258, 0.018632848, -0.07036459, -0.022376657, 0.053557377, -0.058401376, 0.03585069, -0.087846205, -0.017133184, 0.046018917, -0.06547599, 0.063615195, 0.0038366304, -0.05621926, 0.015354922, 0.027993223, -0.008997667, -0.07307446, -0.029412353, -0.007533675, 0.0054107136, -0.0084472345, 0.031681348, 0.004164861, -0.10778427, 0.016246736, 0.034010183, 0.08023602, -0.007318592, -6.4897664e-05, -0.007139854, 0.048286464, 0.015510914, -0.029625932, -0.0059952363, 0.008322641, 0.0065014833, -0.025500137, 0.063725494, -0.019506633, -0.0068314, 0.026189992, 0.022515895, 0.058747195, -0.018565234, 0.012418801, -0.015309863, 0.027896155, -0.013933084, -0.0

## Use `task_type` to provide a hint to the model how you'll use the embeddings

Let's look at all the parameters the `embed_content` method takes. There are four:

* `model`: Required. The model's resource name. This serves as an ID for the Model to use. Use `models/embedding-001`.
* `content`: Required. The content that you would like to embed.
*`task_type`: Optional. A field for setting the task type for which the embeddings will be used. This can only be set for `models/embedding-001`.
* `title`: Optional. A title for the text you are embedding. You should only set the title if your task type is `retrieval_document` (or `document`).

`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application.

The following task_type parameters are accepted:

* `unspecified`: Unset value, which will default to one of the other enum values.
* `retrieval_query` (or `query`): Specifies the given text is a query in a search/retrieval setting.
* `retrieval_document` (or `document`): Specifies the given text is a document from the corpus being searched.
* `semantic_similarity` (or `similarity`): Specifies the given text will be used for  Semantic Textual Similarity (STS).
* `classification`: Specifies that the given text will be classified.
* `clustering`: Specifies that the embeddings will be used for clustering.





In [8]:
# Notice the API returns different embeddings depending on `task_type`
result1 = genai.embed_content(
    model="models/embedding-001",
    content="Hello world")

result2 = genai.embed_content(
    model="models/embedding-001",
    content="Hello world",
    task_type="document",)

print(result1['embedding'])
print(result2['embedding'])

[0.04703258, -0.040190056, -0.029026963, -0.026809642, 0.018920582, -8.3654784e-05, 0.031116402, -0.019520544, 0.0114913415, 0.009625779, 0.04571186, 0.05170951, -0.007854084, -0.07627559, -0.00073652336, -0.02259244, 0.01149677, -0.00761096, 0.006400746, -0.0036826304, -8.6395165e-05, 0.007910556, -0.031401973, -0.027668774, 0.0131483, 0.005762955, -0.0022430476, -0.07029421, 0.007011013, 0.07013052, -0.047634568, 0.008311825, -0.060211696, 0.016431302, 0.042709153, -0.047674265, 0.03426082, 0.021967327, -0.0070651034, 0.00032590108, 0.013825696, -0.08921293, -0.03404069, -0.03793646, 0.059349738, -0.0044174152, 0.015472682, -0.0061533544, 0.022183485, -0.08739371, 0.049185753, 0.025158774, 0.044854913, -0.022910612, 0.02060697, -0.016286727, 0.07367813, 0.013565082, -0.06963922, -0.002877564, 0.02369202, 0.0143784685, -0.012660949, 0.06607742, -0.00069232617, -0.017637717, -0.06946077, 0.042905096, 0.03502765, -0.029362002, 0.0069921436, -0.03341513, 0.036520302, -0.039816536, -0.025

## Learning more

Check out these examples in the Cookbook to learn more about what you can do with embeddings:

* [Search Reranking](https://github.com/google-gemini/gemini-api-cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.

* [Anomaly detection with embeddings](https://github.com/google-gemini/gemini-api-cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.

* [Train a text classifier](https://github.com/google-gemini/gemini-api-cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.

* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).

You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)

* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).

* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents).