# Gemini API: Embeddings Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

The Gemini API generates state-of-the-art text embeddings. An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph. You can use embeddings in many downstream applications like document search.

This notebook provides quick code examples that show you how to get started generating embeddings.

In [1]:
!pip install -q -U google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m146.8/146.8 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.5/664.5 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import google.generativeai as genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see  [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [3]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Embed content

Call the `embed_content` method with the `models/text-embedding-004` model to generate text embeddings.

In [4]:
text = "Hello world"
result = genai.embed_content(model="models/text-embedding-004", content=text)

# Print just a part of the embedding to keep the output manageable
print(str(result['embedding'])[:50], '... TRIMMED]')

[0.013168523, -0.008711934, -0.046782676, 0.000699 ... TRIMMED]


In [5]:
print(len(result['embedding'])) # The embeddings have 768 dimensions

768


## Batch embed content

You can embed a list of multiple prompts with one API call for efficiency.

In [6]:
result = genai.embed_content(
    model="models/text-embedding-004",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'])

for embedding in result['embedding']:
  print(str(embedding)[:50], '... TRIMMED]')

[-0.010632277, 0.019375855, 0.0209652, 0.000770642 ... TRIMMED]
[0.018467998, 0.0054281196, -0.017658804, 0.013859 ... TRIMMED]
[0.05808907, 0.020941721, -0.108728774, -0.0403925 ... TRIMMED]


## Truncating embeddings

The `text-embedding-004` model also supports lower embedding dimensions. Specify `output_dimensionality` to truncate the output.

In [7]:
# Not truncated
result1 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world")


# Truncated
result2 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world",
    output_dimensionality=10)


(len(result1['embedding']), len(result2['embedding']))

(768, 10)

## Specify `task_type`

Let's look at all the parameters the `embed_content` method takes. There are five:

* `model`: Required. Must be `models/text-embedding-004` or `models/embedding-001`.
* `content`: Required. The content that you would like to embed.
*`task_type`: Optional. The task type for which the embeddings will be used.
* `title`: Optional. You should only set this parameter if your task type is `retrieval_document` (or `document`).
* `output_dimensionality`: Optional. Reduced dimension for the output embedding. If set, excessive values in the output embedding are truncated from the end. This is supported by `models/text-embedding-004`, but cannot be specified in `models/embedding-001`.

`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application.

The following task_type parameters are accepted:

* `unspecified`: If you do not set the value, it will default to `retrieval_query`.
* `retrieval_query` (or `query`): The given text is a query in a search/retrieval setting.
* `retrieval_document` (or `document`): The given text is a document from a corpus being searched. Optionally, also set the `title` parameter with the title of the document.
* `semantic_similarity` (or `similarity`): The given text will be used for  Semantic Textual Similarity (STS).
* `classification`: The given text will be classified.
* `clustering`: The embeddings will be used for clustering.
* `question_answering`: The given text will be used for question answering.
* `fact_verification`: The given text will be used for fact verification.

In [8]:
# Notice the API returns different embeddings depending on `task_type`
result1 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world")

result2 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world",
    task_type="document")

print(str(result1['embedding']))
print(str(result2['embedding']))

[0.013168523, -0.008711934, -0.046782676, 0.00069968984, -0.009518873, -0.008720178, 0.060103577, 0.024755744, 0.026053512, 0.054356422, -0.037933834, -0.0014235444, 0.030605134, -0.015512644, -0.012904961, -0.028807389, -0.007819577, 0.012152762, -0.11399522, 0.010654234, 0.0053652385, -0.0011788871, -0.029781109, -0.060107403, -0.015272878, -0.0036046256, 0.0061476836, 0.031175775, 0.021421988, 0.037104346, -0.03720273, 0.04614693, 0.002196373, -0.031793043, 0.0096602505, 0.012500472, -0.0509635, 0.0211728, 0.01433289, -0.057802223, -0.027034512, 0.03680537, 0.0016361827, 0.008520898, 0.043315884, -0.032519083, 0.018076206, -0.0031592483, 0.0045996457, -0.006337254, 0.04721373, 0.0019672965, -0.096703835, 0.03913275, -0.009261215, 0.00052188913, -0.034771822, -0.061101012, 0.11129944, -0.026392855, -0.033570185, -0.046336856, 0.048343632, 0.0137551045, -0.04588907, -0.032731514, -0.00030687434, -0.001844734, -0.040224575, 0.015202555, -0.062102083, 0.016816532, -0.004103314, 0.006833

## Learning more

Check out these examples in the Cookbook to learn more about what you can do with embeddings:

* [Search Reranking](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.

* [Anomaly detection with embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.

* [Train a text classifier](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.

* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).

You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)

* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).

* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents).