## <b><font color='darkblue'>Gemini API: Embeddings Quickstart</font></b>
([source](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb)) <b><font size='3ptx'>The Gemini API generates state-of-the-art text embeddings.</font></b>

<b>An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph.</b> You can use embeddings in many downstream applications like document search.

<b>This notebook provides quick code examples that show you how to get started generating embeddings.</b>

In [1]:
!pip freeze | grep 'google-generativeai'

google-generativeai==0.7.2


In [2]:
import config_genai
import google.generativeai as genai

### <b><font color='darkgreen'>Embed content</font></b>
Call the `embed_content` method with the `models/text-embedding-004` model to generate text embeddings.

In [3]:
text = "Hello world"
result = genai.embed_content(model="models/text-embedding-004", content=text)

In [4]:
# Print just a part of the embedding to keep the output manageable
print(str(result['embedding'])[:50], '... TRIMMED]')

[0.013168523, -0.008711934, -0.046782676, 0.000699 ... TRIMMED]


In [5]:
print(len(result['embedding'])) # The embeddings have 768 dimensions

768


### <b><font color='darkgreen'>Batch embed content</font></b>
You can embed a list of multiple prompts with one API call for efficiency.

In [6]:
result = genai.embed_content(
    model="models/text-embedding-004",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'])

In [7]:
for embedding in result['embedding']:
  print(str(embedding)[:50], '... TRIMMED]')

[-0.010632277, 0.019375855, 0.0209652, 0.000770642 ... TRIMMED]
[0.018467998, 0.0054281196, -0.017658804, 0.013859 ... TRIMMED]
[0.05808907, 0.020941721, -0.108728774, -0.0403925 ... TRIMMED]


### <b><font color='darkgreen'>Truncating embeddings</font></b>
The `text-embedding-004` model also supports lower embedding dimensions. Specify `output_dimensionality` to truncate the output.

In [8]:
# Not truncated
result1 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world")


# Truncated
result2 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world",
    output_dimensionality=10)

In [9]:
(len(result1['embedding']), len(result2['embedding']))

(768, 10)

### <b><font color='darkgreen'>Specify `task_type`</font></b>
Let's look at all the parameters the `embed_content` method takes. There are five:
* **model**: Required. Must be `models/text-embedding-004` or `models/embedding-001`.
* **content**: Required. The content that you would like to embed.
* **task_type**: Optional. The task type for which the embeddings will be used.
* **title**: Optional. You should only set this parameter if your task type is `retrieval_document` (or document).
* **output_dimensionality**: Optional. Reduced dimension for the output embedding. If set, excessive values in the output embedding are truncated from the end. This is supported by `models/text-embedding-004`, but cannot be specified in `models/embedding-001`.

`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application. The following task_type parameters are accepted:
* **unspecified**: If you do not set the value, it will default to `retrieval_query`.
* **retrieval_query** (or **query**): The given text is a query in a search/retrieval setting.
* **retrieval_document** (or **document**): The given text is a document from a corpus being searched. Optionally, also set the title parameter with the title of the document.
* **semantic_similarity** (or **similarity**): The given text will be used for Semantic Textual Similarity (STS).
* **classification**: The given text will be classified.
* **clustering**: The embeddings will be used for clustering.
* **question_answering**: The given text will be used for question answering.
* **fact_verification**: The given text will be used for fact verification.

In [10]:
# Notice the API returns different embeddings depending on `task_type`
result1 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world")

result2 = genai.embed_content(
    model="models/text-embedding-004",
    content="Hello world",
    task_type="document")

In [11]:
print(str(result1['embedding'])[:50], '... TRIMMED]')
print(str(result2['embedding'])[:50], '... TRIMMED]')

[0.013168523, -0.008711934, -0.046782676, 0.000699 ... TRIMMED]
[0.023399517, -0.00854715, -0.052534223, -0.012143 ... TRIMMED]


## <b><font color='darkblue'>Supplement</font></b>
Check out these examples in the Cookbook to learn more about what you can do with embeddings:
* [**Search Reranking**: Use embeddings from the Gemini API to rerank search results from Wikipedia.](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb)
* [**Anomaly detection with embeddings**: Use embeddings from the Gemini API to detect potential outliers in your dataset.](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb)
* [**Train a text classifier**: Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb)
* [Embeddings have many applications in Vector Databases, too. Check out this example with Chroma DB.](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb)
* [You can learn more about embeddings in general on ai.google.dev in the embeddings guide](https://ai.google.dev/docs/embeddings_guide)
* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).
* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents).