# Google Gemini Notes

## Models

* __gemini-pro__: Optimized for high intelligence tasks, the most powerful Gemini model
* __gemini-flash__: Optimized for multi-modal use-cases, where speed and cost are important
* __text-embedding__: Generates text embedding.
* __aqa__: Perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability.

[Ref](https://ai.google.dev/gemini-api/docs/models/gemini)

## Generating Text

Use the pro model:

```python

model = genai.GenerativeModel('gemini-1.5-flash')

# generate_content handle various use cases, including multimodal input
response = model.generate_content("What is the meaning of life?")

# Responses re given in response.text
# You can use a method to convert the output to markdown
to_markdown(response.text)

# You can use response.prompt_feedback to understand why there was no response (e.g. there may be safety concerns)
response.prompt_feedback

# You can view multiple possible responses with response.candidates
response.candidates

# Responses can also be streamed, instead of waiting for the whole thing to be generating at once.
```

## Generating Text From Images and Text Inputs

```python

import PIL.Image

img = PIL.Image.open('image.jpg')


model = genai.GenerativeModel('gemini-1.5-flash')

response = model.generate_content(img)

to_markdown(response.text)

```


> This image shows two glass containers filled with prepared food...

You can also pass in a list of strings and images:

```python
response = model.generate_content(["Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping.", img], stream=True)

response.resolve()

to_markdown(response.text)
```

> Meal prepping is a great way to save time and money, and it can also help you to eat healthier. 

## Chat Conversations

You can use the `ChatSession` class to manage conversation state.

```python
model = genai.GenerativeModel('gemini-1.5-flash')
chat = model.start_chat(history=[])
chat
```

```
ChatSession(
    model=genai.GenerativeModel(
        model_name='models/gemini-1.5-flash',
        generation_config={},
        safety_settings={},
        tools=None,
        system_instruction=None,
        cached_content=None
    ),
    history=[]
)
```

History can then be stored and received.

```python
response = chat.send_message("In one sentence, explain how a computer works to a young child.")
to_markdown(response.text)

chat.history
```

```
[parts {
   text: "In one sentence, explain how a computer works to a young child."
 }
 role: "user",
 parts {
   text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
 }
 role: "model"]
```

You can iterate around the history like this:

```python
for message in chat.history:
  display(to_markdown(f'**{message.role}**: {message.parts[0].text}'))
```

```


    user: In one sentence, explain how a computer works to a young child.

    model: A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!

...
```

## Counting Tokens

Large language models have a context window, and the context length is often measured in terms of the number of tokens.

```python
model.count_tokens("What is the meaning of life?")
```

```
> total_tokens: 7
```

A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words.

## Using Embeddings

Embedding is a way of representing text as a list of floats in a vector to compare and contrast embeddings. Texts that have similar subject matter or sentiment should have similar embeddings when comparing using e.g. cosine similarity.

```python
result = genai.embed_content(
    model="models/text-embedding-004",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'],
    task_type="retrieval_document",
    title="Embedding of list of strings")

# A list of inputs > A list of vectors output
for v in result['embedding']:
  print(str(v)[:50], '... TRIMMED ...')
```

```
> [0.0040260437, 0.004124458, -0.014209415, -0.00183 ... TRIMMED ...
```

Depending on what the text is being used for, you can set different task types:

Task Type | Description
---       | ---
RETRIEVAL_QUERY	| Specifies the given text is a query in a search/retrieval setting.
RETRIEVAL_DOCUMENT | Specifies the given text is a document in a search/retrieval setting. Using this task type requires a `title`.
SEMANTIC_SIMILARITY	| Specifies the given text will be used for Semantic Textual Similarity (STS).
CLASSIFICATION	| Specifies that the embeddings will be used for classification.
CLUSTERING	| Specifies that the embeddings will be used for clustering.

## Safety Settings

You can set a safety setting to block potentially risky prompts.

## Customizable Paramters


Parameter | Description
---       | ---
Top p (probability) | The randomness or focus of the generated text. It specifies the probability distribution from which the next word is chosen during generation. Higher Top p (closer to 1): The model will choose the next word based on a more uniform probability distribution, leading to more creative and surprising but potentially less relevant outputs. Lower Top p (closer to 0): The model will prioritize the most likely continuations based on the current context, resulting in more predictable and relevant but potentially less creative outputs.
Top k (number) | Limits the number of possible continuations considered by the model when generating the next word. It acts as a filter, reducing the search space for the most likely next word. Higher Top k: The model considers a wider range of possibilities, potentially leading to more diverse and interesting outputs. Lower Top k: The model focuses on a smaller set of highly likely continuations, resulting in more consistent and focused outputs.
Temperature | Similar to Top p, controls the randomness of the generated text. However, it works by scaling the logits (log probabilities) of the candidate words before selecting the next one. Higher Temperature (greater than 1): Increases the randomness, making the model more likely to choose less probable but potentially more creative continuations. Lower Temperature (between 0 and 1): Decreases the randomness, favoring the most likely continuations and leading to more predictable outputs. Temperature of 1: Essentially acts like the original probability distribution.
Stop Sequence (string) | This parameter specifies a string or sequence of characters that signals the end of the text generation. Once the model encounters this sequence, it will stop generating further text. This is useful for controlling the length and focus of the generated content. For example, you might set the stop sequence to a specific punctuation mark (".", "?", "!") to indicate the end of a sentence or paragraph.
Max Output Length (number) | This parameter sets a hard limit on the maximum number of tokens (words or subwords) the model can generate. This helps prevent the generation of overly long or rambling outputs. It's useful when you need the generated text to be concise or fit within a specific word count.
Number of Response Candidates (number) | This parameter (potentially specific to certain use cases) determines how many candidate continuations the model generates for each step during the generation process.