<center><a href="https://www.pieriantraining.com/" target="_blank"><img src="../PTCenteredPurple.png" alt="Pierian Training Logo" /></a></center>


# Parameters

### Customizing the Behavior of Text Models in Google Vertex AI's PaLM API

When you're working with the text-bison model (and other models) through the PaLM API in Google Vertex AI, you have the ability to tweak its behavior to better suit your specific needs. This is done by adjusting various parameters that control how the model generates text. Let's break down what each of these parameters means and how you can use them:

#### Parameters:

1. **Temperature**: 
    - **What it Does**: Controls the "creativity" of the model's responses.
    - **How to Use It**: A higher value (closer to 1) will make the model's output more random and creative. A lower value (closer to 0) will make the output more focused and deterministic.
    - **Example**: If you set the temperature to 0.2, the model is more likely to generate safe and predictable text. On the other hand, a temperature of 0.8 would result in more diverse and unexpected outputs.

2. **Max Output Tokens**: 
    - **What it Does**: Limits the length of the generated text.
    - **How to Use It**: Set this parameter to the maximum number of tokens you want in the output. A token can be as short as one character or as long as one word.
    - **Example**: If you set `max_output_tokens` to 50, the model will generate text that is no longer than 50 tokens.

3. **Top_p**: 
    - **What it Does**: Controls the diversity of the next token based on cumulative probability.
    - **How to Use It**: A higher value (closer to 1) means the model will consider a broader range of possible next tokens. A lower value (closer to 0) narrows down the options.
    - **Example**: If you set `top_p` to 0.9, the model will consider a wider array of tokens for the next position in the sequence, making the output more varied.

4. **Top_k**: 
    - **What it Does**: Limits the number of top candidates for the next token.
    - **How to Use It**: A higher value will allow the model to sample from more possible next tokens, making the output more diverse. A lower value will restrict the model to a smaller set of highly probable next tokens.
    - **Example**: If you set `top_k` to 40, the model will choose the next token from the top 40 most likely candidates.

By understanding and adjusting these parameters, you can fine-tune the behavior of the text-bison@001 model to generate text that aligns with your specific requirements.

---
---
## Setting Up Our Model

In [14]:
from vertexai.language_models import TextGenerationModel

In [15]:
# Choose this string based on Console Lecture
model_name = 'text-bison'
model = TextGenerationModel.from_pretrained(model_name)

----
----

## Max Output Tokens
### Understanding the max_output_tokens Parameter in Google Vertex AI's PaLM API

#### What Are Tokens?

In the context of language models like text-bison@001, a "token" is a unit of text that the model reads. A token can be as small as a single character or as long as a word. It's crucial to understand the concept of tokens because both the input and output of the model are constrained by token limits.

- **Token Size**: On average, a token is approximately four characters long.
- **Token-to-Word Conversion**: About 100 tokens usually equate to roughly 60-80 words.

#### What is the max_output_tokens Parameter?

The `max_output_tokens` parameter sets the upper limit on the number of tokens that can be generated in the model's response. This parameter is especially useful when you want to control the length of the generated text.

#### How Does max_output_tokens Affect the Generated Response?

1. **Shorter Responses**: 
    - **Lower Values**: If you set `max_output_tokens` to a lower number (e.g., 10 or 20), the model will generate shorter responses.
    - **Use-Cases**: This is useful for tasks like generating headlines, titles, or brief summaries.

2. **Longer Responses**: 
    - **Higher Values**: Setting `max_output_tokens` to a higher number (e.g., 500 or 1024) allows the model to generate longer, more detailed responses.
    - **Use-Cases**: This is beneficial for tasks like article writing, storytelling, or generating detailed explanations.

3. **Default Setting**: 
    - The default value for `max_output_tokens` is 128, which usually results in a moderately sized output suitable for a variety of tasks.

#### Additional Resources:

For a more in-depth understanding of the `max_output_tokens` parameter and other settings, you can refer to the official Google Cloud Vertex AI documentation: [Model Parameters in Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#text_model_parameters).

By understanding how the `max_output_tokens` parameter works, you can better control the length and detail of the text generated by the model, making it more suited to your specific needs.


In [10]:
max_output_tokens_val = 5

response = model.predict(
    prompt="Write me a funny poem about dogs",
    max_output_tokens=max_output_tokens_val,
)
 
print(response.text)

 There once was a dog


In [11]:
max_output_tokens_val = 10

response = model.predict(
    prompt="Write me a funny poem about dogs",
    max_output_tokens=max_output_tokens_val,
)
 
print(response.text)

 There once was a dog named Sparky,



In [12]:
max_output_tokens_val = 2000

response = model.predict(
    prompt="Write me a funny poem about dogs",
    max_output_tokens=max_output_tokens_val,
)
 
print(response.text)

 There once was a dog named Sparky,
Who was always quite playful and barky.
He loved to chase balls,
And run down the halls,
And he never failed to make me happy.

One day, Sparky was playing in the park,
When he saw a squirrel and started to bark.
He chased the squirrel up a tree,
But the squirrel was too quick, you see,
And Sparky came tumbling down with a spark.

He landed in a big pile of mud,
And he looked like a giant chocolate bud.
I couldn't help but laugh,
At my silly dog, Sparky,
And his muddy, chocolate-y face.

I helped Sparky clean up,
And we went home for a big pup-cup.
We cuddled up on the couch,
And I read Sparky a story,
And we both had a very happy day.

So if you're ever feeling down,
Just remember the story of Sparky the clown.
He's sure to put a smile on your face,
And make you feel happy in your space.


In [13]:
max_output_tokens_val = 1000

response = model.predict(
    prompt="Write me a funny poem about dogs",
    max_output_tokens=max_output_tokens_val,
)
 
print(response.text)

 There once was a dog named Sparky,
Who was always quite playful and barky.
He loved to chase balls,
And run down the halls,
And he never failed to make me happy.

One day, Sparky was playing in the park,
When he saw a squirrel and started to bark.
He chased the squirrel up a tree,
But the squirrel was too quick, you see,
And Sparky came tumbling down with a spark.

He landed in a big pile of mud,
And he looked like a giant chocolate bud.
I couldn't help but laugh,
At my silly dog, Sparky,
And his muddy, chocolate-y face.

I helped Sparky clean up,
And we went home for a big pup-cup.
We cuddled up on the couch,
And I read Sparky a story,
And we both had a very happy day.

So if you're ever feeling down,
Just remember the story of Sparky the clown.
He's sure to put a smile on your face,
And make you feel happy in your space.


---

---

## Temperature

### Understanding the Temperature Parameter in Google Vertex AI's PaLM API

#### What is the Temperature Parameter?

The temperature parameter is a crucial setting in the text-bison@001 model available through Google Vertex AI's PaLM API. It plays a role in the sampling process during text generation, particularly when the `top_p` and `top_k` parameters are also in play. Essentially, the temperature parameter controls how "random" or "creative" the model's generated text will be.

#### How Does Temperature Affect the Generated Response?

1. **Low Temperature Values (e.g., 0.0 - 0.2)**:
    - **Deterministic Output**: If you set the temperature to 0, the model will always choose the token with the highest probability, resulting in a deterministic output.
    - **Less Open-Ended**: Lower temperatures are ideal for scenarios where you need a more straightforward, predictable response.
    - **Common Phrases**: With a low temperature, the model is more likely to generate commonly used words or phrases.

2. **High Temperature Values (e.g., 0.8 - 1.0)**:
    - **Creative Output**: Higher temperatures make the model more "adventurous" in its choice of tokens, leading to more diverse and creative text.
    - **Exploratory**: You're more likely to see rare or unusual words and phrases in the output.
  
3. **Moderate Temperature Values (e.g., 0.2 - 0.7)**:
    - **Balanced Output**: These settings offer a balance between creativity and determinism, making them a good starting point for many use cases.

#### Important Considerations:

- **Risk of Nonsensical Output**: While a high temperature can make the text more interesting, it also increases the risk of generating text that doesn't make sense or is inappropriate. This phenomenon is sometimes referred to as "hallucinations."
  
- **Starting Point**: For most applications, a good starting point is a temperature of 0.2. This offers a balance between generating text that is both coherent and slightly creative.

By understanding how the temperature parameter works and its impact on text generation, you can better tailor the model's output to meet your specific needs. Always remember to use this parameter thoughtfully, considering the context and purpose of your text generation task.

In [32]:
## RE-RUN THIS CELL MULTIPLE TIMES< NOTICE THE OUTPUT!
temp_val = 0.0
prompt  = "Continue this story: 'I was walking and then a most peculiar thing happened.'"

response = model.predict(
    prompt=prompt ,
    temperature=temp_val,
     max_output_tokens=1000
)
 
print(response.text)

 I was walking and then a most peculiar thing happened. I saw a man walking towards me, but as he got closer, I realized that he was not a man at all. He was a giant spider, with eight long legs and a hairy body. I was so shocked that I couldn't move. I just stood there, frozen in fear, as the spider got closer and closer.

Finally, the spider stopped in front of me. It raised its front legs and stared at me with its eight red eyes. I couldn't believe what I was seeing. I was about to be eaten by a giant spider!

But then, the spider did something unexpected. It turned and walked away. I was so relieved that I almost cried. I didn't know what had just happened, but I wasn't going to question it. I just turned and ran away as fast as I could.

I didn't stop running until I was safe at home. I told my family what had happened, and they were just as shocked as I was. We all agreed that it was a very strange experience, and we're still not sure what to make of it.

But one thing is for sur

In [33]:
## RE-RUN THIS CELL MULTIPLE TIMES< NOTICE THE OUTPUT!
temp_val = 1.0
prompt  = "Continue this story: 'I was walking and then a most peculiar thing happened.'"


response = model.predict(
    prompt=prompt ,
    temperature=temp_val,
     max_output_tokens=1000
)
 
print(response.text)

 I was walking home from school one day when I saw a most peculiar thing. A large, white rabbit wearing a waistcoat and a pocket watch was running very fast. The rabbit looked like it was in a hurry, so I decided to follow it. 
The rabbit led me down a hole, and I fell and fell and fell. When I landed, I found myself in a strange and wonderful world. There were talking animals, magical creatures, and all sorts of other strange and wonderful things. I had many adventures in that world, and I learned a lot about myself along the way. 
One day, I decided it was time to go home. I said goodbye to my new friends, and I stepped back through the hole. When I landed, I was back in my own backyard. I looked up and saw the white rabbit running away. I smiled, knowing that I would never forget my adventures in Wonderland.


## Top-K

### Understanding the top_k Parameter in Google Vertex AI's PaLM API

#### What is the top_k Parameter?

The `top_k` parameter controls the number of most probable tokens that the model considers when generating the next token in a sequence. This is known as "top-k sampling."

- **Greedy Decoding**: If `top_k` is set to 1, the model will always choose the most probable token, a method also known as greedy decoding.
- **Top-k Sampling**: If `top_k` is set to a number greater than 1 (e.g., 3, 10, or 40), the model will consider that many of the most probable tokens for the next position in the generated text.

#### How Does It Work in Conjunction with Other Parameters?

The `top_k` parameter often works in tandem with other parameters like `top_p` and `temperature`:

1. **Step 1**: The model first identifies the `top_k` tokens with the highest probabilities.
2. **Step 2**: These tokens are then filtered based on the `top_p` value, which sets a cumulative probability cutoff.
3. **Step 3**: Finally, one of these filtered tokens is selected based on the `temperature` setting, which controls the randomness of the choice.

#### How Does top_k Affect the Generated Response?

1. **Lower top_k Values (e.g., 1 - 10)**:
    - **Less Random**: The model will generate more predictable and focused text because it's limited to a smaller set of highly probable tokens.
    - **Use-Cases**: This is useful for tasks that require specific and accurate responses, such as FAQ generation or summarization.

2. **Higher top_k Values (e.g., 20 - 40)**:
    - **More Random**: The model has a broader range of tokens to choose from, making the output more diverse and creative.
    - **Use-Cases**: This is beneficial for creative writing, brainstorming, or any task where a diverse range of ideas is desired.

3. **Default Setting**:
    - The default value for `top_k` is 40, which provides a good balance between predictability and creativity for most general-purpose tasks.

By understanding how the `top_k` parameter works, you can better control the randomness and focus of the text generated by the model, tailoring it to your specific needs.

In [None]:
prompt = "Plan a trip to Rome"
model.predict(prompt=prompt,max_output_tokens=1024,top_k=1,temperature=1.0)

## Top-P
### Understanding the top_p Parameter in Google Vertex AI's PaLM API

#### What is the top_p Parameter?

The `top_p` parameter, also known as "nucleus sampling," is a setting that controls the diversity of the tokens selected by the model during text generation. It adjusts the probability distribution of the next token based on a cumulative probability cutoff.

- **Cumulative Probability**: This is the sum of probabilities of individual tokens, sorted in descending order.
- **Cutoff Probability**: The `top_p` value acts as a cutoff, allowing only tokens whose cumulative probability exceeds this value to be considered for the next position in the generated text.

#### How Does It Work? An Example:

Let's say we have three tokens: A, B, and C, with probabilities of 0.3, 0.2, and 0.1, respectively. If `top_p` is set to 0.5, then:

- The cumulative probability for A and B is 0.3 + 0.2 = 0.5, which matches the `top_p` value.
- Token C, with a probability of 0.1, is excluded because adding it would exceed the `top_p` value.
- The model will then choose between A and B for the next token, based on other parameters like `temperature`.

#### How Does top_p Affect the Generated Response?

1. **Higher top_p Values (e.g., 0.8 - 1.0)**:
    - **Diverse Outputs**: The model can sample from a larger set of tokens, making the output more varied and interesting.
    - **Use-Cases**: This is useful when you want creative or exploratory text, like brainstorming sessions or storytelling.

2. **Lower top_p Values (e.g., 0.2 - 0.5)**:
    - **Predictable Outputs**: The model is constrained to a smaller set of highly probable tokens, resulting in more focused and predictable text.
    - **Use-Cases**: This is beneficial for tasks that require precise and accurate information, like summarizing a technical document.

3. **Default Setting**:
    - The default value for `top_p` is 0.95, which offers a good balance between diversity and predictability for most general-purpose tasks.

By understanding the `top_p` parameter, you can fine-tune the diversity and predictability of the text generated by the model, making it more aligned with your specific requirements.

In [34]:
prompt = "Tell me a crazy story: " 
model.predict(prompt=prompt,max_output_tokens=1024,top_k=40,top_p=1.0,temperature=1.0)

[top_p = 0.0]


## Stop Sequences

In [None]:
model.predict(prompt="Give me a numbered list of vegetables",
             stop_sequences=['4.'])