# Setup

In [1]:
! pip install -qqU transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.7/9.7 MB[0m [31m69.3 MB/s[0m eta [36m0:00:00[0m
[?25h

This line installs or updates the "transformers" Python package using pip, Python's package manager. The command has several special flags that modify how it works:

The exclamation mark (!) tells a Jupyter notebook to run this as a shell command rather than Python code. This is necessary because pip commands need to be run at the system level.

The -qq flag makes pip run in "super quiet" mode, showing minimal output during installation. This keeps your notebook clean and free of installation messages.

The U flag tells pip to upgrade the package to the latest version if it's already installed.

The transformers package, created by Hugging Face, provides tools for working with machine learning models, especially those used in natural language processing. It includes pre-trained models and utilities for tasks like text classification, translation, and question answering.

In [2]:
from transformers import pipeline, set_seed
import torch
import warnings
import os
warnings.simplefilter(action='ignore', category=FutureWarning)

os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

These lines set up a Python environment for working with machine learning models. Let's break down each part:

First, we import two key elements from the transformers library: `pipeline` and `set_seed`. The pipeline function creates ready-to-use machine learning pipelines that can perform various tasks like text generation or translation. `set_seed` controls randomness in the model's output, making results reproducible.

We also import `torch`, which is PyTorch - a deep learning framework that powers many of the transformers models behind the scenes. Think of PyTorch as the engine that runs the neural networks.

The next lines deal with warnings and environment settings. `warnings.simplefilter` tells Python to ignore "FutureWarning" messages. These warnings typically alert developers about upcoming changes in libraries, but they can clutter output when you're focused on getting work done. It's like turning off non-critical alerts so you can focus on the important messages.

The final line sets an environment variable for Hugging Face's file transfer system. By setting `HF_HUB_ENABLE_HF_TRANSFER` to "1", we're telling the system to use Hugging Face's own file transfer mechanism when downloading models. This can be more reliable than the default method, especially when dealing with large model files or unstable internet connections. Think of it as choosing a specialized delivery service instead of standard shipping for important packages.

This setup code creates a foundation for working with sophisticated machine learning models while managing technical details that could otherwise get in the way of the main task.

In [3]:
class CFG:
    model = "microsoft/phi-2"
    device = "cuda"

This code creates a configuration class called `CFG` that acts as a central place to store important settings for a machine learning model. It's a common pattern in machine learning projects to keep configurations organized this way rather than scattered throughout the code.

Let's examine the two settings defined here:

1. `model = "microsoft/phi-2"` specifies which model we'll be using. The string "microsoft/phi-2" is an identifier that points to Microsoft's Phi-2 language model in the Hugging Face model hub. Think of this like an address telling our program exactly which AI model to download and use. Phi-2 is a relatively compact but capable language model designed to be more efficient than larger models while still maintaining good performance.

2. `device = "cuda"` tells the program to run the model on a CUDA-enabled GPU (Graphics Processing Unit) rather than the CPU. CUDA is NVIDIA's platform for running computations on their GPUs. Using a GPU dramatically speeds up machine learning tasks because GPUs are designed to perform many calculations in parallel, much like how they process millions of pixels simultaneously when rendering graphics in video games.

By placing these settings in a class, we make it easy to:
- Keep all configuration in one place
- Modify settings by changing just one line of code
- Share settings across different parts of the program
- Create different configurations for different scenarios (like training vs testing)

This configuration setup is particularly important because language models like Phi-2 require careful management of computational resources and consistent settings across different parts of the application.

# Functions

In [4]:
def generate_text(prompt, temperature, top_p, model_name , max_length=100, num_return_sequences=1):
    # Set up the text generation pipeline
    generator = pipeline('text-generation', model=model_name, torch_dtype=torch.float16, device_map = CFG.device)

    # Generate text
    generated = generator(
        prompt,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        temperature=temperature,
        top_p=top_p,
        do_sample=True,
        truncation = True,
        pad_token_id= None
    )


    return generated[0]['generated_text']

This code defines a function called `generate_text` that sets up and runs a text generation model. Let's break down how it works, starting with the function's parameters:

The function takes several key parameters that control how the text is generated:
- `prompt` is the input text that the model will use as a starting point
- `temperature` controls how random or focused the generation is (higher values make output more creative but less predictable)
- `top_p` controls the cumulative probability threshold for selecting words (another way to balance creativity and focus)
- `model_name` specifies which language model to use
- `max_length` sets the maximum length of the generated text (defaulting to 100)
- `num_return_sequences` determines how many different text versions to generate (defaulting to 1)

Inside the function, the first major step creates a text generation pipeline. The line:
```python
generator = pipeline('text-generation', model=model_name, torch_dtype=torch.float16, device_map = CFG.device)
```
does several important things:
1. Creates a pipeline specifically for text generation
2. Uses the specified model
3. Sets the numerical precision to float16 (half precision) to save memory while maintaining good performance
4. Maps the model to the device specified in our CFG class (in this case, a GPU)

The second major step generates the text using the pipeline:
```python
generated = generator(
    prompt,
    max_length=max_length,
    num_return_sequences=num_return_sequences,
    temperature=temperature,
    top_p=top_p,
    do_sample=True,
    truncation=True,
    pad_token_id=None
)
```
This configuration tells the model:
- To use sampling (`do_sample=True`) rather than always picking the most likely next word
- To truncate text that exceeds the maximum length
- To handle padding in a way that works with most modern language models (`pad_token_id=None`)

Finally, the function returns just the generated text from the first (and in this case, only) sequence. The `generated[0]['generated_text']` access pattern shows that the pipeline returns a list of dictionaries, where each dictionary contains the generated text along with other potential metadata.

This function encapsulates all the complexity of text generation behind a simple interface - you just provide a prompt and some generation parameters, and it handles all the technical details of running the model and processing its output.

# Test

In [5]:

prompt = "In a world where technology and nature coexist,"

print("Demonstrating the impact of temperature and top_p on text generation:")
print("\nPrompt:", prompt)



Demonstrating the impact of temperature and top_p on text generation:

Prompt: In a world where technology and nature coexist,


The prompt variable contains the text "In a world where technology and nature coexist," - a creative writing prompt that sets up an intriguing scenario. By starting with this open-ended phrase, the code prepares to showcase how the language model can complete this story in different ways.

The first print statement introduces the purpose of the demonstration, explicitly stating that we'll be exploring how temperature and top_p affect text generation. These two parameters are crucial for controlling how the AI model generates text:

Temperature works like a creativity dial - when it's low (close to 0), the model becomes very focused and predictable, always choosing the most likely next words. When it's high (closer to 1), it's like turning up the model's imagination, making it more willing to take creative risks and generate surprising text.

Top_p (also called nucleus sampling) determines how many of the most likely next words the model considers. A low top_p means the model only picks from a small set of very likely words, while a higher value lets it choose from a broader range of possibilities.

The second print statement displays the prompt that will be used, making it clear to the user what starting point the model will use for all its generations. This helps users compare how different parameter settings affect the continuation of the same initial scenario.


In [6]:

# Test different temperature values
temperatures = [0.2, 0.5, 1.0, 1.5]
for temp in temperatures:
    print('------')
    print(f"\nTemperature: {temp}, Top_p: 1.0")
    generated_text = generate_text(prompt, model_name = CFG.model, temperature=temp, top_p=1.0)
    print(generated_text)

------

Temperature: 0.2, Top_p: 1.0


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, the story of the "Tech-Natura" company serves as a reminder of the importance of finding a balance between innovation and sustainability. By embracing the principles of engineering and respecting the natural world, we can create a future where technology and nature thrive together.


------

Temperature: 0.5, Top_p: 1.0


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, it's important to appreciate the beauty and diversity of both. From the intricate patterns of a spider's web to the vibrant colors of a butterfly's wings, nature never fails to amaze us with its wonders. And with the help of technology, we can capture and share these moments with the world.

One example of this is the growing trend of nature photography. With the rise of smartphones and social media, it's easier than ever to snap
------

Temperature: 1.0, Top_p: 1.0


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, it is important to ensure that the health of our planet does not come at the cost of our personal information. A new initiative has been launched to encourage people to be more mindful of their personal information on mobile devices while enjoying the great outdoors.

The initiative, called "Green Apps," aims to promote the use of environmentally-friendly mobile apps that also prioritize user privacy and security. By downloading Green Apps, users can enjoy a range of benefits,
------

Temperature: 1.5, Top_p: 1.0


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, there was always a need for more creativity and imagination. People longed for originality and inspiration to guide their lives. Among those seeking such breakthroughs came two individuals - Amelia and Leo - united by their love for movies, shows, and language arts. Despite their different backgrounds, the duo hoped to challenge convention and explore unassuming connections.

Amelia, a brilliant psychologist well known for her groundbreaking research in the fields of creativity and psychology,


This code runs an experiment to demonstrate how the temperature parameter affects text generation by running the model multiple times with different temperature values. Let's break down how it works and understand why this is significant.

The code starts by creating a list called `temperatures` that contains four different temperature values: 0.2, 0.5, 1.0, and 1.5. These values were chosen strategically to show a range of effects:
- 0.2 represents a very "conservative" setting
- 0.5 is a moderate setting
- 1.0 is the default "standard" temperature
- 1.5 represents a highly "creative" setting

The code then uses a for loop to iterate through each temperature value. For each iteration, it:
1. Prints a line of dashes to visually separate each test
2. Displays the current temperature value (keeping top_p constant at 1.0 to isolate the effect of temperature)
3. Calls our `generate_text` function with the current temperature
4. Prints the generated text

This experimental setup holds all other parameters constant while varying only the temperature. This is a fundamental principle in scientific experimentation - changing one variable at a time to understand its specific effect. In this case, we're keeping the same prompt and top_p value (1.0) while only changing the temperature.

Temperature fundamentally changes how the model makes choices about what words to use next. Think of it like this: if you're telling a story and you come to a fork in the road, a low temperature (like 0.2) means you'll almost always take the most obvious path. A high temperature (like 1.5) means you might venture down some unexpected routes.

When this code runs, we should expect to see:
- At temperature 0.2: Very consistent, predictable completions that stick closely to common patterns
- At temperature 0.5: A balance between predictability and creativity
- At temperature 1.0: More varied and creative completions
- At temperature 1.5: Highly creative and potentially surprising completions that might take unexpected directions

In [7]:
# Test different top_p values
top_ps = [0.5, 0.7, 0.9, 1.0]
for top_p in top_ps:
    print('------')
    print(f"\nTemperature: 1.0, Top_p: {top_p}")
    generated_text = generate_text(prompt, model_name = CFG.model, temperature=1.0, top_p=top_p)
    print(generated_text)

------

Temperature: 1.0, Top_p: 0.5


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, the use of drones has become increasingly popular for various purposes. One such purpose is the creation of a drone flower, a small and delicate drone that can be controlled and maneuvered through the air. This project not only requires technical skills but also an understanding of the environment and its impact on living organisms.

To begin, the materials needed for this project are easily accessible and can be found in most hardware stores. This makes it a cost-effective
------

Temperature: 1.0, Top_p: 0.7


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, it is crucial to find a balance between the two. This is where the concept of sustainable living comes into play. Sustainable living refers to a lifestyle that aims to minimize the negative impact on the environment while meeting the needs of the present generation without compromising the ability of future generations to meet their own needs. One aspect of sustainable living is utilizing natural resources, such as solar energy, to power our daily lives.

The knowledge presented in the paragraph above
------

Temperature: 1.0, Top_p: 0.9


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, the intricate interplay between plants, humans, and the environment can offer valuable lessons. The resilience of plants in surviving adverse conditions teaches us about the power of adaptation. The benefits of technology, such as reading books online, remind us of the importance of staying informed. And the consequences of excessive computer use prompt us to seek balance in our digital lives. By acknowledging and embracing these connections, we can strive to create a harmonious and fulfilling existence.
------

Temperature: 1.0, Top_p: 1.0


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a world where technology and nature coexist, one device has revolutionized how people interact with their environment and with each other. The Cell Phone is a small electronic device that people use for communication, information access, entertainment, and many other purposes. But what's the science behind our dependence on the Cell Phone? And how can we use engineering concepts to improve it? Let's find out with the help of science, engineering, and design!

Chapter 1: The Science of Cell Phones


This code runs a second experiment that mirrors the first one, but now explores how the top_p parameter affects text generation. Let's understand what this code does and why top_p is such an important concept in text generation.

The code creates a list called `top_ps` with four carefully chosen values: 0.5, 0.7, 0.9, and 1.0. To understand why these values matter, we need to grasp what top_p does. While temperature controls how "random" the model's choices are, top_p (also called nucleus sampling) controls what fraction of the probability distribution the model considers when choosing the next word.

Think of it this way: imagine the model has 100 possible words it could use next, each with a different probability. If top_p is 0.7, the model will only consider the most likely words whose cumulative probabilities add up to 70% of the total probability mass. It's like telling the model "only look at the most likely options that together make up 70% of all possibilities."

The code uses a for loop to try each top_p value while keeping the temperature constant at 1.0. For each iteration, it:
1. Creates a visual separator with dashes
2. Shows the current settings (temperature=1.0 and the current top_p value)
3. Generates text using these parameters
4. Displays the result

When this code runs, we should see different patterns in the generated text:
- At top_p = 0.5: The model only considers the most likely words, leading to very focused and conventional text
- At top_p = 0.7: The model has a bit more flexibility but still stays relatively conservative
- At top_p = 0.9: The model can access a broader range of word choices while still filtering out unlikely options
- At top_p = 1.0: The model considers all possible next words (weighted by their probabilities)

The real power of this experiment comes from seeing it alongside the temperature experiment. Together, they show two different ways to control text generation:
- Temperature controls how "sharp" or "flat" the probability distribution becomes
- Top_p controls how much of that distribution we consider

By setting these parameters thoughtfully, we can fine-tune the balance between creativity and coherence in the generated text. For creative writing, we might want higher values of both parameters to allow for more imaginative outputs. For technical writing or fact-based responses, we might prefer lower values to keep the text more focused and predictable.

This systematic exploration of parameters is crucial for understanding how to get the best results from AI language models for different types of tasks and desired outcomes.