#Text Generation with Large Language Models in Google Colab: A Step-by-Step Guide.

Definition of variables, constants and functions:



1.   model_name:
2.   List item




Cell 1: Install Required Libraries

In [None]:
# Install the Hugging Face transformers library, which provides easy access to pre-trained language models
# Check the Hugging Face documentation for more information
!pip install transformers




Cell 2: Import Libraries and Load Model

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Define the model name. You can replace 'gpt2' with the name of other model if available.
model_name = "gpt2"

# Load the tokenizer associated with the model. The tokenizer converts text into the format the model understands.
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the language model. AutoModelForCausalLM is used for models that generate text in a causal, left-to-right manner.
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move the model to the GPU if available to speed up computations, otherwise use the CPU.
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

Cell 3: Define the Prompt and Tokenize


In [None]:
# Define the prompt for text generation. Since we plan to generate multiple prompts, label this as prompt_number_01.
prompt_number_01 = "This patient was admitted to hospital with a heart attack and"

# Tokenize the input prompt. Tokenization is the process of converting text into numerical tokens that the model can process.
# The `return_tensors="pt"` option specifies that the output should be in PyTorch format (pt).
inputs = tokenizer(prompt_number_01, return_tensors="pt").to(device)


Cell 4: Generate Text


In [None]:
# Generate text based on the input prompt with specified settings.
# - max_length=50 limits the length of the generated text to 50 tokens.
# - num_return_sequences=1 specifies that only one sequence of text should be generated.
# - temperature=0.7 controls randomness; a value less than 1 makes the output more deterministic.
# - pad_token_id=tokenizer.eos_token_id sets the padding token to the end-of-sequence token for coherent generation.

outputs = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Adding attention mask for reliable results
    max_length=50,
    num_return_sequences=1,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id  # Setting pad token ID to eos token ID
)



Cell 5: Decode and Print Output


In [None]:
# Decode the generated output tokens back into readable text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print the generated text
print("Generated text:", generated_text)


Generated text: This patient was admitted to hospital with a heart attack and died on the way to the hospital.

The patient was taken to the hospital with a serious heart attack.

The patient was taken to the hospital with a serious heart attack.



This repetitive output can happen if the model gets "stuck" in a loop, a common issue with smaller models like gpt2. Here are some adjustments you can make to improve the diversity and coherence of the generated text:


*   Increase the temperature: Raising it slightly can introduce more randomness and reduce repetition.
*  Adjust top_k and top_p: Setting these values can help encourage more diverse outputs by controlling how the model samples from its predictions.


In [None]:
# Generate text with additional parameters to improve output diversity
outputs = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    max_length=50,
    num_return_sequences=1,
    temperature=1.0,  # Increasing temperature for more randomness
    top_k=50,         # Limits sampling to top 50 tokens, reducing repetitive patterns
    top_p=0.9,        # Enables nucleus sampling to further encourage diverse output
    pad_token_id=tokenizer.eos_token_id
)




In [None]:
# Decode the generated output tokens back into readable text
generated_text_attempt_02 = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print the generated text
print("Generated text:", generated_text_attempt_02)

Generated text: As a student of health data science, I'd like to see a more comprehensive approach to the problem of obesity.

I'm not sure how to answer this question. I'm not sure how to answer this question.

I'm not
