<a href="https://colab.research.google.com/github/ihabiba/NLP-Labs/blob/main/LLM_in_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# **LARGE LANGUAGE MODELS AND ITS USAGE IN NLP APPLICATIONS**

**What is Large Language Model (LLM)?**

A Large Language Model (LLM) is an advanced deep learning model trained on massive amounts of text data to understand, generate, and process human language. LLMs, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), use transformer architectures to learn language patterns, relationships, and context at an unprecedented scale.

These models leverage self-attention mechanisms and deep neural networks to perform various Natural Language Processing (NLP) tasks, making them highly effective in understanding, summarizing, translating, and generating human-like text.

* Conversational AI & Chatbots – LLMs power virtual assistants and chatbots capable of engaging in human-like conversations, answering queries, and providing customer support.
* Machine Translation – Advanced translation models use LLMs to improve the accuracy and fluency of multilingual translations.
* Text Summarization – LLMs generate concise summaries from lengthy documents, helping users quickly grasp essential information.
* Sentiment Analysis – Businesses use LLM-powered sentiment analysis to assess customer opinions, brand reputation, and market trends.
* Text Generation – LLMs generate high-quality content for blogs, reports, emails, and creative writing, automating content creation.
* Code Generation & Debugging – AI-assisted coding tools leverage LLMs to write, refactor, and debug code efficiently.
* Information Retrieval & Question Answering – LLMs enhance search engines and knowledge bases by providing accurate and context-aware answers.
* Medical & Legal Document Analysis – LLMs process complex medical or legal texts for summarization, compliance checking, and decision support.
* Text-to-Speech & Speech-to-Text – LLMs support applications that convert text into natural speech and vice versa, improving accessibility.
* Personalized Recommendation Systems – LLMs enhance product and content recommendations by understanding user preferences from textual interactions.<p>

These applications demonstrate the transformative impact of LLMs on NLP, driving innovation across industries.

# **Sample Code of a Large Language Model (LLM) using Google FLAN-T5-Small**

Google FLAN-T5-Small, a free, lightweight, yet powerful Large Language Model (LLM).
✅ Provides 3 NLP functionalities:
* Text Generation 📝 – Generates text based on a given topic.
* Text Summarization ✂️ – Summarizes a given input text.
*  Question Answering ❓ – Answers a given question with or without context.
* No API key required! Fully free to use for teaching and learning purposes.

The following code demonstrates Text Generation, Summarization, and Question Answering using a free open-source Google FLAN-T5-Small.<br>


1. First install the the required libraries:



In [None]:
pip install transformers torch




* transformers – This is the Hugging Face Transformers library, which provides pre-trained models like BERT, GPT, T5, BART, and more.
* torch – This is PyTorch, a deep learning framework used for training and running neural networks.

2. Run the Python code below:

In [None]:
from transformers import pipeline

# Load a free LLM model from Hugging Face
model_name = "google/flan-t5-small"  # A lightweight yet powerful model
llm_pipeline = pipeline("text2text-generation", model=model_name)

def run_llm_demo():
    while True:
        # Display available NLP tasks
        print("\n🔹 Select an NLP Task:")
        print("1️⃣ Text Generation")
        print("2️⃣ Text Summarization")
        print("3️⃣ Question Answering")
        print("4️⃣ Exit")

        choice = input("Enter choice (1/2/3/4): ")

        if choice == "1":
            user_prompt = input("Enter a topic for text generation: ")
            prompt = f"Write a short paragraph about {user_prompt}."
            output = llm_pipeline(prompt, max_length=100)

        elif choice == "2":
            text_to_summarize = input("Enter text to summarize: ")
            prompt = f"Summarize this text: {text_to_summarize}"
            output = llm_pipeline(prompt, max_length=50)

        elif choice == "3":
            question = input("Ask a question: ")
            context = input("Provide context (or leave blank for general knowledge): ")
            if context.strip():
                prompt = f"Based on this context, answer the question: {context} \nQuestion: {question}"
            else:
                prompt = f"Answer this question: {question}"
            output = llm_pipeline(prompt, max_length=50)

        elif choice == "4":
            print("Exiting the program. Goodbye! 👋")
            break  # Exit the loop

        else:
            print("❌ Invalid choice. Please select 1, 2, 3, or 4.")
            continue  # Skip to the next iteration

        # Display LLM-generated output
        print("\n🔹 LLM Output:\n", output[0]['generated_text'])

# Run the LLM demo
run_llm_demo()


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0



🔹 Select an NLP Task:
1️⃣ Text Generation
2️⃣ Text Summarization
3️⃣ Question Answering
4️⃣ Exit
Enter choice (1/2/3/4): 2
Enter text to summarize: Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and human language. It enables machines to understand, interpret, and generate human language in a meaningful way.


Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



🔹 LLM Output:
 Artificial intelligence (NLP) is a field of artificial intelligence that focuses on the interaction between computers and human language.

🔹 Select an NLP Task:
1️⃣ Text Generation
2️⃣ Text Summarization
3️⃣ Question Answering
4️⃣ Exit
Enter choice (1/2/3/4): 4
Exiting the program. Goodbye! 👋


# **Sample Code of a Large Language Model (LLM) using Google FLAN-T5-Large**

This Python program demonstrates Text Generation, Summarization, and Question Answering using a free open-source LLM, Google FLAN-T5-Large.<br>

1. First install the the required libraries:



In [None]:
pip install transformers torch



* transformers – This is the Hugging Face Transformers library, which provides pre-trained models like BERT, GPT, T5, BART, and more.
* torch – This is PyTorch, a deep learning framework used for training and running neural networks.

2. Run the Python code below:

In [None]:
from transformers import pipeline

# Load a more powerful LLM model from Hugging Face
model_name = "google/flan-t5-large"  # Try using a larger model for better outputs
llm_pipeline = pipeline("text2text-generation", model=model_name)

def run_llm_demo():
    while True:
        # Display available NLP tasks
        print("\n🔹 Select an NLP Task:")
        print("1️⃣ Text Generation (More Detailed)")
        print("2️⃣ Text Summarization")
        print("3️⃣ Question Answering")
        print("4️⃣ Exit")

        choice = input("Enter choice (1/2/3/4): ")

        if choice == "1":
            user_prompt = input("Enter a topic for text generation: ")
            prompt = f"Write a detailed and informative paragraph about {user_prompt}."
            output = llm_pipeline(
                prompt,
                max_length=300,  # Generate longer text
                temperature=0.9,  # Increase creativity
                repetition_penalty=1.2,  # Reduce repetitive text
                top_p=0.9,  # Sample from top 90% of likely words
                top_k=50  # Consider top 50 words for prediction
            )

        elif choice == "2":
            text_to_summarize = input("Enter text to summarize: ")
            prompt = f"Summarize this text in a detailed yet concise way: {text_to_summarize}"
            output = llm_pipeline(prompt, max_length=100)

        elif choice == "3":
            question = input("Ask a question: ")
            context = input("Provide context (or leave blank for general knowledge): ")
            if context.strip():
                prompt = f"Based on this context, answer the question: {context} \nQuestion: {question}"
            else:
                prompt = f"Answer this question in a detailed way: {question}"
            output = llm_pipeline(prompt, max_length=150)

        elif choice == "4":
            print("Exiting the program. Goodbye! 👋")
            break  # Exit the loop

        else:
            print("❌ Invalid choice. Please select 1, 2, 3, or 4.")
            continue  # Skip to the next iteration

        # Display LLM-generated output
        print("\n🔹 LLM Output:\n", output[0]['generated_text'])

# Run the LLM demo
run_llm_demo()


Device set to use cuda:0



🔹 Select an NLP Task:
1️⃣ Text Generation (More Detailed)
2️⃣ Text Summarization
3️⃣ Question Answering
4️⃣ Exit
Enter choice (1/2/3/4): 2
Enter text to summarize: Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and human language. It enables machines to understand, interpret, and generate human language in a meaningful way.


Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



🔹 LLM Output:
 Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and human language. It enables machines to understand, interpret, and generate human language in a meaningful way.

🔹 Select an NLP Task:
1️⃣ Text Generation (More Detailed)
2️⃣ Text Summarization
3️⃣ Question Answering
4️⃣ Exit
Enter choice (1/2/3/4): 4
Exiting the program. Goodbye! 👋


This code demonstrate the usage of Google FLAN-T5-Large for richer, longer responses.<br>
Modification made:
* ✅ Increased max_length=300 to allow more words in generated text.
* ✅ Set temperature=0.9 to make the text more creative and diverse.
* ✅ Added repetition_penalty=1.2 to avoid redundancy in responses.
* ✅ Fine-tuned top_p=0.9 and top_k=50 to balance randomness and coherence.

# **Sample Code of a Large Language Model (LLM) using GPT-2 Model**

* This code demonstrates local text generation using the GPT-2 language model from the Hugging Face Transformers library.
* It loads a pre-trained model without requiring an API key, accepts a text prompt, and generates one or more text outputs.
* The code allows control over output length, creativity, and randomness through parameters such as max_length, temperature, top_k, and top_p, and showcases both creative and factual text generation modes.



In [None]:
"""
Text Generation using Hugging Face Transformers (No API Key Required)
Uses GPT-2 model running locally
"""

from transformers import pipeline, set_seed
import warnings
warnings.filterwarnings('ignore')

def text_generation(prompt, max_length=100, temperature=0.7, num_return=1):
    """
    Generate text based on a prompt using local GPT-2 model

    Args:
        prompt (str): The input prompt for text generation
        max_length (int): Maximum length of generated text
        temperature (float): Controls randomness (0.1-2.0)
        num_return (int): Number of sequences to generate

    Returns:
        list: Generated text sequences
    """
    print("\n=== TEXT GENERATION (GPT-2) ===")
    print(f"Prompt: {prompt}")
    print("Loading model... (first run may take a moment)")

    # Initialize the text generation pipeline with GPT-2
    generator = pipeline('text-generation', model='gpt2')
    set_seed(42)  # For reproducibility

    # Generate text
    results = generator(
          prompt,                      # Input text to continue from
          max_length=max_length,       # Maximum total length (prompt + generated)
          num_return_sequences=num_return,  # How many different outputs to generate
          temperature=temperature,      # Controls randomness/creativity, low value =deterministics, high value =creative
          do_sample=True,              # Enable random sampling (vs greedy)
          #Setting this to True is required for temperature, top_p, and top_k to have effect.
          top_p=0.95,                  # Nucleus sampling threshold
          top_k=50                     # Top-K sampling limit
    )

    print("\nGenerated Text:")
    for i, result in enumerate(results, 1):
        print(f"\n--- Output {i} ---")
        print(result['generated_text'])

    return results

def creative_writing(prompt):
    """
    Generate creative writing with higher temperature
    """
    print("\n=== CREATIVE WRITING MODE ===")
    return text_generation(prompt, max_length=150, temperature=0.9)

def factual_generation(prompt):
    """
    Generate more factual text with lower temperature
    """
    print("\n=== FACTUAL GENERATION MODE ===")
    return text_generation(prompt, max_length=100, temperature=0.3)

def main():
    """Run text generation examples"""
    print("=" * 60)
    print("TEXT GENERATION DEMO - NO API KEY REQUIRED")
    print("Using Hugging Face GPT-2 Model (Running Locally)")
    print("=" * 60)

    # Example 1: Story beginning
    text_generation(
        "Once upon a time in a digital world,",
        max_length=80
    )

    # Example 2: Technical writing
    text_generation(
        "Artificial intelligence is",
        max_length=80
    )

    # Example 3: Creative writing
    creative_writing(
        "The robot looked at the stars and wondered",
    )

    print("\n" + "=" * 60)
    print("Note: GPT-2 is a smaller model, so outputs may be less")
    print("sophisticated than commercial APIs like GPT-3/4")
    print("=" * 60)

if __name__ == "__main__":
    main()

TEXT GENERATION DEMO - NO API KEY REQUIRED
Using Hugging Face GPT-2 Model (Running Locally)

=== TEXT GENERATION (GPT-2) ===
Prompt: Once upon a time in a digital world,
Loading model... (first run may take a moment)


Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Generated Text:

--- Output 1 ---
Once upon a time in a digital world, we have no idea how much time, effort, and resources we have to spend on this task.

What if the entire task was to be created in only a few months? What if we could do it from scratch and then have a single piece of hardware that was used to create the whole project? What if we could do it from the ground up, with the only data we have for the project from the start? What if we could build a whole system of tools, software, and tools to help us create this project?

What if we could build a whole set of tools, software, and tools that could be used to build a new version of the project? What if we could build a whole set of tools, software, and tools that could be used to test and verify the new version of the project?

It is important to remember that each step of the project is a part of our life cycle, and we will not ever be able to do anything without our own hands.

So, what do we do?

What are the tools we 

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Generated Text:

--- Output 1 ---
Artificial intelligence is a new field that has been attracting a lot of attention in recent years, but it's still only a few years old. A lot of the work that is done is done in a few dozen different domains, so it's not clear if it's feasible to get all that out at once. So, I'm not going to go into that and say it's impossible, but it's not a very important point. I'm going to say that there are ways that we can make it a lot easier to get these things done.

A: Yes, this is a little bit of a mixed bag. I think it's quite a bit more complicated than I thought. I think that it's a little bit more complicated than I thought, but I think there's an even more important point here. I think that AI is a very important, if not the most important, field of robotics right now, and we have some very interesting applications that we could potentially pursue in the future. And I think that's a big part of what's happening here.

Q: I think that there's a lot o

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Generated Text:

--- Output 1 ---
The robot looked at the stars and wondered for a moment. Then it let go and began to move slowly, making it hard to see.

What it saw was bright blue.

She could not remember what it saw, but it was almost a blackness of light.

She could not remember what it had seen.

The robot's arms had a red and yellow color.

It could be in many different colors.

It had a blue and purple color.

It was black.

All its hands, it could not change its color.

It looked at the stars and thought that it was a little lost.

She knew that the universe was not perfect.

But she still wanted to see it.

It had been so long.

It was starting to show signs of life.

It was starting to move.

It was starting to see its own light.

She knew that she needed to go find it.

If she could find it, she would not have to go with her.

The day was going well.

She didn't know if she had enough time, or if she knew the answers.

But she could not stop dreaming.

The day

Note: GPT-

**Overview :**
*  A pre-trained GPT-2 language model is loaded locally
*  Given a text prompt, the model predicts and generates the next words
*  Output depends on parameters such as length and randomness
<br>

**Main Libraries Used**
*  pipeline → Simplifies model loading and inference
*  set_seed → Ensures reproducible results
*  warnings.filterwarnings() → Suppresses warning messages
<br>

**Input parameters:**
* prompt → Starting text
* max_length → Total length of output (prompt + generated text)
* temperature → Controls creativity
   * Low → More deterministic
   * High → More creative
* num_return → Number of outputs to generate
<br>

**Generation Settings Explained**
* do_sample=True : Enables random sampling instead of greedy decoding
* temperature : Controls randomness of token selection
* top_k=50 : Limits choices to top 50 probable tokens
* top_p=0.95 : Selects tokens covering 95% cumulative probability (nucleus sampling)
<br>

**Creative vs Factual Modes**
* Creative Writing
   * Higher temperature (0.9)
   * Produces more imaginative and diverse text
* Factual Generation
   * Lower temperature (0.3)
   * Produces more stable and factual text

# **LABORATORY TASK**
1. Write a simple python code to create your own NLP application that use LLM. Use more than 1 models to compare the modals' performance:
   * Flan T5 (Small or Large)
   * BART (facebook/bart-large-cnn)
   * GPT-2 (gpt2)
   * LLaMA 2 (meta-llama/Llama-2-7b-hf)



In [None]:
pip install transformers torch accelerate sentencepiece



# 1️⃣ Install Dependencies

# 2️⃣ Common Input Text (Same for All Models for fair comparison)

In [None]:
text = """
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction
between computers and human language. It enables machines to understand, interpret, and generate
human language in a meaningful way.
"""


# 3️⃣ FLAN-T5 (Instruction-Tuned Model)

In [None]:
from transformers import pipeline

flan_t5 = pipeline(
    "text2text-generation",
    model="google/flan-t5-large"
)

flan_output = flan_t5(
    "Summarize the following text:\n" + text,
    max_length=50
)

print("FLAN-T5 Output:")
print(flan_output[0]["generated_text"])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


FLAN-T5 Output:
Understand what NLP is.


# 4️⃣ BART (Summarization-Specialized Model)

In [None]:
bart = pipeline(
    "summarization",
    model="facebook/bart-large-cnn"
)

bart_output = bart(
    text,
    max_length=50,
    min_length=20,
    do_sample=False
)

print("\nBART Output:")
print(bart_output[0]["summary_text"])


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0
Your max_length is set to 50, but your input_length is only 48. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=24)



BART Output:
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interactionbetween computers and human language. It enables machines to understand, interpret, and generatehuman language in a meaningful way.


# 5️⃣ GPT-2 (Text Generation Model)

In [None]:
gpt2 = pipeline(
    "text-generation",
    model="gpt2"
)

gpt2_output = gpt2(
    "NLP is important because",
    max_length=50,
    num_return_sequences=1
)

print("\nGPT-2 Output:")
print(gpt2_output[0]["generated_text"])


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



GPT-2 Output:
NLP is important because it means that the government can no longer force people to pay any extra taxes.

"However, there was a clear decision not to try to force people to pay extra taxes over the last decade," Mr O'Grady said.

"In the past the government has tried to force people to pay more taxes than they would otherwise have to.

"This is a huge change and the government could soon face some serious backlash if they do not change their thinking and act to change the tax code.

"We urge the government to act to help people pay their fair share and to act fast to change the tax code so it does not become a tax avoidance scheme."

Mr O'Grady said the government would be "extremely pleased" to hear from people about the changes to the tax code.

Topics: tax, tax-and-spend, tax-alliance, australia

First posted


# 6️⃣ LLaMA 2 (Large General-Purpose LLM)

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "meta-llama/Llama-2-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

inputs = tokenizer(
    "Summarize the following text:\n" + text,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=50
)

print("\nLLaMA 2 Output:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Llama-2-7b-hf.
401 Client Error. (Request ID: Root=1-697b33d0-7dc90f6d05f8640940ab0dd7;310eeffe-7138-4073-b84c-82d7f775c5d0)

Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/config.json.
Access to model meta-llama/Llama-2-7b-hf is restricted. You must have access to it and be authenticated to access it. Please log in.

## LLM Comparison Results and Explanation

### FLAN-T5 (Instruction-Tuned Model)
**Output:**  
“Understand what NLP is.”

**Explanation:**  
FLAN-T5 correctly followed the instruction to summarize, but produced an overly short and abstract output. This shows strong instruction-following ability, but also a tendency to over-compress information.

---

### BART (Summarization Model)
**Output:**  
A clear summary describing NLP as a field of AI focused on understanding, interpreting, and generating human language.

**Explanation:**  
BART produced the most accurate and complete summary. Since it is trained specifically for summarization, it preserves key details and avoids hallucination.

---

### GPT-2 (Text Generation Model)
**Output:**  
A long, unrelated text about government tax policies.

**Explanation:**  
GPT-2 does not follow instructions. It generates text by predicting the next word, which can lead to coherent but contextually irrelevant output. This behavior is expected and highlights its lack of control.

---

### LLaMA 2 (Large Language Model)
**Output:**  
Model could not be loaded due to restricted access (401 Unauthorized).

**Explanation:**  
LLaMA 2 is a gated model that requires special access and authentication. This demonstrates real-world limitations of large LLMs, including access restrictions and hardware requirements.

---

### Summary
Each model behaves differently based on its training objective. Instruction-tuned and task-specific models perform better for controlled NLP tasks, while pure text generators lack reliability. Large state-of-the-art models introduce practical deployment challenges.
