# Course: Introduction To GenAI

*Notebook: Experimenting_with_LLMs.ipynb*

<a href="https://colab.research.google.com/github/gassaf2/IntroductionToGenAI/blob/main/week3/Experimenting_with_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 3 Hands-on Lab: Experimenting With Large Language Models**

**Introduction:**

In this hands-on python notebook, we willl be experimenting with LLMs.
This will help you:
1.	Use a pre-trained LLM from the Hugging Face library for text summarization.
2.	Implement a question-answering task using a pre-trained LLM.
3.	Understand how LLMs perform NLP tasks in real-world scenarios.


# **Part 1: Text Summarization**

**1. Import Necessary Libraries**

We will be using the [Transformers](https://huggingface.co/docs/transformers/en/index) library from [Hugging Face](https://huggingface.co/). The Transformers library provides APIs and tools to easily download and train state-of-the-art pretrained models.

In [2]:
from transformers import pipeline

**2. Set up the Summarization Pipeline**

In [3]:
# Load the summarization pipeline
summarizer = pipeline("summarization")

# Input a long text for summarization
long_text = """
Artificial intelligence (AI) is rapidly transforming industries, from healthcare and education to finance and entertainment.
Generative AI models, such as Large Language Models (LLMs), are at the forefront of this transformation. These models are
trained on vast datasets and can generate human-like text, enabling applications like automated customer support,
personalized education tools, and content creation. Despite their potential, challenges such as bias, ethical concerns,
and the environmental impact of large-scale training persist. Addressing these challenges is crucial for the responsible
deployment of AI technologies in the future.
"""

# Generate a summary
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)
print("Summary:")
print(summary[0]['summary_text'])


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Summary:
 Artificial intelligence (AI) is rapidly transforming industries, from healthcare and education to finance and entertainment . Large Language Models (LLMs) are at the forefront of this transformation . These models are trained on vast datasets and can generate human-like


In [6]:
# Load the summarization pipeline
summarizer = pipeline("summarization")

# Input a long text for summarization
long_text = """
The Great Fire of London in 1666 was a catastrophic event that destroyed much of the city, including over 13,000 homes and numerous 
landmarks such as St. Paul’s Cathedral. The fire started in a bakery on Pudding Lane and quickly spread due to strong winds and the
abundance of wooden structures. Although the official death toll was recorded as low, recent studies suggest that many undocumented
casualties occurred, particularly among the city’s poorer residents. In the aftermath, King Charles II mandated a complete ban on
wooden buildings, replacing them with fire-resistant stone structures. Interestingly, some historians argue that the fire was 
deliberately set to cover up a failed political coup, a theory that remains a topic of debate today.
"""

# Generate a summary
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)
print("Summary:")
print(summary[0]['summary_text'])


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Summary:
 The Great Fire of London destroyed 13,000 homes and numerous landmarks such as St. Paul’s Cathedral . The fire started in a bakery on Pudding Lane and quickly spread due to strong winds and the abundance of wooden structures .


**3. Experiment**

* Replace long_text with any article or paragraph of your choice.
* Try different max_length and min_length values to see how they affect the summary.


# **Part 2: Question Answering**

**1.	Set Up the Question-Answering Pipeline**

In [4]:
# Load the question-answering pipeline
qa_pipeline = pipeline("question-answering")

# Input context and questions
context = """
The Large Language Model (LLM) GPT-3, developed by OpenAI, is known for its exceptional ability to generate human-like text.
It uses the Transformer architecture and has 175 billion parameters, making it one of the largest AI models in the world.
LLMs like GPT-3 are widely used in applications such as content creation, summarization, and question answering.
"""

question = "What architecture does GPT-3 use?"

# Get the answer
answer = qa_pipeline(question=question, context=context)
print("Answer:")
print(answer['answer'])


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Answer:
Transformer


**2.	Experiment**

* Modify the context and question variables with your own text and queries.
* Observe how the model adjusts its answers based on the provided input


# **Part 3: Combine Summarization and Question Answering**

**1. Pipeline Integration**

Combine the two pipelines to first summarize a long text and then extract answers to specific questions from the summary.


In [5]:
# Summarize the text
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)[0]['summary_text']

# Define a question based on the summary
question = "What are the challenges mentioned in the summary?"

# Use the QA pipeline to extract the answer
answer = qa_pipeline(question=question, context=summary)
print("Question:", question)
print("Answer:", answer['answer'])


Question: What are the challenges mentioned in the summary?
Answer: These models are trained on vast datasets


**2. Experiment**

•	Change the input text and questions to test the robustness of the combined approach.


# **Summary:**

By completing this activity, you have:

* Gained hands-on experience using LLMs for real-world NLP tasks.
* Understood the capabilities and limitations of pre-trained LLMs.
* Appreciated the practical applications of LLMs in summarization and question answering.

This activity ensures practical understanding of LLMs while showcasing their real-world relevance. Let me know if you’d like additional extensions!
