# **Week 3 Hands-on Lab: Experimenting With Large Language Models**

**Introduction:**

In this hands-on python notebook, we willl be experimenting with LLMs.
This will help you:
1.	Use a pre-trained LLM from the Hugging Face library for text summarization.
2.	Implement a question-answering task using a pre-trained LLM.
3.	Understand how LLMs perform NLP tasks in real-world scenarios.


# **Part 1: Text Summarization**

**1. Import Necessary Libraries**

We will be using the [Transformers](https://huggingface.co/docs/transformers/en/index) library from [Hugging Face](https://huggingface.co/). The Transformers library provides APIs and tools to easily download and train state-of-the-art pretrained models.

In [1]:
from transformers import pipeline

**2. Set up the Summarization Pipeline**

In [20]:
# Load the summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Input a long text for summarization
long_text = """
Artificial intelligence (AI) is rapidly transforming industries, from healthcare and education to finance and entertainment.
Generative AI models, such as Large Language Models (LLMs), are at the forefront of this transformation. These models are
trained on vast datasets and can generate human-like text, enabling applications like automated customer support,
personalized education tools, and content creation. Despite their potential, challenges such as bias, ethical concerns,
and the environmental impact of large-scale training persist. Addressing these challenges is crucial for the responsible
deployment of AI technologies in the future.
"""

# Generate a summary
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)
print("Summary:")
print(summary[0]['summary_text'])


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


Summary:
Artificial intelligence (AI) is rapidly transforming industries, from healthcare and education to finance and entertainment. Generative AI models, such as Large Language Models (LLMs), are at the forefront of this transformation. Despite their potential, challenges


**3. Experiment**

* Replace long_text with any article or paragraph of your choice.
* Try different max_length and min_length values to see how they affect the summary.


# Added A TExt from Week1 on Alpha Fold that [P1_FE_Practical Exercise (Case Study) AlphaFold](https://raw.githubusercontent.com/falawar7/AAI_633O/refs/heads/main/Week3/P1_LLM.txt)

In [28]:
from transformers import pipeline
import requests

# Load the summarization pipeline
summarizer = pipeline("summarization")

# Input a long text for summarization (URL)
FE_text_url = "https://raw.githubusercontent.com/falawar7/AAI_633O/refs/heads/main/Week3/P1_LLM.txt"

# Fetch the text content from the URL
response = requests.get(FE_text_url)
FE_text = response.text

# Generate a summary
summary1 = summarizer(FE_text, max_length=500, min_length=25, do_sample=False)
print("Summary1:")
print(summary1[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


Summary1:


KeyError: 'summary_text1'

**Added Diffent Model : facebook/bart-large-cnn**

In [25]:
from transformers import pipeline
import requests

# Load the summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Input a long text for summarization (URL)
FE_text_url = "https://raw.githubusercontent.com/falawar7/AAI_633O/refs/heads/main/Week3/P1_LLM.txt"

# Fetch the text content from the URL
response = requests.get(FE_text_url)
FE_text = response.text

# Generate a summary
summary2 = summarizer(FE_text, max_length=4200, min_length=150, do_sample=False)
print("Summary1:")
print(summary[0]['summary_text2'])

Device set to use cuda:0


Summary:
Alpha fold AI was developed by Google Deep mind, the first non-experimental method that can rapidly accomplish accuracy with comparable experiments. It can predict the 3D structures of proteins based on their amino acid sequences. Researchers have used AlphaFold to understand vitellogenin, a protein fundamental to the immune system of honeybees.


**Sumamry on first one:** Alpha fold AI was developed by Google Deep mind, the first non-experimental method that can rapidly accomplish accuracy with comparable experiments . It can predict the 3D structures of proteins based on their amino acid sequences . AlphaFold could help us face up to the challenge of cleaning up our world .

Summary on the second model

# **Part 2: Question Answering**

**1.	Set Up the Question-Answering Pipeline**

In [None]:
# Load the question-answering pipeline
qa_pipeline = pipeline("question-answering")

# Input context and questions
context = """
The Large Language Model (LLM) GPT-3, developed by OpenAI, is known for its exceptional ability to generate human-like text.
It uses the Transformer architecture and has 175 billion parameters, making it one of the largest AI models in the world.
LLMs like GPT-3 are widely used in applications such as content creation, summarization, and question answering.
"""

question = "What architecture does GPT-3 use?"

# Get the answer
answer = qa_pipeline(question=question, context=context)
print("Answer:")
print(answer['answer'])


**2.	Experiment**

* Modify the context and question variables with your own text and queries.
* Observe how the model adjusts its answers based on the provided input


# **Part 3: Combine Summarization and Question Answering**

**1. Pipeline Integration**

Combine the two pipelines to first summarize a long text and then extract answers to specific questions from the summary.


In [None]:
# Summarize the text
summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)[0]['summary_text']

# Define a question based on the summary
question = "What are the challenges mentioned in the summary?"

# Use the QA pipeline to extract the answer
answer = qa_pipeline(question=question, context=summary)
print("Question:", question)
print("Answer:", answer['answer'])


**2. Experiment**

•	Change the input text and questions to test the robustness of the combined approach.


# **Summary:**

By completing this activity, you have:

* Gained hands-on experience using LLMs for real-world NLP tasks.
* Understood the capabilities and limitations of pre-trained LLMs.
* Appreciated the practical applications of LLMs in summarization and question answering.

This activity ensures practical understanding of LLMs while showcasing their real-world relevance. Let me know if you’d like additional extensions!
