<center> <h1> Prompt Engineering with Open-Source Large Language Models (LLMs) using HuggingFace InferenceClient</h1> </center>

<p style="margin-bottom:1cm;"></p>

_____


In this notebook we will learn how to run any open-source LLMs via HuggingFace InferenceClient using this colab notebook. You can run this notebook in your local server also without worrying about having enough infrastructure to run these models!

The HuggingFace [__InferenceClient__](https://huggingface.co/docs/huggingface_hub/guides/inference) provides easy access to thousands of models without worrying about your infrastructure.

The models we will be trying here include:

- __[meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)__ - Meta's powerful 8B parameter instruction-tuned model from the Llama 3.1 family, designed for conversational AI and instruction-following tasks.

- __[deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)__ - DeepSeek's advanced V3 model optimized for various language understanding and generation tasks.


__You just need an internet connection and a HuggingFace Account and API Key to use these models.__


In [None]:
# Install required package
!pip install huggingface_hub -q

In [1]:
import os
from huggingface_hub import InferenceClient
from IPython.display import display, Markdown

  from .autonotebook import tqdm as notebook_tqdm


## Get your API Key

Remember to go to your [HuggingFace Account Settings](https://huggingface.co/settings/account) and generate an API key by creating a new token from the [Access Tokens](https://huggingface.co/settings/tokens) section.


## Load HuggingFace API Credentials

Enter your key from [here](https://huggingface.co/settings/tokens)

In [2]:
from getpass import getpass

API_KEY = getpass("Enter HuggingFace API Key: ")
os.environ['HF_TOKEN'] = API_KEY

### Initialize InferenceClient

Here we create InferenceClient instances for our models. The InferenceClient provides a simple interface to access HuggingFace models.

In [3]:
# Initialize client with API key for models requiring authentication
client_with_auth = InferenceClient(api_key=os.environ["HF_TOKEN"])

# Initialize client without API key for publicly available models
client = InferenceClient()

## Define Query Functions

Here we create helper functions to query our models easily.

In [4]:
def query_llama(prompt, system_message=None):
    """
    Query Llama-3.1-8B-Instruct model
    """
    messages = []
    if system_message:
        messages.append({"role": "system", "content": system_message})
    messages.append({"role": "user", "content": prompt})
    
    completion = client_with_auth.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=messages,
        max_tokens=1000
    )
    return completion.choices[0].message.content

def query_deepseek(prompt, system_message=None):
    """
    Query DeepSeek-V3-0324 model
    """
    messages = []
    if system_message:
        messages.append({"role": "system", "content": system_message})
    messages.append({"role": "user", "content": prompt})
    
    completion = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V3-0324",
        messages=messages,
        max_tokens=1000
    )
    return completion.choices[0].message.content

## Prompting with Open-Source LLM APIs

Now we will use HuggingFace InferenceClient and try some tasks with prompting

### 1. Basic Q & A

In [5]:
prompt = """Can you explain what is quantum computing to a 5th grader?"""
print(prompt)

Can you explain what is quantum computing to a 5th grader?


In [6]:
# Query Llama model
print("=" * 50)
print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
print("=" * 50)
response_llama = query_llama(prompt)
display(Markdown(response_llama))

LLAMA-3.1-8B-INSTRUCT RESPONSE:


Imagine you have a big box of different colored socks. You want to find a specific pair of socks, but they're all mixed up inside the box. A regular computer would look at each sock one by one, and it would take a long time to find the pair you're looking for.

A quantum computer is like a super-smart, magic box that can look at all the socks at the same time. It can even look at all the possible combinations of socks, like "sock 1 with sock 2" or "sock 3 with sock 4." This means it can find the pair of socks you're looking for much, much faster than a regular computer.

But how does it do that? Well, quantum computers use something called "qubits" (say "cue-bits"). Qubits are like special kinds of socks that can be in many different states at the same time. It's like a sock that's both red and blue at the same time!

This means that a quantum computer can process many different possibilities all at once, which makes it really good at solving certain kinds of problems. It's like having a super-powerful, magic calculator that can help us solve really hard puzzles.

So, quantum computing is a way of using special computers to solve problems that are too hard for regular computers. It's like having a magic tool that can help us find the answers to really tricky questions!

In [7]:
# Query DeepSeek model
print("=" * 50)
print("DEEPSEEK-V3 RESPONSE:")
print("=" * 50)
response_deepseek = query_deepseek(prompt)
display(Markdown(response_deepseek))

DEEPSEEK-V3 RESPONSE:


Sure! Here's a simple way to explain **quantum computing** to a 5th grader:

---

### **Imagine a Super-Powerful Computer!**
A **quantum computer** is like a super-smart computer that uses tiny, tiny things called **qubits** (like the "bits" in regular computers, but way cooler!). 

### **Regular Computers vs. Quantum Computers**
- **Regular computers** (like your tablet or laptop) use **bits** that can be either **0 or 1** (like a light switchâ€”ON or OFF).  
- **Quantum computers** use **qubits**, which can be **0, 1, or BOTH at the same time** (like a spinning coin thatâ€™s both heads AND tails until you catch it!).  

### **Why Is That Cool?**
Because qubits can be in many states at once, quantum computers can solve **really hard problems** super fastâ€”like:  
- Figuring out the best way to **cure diseases** by testing millions of medicine combinations.  
- Helping scientists invent **new materials** (like unbreakable stuff for spaceships!).  
- Making **super-secret codes** that hackers canâ€™t break.  

### **Butâ€¦ Itâ€™s Still Learning!**
Right now, quantum computers are like baby geniusesâ€”theyâ€™re **super powerful** but still make mistakes. Scientists are teaching them to get better!  

---

Would you like me to explain any part in a different way? ðŸ˜Š

### 2. Report Summarization

In [8]:
report = """
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.
The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.
On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.
Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.
The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.
These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.
"""

prompt = f"""
Summarize the following report delimited by triple backticks on Generative AI in max 5 lines

Report:
```{report}```
"""

print(prompt)


Summarize the following report delimited by triple backticks on Generative AI in max 5 lines

Report:
```
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.
The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.
On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmfu

In [9]:
# Query Llama model
print("=" * 50)
print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
print("=" * 50)
response_llama = query_llama(prompt)
display(Markdown(response_llama))

LLAMA-3.1-8B-INSTRUCT RESPONSE:


Here is a summary of the report on Generative AI in 5 lines:

Generative AI is a technology that produces various types of content, including text, imagery, and audio. It has become more accessible with new user interfaces and advancements in machine learning algorithms. Generative AI has opened up opportunities in areas like movie dubbing and education, but also raises concerns about deepfakes and cybersecurity attacks. Recent breakthroughs in transformers and large language models have enabled generative AI to create engaging text, photorealistic images, and even videos. This technology has the potential to fundamentally change how businesses operate and could be used for tasks like code writing, product development, and supply chain transformation.

In [10]:
# Query DeepSeek model
print("=" * 50)
print("DEEPSEEK-V3 RESPONSE:")
print("=" * 50)
response_deepseek = query_deepseek(prompt)
display(Markdown(response_deepseek))

DEEPSEEK-V3 RESPONSE:


1. **Generative AI** creates content like text, images, and audio, gaining traction due to user-friendly tools.  
2. Evolved since the 1960s, it advanced with **GANs (2014)** for realistic media, raising concerns like deepfakes.  
3. **Transformers and large language models (LLMs)** enabled deeper analysis, multitasking across text, code, and more.  
4. Innovations like **multimodal AI** power tools (e.g., Dall-E) for cross-media generation but face accuracy and bias issues.  
5. Despite challenges, generative AI promises transformative impacts on industries, from coding to supply chains.

### 3. Sentiment Analysis

In [11]:
review = """I recently worked with this real estate company to purchase my first home,
    and the experience was outstanding. The agent was knowledgeable, patient, and incredibly responsive.
    They guided me through every step of the process, making what could have been a stressful
    experience very smooth and enjoyable.
    """

prompt = f"""
Act as a customer review analyst, given the following customer review text,
do the following tasks:
- Find the sentiment (positive, negative or neutral)
- Extract max 5 key topics or phrases of the good or bad in the review
Review Text:
{review}
"""

In [12]:
# Query Llama model
print("=" * 50)
print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
print("=" * 50)
response_llama = query_llama(prompt)
display(Markdown(response_llama))

LLAMA-3.1-8B-INSTRUCT RESPONSE:


**Sentiment Analysis:**
The sentiment of this review is **Positive**. The customer expresses their satisfaction with the real estate company and their agent, using words such as "outstanding", "knowledgeable", "patient", and "incredibly responsive" to describe their experience.

**Key Topics or Phrases:**

1. **Knowledgeable agent**: The customer praises the agent's knowledge, indicating that they were well-informed and able to provide valuable guidance throughout the process.
2. **Patient and responsive**: The customer appreciates the agent's patience and responsiveness, suggesting that they were easy to communicate with and willing to address any concerns or questions.
3. **Smooth and enjoyable experience**: The customer notes that the agent made the home-buying process "smooth and enjoyable", implying that they were able to mitigate any stress or anxiety associated with the experience.
4. **Guided through every step**: The customer mentions that the agent guided them through every step of the process, indicating that they were proactive and supportive throughout the transaction.
5. **Outstanding experience**: The customer describes their overall experience as "outstanding", emphasizing their high level of satisfaction with the real estate company and their agent.

In [13]:
# Query DeepSeek model
print("=" * 50)
print("DEEPSEEK-V3 RESPONSE:")
print("=" * 50)
response_deepseek = query_deepseek(prompt)
display(Markdown(response_deepseek))

DEEPSEEK-V3 RESPONSE:


### Sentiment Analysis:  
**Positive** â€“ The review expresses strong satisfaction with the real estate company, highlighting an "outstanding" experience and praising the agent's qualities.  

### Key Topics/Phrases (Good Aspects):  
1. **"Outstanding experience"** â€“ Overall high satisfaction with the service.  
2. **"Knowledgeable, patient, and incredibly responsive agent"** â€“ Positive traits of the agent.  
3. **"Guided me through every step"** â€“ Emphasis on thorough support.  
4. **"Made the process smooth and enjoyable"** â€“ Stress-free and positive outcome.  
5. **"First home purchase"** â€“ Context of a milestone transaction handled well.  

No negative aspects were mentioned in the review.

## Bonus: Comparing Both Models Side-by-Side

In [14]:
def compare_models(prompt, system_message=None):
    """
    Compare responses from both models side by side
    """
    print("=" * 70)
    print("PROMPT:")
    print("=" * 70)
    print(prompt)
    print("\n")
    
    print("=" * 70)
    print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
    print("=" * 70)
    response_llama = query_llama(prompt, system_message)
    display(Markdown(response_llama))
    print("\n")
    
    print("=" * 70)
    print("DEEPSEEK-V3 RESPONSE:")
    print("=" * 70)
    response_deepseek = query_deepseek(prompt, system_message)
    display(Markdown(response_deepseek))

# Example usage
test_prompt = "Write a haiku about artificial intelligence."
compare_models(test_prompt)

PROMPT:
Write a haiku about artificial intelligence.


LLAMA-3.1-8B-INSTRUCT RESPONSE:


Metal minds awake
Learning, growing, thinking fast
Future's digital



DEEPSEEK-V3 RESPONSE:


**Silent circuits hum,**  
**learning fast like morning lightâ€”**  
**mind without a mind.**