<center> <h1> Prompt Engineering with Open-Source Large Language Models (LLMs) using HuggingFace InferenceClient</h1> </center>

<p style="margin-bottom:1cm;"></p>

_____


In this notebook we will learn how to run any open-source LLMs via HuggingFace InferenceClient using this colab notebook. You can run this notebook in your local server also without worrying about having enough infrastructure to run these models!

The HuggingFace [__InferenceClient__](https://huggingface.co/docs/huggingface_hub/guides/inference) provides easy access to thousands of models without worrying about your infrastructure.

The models we will be trying here include:

- __[meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)__ - Meta's powerful 8B parameter instruction-tuned model from the Llama 3.1 family, designed for conversational AI and instruction-following tasks.

- __[deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)__ - DeepSeek's advanced V3 model optimized for various language understanding and generation tasks.


__You just need an internet connection and a HuggingFace Account and API Key to use these models.__


In [None]:
# Install required package
!pip install huggingface_hub -q

In [5]:
import os
from huggingface_hub import InferenceClient
from IPython.display import display, Markdown

## Get your API Key

Remember to go to your [HuggingFace Account Settings](https://huggingface.co/settings/account) and generate an API key by creating a new token from the [Access Tokens](https://huggingface.co/settings/tokens) section.


## Load HuggingFace API Credentials

Enter your key from [here](https://huggingface.co/settings/tokens)

In [2]:
from getpass import getpass

API_KEY = getpass("Enter HuggingFace API Key: ")
os.environ['HF_TOKEN'] = API_KEY

Enter HuggingFace API Key:  Â·Â·Â·Â·Â·Â·Â·Â·


### Initialize InferenceClient

Here we create InferenceClient instances for our models. The InferenceClient provides a simple interface to access HuggingFace models.

In [3]:
# Initialize client with API key for models requiring authentication
client_with_auth = InferenceClient(api_key=os.environ["HF_TOKEN"])

# Initialize client without API key for publicly available models
client = InferenceClient()

## Define Query Functions

Here we create helper functions to query our models easily.

In [4]:
def query_llama(prompt, system_message=None):
    """
    Query Llama-3.1-8B-Instruct model
    """
    messages = []
    if system_message:
        messages.append({"role": "system", "content": system_message})
    messages.append({"role": "user", "content": prompt})
    
    completion = client_with_auth.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=messages,
        max_tokens=1000
    )
    return completion.choices[0].message.content

def query_deepseek(prompt, system_message=None):
    """
    Query DeepSeek-V3-0324 model
    """
    messages = []
    if system_message:
        messages.append({"role": "system", "content": system_message})
    messages.append({"role": "user", "content": prompt})
    
    completion = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V3-0324",
        messages=messages,
        max_tokens=1000
    )
    return completion.choices[0].message.content

## Prompting with Open-Source LLM APIs

Now we will use HuggingFace InferenceClient and try some tasks with prompting

### 1. Basic Q & A

In [6]:
prompt = """Can you explain what is quantum computing to a 5th grader?"""
print(prompt)

Can you explain what is quantum computing to a 5th grader?


In [7]:
# Query Llama model
print("=" * 50)
print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
print("=" * 50)
response_llama = query_llama(prompt)
display(Markdown(response_llama))

LLAMA-3.1-8B-INSTRUCT RESPONSE:


Imagine you have a big box of LEGOs, and inside the box, there are lots of different colored blocks. Each block represents a piece of information, like a number or a letter.

**Classical Computers**
Now, imagine you want to use a special machine to build a really cool castle with those LEGO blocks. A classical computer is like a machine that follows a set of instructions to build the castle, one block at a time. It's like following a recipe to make a cake. You need to put the blocks together in a specific order, and the machine will do it step by step.

**Quantum Computers**
But, what if the machine could build the castle really, really fast? Like, in a flash of a second? That's kind of like a quantum computer. It's a machine that can use special powers called "quantum" to build the castle in a super-fast way.

The thing about quantum computers is that they can try many different ways to build the castle at the same time! It's like the machine has a million different LEGO builders inside it, and each one is building a different castle. And then, the machine can look at all the castles and pick the best one.

This is because quantum computers use something called "quantum bits" or "qubits." Qubits are like special LEGO blocks that can be many different colors at the same time! It's like a block that's both red and blue and green all at once.

So, when we use a quantum computer, it can try many different possibilities at the same time, and that makes it super powerful. It's like having a million LEGO builders working together to build the most amazing castle ever!

But, it's not just LEGO castles. Quantum computers can help us solve really hard problems in science and mathematics, like understanding the universe and creating new medicines.

That's what quantum computing is in a nutshell!

In [8]:
# Query DeepSeek model
print("=" * 50)
print("DEEPSEEK-V3 RESPONSE:")
print("=" * 50)
response_deepseek = query_deepseek(prompt)
display(Markdown(response_deepseek))

DEEPSEEK-V3 RESPONSE:


Of course! Imagine you have a super special kind of computerâ€”a **quantum computer**â€”that doesnâ€™t just use regular "bits" (like tiny switches that are either ON or OFF). Instead, it uses **quantum bits**, or **qubits** (say: "kyoo-bits"), which are like magical switches that can be ON, OFF, **or both at the same time**!  

### Hereâ€™s how itâ€™s different:  
1. **Regular Computer**:  
   - Thinks in 0s (OFF) and 1s (ON), like flipping a coin to heads or tails.  
   - Solves problems one step at a time (like reading a book page by page).  

2. **Quantum Computer**:  
   - Uses qubits that can be 0, 1, **or a mix of both** (like a spinning coin thatâ€™s both heads *and* tails until you catch it).  
   - Can test **many answers at once** (like reading every page in a book at the same time).  

### Why is this cool?  
Quantum computers can solve **super hard puzzles** (like cracking secret codes, helping design new medicines, or making AI smarter) much faster than regular computers. But theyâ€™re still being builtâ€”scientists are teaching them how to work without making too many mistakes!  

Think of it like a superhero computer that uses **quantum magic** (really, just physics!) to do amazing things. ðŸš€âœ¨  

Would you like a fun example or analogy to explain further? ðŸ˜Š

### 2. Report Summarization

In [9]:
report = """
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.
The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.
On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.
Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.
The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.
These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.
"""

prompt = f"""
Summarize the following report delimited by triple backticks on Generative AI in max 5 lines

Report:
```{report}```
"""

print(prompt)


Summarize the following report delimited by triple backticks on Generative AI in max 5 lines

Report:
```
Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.
The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.
On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmfu

In [10]:
# Query Llama model
print("=" * 50)
print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
print("=" * 50)
response_llama = query_llama(prompt)
display(Markdown(response_llama))

LLAMA-3.1-8B-INSTRUCT RESPONSE:


Here is a summary of the report in 5 lines, delimited by triple backticks:

```Generative AI is a type of artificial intelligence technology that can produce various content types, including text, imagery, and audio. 
The technology has improved significantly with the introduction of generative adversarial networks (GANs) and breakthrough language models. 
Generative AI has opened up opportunities for better movie dubbing, educational content, and product design, but also raises concerns about deepfakes and cybersecurity attacks. 
Recent advances in transformers and large language models (LLMs) have enabled generative AI models to write engaging text, paint photorealistic images, and create entertaining content. 
The technology has the potential to fundamentally change enterprise technology and business operations, but also poses challenges related to accuracy, bias, and hallucinations.```

In [11]:
# Query DeepSeek model
print("=" * 50)
print("DEEPSEEK-V3 RESPONSE:")
print("=" * 50)
response_deepseek = query_deepseek(prompt)
display(Markdown(response_deepseek))

DEEPSEEK-V3 RESPONSE:


1. Generative AI creates content like text, images, and audio, gaining traction due to user-friendly tools.  
2. Originating in the 1960s, it advanced in 2014 with GANs, enabling realistic synthetic media.  
3. While offering benefits like better dubbing, it raises concerns like deepfakes and cyber threats.  
4. Key breakthroughsâ€”transformers and large language models (LLMs)â€”enhanced scalability and multimodal content generation.  
5. Despite early challenges (bias, hallucinations), generative AI has transformative potential for industries.

### 3. Sentiment Analysis

In [12]:
review = """I recently worked with this real estate company to purchase my first home,
    and the experience was outstanding. The agent was knowledgeable, patient, and incredibly responsive.
    They guided me through every step of the process, making what could have been a stressful
    experience very smooth and enjoyable.
    """

prompt = f"""
Act as a customer review analyst, given the following customer review text,
do the following tasks:
- Find the sentiment (positive, negative or neutral)
- Extract max 5 key topics or phrases of the good or bad in the review
Review Text:
{review}
"""

In [13]:
# Query Llama model
print("=" * 50)
print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
print("=" * 50)
response_llama = query_llama(prompt)
display(Markdown(response_llama))

LLAMA-3.1-8B-INSTRUCT RESPONSE:


**Sentiment Analysis:**
The sentiment of this customer review is **Positive**. The customer expresses their outstanding experience with the real estate company, highlighting the agent's knowledge, patience, and responsiveness. The review also mentions that the agent made the process smooth and enjoyable, indicating a positive outcome.

**Key Topics or Phrases:**

1. **Knowledgeable agent**: The customer praises the agent's knowledge, suggesting that they are well-informed and capable of handling the purchase process.
2. **Patient and responsive**: The customer appreciates the agent's patience and responsiveness, indicating that they were easy to communicate with and willing to address their concerns.
3. **Guided through every step**: The customer mentions that the agent guided them through every step of the process, suggesting a high level of support and expertise.
4. **Stress-free experience**: The customer states that the experience was "smooth and enjoyable," implying that the agent helped to minimize stress and make the process more enjoyable.
5. **Outstanding experience**: The customer uses the phrase "outstanding experience" to summarize their positive experience with the real estate company and the agent.

In [14]:
# Query DeepSeek model
print("=" * 50)
print("DEEPSEEK-V3 RESPONSE:")
print("=" * 50)
response_deepseek = query_deepseek(prompt)
display(Markdown(response_deepseek))

DEEPSEEK-V3 RESPONSE:


### Sentiment Analysis:  
**Positive**  

### Key Topics/Phrases (Good Aspects):  
1. **Outstanding experience**  
2. **Knowledgeable agent**  
3. **Patient and responsive service**  
4. **Guided through every step**  
5. **Stress-free and enjoyable process**

## Bonus: Comparing Both Models Side-by-Side

In [15]:
def compare_models(prompt, system_message=None):
    """
    Compare responses from both models side by side
    """
    print("=" * 70)
    print("PROMPT:")
    print("=" * 70)
    print(prompt)
    print("\n")
    
    print("=" * 70)
    print("LLAMA-3.1-8B-INSTRUCT RESPONSE:")
    print("=" * 70)
    response_llama = query_llama(prompt, system_message)
    display(Markdown(response_llama))
    print("\n")
    
    print("=" * 70)
    print("DEEPSEEK-V3 RESPONSE:")
    print("=" * 70)
    response_deepseek = query_deepseek(prompt, system_message)
    display(Markdown(response_deepseek))

# Example usage
test_prompt = "Write a haiku about artificial intelligence."
compare_models(test_prompt)

PROMPT:
Write a haiku about artificial intelligence.


LLAMA-3.1-8B-INSTRUCT RESPONSE:


Metal minds awaken
Intelligence in each code
Future's digital



DEEPSEEK-V3 RESPONSE:


**Silent circuits humâ€”**  
**learning, thinking, growing fast,**  
**mind without a soul.**