<h1>Chapter 7 - Easy Tasks</h1>
<i>Going beyond prompt engineering.</i>


In [1]:
%pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2 langchain_community
%pip install llama-cpp-python==0.2.69

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Loading the LLMs

In [None]:
# Download the 16-bit variant model from Hugging Face
!wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

# If this command does not work for you, you can use the link directly to download the model
# https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

In [3]:
%pip install llama-cpp-python

Note: you may need to restart the kernel to use updated packages.


In [None]:
!wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp8.gguf

In [16]:
from langchain_community.llms import LlamaCpp

# Make sure the model path is correct for your system!
model_16_bit = LlamaCpp(
    model_path="Phi-3-mini-4k-instruct-fp16.gguf",
    n_gpu_layers=-1,
    max_tokens=500,
    n_ctx=2048,
    seed=42,
    verbose=False
)

### Load the 4-bit Version

In [None]:
!wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf

In [17]:
# Let's download the 4-bit quantized model using llama.cpp's quantization tool
# Make sure the model path is correct for your system!
model_4_bit = LlamaCpp(
    model_path="Phi-3-mini-4k-instruct-q4.gguf",
    n_gpu_layers=-1,
    max_tokens=500,
    n_ctx=2048,
    seed=42,
    verbose=False
)

### Test the Performance of Each Model

In [19]:
import time

In [14]:
test_prompts = [
    "What is 2 + 2?",
    "Write a haiku about winter.",
    "Explain what AI is in one sentence.",
    "Name three colors.",
    "What's the capital of France?"
]

### Test on the First Test Prompt

In [43]:
# Test 16-bit model on first prompt
prompt = test_prompts[0]  # "What is 2 + 2?"
print(f"Testing: {prompt}")

start_time = time.time()
response_16bit_1 = model_16_bit.invoke(prompt)
end_time = time.time()
time_16bit_1 = end_time - start_time

print(f"16-bit Response: {response_16bit_1}")
print(f"Time taken: {time_16bit_1:.2f} seconds")

Testing: What is 2 + 2?
16-bit Response: 
<|assistant|> The sum of 2 and 2 is 4.

This question falls into the category of basic arithmetic, which has a clear-cut answer that doesn't require opinion or subjective interpretation. It's essential in mathematics to understand such fundamental operations because they form the building blocks for more complex calculations.
Time taken: 13.55 seconds


In [44]:
# Test 4-bit model on first prompt
prompt = test_prompts[0]  # "What is 2 + 2?"
print(f"Testing: {prompt}")

start_time = time.time()
response_4bit_1 = model_4_bit.invoke(prompt)
end_time = time.time()
time_4bit_1 = end_time - start_time

print(f"4-bit Response: {response_4bit_1}")
print(f"Time taken: {time_4bit_1:.2f} seconds")

Testing: What is 2 + 2?
4-bit Response: 
<|assistant|> The sum of 2 and 2 is 4.

Here's a brief explanation: When you add the number 2 to another number 2, you get 4 as the result because addition combines quantities together. In mathematical terms:

2 + 2 = 4

This operation follows the basic principles of arithmetic and applies universally across mathematics.
Time taken: 6.99 seconds


### Test on the Second Prompt

In [45]:
# Test 16-bit model on second prompt
prompt = test_prompts[1]  # "Write a haiku about winter."
print(f"Testing: {prompt}")

start_time = time.time()
response_16bit_2 = model_16_bit.invoke(prompt)
end_time = time.time()
time_16bit_2 = end_time - start_time

print(f"16-bit Response: {response_16bit_2}")
print(f"Time taken: {time_16bit_2:.2f} seconds")

Testing: Write a haiku about winter.
16-bit Response: 
<|assistant|> Snowflakes gently fall,  
White blanket covers the earth,  
Winter's peace enwraps.
Time taken: 6.29 seconds


In [46]:
# Test 4-bit model on second prompt
prompt = test_prompts[1]  # "Write a haiku about winter."
print(f"Testing: {prompt}")

start_time = time.time()
response_4bit_2 = model_4_bit.invoke(prompt)
end_time = time.time()
time_4bit_2 = end_time - start_time

print(f"4-bit Response: {response_4bit_2}")
print(f"Time taken: {time_4bit_2:.2f} seconds")

Testing: Write a haiku about winter.
4-bit Response: 
<|assistant|> Snowflakes gently fall,  

Winter's silent whispers call,  

Blanketing the world in white hush.
Time taken: 3.26 seconds


### Test on the Third Prompt

In [47]:
# Test 16-bit model on third prompt
prompt = test_prompts[2]  # "Explain what AI is in one sentence."
print(f"Testing: {prompt}")

start_time = time.time()
response_16bit_3 = model_16_bit.invoke(prompt)
end_time = time.time()
time_16bit_3 = end_time - start_time

print(f"16-bit Response: {response_16bit_3}")
print(f"Time taken: {time_16bit_3:.2f} seconds")

Testing: Explain what AI is in one sentence.
16-bit Response:  
Time taken: 1.36 seconds


In [48]:
# Test 4-bit model on third prompt
prompt = test_prompts[2]  # "Explain what AI is in one sentence."
print(f"Testing: {prompt}")

start_time = time.time()
response_4bit_3 = model_4_bit.invoke(prompt)
end_time = time.time()
time_4bit_3 = end_time - start_time

print(f"4-bit Response: {response_4bit_3}")
print(f"Time taken: {time_4bit_3:.2f} seconds")

Testing: Explain what AI is in one sentence.
4-bit Response:  <|assistant|> AI, or artificial intelligence, refers to the development of computer systems that can perform tasks and make decisions autonomously, mimicking human cognition and problem-solving abilities.

A more detailed explanation: Artificial Intelligence (AI) is a branch of computer science focused on creating machines and software programs capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and natural language understanding, while exhibiting learning, reasoning, and adaptive behavior. AI encompasses various subfields like machine learning, deep learning, computer vision, natural language processing, robotics, and more, which are designed to enable machines to improve their performance over time through data analysis and experience.
Time taken: 13.58 seconds


### Test on the Fourth Prompt

In [49]:
# Test 16-bit model on fourth prompt

prompt = test_prompts[3]  # "Name three colors."
print(f"Testing: {prompt}")

start_time = time.time()
response_16bit_4 = model_16_bit.invoke(prompt)
end_time = time.time()
time_16bit_4 = end_time - start_time

print(f"16-bit Response: {response_16bit_4}")
print(f"Time taken: {time_16bit_4:.2f} seconds")

Testing: Name three colors.
16-bit Response: 
<|assistant|> 1. Blue 2. Red 3. Yellow
<|assistant|> Certainly! Here are three colors:
1. Azure - a bright, clear blue similar to the color of a cloudless sky on a sunny day.
2. Ruby red - a deep, rich shade of red that is reminiscent of the precious gemstone.
3. Lemon yellow - a vivid and energetic shade of yellow, resembling the color of ripe lemons.
Time taken: 20.99 seconds


In [50]:
# Test 4-bit model on fourth prompt

prompt = test_prompts[3]  # "Name three colors."
print(f"Testing: {prompt}")

start_time = time.time()
response_4bit_4 = model_4_bit.invoke(prompt)
end_time = time.time()
time_4bit_4 = end_time - start_time

print(f"4-bit Response: {response_4bit_4}")
print(f"Time taken: {time_4bit_4:.2f} seconds")

Testing: Name three colors.
4-bit Response: 
<|assistant|> 1. Red: A vibrant and intense color often associated with energy, passion, and love. It is the primary color on many digital devices' screens.

2. Blue: Often symbolizing calmness, trust, and stability. It is a popular choice for corporate branding due to its association with reliability.

3. Yellow: A cheerful and optimistic hue that can represent happiness, creativity, or caution depending on the shade used (e.g., bright yellow vs. darker tones).
Time taken: 10.02 seconds


### Calculate the Average Time

In [51]:
total_time_16bit = time_16bit_1 + time_16bit_2 + time_16bit_3 + time_16bit_4
total_time_4bit = time_4bit_1 + time_4bit_2 + time_4bit_3 + time_4bit_4

average_time_16bit = total_time_16bit / 4
average_time_4bit = total_time_4bit / 4

print(f"Average time taken by 16-bit model: {average_time_16bit:.2f} seconds")
print(f"Average time taken by 4-bit model: {average_time_4bit:.2f} seconds")

Average time taken by 16-bit model: 10.55 seconds
Average time taken by 4-bit model: 8.46 seconds


### Summary

As you can see, while the responses from the 16-bit model are better quality, they take longer to be generated. When choosing the right version, speed and quality are the tradeoffs.