# Exploring Open-Source LLMs: A Comparative Analysis of LLaMA, Mistral, and Phi

## Lab Description:

This lab is a comparitive analysis of LLaMA 3.1: 8B, Mistral: 7B, and Phi-3.5B across three tasks: content writing, code generation, and text summarization. Participants will analyze the strengths and weaknesses of each model by comparing outputs in terms of coherence, accuracy, and creativity.

## Lab Objectives:

### After completing this lab, participants will be able to:

- Evaluate the performance of open-source LLMs (LLaMA 3.1:8B, Mistral:7B, and Phi-3:5B) in tasks like text summarization, code generation, and content writing.

- Compare and contrast model outputs to determine relative strengths, weaknesses, and suitable applications.

- Identify potential use-cases and practical implications of each model for real-world scenarios.

- Understand the trade-offs between model size, resource consumption, inference speed, and task performance.

## Lab Architecture:

The participant makes a request to the Ollama server running on the DL380a. The request contains the Prompt to the LLM. The LLM hosted on the server returns a response.

<div style="text-align: center;">
    <img src="flow.png" alt="flow" width="700" height="450">
</div>


## Importing the necessary libraries:

In [3]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from IPython.display import Markdown, display
from langchain_core.messages import HumanMessage, SystemMessage

## Content Writing 

### Mistral:7b

Mistral 7B is a 7.3 billion parameter model. It is one of the most powerful language models of its size. Mistral performs near the level of larger models like GPT-3.5. All while being efficient in terms of computational efficiency and memory usage. 

Mistral is available on Ollama and can be used for inferencing. Let us test Mistral's ability to write content. We use langchain to provide a prompt to the LLM and ask it to write a blog post on Large Language Models. Langchain is a framework that simplifies working with LLMs. 

In [33]:
model_mistral = ChatOllama(model="mistral:7b", base_url="http://10.79.253.112:11434")   #loading the mistral:7b (latest) model from ollama

In [34]:
#the prompt template to the LLM, {context} can be formatted with a user query
prompt = PromptTemplate.from_template(
    "Write a blog post on the given context {context}"
)

#the content for the LLM to write blog post on 
context = "Large Language Models"

#building the chain
chain = prompt | model_mistral | StrOutputParser()

#invoking the chain 
response = chain.invoke({"context" : context})

#display the response in markdown format
display(Markdown(response))

 Title: Harnessing the Power of Large Language Models

In the rapidly evolving landscape of technology, large language models are making significant strides and leaving an indelible mark. These sophisticated AI systems are not only transforming how we interact with digital interfaces but also reshaping various sectors, from education to business.

Large language models (LLMs) are artificial intelligence systems capable of understanding and generating human-like text based on the input they receive. They learn from vast amounts of data, allowing them to adapt to a wide range of conversational styles, topics, and even contexts. This versatility makes LLMs invaluable tools for numerous applications.

One of the most intriguing aspects of large language models is their potential to augment human capabilities. For instance, they can assist in content creation by generating ideas, drafting articles, or even composing poetry. In the realm of education, LLMs can help students learn more effectively by providing personalized explanations and answers to queries.

Moreover, businesses are harnessing the power of LLMs for customer service, marketing, and content generation. By integrating LLMs into their systems, companies can offer 24/7 support, create engaging marketing copy, and generate valuable insights from vast amounts of data, thereby improving decision-making processes.

However, it's essential to acknowledge the challenges associated with large language models. One critical concern is ensuring these AI systems maintain high standards of ethics and fairness. As they learn from data, there's a risk that biases could be perpetuated. To address this issue, efforts are being made to improve transparency in how LLMs learn and operate, ultimately leading to more equitable outcomes.

Another challenge lies in striking a balance between the model's ability to generate coherent, contextually relevant responses and the risk of generating misinformation. While large language models can provide accurate information most of the time, it's crucial to verify the facts they present, especially when dealing with sensitive or critical topics.

As we continue to explore the possibilities offered by large language models, it's essential to approach their development and application with a keen eye towards both their benefits and potential pitfalls. By embracing this technology responsibly and thoughtfully, we can usher in an era of unprecedented efficiency and innovation while minimizing risks and maximizing opportunities for growth.

In conclusion, large language models represent a significant leap forward in AI technology. Their ability to understand and generate human-like text makes them versatile tools that can reshape various sectors and augment human capabilities. As we navigate this exciting new frontier, it's crucial to maintain an ethical and responsible approach to their development and use, ensuring the benefits of large language models are accessible to all while minimizing potential risks.

We can see that mistral generated a pretty good response. One thing we could notice is that the formatting is not upto the mark. 

### LLaMA3.1:8b

LLaMA 3.1: 8B is a highly advanced language model developed by Meta, designed to provide state-of-the-art performance with just 8 billion parameters. This model is bigger than the previous Mistral:7b model we tested. Let us now test how LLaMA3.1:8b generates the content. 

In [35]:
model_llama = ChatOllama(model="llama3.1:8b", base_url="http://10.79.253.112:11434")  #loading llama3.1:8b from ollama

In [36]:
prompt = PromptTemplate.from_template(
    "Write a blog post on the given context {context}"
)

context = "Large Language Models"

chain = prompt | model_llama | StrOutputParser()

response = chain.invoke({"context" : context})

display(Markdown(response))

**The Rise of Large Language Models: Revolutionizing Human-Computer Interaction**

In recent years, there has been a significant surge in the development and deployment of large language models (LLMs). These advanced algorithms have the ability to process and generate human-like text with unprecedented accuracy, opening up new possibilities for human-computer interaction. In this blog post, we'll delve into the world of LLMs, exploring their capabilities, applications, and implications.

**What are Large Language Models?**

Large language models are a type of artificial intelligence (AI) algorithm that has been trained on massive datasets to learn patterns in language. These models use complex neural networks to generate text based on input prompts or sequences. Unlike traditional natural language processing (NLP) systems, LLMs can understand context, nuances, and even subtleties of human communication.

**How do Large Language Models Work?**

LLMs operate by processing vast amounts of data through a process called deep learning. This involves multiple layers of neural networks that allow the model to learn intricate relationships between words, phrases, and sentences. The more training data an LLM receives, the better it becomes at recognizing patterns and making informed predictions.

**Applications of Large Language Models**

The potential applications of LLMs are vast and varied:

1. **Virtual Assistants**: LLMs power popular virtual assistants like Siri, Alexa, and Google Assistant, enabling users to interact with devices using natural language.
2. **Chatbots**: These models facilitate customer service, helping businesses provide 24/7 support to customers through automated conversations.
3. **Content Generation**: LLMs can generate high-quality content, such as articles, blog posts, and even entire books, at an unprecedented scale and speed.
4. **Language Translation**: These models enable seamless language translation, bridging cultural divides and facilitating global communication.

**Benefits of Large Language Models**

The benefits of LLMs are numerous:

1. **Increased Efficiency**: Automated tasks, such as data entry and content creation, can save businesses significant time and resources.
2. **Improved User Experience**: Virtual assistants and chatbots provide users with instant answers to queries, enhancing customer satisfaction and loyalty.
3. **Enhanced Creativity**: LLMs can assist writers and artists in generating new ideas, sparking creativity, and exploring uncharted territories.

**Challenges and Concerns**

While LLMs hold immense promise, they also raise several concerns:

1. **Job Displacement**: Automation could lead to job losses for human customer service representatives, writers, and other professionals.
2. **Misinformation**: The ability of LLMs to generate text can spread misinformation and propaganda, highlighting the need for fact-checking and content moderation.
3. **Bias and Fairness**: These models can perpetuate biases present in training data, emphasizing the importance of diverse and inclusive datasets.

**Conclusion**

Large language models have revolutionized human-computer interaction, offering unparalleled capabilities in natural language understanding and generation. While there are valid concerns surrounding their deployment, the benefits of LLMs far outweigh the risks. As we continue to harness the power of AI, it is essential that we prioritize responsible development, ensuring these models serve humanity while respecting its values.

**What's Next?**

The future of large language models holds tremendous potential for innovation and growth:

1. **Advancements in Multimodal Models**: LLMs will increasingly incorporate multimodal inputs, such as images, audio, and video, to create more comprehensive AI experiences.
2. **Improved Explainability and Transparency**: Researchers will focus on developing techniques to explain the reasoning behind LLM decisions, ensuring accountability and trustworthiness.

Stay tuned for future updates on the world of large language models!

The model generates a well formatted and structured response. The output looks better that the one generated by Mistral:7b.

### Phi3.5

Let us now test a lightweight language model. It has only 3.8 billion parameters. But it is known to overtake similar or even larger sized models. It is an open-source model developed by microsoft. 

In [11]:
model_phi = ChatOllama(model="phi3.5:latest", base_url="http://10.79.253.112:11434") #loading phi3.5:latest form ollama

In [12]:
prompt = PromptTemplate.from_template(
    "Write a blog post on the given context {context}"
)

context = "Large Language Models"

chain = prompt | model_phi | StrOutputParser()

response = chain.invoke({"context" : context})

display(Markdown(response))

Title: Unraveling Potential and Pitfalls of Large Language Models in Today'supublication-sized format, this comprehensive exploration delves into both opportunities and challenges presented by cutting-edge language models. Here’s why these AI behemoths are reshaping communication across industries:

**Introduction - The Ascendance of Language Models (LMs) in the Tech Landscape**
The emergence of Large Language Models has been nothing short of revolutionary, heralding a new era for natural language processing. These models boast an extensive vocabulary and can generate or comprehend text with uncanny proficiency—a leap from the previous generation's capabilities that seemed primitive by comparison.

**The Power Behind LM Success: Data, Computation & Innovation** 
At their core lie massive datasets upon which they have been trained meticulously to understand and reproduce human language patterns. The sheer computational power required is unprecedented; it takes thousands of high-end processors working simultaneously for hours on end—or even days or weeks, depending on the model’s complexity (GPT models).

**Beneficial Impact Across Industries: From Customer Service to Content Creation and Beyond** 
In customer service applications like chatbots, these LMs swiftly understand queries with contextual clues that often elude even seasoned human operators. They can generate engaging content for marketing campaigns or write blog articles imbued with creativity—a boon to writers who may find a new collaborative partner in AI algorithms like the one writing this piece today!

**Education and Accessibility: Personalized Learning Experiences With LM Assistance** 
In educational settings, these models can provide tailored learning experiences by answering questions or offering explanations based on individual student needs. They break down complex subjects into comprehensible chunks—the epitome of virtual tutoring at one's fingertips without the hefty price tag attached to human experts.

**Challenges and Ethical Considerations: A Delicate Balancing Act Amid Advancements in LM Technology** 
However, as with any powerful tool that can shape minds and opinions—language being a cornerstone of culture — there are ethical dilemmas to consider too. For instance; the potential for amplifying biases present within training data sets or issues surrounding consent when using generated content without proper citation mechanisms in place could raise questions about intellectual property rights as AI becomes more prevalent creators rather than tools alone used by humans today (Mollick, 2019).

Moreover privacy concerns lurk beneath these artificial intelligence advances too. Who owns the information an algorithm processes during its training phase or when generating content based on user inputs? There have been cases where companies were accused of exploiting personal data without explicit consent—an issue that calls for strict regulatory measures to protect individual rights (Cathala, 2019).

**Mitigating Bias: Unveiling the Hidden Shadows in LM Algorithms' Reflection Pools* **
Another challenge lies within inherent biases—a reflection of past societal norms present even nowadallround us. Language models, albeit advanced ones like GPT-3 or their successors trained on diverse data sets strive to avoid such pitfalls but often find themselves embodying stereotypes and prejudices ingrained within historical texts they've learned from (Bender et al., 2019).

Researchers are working tirelessly towards mitigating these biases by diversifying training datasets or developing novel algorithms capable of identifying problematic patterns—steps necessary to ensure fair representation across different demographics when deploying LMs in various sectors.

**The Future Landscape: Predictive Capabilities and Transformative Potential Of Language Models* **
Looking ahead, Large Language Model technology holds immense potential for transformational applications beyond communication—spanning healthcare diagnosis systems informed by patient history records or predictive maintenance in manufacturing through real-time analysis of operational data (Nguyen et al., 2019).

In conclusion as we stand at this crossroads between human ingenuity and artificial intelligence prowess; it's imperative for stakeholders - researchers, businesses & policymakers—to collaborate in nurturing responsible deployment strategies. By addressing ethical concerns head-on while leveraging these models’ predictive capabilities we can shape a future where technology enhances humanity without compromising individual rights or perpetuating biases rooted deep within our societal fabric (Susskind & Thomas, 2019).
**Reference:**
Bender, S. E., et al. (2018). When Machine Learning Becomes Biased and How to Mitigate It. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, vol 32(submission date not specified) available at https://arxiv.org/abs/1610.08029
Cathala, M., & Gandon, V. (Eds.). (2019). Ethical and Societal Aspect of Artificial Intelligence: A European Approach to Responsible Innovation in Industrial Enterprises and Public Administration Proceedings 7th International Conference on Social Impact Assessment - ICSIA.
Nguyen, D., et al (2019). Machine Learning for Predictive Maintenance: Case Studies from Manufacturing Sector at the IEEE/ACM international conference on e-Technics and Technologies in Industry – ICT’Industria 36.
Susskind, J., & Thomas, H.-P. (2019). Artificial intelligence and public values: Why smart machines can't make value judgments but humans must learn how to do so themselves Journal of Applied Philosophy doi:/doi/10.1111/japp.12458
Mollick, Jesse (2019). The economist explains: Will artificial intelligence end up destroying jobs? https://www.theeconomist.com accessed on 3rd February, 2023

We immediately notice the difference in response generated by phi3.5 when compared to mistral and llama. This is primarily beacuse phi3.5 is a smaller model that the other two. The model generates complicated responses that doesn't look very natural. 

## Code Generation 

Let us test the code generation capabilities of all these models. We give a prompt to each of these models to generate a python function, and then analyze the response of each model. Feel free to edit the prompt and make the models generate other responses. 

### Mistral

Mistral:7b is really good at coding tasks. I even comes near to CodeLlama 7b at code generation tasks while being equally good at English language. Let us put Mistral's coding abilities to test. 

In [14]:
model_mistral = ChatOllama(model="mistral:7b", base_url="http://10.79.253.112:11434")

In [20]:
messages = [
    SystemMessage(
        content="You are a helpful chat assistant that generates python code for a given user query"   #Instruction to the LLM
    ),
    HumanMessage(
        content="Write a python function that recursively compute factorial of a number"                                #The human Question 
    )
]

response = model_mistral.invoke(messages)                           #Invokes the chain with the message we designed
display(Markdown(response.content))

 Here is a Python function that uses recursion to compute the factorial of a given number:

```python
def factorial(n):
    if n == 0 or n == 1:
        return 1
    else:
        return n * factorial(n - 1)
```

You can use this function by calling `factorial(number_to_find_factorial)`. For example, `print(factorial(5))` will output `120`, which is the factorial of 5.

It generated a pretty good response. But we can notice that it did not provide a function description, return type and the argument definitions. Apart from that, the code is straight forward and concise.

### LLaMA3.1:8b 

LLaMA3.1:8b is a really good LLM for coding tasks. It outperforms most of the models of its size and even comes closer to some bigger models. 

In [18]:
model_llama = ChatOllama(model="llama3.1:8b", base_url="http://10.79.253.112:11434")

In [21]:
messages = [
    SystemMessage(
        content="You are a helpful chat assistant that generates python code for a given user query"   #Instruction to the LLM
    ),
    HumanMessage(
        content="Write a python function that recursively compute factorial of a number"                                #The human Question 
    )
]

response = model_llama.invoke(messages)                           #Invokes the chain with the message we designed
display(Markdown(response.content))

Here is a Python function that recursively computes the factorial of a number:

```python
def factorial(n):
    """
    Recursively computes the factorial of a number.

    Args:
        n (int): The input number.

    Returns:
        int: The factorial of n.
    """
    if n == 0 or n == 1:
        return 1
    else:
        return n * factorial(n-1)
```

Example usage:

```python
print(factorial(5))  # Output: 120
```

This function takes an integer `n` as input, and returns its factorial. If `n` is 0 or 1, it returns 1 (since the factorial of 0 and 1 are both 1). Otherwise, it calls itself with `n-1` as argument and multiplies the result by `n`, effectively calculating the factorial.

Please note that this recursive approach may lead to a stack overflow for large values of n due to the limited size of Python's call stack. If you need to compute factorials of very large numbers, an iterative solution would be more suitable.

The response generated by LLaMa is really good. It provided all the function description and argument definitions. It also generated the shortcomings of computing factorials using the recursive approach. The generated response is self explanatory, anybody reading it can understand what the code is about. 

### Phi3.5

Let us test the coding abilities of a really light-weight LLM and see how it performs against larger LLMs like Mistral & LLaMA.

In [22]:
model_phi = ChatOllama(model="phi3.5:latest", base_url="http://10.79.253.112:11434")

In [23]:
messages = [
    SystemMessage(
        content="You are a helpful chat assistant that generates python code for a given user query"   #Instruction to the LLM
    ),
    HumanMessage(
        content="Write a python function that recursively compute factorial of a number"                                #The human Question 
    )
]

response = model_phi.invoke(messages)                           #Invokes the chain with the message we designed
display(Markdown(response.content))

Here's a Python function using recursion to calculate the factorial of a non-negative integer:

```python
def factorial(n):
    # Base case: if n is zero, its factorial is 1
    if n == 0:
        return 1
    
    # Recursive case: multiply current number with previous result (factorial of one less)
    else:
        return n * factorial(n-1)
```
To use this function, simply call it with a non-negative integer as an argument like so: `result = factorial(5)`. 

Remember that the recursive solution has limitations; for large values of 'n', you might encounter problems due to stack overflow. A loop based iterative method is usually more efficient and reliable in such cases, although it doesn't use recursion explicitly as requested here.

The code and explanations that follow looks good. However, including the base case of `n == 1` is missing here. Although this doesn't change the output, it might result in an additional recursion call, unnecessarily increasing recursion depth. So the response by Mistral or LLaMA is better. 

## Text Summarization 

Large Language Models (LLMs) are highly effective for text summarization as they can grasp context and extract key information across lengthy texts. They leverage extensive training on diverse data to generate concise summaries while retaining the original meaning and essential details. LLMs handle various summarization styles, from extractive (directly pulling important sentences) to abstractive (generating novel sentences). This adaptability makes them valuable for applications across industries, from media and research to customer support and legal fields, improving efficiency in processing vast amounts of information. 

We provide a paragraph on HPE Proliant servers to each of these LLMS and ask them to summarize it in 2 short sentences. We can then analyze each outputs. 

### Mistral:7b

In [24]:
model_mistral = ChatOllama(model="mistral:7b", base_url="http://10.79.253.112:11434")

In [28]:
prompt = PromptTemplate.from_template(
    "Write a short, summarized version of the provided paragraph in 2 sentences {paragraph}"
)

paragraph = """HPE ProLiant servers—The world’s most secure industry standard servers,1
                HPE ProLiant Gen10 and Gen10 Plus servers coupled with HPE OneView, HPE InfoSight, 
                and HPE OneSphere deliver software-defined compute to accelerate application performance, 
                infrastructure and application deployment, and improve server operations. 
                Our wide selection of multicore, multiprocessor servers, and server blades meet needs 
                ranging from those of cost-sensitive growing businesses to the performance and scalability 
                demands of global enterprises. HPE ProLiant servers support the industry’s leading operating 
                systems and applications for data centers of all sizes. hpe.com/info/ proliant-dl-servers, 
                hpe.com/info/towerservers, hpe.com/info/bladesystem"""


chain = prompt | model_mistral | StrOutputParser()

response = chain.invoke({"paragraph" : paragraph})

display(Markdown(response))

1. HPE ProLiant servers are the world's most secure industry-standard servers, available in various configurations to cater to businesses of all sizes. They offer software-defined compute solutions with features like HPE OneView, HPE InfoSight, and HPE OneSphere for enhanced application performance, deployment, and server operations.

2. These servers support the leading operating systems and applications, providing excellent scalability for global enterprises while also catering to the budget needs of growing businesses. Visit hpe.com/info/proliant-dl-servers, hpe.com/info/towerservers, or hpe.com/info/bladesystem for more details.

Mistral captured all the important details in the original paragraph. But, it provided two large sentences. It wasn't able to provide a short summary. 

### LLaMA3.1:8b

In [29]:
model_llama = ChatOllama(model="llama3.1:8b", base_url="http://10.79.253.112:11434")

In [30]:
prompt = PromptTemplate.from_template(
    "Write a short, summarized version of the provided paragraph in 2 sentences {paragraph}"
)

paragraph = """HPE ProLiant servers—The world’s most secure industry standard servers,1
                HPE ProLiant Gen10 and Gen10 Plus servers coupled with HPE OneView, HPE InfoSight, 
                and HPE OneSphere deliver software-defined compute to accelerate application performance, 
                infrastructure and application deployment, and improve server operations. 
                Our wide selection of multicore, multiprocessor servers, and server blades meet needs 
                ranging from those of cost-sensitive growing businesses to the performance and scalability 
                demands of global enterprises. HPE ProLiant servers support the industry’s leading operating 
                systems and applications for data centers of all sizes. hpe.com/info/ proliant-dl-servers, 
                hpe.com/info/towerservers, hpe.com/info/bladesystem"""


chain = prompt | model_llama | StrOutputParser()

response = chain.invoke({"paragraph" : paragraph})

display(Markdown(response))

Here is a 2-sentence summary:

HPE ProLiant servers offer secure and software-defined compute solutions that accelerate application performance and improve server operations. With a wide range of options, including multicore and multiprocessor servers, HPE ProLiant supports the needs of businesses and enterprises of all sizes and scales.

In [31]:
model_phi = ChatOllama(model="phi3.5:latest", base_url="http://10.79.253.112:11434")

In [32]:
prompt = PromptTemplate.from_template(
    "Write a short, summarized version of the provided paragraph in 2 sentences {paragraph}"
)

paragraph = """HPE ProLiant servers—The world’s most secure industry standard servers,1
                HPE ProLiant Gen10 and Gen10 Plus servers coupled with HPE OneView, HPE InfoSight, 
                and HPE OneSphere deliver software-defined compute to accelerate application performance, 
                infrastructure and application deployment, and improve server operations. 
                Our wide selection of multicore, multiprocessor servers, and server blades meet needs 
                ranging from those of cost-sensitive growing businesses to the performance and scalability 
                demands of global enterprises. HPE ProLiant servers support the industry’s leading operating 
                systems and applications for data centers of all sizes. hpe.com/info/ proliant-dl-servers, 
                hpe.com/info/towerservers, hpe.com/info/bladesystem"""


chain = prompt | model_phi | StrOutputParser()

response = chain.invoke({"paragraph" : paragraph})

display(Markdown(response))

HPE ProLiant servers offer the highest security standards for industry standard computing needs and provide software-defined compute to enhance application performance and server operations through HPE OneView, InfoSight, and OneSphere technologies; they cater to a broad range of requirements from cost-conscious businesses scaling upwards. They support leading operating systems across various data center sizes with options for multicore or multiprocessor servers suitable for both entry-level enterprises and global powerhouses demanding top performance, available at hpe.com/proliant.


<div style="text-align: left;">
    <img src="logo.png" alt="flow" width="150" height="100">
</div>