```
  /$$$$$$  /$$   /$$ /$$$$$$$   /$$$$$$ 
 /$$__  $$| $$  | $$| $$__  $$ /$$__  $$
| $$  \__/| $$  | $$| $$  \ $$| $$  \ $$
|  $$$$$$ | $$  | $$| $$  | $$| $$  | $$
 \____  $$| $$  | $$| $$  | $$| $$  | $$
 /$$  \ $$| $$  | $$| $$  | $$| $$  | $$
|  $$$$$$/|  $$$$$$/| $$$$$$$/|  $$$$$$/
 \______/  \______/ |_______/  \______/ 


 @Author : Pierre Lague

 @Email : p.lague@sudogroup.fr

 @Date : 04/10/2024

```

---

# LLM - Function Calling and Retrieval Augmented Generation

This notebook aims to display the basics of FC and RAG for LLM use in a local environement.

>N.B. User should have Ollama installed

**Retrieval-Augmented Generation (RAG)** is an AI technique that enhances the capabilities of large language models by combining them with external knowledge retrieval. In RAG, when a query is received, relevant information is first retrieved from a knowledge base. This retrieved context is then provided to the language model along with the original query, allowing the model to generate a response that's informed by both its pre-trained knowledge and the specific, relevant information from the external source. This approach helps to ground the model's responses in factual information, reduce hallucinations, and provide more up-to-date or domain-specific answers.

![alt-text](./assets/schema_RAG.webp)

**Function calling**, in the context of large language models, refers to the ability of these models to recognize when a specific task or query requires the execution of an external function, and to call or recommend calling that function appropriately. Instead of trying to generate the answer itself, the model identifies that the query maps to a particular function and outputs a structured request to call that function. This capability allows language models to integrate more seamlessly with external tools, APIs, or data sources, extending their functionality beyond mere text generation. For instance, if asked about the current time, a model with function calling capabilities might recognize that it needs to call a time-retrieval function rather than trying to guess the time based on its training data.

![alt-text](./assets/llm-function-calls.webp)



In [3]:
import requests
import json
import os
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [4]:
OLLAMA_API = "http://localhost:11434/api/generate"

In [7]:
def load_knowledge_base(file_path):
    """Loads the knowledge base for the model to retrieve information and context.

    Args:
        file_path (string): filepath to the knowledge base

    Returns:
        the content of the knowledge base.
    """
    with open(file_path, 'r') as file:
        return file.read().split('\n\n')  # Assuming paragraphs are separated by blank lines

def query_model(prompt, model="tinyllama"):
    """Queries the model given in parameter with a prompt.

    Args:
        prompt (string): prompr from which the model receives instructions on how to handle it's knowledge base. 
        model (str, optional): Name of the Ollama model. Defaults to "tinyllama".

    Returns:
        string : the response from the model
    """
    response = requests.post(OLLAMA_API, json={
        "model": model,
        "prompt": prompt,
        "stream": False
    })
    return response.json()['response']

def retrieve_context(query, knowledge_base, top_n=3):
    """This function retrieves context from the knowledge base.
    It starts by vectorizing the content and computing the similarities bewteen the the content of the KB
    and the vectorized query. Getting the top_n best similarities will form the answer.

    Args:
        query (str): the query for the model
        knowledge_base (file/str): the KB containing context
        top_n (int, optional): how much of the top similarities we use and return. Defaults to 3.

    Returns:
        the content of the KB that most corresponds to our query based on cosine similarity.
    """
    vectorizer = TfidfVectorizer()
    kb_vectors = vectorizer.fit_transform(knowledge_base)
    query_vector = vectorizer.transform([query])
    
    similarities = cosine_similarity(query_vector, kb_vectors)[0]
    top_indices = similarities.argsort()[-top_n:][::-1]
    
    return "\n".join([knowledge_base[i] for i in top_indices])

def rag_query(query, knowledge_base, top_n):
    context = retrieve_context(query, knowledge_base, top_n=top_n) #getting the context
    
    # this is called prompt engineering, guiding the LLM to act how we wish by providing it clear instructions.
    prompt = f"""Context: {context}

Query: {query}

Based on the context provided, please answer the query. If the context doesn't contain relevant information, say so and provide a general answer based on your knowledge of FinOps and cloud computing."""
    
    return query_model(prompt)

In [6]:
kb_file_path = os.path.join("./data/finops_cloud_kb.txt")
knowledge_base = load_knowledge_base(kb_file_path)

print("FinOps and Cloud Computing Q&A System")
print("Enter 'quit' to exit")

while True:
    query = input("\nEnter your query: ")
    if query.lower() == 'quit':
        break
    
    response = rag_query(query, knowledge_base, top_n=10)
    print(f"\nResponse: {response}")

FinOps and Cloud Computing Q&A System
Enter 'quit' to exit

Response: In addition to providing cloud services and optimizing infrastructure, FinOp strategies can help businesses achieve cost optimization by automating tasktags, integrating machine learning, and adopting more sophisticated solutions like IA. While continuously evolving in a constant pace, FinOps provides opportunities for adapting quickly to changes and implementing culture of optimizing team-wide. The combination of proactive methodologies, a well-strategized approach, and cost transformation as the underlying motivator can lead to significant cost savings for businesses.



RAG Implementation in the Script:

1. Knowledge Base: We have a simple in-memory knowledge base represented by the `knowledge_base` which is a text file containing information about the cloud and FinOps. In a real-world scenario, this could be a much larger database or document store.

2. Retrieval Function: The `retrieve_context` function implements a basic retrieval mechanism. It searches the knowledge base for sentences containing the query and concatenates them.

3. RAG Process: The `rag_example` function ties it all together:
   - It first calls `retrieve_context` to get relevant information.
   - It then constructs a prompt that includes both the retrieved context and the original query.
   - Finally, it sends this enriched prompt to the language model via the `query_model` function.

This implementation allows the model to generate responses based on both its pre-trained knowledge and the specific information retrieved from the knowledge base.