### **What is Langchain**

LangChain is a framework designed to build applications using Large Language Models (LLMs) by integrating components like memory, chaining, retrieval, and agentic workflows. It simplifies tasks such as text generation, summarization, and question-answering by connecting LLMs with external data sources, APIs, and tools.

**Key features**:
- **Chains**: Sequences of LLM calls or other functions for structured execution.
- **Memory**: Stores conversation history to maintain context.
- **Retrieval**: Integrates with vector databases like FAISS or Pinecone for knowledge augmentation.
- **Agents**: Uses tools dynamically to enhance decision-making.
- **Integrations**: Supports OpenAI, Hugging Face, Cohere, Gemini, and more.

LangChain is widely used for chatbots, RAG (Retrieval-Augmented Generation), and workflow automation. It streamlines LLM usage by handling API calls, formatting, and orchestration, making it a key tool for AI-driven applications.



In this blog, we will look at the basic steps in using the model, prompts and output parsing. Let's learn a bit about these first -

**1. Models**  
LangChain supports multiple types of language models:  
- **LLMs (Large Language Models):** OpenAI, Hugging Face, Cohere, LLaMA, GPT4All, Gemini, etc.  
- **Chat Models:** Optimized for multi-turn conversations (e.g., OpenAI ChatGPT, Claude, Gemini).  
- **Embedding Models:** Convert text into vector representations for similarity search (e.g., OpenAI Embeddings, SentenceTransformers).  
- **Custom Models:** Allows fine-tuned or self-hosted models via API integration or local execution.  

**2. Prompting**  
- **PromptTemplates:** Structures prompts dynamically using placeholders (e.g., `f"Summarize: {text}"`).  
- **Few-shot Prompting:** Provides examples within the prompt to improve response quality.  
- **Retrieval-Augmented Prompting:** Fetches external knowledge before generating responses.  
- **Parameterized Inputs:** Customizes prompts based on user input for personalization.  

**3. Parsing**  
- **Output Parsers:** Extract structured data (e.g., JSON, lists) from model responses.  
- **Regex/Text Parsers:** Cleans and formats LLM output for specific applications.  
- **Function Calling (Tool Use):** Directs LLMs to produce API calls, structured responses, or execute commands.

Models refer to the language models that power many applications. A significant part of working with them involves designing inputs, known as prompting, to effectively communicate with the model. On the other end, parsing focuses on structuring the model's output into a more organized format for downstream tasks.  

When building an application with large language models (LLMs), there will often be reusable components that repeatedly prompt the model, process its output, and handle related operations. This allows for a more efficient and structured approach, providing a set of abstractions that simplify these tasks.

### **Using Gemini**

**Gemini Model (by Google DeepMind)**  
Gemini is a family of multimodal AI models developed by Google DeepMind, designed for text, images, audio, and code processing. Built on transformer-based architecture, it excels in reasoning, retrieval-augmented generation (RAG), and advanced problem-solving.  

Key features:  
- **Multimodal Capabilities:** Understands and generates text, images, and videos.  
- **Integration with Google Tools:** Works with Search, Docs, and Workspace apps.  
- **Variants:** Gemini 1, 1.5, and future iterations optimized for efficiency.  
- **Fine-tuning & API Access:** Available via Google Cloud for enterprise use.  

For the API access to Gemini, visit - https://aistudio.google.com/apikey and get the API key to get started.

Gemini competes with OpenAI’s GPT-4 and is optimized for real-world applications like research, education, and business automation.

#### **Model Setup and Configuration**

Explanation of below code block -
- Imports required libraries (os for environment variables and google.generativeai for Gemini API).
- Configures the Gemini API by retrieving the API key from the environment (GEMINI_API_KEY).

In [2]:
import os
import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

- Defines a dictionary generation_config to configure the model’s behavior, setting parameters like:
  - temperature (0.2) → Controls randomness (lower = more deterministic).
  - top_p (0.95) & top_k (40) → Control sampling diversity.
  - max_output_tokens (8192) → Limits response length.
  - response_mime_type → Ensures text output format.

- Initializes the llm_model as gemini-1.5-flash with the defined configuration.

In [3]:
# Loading the model
generation_config = {
  "temperature":0.2,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}


llm_model = genai.GenerativeModel(
  model_name="gemini-1.5-flash",
  generation_config=generation_config,
)


The above codes sets up the Gemini model with specified generation settings, ensuring controlled, structured outputs.

- Defines get_completion(prompt, model=llm_model) to send a prompt to Gemini and return the response.

- Calls get_completion("What is 1+1?") and prints the response and token usage.

#### **Model Usage in simplest format**

In [4]:
def get_completion(prompt, model=llm_model):

    response = llm_model.generate_content(
        prompt
    )

    return response


response = get_completion("What is 1+1?")

In [5]:
print('Response is : ', response.candidates[0].content.parts[0])
print('Token Usage : ', response.usage_metadata.total_token_count)

Response is :  text: "1 + 1 = 2\n"

Token Usage :  15


Now, let's say we have a customer email and we want to convert it into a predefined -
- Define a customer_email containing a complaint.
- Specify a style variable (American English in an angry, rude tone used here).
- Constructs a formatted prompt instructing Gemini to rewrite customer_email in the defined style.
- Calls get_completion(prompt) and prints the transformed response with token usage.

In [6]:
customer_email = """
Our server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933. The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM and troubleshoot the network issue affecting the iLO."""


style = """American English in a very angry and rude tone"""

In [7]:
prompt = f"""Translate the text that is delimited by triple backticks into a style that is {style}.
text: ```{customer_email}```
"""

print(prompt)  # This is how the prompt looks with your defined style and customer mail

Translate the text that is delimited by triple backticks into a style that is American English in a very angry and rude tone.
text: ```
Our server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933. The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM and troubleshoot the network issue affecting the iLO.```



In [8]:
response_1 = get_completion(prompt)

In [9]:
print('Response is : ', response_1.candidates[0].content.parts[0])
print('\n')
print('Token Usage : ', response_1.usage_metadata.total_token_count)

Response is :  text: "ARE YOU KIDDING ME?!  That WORTHLESS GX-780 SERVER is DOWN AGAIN!  DIMM slot 4 is FRIED – a freakin\' 16GB-DDR4-2933 piece of junk! And the iLO?  It\'s decided to take a nap, with a pathetic 50% packet loss on port 2!  Get your sorry butts over here NOW and fix this garbage before I personally replace every single one of you with a toaster oven that\'s probably more reliable!  We need someone ON-SITE, and we need them YESTERDAY!\n"



Token Usage :  249


Customer mail is now changed to a very rude style as you can read from the above. Can the same prompt convert the texts if in other language? Let's see -

In [10]:
# Email is stated in Bengali language now
customer_email_in_bengali = """আমাদের সার্ভার, মডেল GX-780, গুরুতর ডাউনটাইমের সম্মুখীন হচ্ছে। আমরা স্লট 4-এ, বিশেষ করে পার্ট নম্বর 16GB-DDR4-2933-এ মেমরি DIMM ত্রুটি দেখতে পাচ্ছি। iLO একটি নেটওয়ার্ক সংযোগ ব্যর্থতার রিপোর্ট করছে, পোর্ট 2-এ 50% প্যাকেট ক্ষতির হার রয়েছে। ত্রুটিপূর্ণ DIMM প্রতিস্থাপন এবং iLO-কে প্রভাবিত করে এমন নেটওয়ার্ক সমস্যা সমাধানের জন্য আমাদের তাৎক্ষণিকভাবে সাইট সহায়তা প্রয়োজন।"""

style = """American English in a very soft and polite tone"""  # Now we are using the style of soft and polite

In [11]:
prompt = f"""Translate the text that is delimited by triple backticks into a style that is {style}.
text: ```{customer_email_in_bengali}```
"""

response_2 = get_completion(prompt)


print('Response is : ', response_2.candidates[0].content.parts[0])
print('\n')
print('Token Usage : ', response_2.usage_metadata.total_token_count)

Response is :  text: "Our server, model GX-780, is experiencing a significant outage.  We\'re seeing a memory DIMM error in slot 4, specifically part number 16GB-DDR4-2933.  The iLO is reporting a network connectivity failure, with a 50% packet loss rate on port 2. We would be very grateful for immediate on-site assistance to replace the faulty DIMM and resolve the network issue affecting the iLO.\n"



Token Usage :  341


Gemini can translate languages using this method, but check the documentation for supported languages. Till now, we have seen defining a function for generating responses, tested it with a simple query, and then processed a customer complaint by reformatting it into a specific style.

### **Using Gemini with Langchain**

Using LangChain for the above tasks simplifies API calls, prompt management, and response parsing. It enables structured workflows, caching, and memory for conversation history, improving efficiency. LangChain can enhance this use case by integrating **retrieval-augmented generation (RAG)** for knowledge-based responses, **output parsers** for structured results, and **agents/tools** for dynamic prompt adjustments. It also supports **multi-model orchestration**, allowing seamless switching between Gemini and other LLMs.

In [12]:
# You will need to install the google genai langchain library
#!pip install -qU langchain-google-genai

Setup: Imports GoogleGenerativeAI from LangChain, sets GEMINI_API_KEY, and initializes the Gemini gemini-1.5-flash model with temperature 0 for deterministic responses.

In [13]:
# Langchain with Gemini

from langchain_google_genai import GoogleGenerativeAI
import os

In [15]:
os.environ["GOOGLE_API_KEY"] = GEMINI_API_KEY   # This has to be done for the model to recognize the API Key for Gemini

# Try running the next block without setting this API. You will get error when you will invoke the model
# If you set the API and then run the next block, you will see the field of google_api_key=SecretStr('**********') in the llm model,
# ensuring your model recognizes the API

In [16]:
# Let's initialize the model first

llm = GoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0   # Setting temperature to 0 for definite output
)

llm

GoogleGenerativeAI(model='gemini-1.5-flash', google_api_key=SecretStr('**********'), temperature=0.0, client=ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', google_api_key=SecretStr('**********'), temperature=0.0, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7eb6c4fdde90>, default_metadata=()))

This is a template string that defines a prompt structure for an LLM. It instructs the model to translate text into a specific style. {style} and {text} are placeholders that will be replaced with actual values later. The text to be translated is enclosed within triple backticks (```) to ensure clarity and prevent misinterpretation by the model.

It makes the prompt reusable and dynamic for different inputs. It helps in maintaining structure while customizing the style and text for various use cases.

In [17]:
template_string = """Translate the text that is delimited by triple backticks into a style that is {style}.
text: ```{text}```
"""

ChatPromptTemplate is a LangChain utility that formats prompts dynamically. It allows for parameterized prompt creation where different values can be inserted into predefined templates. It helps standardize prompt structures for LLM calls, and makes it easy to apply the same logic across multiple text transformations.

.from_template(template_string) converts the template_string into a ChatPromptTemplate instance. This enables easy formatting by replacing {style} and {text} dynamically. The resulting object (prompt_template) can now be used to generate formatted prompts.

This avoids manually constructing prompts for every input, and makes it scalable for various use cases like translation, summarization, or sentiment analysis.

In [18]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(template_string)

In [19]:
prompt_template  # This is how the prompt template instance for your template string looks

ChatPromptTemplate(input_variables=['style', 'text'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['style', 'text'], input_types={}, partial_variables={}, template='Translate the text that is delimited by triple backticks into a style that is {style}. \ntext: ```{text}```\n'), additional_kwargs={})])

In [20]:
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['style', 'text'], input_types={}, partial_variables={}, template='Translate the text that is delimited by triple backticks into a style that is {style}. \ntext: ```{text}```\n')

Customer Complaint Processing: Converts a pirate-themed complaint into British English in an angry tone using format_messages from the ChatPromptTemplate.

.format_messages() replaces {style} and {text} in prompt_template with actual values:
- customer_style: Defines the desired tone/style for the translation (e.g., "British English in a very angry tone").
- customer_email: The text that needs to be transformed.

The formatted output (customer_messages) is now ready to be sent to an LLM for processing. This automates text transformation in a structured manner, and ensures consistent, reproducible outputs for different inputs.

In [21]:
customer_style = """British English in a very angry tone"""

customer_email = """
Our server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933. The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM and troubleshoot the network issue affecting the iLO.
"""

customer_messages = prompt_template.format_messages(style = customer_style, text = customer_email)

In [22]:
customer_messages

[HumanMessage(content="Translate the text that is delimited by triple backticks into a style that is British English in a very angry tone. \ntext: ```\nOur server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933. The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM and troubleshoot the network issue affecting the iLO.\n```\n", additional_kwargs={}, response_metadata={})]

In [23]:
print(type(customer_messages))
print(type(customer_messages[0]))

<class 'list'>
<class 'langchain_core.messages.human.HumanMessage'>


Sending it to the LLM for processing via llm.invoke() -

In [24]:
customer_response = llm.invoke(customer_messages)

In [25]:
customer_response

"Right, listen here, you lot!  Our blasted GX-780 server's gone belly up!  We've got catastrophic downtime, thanks to a flipping memory DIMM error on slot four – the 16GB-DDR4-2933 blighter, specifically!  And the iLO?  It's decided to have a complete hissy fit, reporting a network connection failure with a scandalous 50% packet loss on port two!  We need someone, *anyone*, on-site this instant to replace that infernal DIMM and sort out this bloody network mess before I lose my utterly livid mind!  Get a move on!"

Similarly, we will perform the Service Reply Transformation: A customer service response is rewritten in a teasing, joking tone using the same prompt template.

In [26]:
service_reply = """Thank you for reporting the server issues on your GX-780. We understand the severity of the downtime. Our technicians are currently investigating the DIMM error on slot 4, part 16GB-DDR4-2933, and the network issues affecting the iLO port 2. We've dispatched a technician to your location with replacement DIMMs. They'll also run diagnostics on the network connectivity to isolate and resolve the packet loss. We'll provide updates every 30 minutes until the server is fully operational.
"""

service_style = """a teasing and joking tone that speaks in English"""


service_messages = prompt_template.format_messages(style = service_style, text = service_reply)


service_response = llm.invoke(service_messages)

In [28]:
print(service_response)

Oh, honey, your GX-780 decided to throw a little hissy fit, huh?  We heard the screams – the *digital* screams, that is.  Turns out, it's got a case of the "DIMM-s" (get it?  DIMM-s?  Because it's a DIMM error?!  Okay, I'll stop).  Slot 4 is feeling a little left out, apparently, and that 16GB-DDR4-2933 stick is staging a full-blown rebellion.  And the iLO port 2?  Drama queen.  Total packet loss.  The whole thing's a soap opera.

But don't you worry your pretty little head! We've sent in the cavalry – or, you know, a technician.  They're bringing backup DIMMs (because apparently, one rebellious stick wasn't enough drama).  They'll also give that network connection a good talking-to.  We'll keep you updated every 30 minutes, so you can live-tweet this whole tech-support saga.  Stay tuned for more exciting developments!


#### **Using Output Parser of Langchain**

Now, let's say we want output in the form of JSON or dictionary that will have the key as our important data to search for and the value as the value reported in the mail. We can create a broad template for the same as below -

In [41]:
customer_review = """\
Our server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933.\
The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM\
and troubleshoot the network issue affecting the iLO.
"""

review_template = """\
For the following text, extract the following information:

model: Is any model name specified? \Answer the model name directly if yes, False if not or unknown.

issue_count: How many issues are reported for the product? If this information is found then give the number, else -1.

parts_reported: Which are all the part names that got reported? Give all the names. If no part names found then reply with NA.

Format the output as JSON with the following keys:
model
issue_count
parts_reported

text: {text}
"""


# Passing the review_template in the ChatPromptTemplate to create an instance of it
prompt_template = ChatPromptTemplate.from_template(review_template)


# Passing the customer review in the prompt template instance with the text as the placeholder
messages = prompt_template.format_messages(text=customer_review)


# Invoking model
response_3 = llm.invoke(messages)

In [42]:
print(response_3)

```json
{
  "model": "GX-780",
  "issue_count": 2,
  "parts_reported": ["16GB-DDR4-2933", "iLO"]
}
```


In [43]:
type(response_3)

str

The above output is in string and not in JSON or dictionary format. So we need to do some work around here.

- ResponseSchema: Defines the expected output structure by specifying what fields to extract and their descriptions.

- StructuredOutputParser: Ensures that the output follows the defined schema and is formatted correctly.

In [49]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser


# Defines a schema named "model". The description instructs the LLM to extract the model name if mentioned in the input text. If no model name is found, it should return "False".
model_schema = ResponseSchema(name="model",
                             description="Is any model name specified? Answer the model name directly if yes, False if not or unknown.")

issue_count_schema = ResponseSchema(name="issue_count",
                                      description="How many issues are reported for the product? If this information is found then give the number, else -1.")

parts_reported_schema = ResponseSchema(name="parts_reported",
                                    description="This tells about the issues reported in the text? Give all the names. If no part names found then reply with NA.")


# Stores the defined schemas in a list to be used for structured parsing
response_schemas = [model_schema,
                    issue_count_schema,
                    parts_reported_schema]


# Creates an instance of StructuredOutputParser, ensuring that the extracted information follows the defined schema
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)


# Generates formatting instructions for the LLM to structure its output properly
format_instructions = output_parser.get_format_instructions()


print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"model": string  // Is any model name specified? Answer the model name directly if yes, False if not or unknown.
	"issue_count": string  // How many issues are reported for the product? If this information is found then give the number, else -1.
	"parts_reported": string  // This tells about the issues reported in the text? Give all the names. If no part names found then reply with NA.
}
```


- Defines a prompt template that instructs the LLM on how to extract the required details.

- Uses {text} as a placeholder for the input text (product review).

- Uses {format_instructions} to enforce structured output formatting.

In [50]:
review_template_2 = """\
For the following text, extract the following information:

model: Is any model name specified? Answer the model name directly if yes, False if not or unknown.

issue_count: How many issues are reported for the product? If this information is found then give the number, else -1.

parts_reported: Tell the issues reported? Give all the names. If no part names found then reply with NA.

text: {text}

{format_instructions}
"""


# Converts the review_template_2 string into a ChatPromptTemplate, making it reusable for different inputs
prompt = ChatPromptTemplate.from_template(template=review_template_2)



# Fills in {text} with the actual customer review (customer_review)
# Inserts the format instructions into {format_instructions}
# Generates a properly formatted LLM input message
messages = prompt.format_messages(text=customer_review,
                                format_instructions=format_instructions)


response_4 = llm.invoke(messages)

In [51]:
print(response_4)

```json
{
  "model": "GX-780",
  "issue_count": "2",
  "parts_reported": "memory DIMM, iLO"
}
```


In [52]:
output_dict = output_parser.parse(response_4)

print(output_dict)

{'model': 'GX-780', 'issue_count': '2', 'parts_reported': 'memory DIMM, iLO'}


In [53]:
type(output_dict)

dict

Thus we got the dictionary type as output!

### **Using Open Source Model**

You might not have an API key to use Gemini, but no worries! There are plenty of open-source models available. While their output may not be as refined as Gemini or GPT, they can still deliver accurate results. With proper model tuning, you can enhance their performance and achieve more precise, targeted outputs.

Every concept is remaining the same. We are just using the llama instruct model from GPT4All. What's GPT4All now?

GPT4All is an open-source, locally runnable language model designed for accessibility and efficiency. It enables users to run large language models (LLMs) on personal devices without requiring cloud-based APIs. It supports multiple models, including Meta’s LLaMA, MPT, Falcon, and GPT-J, optimized for different use cases.

Key features of GPT4All:
- Local Execution: Runs entirely on a personal machine without internet dependency.
- Privacy-Focused: No external data transmission, ensuring data security.
- Customizable & Extendable: Supports fine-tuning and model customization.
- Lightweight & Efficient: Optimized for consumer-grade hardware.

So, GPT4All is ideal for users who need local, private, and free AI models, whereas OpenAI's GPT models offer superior performance, cloud accessibility, and commercial-grade capabilities.

We need to load the model first. I am picking Llama instruct model. You need to perform !pip install llama-cpp-python

#### **Using Llama from GPT4All Directly**

In [55]:
#!pip install llama-cpp-python

In [56]:
from llama_cpp import Llama

llm_model = Llama.from_pretrained(
	repo_id="GPT4All-Community/Meta-Llama-3.1-8B-Instruct-128k-GGUF",
	filename="Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf",
  n_ctx_per_seq = 128000,
  verbose=False
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


(…)eta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf:   0%|          | 0.00/4.66G [00:00<?, ?B/s]

llama_init_from_model: n_ctx_per_seq (512) < n_ctx_train (131072) -- the full capacity of the model will not be utilized


Fetching the completion prompt varies slightly depending on the model. The response generation process differs across models, so it's important to refer to the documentation for specifics. For LLaMA models in GPT4All, the structure is quite similar to OpenAI's GPT models.

In [57]:
def get_completion(prompt, model=llm_model):
    messages = [{"role": "user", "content": prompt}]
    response = model.create_chat_completion(
        messages=messages,
        temperature=0,
    )
    return response['choices'][0]['message']['content']

In [58]:
response = get_completion("What is 1+1?")
response

'1 + 1 = 2'

In [59]:
response = get_completion("Can you work for me? Tell me within 10 words")
response

'I can provide information, answer questions, and assist with tasks.'

In [64]:
customer_email = """
Our server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933. The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM and troubleshoot the network issue affecting the iLO.
"""

style = """American English in a very soft tone"""

In [65]:
prompt = f"""Translate the text that is delimited by triple backticks into a style that is {style}.
text: ```{customer_email}```
"""

response = get_completion(prompt)


response

"Here is the translated text in a soft tone and American English style:\n\n```\nOur server, the model GX-780, is having a bit of a rough day. We're noticing some memory issues on slot 4, specifically with a 16GB DDR4 memory module that's just not cooperating. The iLO (that's our remote monitoring system, for those who don't know) is also giving us some trouble, with a network connection that's just not quite working right - we're seeing a 50% packet loss rate on port 2. We could really use some help getting this sorted out, so we're hoping someone can come on-site to take a look and see what's going on. Maybe we just need to swap out that one memory module and get everything up and running smoothly again."

In [69]:
# Email is stated in Bengali language now
customer_email_in_bengali = """আমাদের সার্ভার, মডেল GX-780, গুরুতর ডাউনটাইমের সম্মুখীন হচ্ছে। আমরা স্লট 4-এ, বিশেষ করে পার্ট নম্বর 16GB-DDR4-2933-এ মেমরি DIMM ত্রুটি দেখতে পাচ্ছি। iLO একটি নেটওয়ার্ক সংযোগ ব্যর্থতার রিপোর্ট করছে, পোর্ট 2-এ 50% প্যাকেট ক্ষতির হার রয়েছে। ত্রুটিপূর্ণ DIMM প্রতিস্থাপন এবং iLO-কে প্রভাবিত করে এমন নেটওয়ার্ক সমস্যা সমাধানের জন্য আমাদের তাৎক্ষণিকভাবে সাইট সহায়তা প্রয়োজন।"""

style = """American English in a very funny tone"""

prompt = f"""Translate the text that is delimited by triple backticks into a style that is {style}.
text: ```{customer_email}```
"""

response = get_completion(prompt)


response

'```\nOH NOEZ, our fancy server, the GX-780, is DOWN FOR THE COUNT! We\'re talkin\' critical downtime, folks! It\'s like, the ultimate bummer. And the reason? Well, it\'s not because we ate all the donuts in the break room (although, let\'s be real, that\'s a pretty good reason). Nope, it\'s because of a pesky memory DIMM error on slot 4. Specifically, it\'s that fancy 16GB-DDR4-2933 thingy that\'s just not cooperating. And if that wasn\'t enough, our iLO (that\'s "intelligent" for you non-techies) is all like, "Uh, hello? I\'m trying to connect to the network over here, but it\'s not working out." And the cherry on top? A 50% packet loss rate on port 2. Yeah, that\'s just peachy. So, what\'s the plan? We need some superhero on-site support to come and save the day (or at least our server). Time to swap out that faulty DIMM and get the iLO back in the game!'

#### **Using Langchain with Llama**

In [68]:
#!pip install --upgrade langchain
#!pip install langchain_community

Download the model first

In [67]:
from huggingface_hub import hf_hub_download

model_name = "GPT4All-Community/Meta-Llama-3.1-8B-Instruct-128k-GGUF"
model_file = "Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf"

# Download the model from Hugging Face
model_path = hf_hub_download(repo_id=model_name, filename=model_file)

print(f"Model downloaded to: {model_path}")

Model downloaded to: /root/.cache/huggingface/hub/models--GPT4All-Community--Meta-Llama-3.1-8B-Instruct-128k-GGUF/snapshots/350b6d7f3a2224c98b6dc8ebdce0e290b71cae22/Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf


In [70]:
from langchain_community.llms import LlamaCpp

# Load the GGUF model
llm = LlamaCpp(
    model_path=model_path,
    temperature=0.2,        # Adjust creativity
    max_tokens=128,         # Output length
    top_p=0.5,              # Sampling parameter
    n_ctx=4096,             # Context window (adjust based on available RAM)
    n_threads=4,            # Set based on CPU cores
    verbose=False
)

llama_init_from_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_init_from_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized


In [71]:
from langchain.prompts import ChatPromptTemplate


template_string = """Translate the text that is delimited by triple backticks into a style that is {style}. \
text: ```{text}```
"""

prompt_template = ChatPromptTemplate.from_template(template_string)


customer_style = """English in a soft tone"""


customer_email = """
Our server, model GX-780, is experiencing critical downtime. We're seeing memory DIMM errors on slot 4, specifically part number 16GB-DDR4-2933. The iLO is reporting a network connection failure, with a 50% packet loss rate on port 2. We need immediate on-site support to replace the faulty DIMM and troubleshoot the network issue affecting the iLO.
"""

customer_messages = prompt_template.format_messages(
                    style=customer_style,
                    text=customer_email)


customer_response = llm.invoke(customer_messages)

In [72]:
print(customer_response)

translation: ```
Our server, model GX-780, is currently experiencing a critical shutdown. We're seeing an error with one of the memory modules on slot 4. The part number for this module is 16GB-DDR4-2933. Additionally, our iLO (intelligent management system) is reporting that it's lost connection to the network and is experiencing a high packet loss rate on port 2. We need immediate assistance from an on-site technician to replace the faulty memory module and troubleshoot the network issue affecting the iLO.
``` markdown
# Server Downtime

## Critical Shutdown

Our server, model


In [73]:
service_reply = """Thank you for reporting the server issues on your GX-780. We understand the severity of the downtime. Our technicians are currently investigating the DIMM error on slot 4, part 16GB-DDR4-2933, and the network issues affecting the iLO port 2. We've dispatched a technician to your location with replacement DIMMs. They'll also run diagnostics on the network connectivity to isolate and resolve the packet loss. We'll provide updates every 30 minutes until the server is fully operational.
"""

service_style = """a teasing and joking tone that speaks in English"""


service_messages = prompt_template.format_messages(
    style=service_style,
    text=service_reply)

In [74]:
print(service_messages[0].content)

Translate the text that is delimited by triple backticks into a style that is a teasing and joking tone that speaks in English. text: ```Thank you for reporting the server issues on your GX-780. We understand the severity of the downtime. Our technicians are currently investigating the DIMM error on slot 4, part 16GB-DDR4-2933, and the network issues affecting the iLO port 2. We've dispatched a technician to your location with replacement DIMMs. They'll also run diagnostics on the network connectivity to isolate and resolve the packet loss. We'll provide updates every 30 minutes until the server is fully operational.
```



In [75]:
service_response = llm.invoke(service_messages)

In [76]:
print(service_response)

Here is the rewritten text in a teasing and joking tone:
```
Hey there, server superstar! We heard you were having some issues with your trusty GX-780. Don't worry, we've got this!

Our tech wizards are on it, investigating the mysterious DIMM error on slot 4 (part 16GB-DDR4-2933 - yeah, that's a mouthful!). And don't even get us started on the network issues affecting the iLO port 2. It's like your server is trying to play a game of "I'm not listening"!

But fear not, dear server owner! We


Now let's try getting the output in the form of dictionary with this open source model -

In [78]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser


# Defines a schema named "model". The description instructs the LLM to extract the model name if mentioned in the input text. If no model name is found, it should return "False".
model_schema = ResponseSchema(name="model",
                             description="Is any model name specified? Answer the model name directly if yes, False if not or unknown.")

issue_count_schema = ResponseSchema(name="issue_count",
                                      description="How many issues are reported for the product? If this information is found then give the number, else -1.")

parts_reported_schema = ResponseSchema(name="parts_reported",
                                    description="This tells about the issues reported in the text? Give all the names. If no part names found then reply with NA.")


# Stores the defined schemas in a list to be used for structured parsing
response_schemas = [model_schema,
                    issue_count_schema,
                    parts_reported_schema]


# Creates an instance of StructuredOutputParser, ensuring that the extracted information follows the defined schema
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)


# Generates formatting instructions for the LLM to structure its output properly
format_instructions = output_parser.get_format_instructions()

In [84]:
review_template_2 = """\
For the following text, extract the following information:

model: Is any model name specified? Answer the model name directly if yes, False if not or unknown.

issue_count: How many issues are reported for the product? If this information is found then give the number, else -1.

parts_reported: Tell the issues or the server parts reported? Give all the names of every issues or parts. If no part names found then reply with NA.

text: {text}

{format_instructions}
"""


prompt = ChatPromptTemplate.from_template(template=review_template_2)



messages = prompt.format_messages(text=customer_review,
                                format_instructions=format_instructions)


In [85]:
response_4 = llm.invoke(messages)

In [86]:
output_dict_2 = output_parser.parse(response_4)

In [87]:
output_dict_2

{'model': 'GX-780',
 'issue_count': 2,
 'parts_reported': 'memory DIMM, part number 16GB-DDR4-2933'}

In [88]:
output_dict_2.get('issue_count')

2

iLO is not there in the parts_reported, though it should have been there. Try changing the prompt in the review template as well as the Response schema and see if you can get the iLO as an output too in the parts_reported.