LangChain is an open-source framework uniquely designed to empower the development of applications leveraging large language models (LLMs). It stands out by providing essential tools and abstractions that enhance the customization, accuracy, and relevance of the information generated by these models.

At its core, LangChain offers a generic interface compatible with nearly any LLM. This facilitates a centralized development environment where data scientists can seamlessly integrate LLM applications with various external data sources and software workflows. This integration is crucial for those looking to harness the full potential of AI in their processes.

One of the most powerful features of LangChain is its module-based approach. This approach allows flexibility in performing experiments and optimizations of interactions with LLMs. Data scientists can dynamically compare prompts and switch between foundation models without significant code modifications. This saves valuable development time and enhances the ability to fine-tune applications to meet specific needs.


We will dive into how LangChain simplifies the complex process of integrating advanced AI capabilities into practical applications. You will learn the core concepts of LangChain and how to use Langchain's innovative features to build more intelligent, responsive, and efficient applications. Whether you are a developer, a data scientist, or an AI enthusiast, this lab will equip you with a deep understanding of how to leverage LangChain for crafting cutting-edge AI solutions.


For this lab, you will be using the following libraries:

*   [`ibm-watson-ai`, `ibm-watson-machine-learning`](https://ibm.github.io/watson-machine-learning-sdk/index.html) for using LLMs from IBM's watsonx.ai.
*   [`langchain`, `langchain-ibm`, `langchain-community`, `langchain-experimental`](https://www.langchain.com/) for using relevant features from LangChain.
*   [`pypdf`](https://pypi.org/project/pypdf/) is an open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files.
*   [`chromadb`](https://www.trychroma.com/) is an open-source vector database used to store embeddings.


### Installing required libraries

The following required libraries are __not__ pre-installed in the Skills Network Labs environment. __You must run the following cell__ to install them:

**Note:** The version has been specified here to pin it. It's recommended that you do the same. Even if the library is updated in the future, the installed version will still support this lab work.

The installation might take approximately 2-3 minutes.

Since `%%capture` is being used to capture the installation process, you won't see the output. However, once the installation is complete, you will see a number beside the cell.


In [1]:
%%capture
!pip install --force-reinstall --no-cache-dir tenacity --user
!pip install "ibm-watsonx-ai==1.0.4" --user
!pip install "ibm-watson-machine-learning==1.0.357" --user
!pip install "langchain-ibm==0.1.7" --user
!pip install "langchain-community==0.2.1" --user
!pip install "langchain-experimental==0.0.59" --user
!pip install "langchainhub==0.1.17" --user
!pip install "langchain==0.2.1" --user
!pip install "pypdf==4.2.0" --user
!pip install "chromadb == 0.4.24" --user

In [2]:
pip install -U langchain langchain-community openai


Collecting langchain
  Downloading langchain-0.3.25-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting openai
  Downloading openai-1.78.1-py3-none-any.whl.metadata (25 kB)
Collecting langchain-core<1.0.0,>=0.3.58 (from langchain)
  Downloading langchain_core-0.3.59-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain)
  Downloading langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Downloading langchain-0.3.25-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25hDo

## LangChain concepts

### Model

A large language model (LLM) serves as the interface for the AI's capabilities. It processes plain text input and generates text output, forming the core functionality needed to complete various tasks. When integrated with LangChain, it becomes a powerful tool, providing the foundational structure necessary for building and deploying sophisticated AI applications.


In [3]:
from transformers import pipeline

pipe = pipeline("text-generation", model="tiiuae/falcon-rw-1b")
response = pipe("Explain what IoT is in simple words.", max_new_tokens=100)
print(response[0]['generated_text'])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Explain what IoT is in simple words.
IoT is a network of physical objects that are connected to each other and to a network. These objects can be anything from a car to a refrigerator.
What is IoT?
IoT is a network of physical objects that are connected to each other and to a network. These objects can be anything from a car to a refrigerator.
What is IoT?
IoT is a network of physical objects that are connected to each other and to a network. These objects can


### Chat model

Chat models support the assignment of distinct roles to conversation messages, helping to distinguish messages from the AI, users, and instructions such as system messages.

To enable the LLM from watsonx.ai to work with LangChain, it needs to be wrapped using `WatsonLLM()`. This wrapper converts the LLM into a chat model, allowing it to integrate seamlessly with LangChain's framework for creating interactive and dynamic AI applications.


In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

# Set your API key (you can also use environment variable)
import os
os.environ["OPENAI_API_KEY"] = ""

# Create chat model
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# Run a simple chat
messages = [
    HumanMessage(content="Hi, who are you?"),
    HumanMessage(content="Can you explain IoT in simple terms?")
]

response = chat(messages)
print(response.content)


  chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)
  response = chat(messages)


Sure! IoT stands for Internet of Things, and it refers to a network of physical objects or devices that are connected to the internet. These devices can communicate with each other and with other systems to gather and exchange data. For example, smart thermostats, wearable fitness trackers, and connected home appliances are all examples of IoT devices. IoT technology allows for automation, monitoring, and control of these devices remotely, making our lives more convenient and efficient.


In [5]:




from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage, AIMessage
import os

# Set OpenAI key
os.environ["OPENAI_API_KEY"] = ""


# Initialize the chat model
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.6)

# Initial conversation history
chat_history = [
    SystemMessage(content="""
You are a virtual health assistant named Dr. Care.
You only answer questions related to general health, wellness, or lifestyle.
If a user asks about anything else (e.g., travel, finance, or technology), politely decline.
""")
,

    HumanMessage(content="Hi Doctor, I often feel tired lately."),
    AIMessage(content="Hi! Fatigue can have many causes like stress, low iron, or sleep issues. Let's explore it."),

    HumanMessage(content="I sleep well but still feel exhausted."),
    AIMessage(content="It could be nutritional. Do you eat enough iron-rich foods like spinach, beans, or red meat?")
]

# New user input
user_input = "Can you recommend meals to improve my iron intake?"
chat_history.append(HumanMessage(content=user_input))

# Get AI response
response = chat(chat_history)
chat_history.append(AIMessage(content=response.content))

# Print reply
print("Dr. Care:", response.content)



Dr. Care: Sure! You can try meals like spinach and chickpea curry, lentil soup, or a beef stir-fry with broccoli.


We manually inserted into the message history to simulate previous AI responses in a multi-turn conversation for demonstration purposes.

### Prompt templates

Prompt templates help translate user input and parameters into instructions for a language model. They can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.

There are several different types of prompt templates.

#### String prompt templates

These prompt templates are used to format a single string, and are generally used for simpler inputs.


In [6]:
from langchain_core.prompts import PromptTemplate

In [7]:
from langchain.chat_models import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
import os

# ✅ Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = ""

# ✅ Initialize chat model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

# ✅ Define prompt template with System + Placeholder
prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are Dr. Care, a friendly and qualified virtual health assistant. You answer only health-related questions in simple terms."),
    MessagesPlaceholder(variable_name="chat_history")
])

# ✅ Simulated conversation: System + Human + AI messages
input_ = {
    "chat_history": [
        HumanMessage(content="Hi Doctor, I have a headache."),
        AIMessage(content="I'm sorry to hear that! Can you tell me if it's sharp, dull, or throbbing?"),
        HumanMessage(content="It's a dull pain, mostly in the morning.")
    ]
}

# ✅ Chain: Prompt + Model
chain = prompt | llm

# ✅ Run simulation
response = chain.invoke(input_)
print("Dr. Care:", response.content)


Dr. Care: A dull headache in the morning could be due to various reasons like dehydration, lack of sleep, or stress. Make sure you drink enough water, try to get sufficient rest, and practice relaxation techniques. If the headache persists or gets worse, it's best to consult a healthcare professional for further evaluation.


### Example selectors

If you have a large number of examples, you may need to select which ones to include in the prompt. The Example Selector is the class responsible for doing so.


Example selector types could based on:
- `Similarity`: Uses semantic similarity between inputs and examples to decide which examples to choose.
- `MMR`: Uses Max Marginal Relevance between inputs and examples to decide which examples to choose.
- `Length`: Selects examples based on how many can fit within a certain length
- `Ngram`: Uses ngram overlap between inputs and examples to decide which examples to choose.

Here, you can use the example selector based on length as an example. For more details on other types, please refer to [https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/).


This code creates a few-shot prompt to teach a language model how to generate antonyms. It defines a list of example word pairs (like "happy → sad") and uses a LengthBasedExampleSelector to dynamically choose the shortest examples that fit within a specified token limit. These examples are inserted into a prompt template along with the user’s input word to guide the model's output.

In LangChain, when you define a list of input-output examples, you can use different ExampleSelector strategies to dynamically choose which examples to include in a prompt based on the user’s current input. Although it may seem like you're just providing static examples, selectors like SimilarityExampleSelector and MMRExampleSelector use intelligent filtering. For instance,
- when the user inputs a new word such as "powerful", the selector first embeds this input and compares it to the embeddings of the inputs from your examples (like "happy", "strong", "sunny", etc.).
- Using semantic similarity (e.g., cosine distance), it ranks how closely related each example is to the user’s input and selects the most relevant ones
- For "powerful", it might choose the example "strong → weak" because it is semantically similar, while ignoring unrelated examples like "happy → sad".
- This dynamic selection helps tailor the prompt to the current task while staying within token limits.
- The process varies depending on the selector type—some prioritize relevance (SimilaritySelector), others balance diversity and similarity (MMRSelector), or simply choose the shortest fitting examples (LengthBasedSelector). This approach makes few-shot prompting more efficient, personalized, and effective.



In [8]:
from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=25,  # The maximum length that the formatted examples should be.
)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

In [9]:
print(dynamic_prompt.format(adjective="big"))

Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:


In [10]:
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))

Give the antonym of every input

Input: happy
Output: sad

Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:


### Output parsers

Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data, or to normalize output from chat models and LLMs.

LangChain has lots of different types of output parsers. This is a [list](https://python.langchain.com/v0.2/docs/concepts/#output-parsers) of output parsers LangChain supports. In this lab, you will use the following two output parsers as examples:

- `JSON`: Returns a JSON object as specified. You can specify a Pydantic model and it will return JSON for that model. Probably the most reliable output parser for getting structured data that does NOT use function calling.
- `CSV`: Returns a list of comma separated values.

