# LLM Foundations and LangChain

## Set up Ollama

To play with LLMs, you can use Ollama open source models. You can download them and play with those models in your local laptop.

```shell
brew install ollama
ollama pull gemma3:1b
```

## Setting up connection

In order to interact with Langchain, you can follow three steps.
1. You can just import respective library from langchain. For example, to interact with Ollama models, you can install `langchain-ollama` package and then import respective class.
2. Create model instance for the LLM model you want to work with
3. Invoke the model with a prompt using `invoke()` method.


In [8]:
from langchain_ollama.llms import OllamaLLM

# If you increase temperature, it becomes more creative and may hallucinate
model = OllamaLLM(model="gemma3:1b", temperature=0.1, max_tokens=1)

model.invoke("The sky is")

'The sky is **blue**. \n\nBut it can be many other colors too! ðŸ˜Š \n\nDo you want to tell me why itâ€™s blue? Or would you like to talk about something else related to the sky?'

The `OllamaLLM` accepts below parameters.
- `model`: This is very common parameter to configure and specifies the model to use. Most providers have multiple models.
- `temperature`: This controls sampling algorithm used to generate output. Lower values produce more predictable outputs. For example, creative writing might need higher values for temperature.
- `max_tokens`: This limits the size of the output. Sometimes it may truncate output if this value is set to very low value.

The chat models enable back and forth conversations. This will have different messsages: user, assistant and system roles.
- System role: Used to provide instructions to the model about how to answer a user question.
- User role: Used for the user's query
- Assistant role: Used for content generated by the model.

In [10]:
from langchain_ollama.chat_models import ChatOllama
from langchain_core.messages import HumanMessage

llm = ChatOllama(
    model="gemma3:1b",
    temperature=0,
    # other params...
)
prompt = [HumanMessage("What is the capital of France?")]

model.invoke(prompt)

'The capital of France is Paris. ðŸ˜Š \n\nWould you like to know more about Paris?'

There are different types of messages.

- `SystemMessage`: For setting the instructions the AI should follow with system role.
- `HumanMessage`: A message sent from the human with the user role.
- `AIMessage`: A message sent from teh LLM with the assistant role.
- `ChatMessage`: A message with arbitrary setting of role.

Let's include `SystemMessage` to interact with model. 

In [12]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama.chat_models import ChatOllama

model = ChatOllama(
    model="gemma3:1b",
    temperature=0
)
system_message = SystemMessage(
    '''You are a very helpful assistant that responds to questions with three exclamation marks.'''
)
human_message = HumanMessage('What is the capital of France?')

model.invoke([system_message, human_message])

AIMessage(content='The capital of France is Paris! ðŸŽ‰ðŸ‡«ðŸ‡·', additional_kwargs={}, response_metadata={'model': 'gemma3:1b', 'created_at': '2025-12-12T18:27:14.952783Z', 'done': True, 'done_reason': 'stop', 'total_duration': 880471125, 'load_duration': 685428083, 'prompt_eval_count': 36, 'prompt_eval_duration': 76550083, 'eval_count': 11, 'eval_duration': 90621669, 'logprobs': None, 'model_name': 'gemma3:1b'}, id='run--5aafb45a-661d-4037-af38-e6121641fd72-0', usage_metadata={'input_tokens': 36, 'output_tokens': 11, 'total_tokens': 47})

To send these interactive conversations programmatically, you can use LangChain's `PromptTemplate`.

In [18]:
from langchain_core.prompts import PromptTemplate

template = PromptTemplate.from_template("""Answer the question based on the
    context below. If the question cannot be answered using the information 
    provided, answer with "I don't know".

Context: {context}

Question: {question}

Answer: """)

prompt = template.invoke({
    "context": """The most recent advancements in NLP are being driven by Large 
        Language Models (LLMs). These models outperform their smaller 
        counterparts and have become invaluable for developers who are creating 
        applications with NLP capabilities. Developers can tap into these 
        models through Hugging Face's `transformers` library, or by utilizing 
        OpenAI and Cohere's offerings through the `openai` and `cohere` 
        libraries, respectively.""",
    "question": "Which model providers offer LLMs?"
})

prompt

StringPromptValue(text='Answer the question based on the\n    context below. If the question cannot be answered using the information \n    provided, answer with "I don\'t know".\n\nContext: The most recent advancements in NLP are being driven by Large \n        Language Models (LLMs). These models outperform their smaller \n        counterparts and have become invaluable for developers who are creating \n        applications with NLP capabilities. Developers can tap into these \n        models through Hugging Face\'s `transformers` library, or by utilizing \n        OpenAI and Cohere\'s offerings through the `openai` and `cohere` \n        libraries, respectively.\n\nQuestion: Which model providers offer LLMs?\n\nAnswer: ')

In [19]:
completion = model.invoke(prompt)
print(completion.content)

Hugging Face, OpenAI, and Cohere.


If you're building a chat application, you can use `ChatPromptTemplate` to provide dynamic inputs based on the role.

In [20]:
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages([
    ('system', '''Answer the question based on the context below. If the 
        question cannot be answered using the information provided, answer with 
        "I don\'t know".'''),
    ('human', 'Context: {context}'),
    ('human', 'Question: {question}'),
])

prompt = template.invoke({
    "context": """The most recent advancements in NLP are being driven by 
        Large Language Models (LLMs). These models outperform their smaller 
        counterparts and have become invaluable for developers who are creating 
        applications with NLP capabilities. Developers can tap into these 
        models through Hugging Face's `transformers` library, or by utilizing 
        OpenAI and Cohere's offerings through the `openai` and `cohere` 
        libraries, respectively.""",
    "question": "Which model providers offer LLMs?"
})

completion = model.invoke(prompt)
completion

AIMessage(content='Hugging Face, OpenAI, and Cohere.', additional_kwargs={}, response_metadata={'model': 'gemma3:1b', 'created_at': '2025-12-12T18:40:32.453948Z', 'done': True, 'done_reason': 'stop', 'total_duration': 357840333, 'load_duration': 148530667, 'prompt_eval_count': 159, 'prompt_eval_duration': 96736584, 'eval_count': 11, 'eval_duration': 95001166, 'logprobs': None, 'model_name': 'gemma3:1b'}, id='run--f3c6479a-6201-4335-ac16-c86e48a9b40f-0', usage_metadata={'input_tokens': 159, 'output_tokens': 11, 'total_tokens': 170})

## Getting output in specific format

Plain text output are useful, but cannot be used in automation systems.
If you want to retrieve answers in JSON format, you can do so by specifying the schema of the output expected. This is where you can use pydantic models to define the schema.


In [21]:
from langchain_ollama import ChatOllama
from langchain_core.pydantic_v1 import BaseModel

class AnswerWithJustification(BaseModel):
    '''An answer to the user's question along with justification for the 
        answer.'''
    answer: str
    '''The answer to the user's question'''
    justification: str
    '''Justification for the answer'''

llm = ChatOllama(model='gemma3:1b', temperature=0)
structured_llm = llm.with_structured_output(AnswerWithJustification)

structured_llm.invoke("""What weighs more, a pound of bricks or a pound  of feathers""")


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


AnswerWithJustification(answer='A pound of feathers weighs more.', justification='A pound is a unit of weight. Therefore, a pound of feathers will always be heavier than a pound of bricks.')

With above code, the schema will validate the output returned by the LLM before actually returning this. Before calling LLM, the schema is converted into JSONSchema and for each LLM, LangChain picks the best method to do this, usually using function calling or prompting.

### Getting output in other formats

If you want chat model to produce output in other formats (CSV or XML), you cna use output parers. **Output Parsers** are classes that help you structure large language model responses. It will include additional instructions in the prompt that will help guide the LLM to output text in the format it knows how to parse. The other function is to take the text output of the LLM or chat model and render it to a more structured format.

In [22]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()
items = parser.invoke('apple, banana, cherry')
items

['apple', 'banana', 'cherry']

## Combine Components of LangChain

The components of the LangChain can be combined to build LLM applications.

### 1. Using Runnable Interface

All components usually follow similar interface. They use `invoke()` method to generate output. The common interface includes following methods.

1. `invoke`: transform single input to an output.
2. `batch`: transform multiple inputs into multiple outputs.
3. `stream`: streams output from a single input as it's produced.

There are built-in retries, fallbacks, schemas and runtime configurability. In Python, each method have `asyncio` equivalents.

In [24]:
from langchain_ollama import ChatOllama

model = ChatOllama(model='gemma3:1b')

completion = model.invoke('Hi there')

completions = model.batch(['Hi there!', 'Bye!'])

for token in model.stream('Bye!'):
    print(token)

content='Bye' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content='!' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' ðŸ‘‹' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' You' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' too' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content='!' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' Have' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' a' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' great' additional_kwargs={} response_metadata={} id='run--0959e316-b8ad-4f27-9276-4bbacd9d2244'
content=' day' additional_kwargs={} response_metadata={} id='run--095

You can combine these components in two ways:
1. Imperative: Call each components directly using `model.invoke()`.
2. Declarative: Use LangChain Expression Language (LCEL)

| | Imperative | Declarative |
|:-----------------|:-------------------|:------------------------|
| Syntax | All Python or Javascript | LCEL |
| Parallel Execution | Python: with threads or coroutines, in JS: with Promise | Automatic |
| Streaming | with `yield` keyword | Automatic |
| Async execution | with `async` functions | Automatic |

### 2. Imperative Composition

This is explicit instructions using Python program.

In [25]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import chain

# the building blocks

template = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful assistant.'),
    ('human', '{question}'),
])

model = ChatOllama(model='gemma3:1b', )

# combine them in a function
# @chain decorator adds the same Runnable interface for any function you write

@chain
def chatbot(values):
    prompt = template.invoke(values)
    return model.invoke(prompt)

# use it

chatbot.invoke({"question": "Which model providers offer LLMs?"})

AIMessage(content="Okay, let's break down which model providers offer Large Language Models (LLMs). It's a rapidly evolving landscape, but here's a breakdown of the major players as of late 2024, categorized by their primary focus and offering:\n\n**1. Leading Giants - Offering Broad Capabilities & Infrastructure**\n\n* **OpenAI:** (GPT Series - GPT-4, GPT-4o) - This is *the* dominant player.  They provide access to GPT models through their API, ChatGPT, and various tools.\n    * **Strengths:** Extremely versatile, strong in text generation, reasoning, coding, and understanding complex prompts.  GPT-4o is significantly improved in audio and video understanding.\n    * **Cost:** Paid API access, various tiers based on usage.\n    * **Accessibility:** Very widely accessible through their API and ChatGPT.\n* **Google AI (Gemini):** (Gemini Pro, Gemini Ultra, Gemini Nano) - Google is investing heavily in LLMs. Gemini is their flagship offering, boasting impressive multimodal capabilities.\

In [27]:
chatbot.invoke({"question": "Ok, Can you provide more information on GPT series?"})

AIMessage(content='Okay, let\'s dive into the GPT series! It\'s a really fascinating and rapidly evolving area of AI, and itâ€™s important to understand where itâ€™s coming from. Hereâ€™s a breakdown, broken down into key aspects:\n\n**1. What is GPT? (The Foundation)**\n\n* **GPT stands for "Generative Pre-trained Transformer."** It\'s a type of Large Language Model (LLM). Let\'s unpack that:\n    * **Generative:** It *creates* new text â€“ it doesnâ€™t just retrieve information.\n    * **Pre-trained:** Itâ€™s been trained on a massive dataset of text and code from the internet. This means it learns patterns, relationships, and a vast amount of knowledge about language.\n    * **Transformer:** This is the underlying neural network architecture. Transformers are particularly good at understanding context and relationships within text, which is crucial for generating coherent and relevant responses. \n\n**2. The GPT Series â€“ Key Versions**\n\n* **GPT-1 (2018):** The original. It demon

For adding streaming or async support, you'd have to modify the function to support it.

In [29]:
@chain
def chatbot(values):
    prompt = template.invoke(values)
    for token in model.stream(prompt):
        yield token

for part in chatbot.stream({
    "question": "Which model providers offer LLMs?"
}):
    pass
    # print(part)

For async execution, you can update like below.

```python
@chain
async def chatbot(values):
    prompt = await template.ainvoke(values)
    return await model.ainvoke(prompt)

await chatbot.ainvoke({"question": "Which model providers offer LLMs?"})
```

### 3. Declarative Composition

LCEL is declarative language for composing LangChain components. Langchain compiles LCEL compositions to an optimized execution plan, with automatic paralellelization, streaming, tracing and async support.

In [31]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages([
    ('system', 'You are a helpful assistant.'),
    ('human', '{question}'),
])

model = ChatOllama(model='gemma3:1b', temperature=0.1)

# combine them with the | operator

chatbot = template | model

# use it

chatbot.invoke({"question": "Which model providers offer LLMs?"})

AIMessage(content="Okay, let's break down which model providers are currently offering Large Language Models (LLMs). Itâ€™s a rapidly evolving field, but hereâ€™s a breakdown of the major players, categorized by their approach and strengths:\n\n**1. The Big Players - Leading the Charge**\n\n* **OpenAI:** (GPT series - GPT-4, GPT-3.5, etc.) - Arguably the most well-known. Theyâ€™ve been at the forefront of LLM development for a long time.\n    * **Strengths:**  Excellent general-purpose models, strong reasoning, creative writing, coding assistance, and a vast ecosystem of tools. GPT-4 is significantly more advanced than previous versions.\n    * **Pricing:**  Offers a free tier (GPT-3.5), paid subscriptions (ChatGPT Plus, API access), and custom pricing for enterprise use.\n* **Google (Gemini):** (Gemini models - Ultra, Pro, Nano) - Google is heavily investing in LLMs. Gemini is their flagship model, and itâ€™s designed to be multimodal (understanding text, images, audio, and video).\n 

Again, for async invocation, you can use `ainvoke` method.

```python
await chatbot.ainvoke({
    "question": "Which model providers offer LLMs?"
})
```