<a href="https://colab.research.google.com/github/nov05/Google-Colaboratory/blob/master/generative_ai_with_langchain/20250602_02_04_local_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* Notebook modified by Nov05 on 2025-06-02    
* Chapter 2, Notebook 4  

## **Getting started with Ollama**  
* Download the tool from https://ollama.com/
* `deepseek-r1:1.5b-qwen-distill-q4_K_M` ~ 1.1GB
* Install the LangChain Ollama integration:

	    $ pip install langchain-ollama    

  Then pull a model. From the command line, a terminal such as bash or the WindowsPowerShell, run:    

	    $ ollama pull deepseek-r1:1.5b  

  Start the Ollama server:  

        $ ollama serve


In [4]:
## Install Ollama in Colab: Installing ollama to /usr/local
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [14]:
## ⚠️ Open the Colab terminal and run $ ollama serve, then check the version.
## ollama version is 0.9.3
!ollama --version

ollama version is 0.9.3


In [7]:
## Pull the model. Model is saved at /root/.ollama/models
!ollama pull deepseek-r1:1.5b

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h

In [1]:
!pip install langchain-ollama langchain-huggingface
## Successfully installed langchain-huggingface-0.3.0 langchain-ollama-0.3.3 ollama-0.5.1

Collecting langchain-ollama
  Downloading langchain_ollama-0.3.3-py3-none-any.whl.metadata (1.5 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.0-py3-none-any.whl.metadata (996 bytes)
Collecting ollama<1.0.0,>=0.4.8 (from langchain-ollama)
  Downloading ollama-0.5.1-py3-none-any.whl.metadata (4.3 kB)
Downloading langchain_ollama-0.3.3-py3-none-any.whl (21 kB)
Downloading langchain_huggingface-0.3.0-py3-none-any.whl (27 kB)
Downloading ollama-0.5.1-py3-none-any.whl (13 kB)
Installing collected packages: ollama, langchain-ollama, langchain-huggingface
Successfully installed langchain-huggingface-0.3.0 langchain-ollama-0.3.3 ollama-0.5.1


# 🟢 **Local models**  

## 👉 **Ollama**

**make sure you've 1. installed ollama, downloaded (pulled) the model, and started "ollama serve" as explained in the book!**

In [16]:
%%time
## Ollama, a platform for running LLMs locally
## Download its installer from https://ollama.com
## If you are using a CPU instance, it will take a few minutes.
from langchain_ollama import ChatOllama
chat = ChatOllama(
    model="deepseek-r1:1.5b",
    temperature=0,
)
messages = [
    ("system","You are a helpful assistant.",),
    ("human", "What makes LangChain great for working with LLMs?",),
]
ai_msg = chat.invoke(messages)
print(ai_msg.content)
## CPU times: user 1.51 s, sys: 120 ms, total: 1.63 s
## Wall time: 3min 44s

<think>
Okay, so I'm trying to understand why LangChain is considered great for working with Large Language Models (LLMs). From what I know, LangChain is an open-source project that allows developers to build and deploy models using Python. It's built on top of Hugging Face's libraries like Llama and BERT.

I remember that when I first started working with LLMs, I found it a bit overwhelming because there were so many components and frameworks out there. But LangChain seemed to simplify things by providing a structured way to build models. They had a clear architecture, which made me feel more confident about my projects.

One thing I noticed was that they provided a lot of documentation and tutorials. That must have been really helpful because developers often face challenges like data preparation, model training, and deployment. Having step-by-step guides and examples could make the process less daunting.

I also liked how LangChain allowed for modular components. You could build dif

In [17]:
from IPython.display import display, Markdown
display(Markdown(ai_msg.content))

<think>
Okay, so I'm trying to understand why LangChain is considered great for working with Large Language Models (LLMs). From what I know, LangChain is an open-source project that allows developers to build and deploy models using Python. It's built on top of Hugging Face's libraries like Llama and BERT.

I remember that when I first started working with LLMs, I found it a bit overwhelming because there were so many components and frameworks out there. But LangChain seemed to simplify things by providing a structured way to build models. They had a clear architecture, which made me feel more confident about my projects.

One thing I noticed was that they provided a lot of documentation and tutorials. That must have been really helpful because developers often face challenges like data preparation, model training, and deployment. Having step-by-step guides and examples could make the process less daunting.

I also liked how LangChain allowed for modular components. You could build different parts of your model separately and then combine them. This modularity made it easier to test each part without worrying about the whole system at once. It's like having puzzle pieces that fit together, which is a relief when you're just starting out.

Another point I remember is how LangChain supported both text-based and code-based tasks. That was useful because different projects might require different types of outputs. Being able to handle both text and code made me more versatile in my work.

I also found the integration with Hugging Face's tools really convenient. They provided a wide range of pre-trained models, which is great for getting started without having to train everything from scratch. This reduces the learning curve because you can leverage existing solutions.

However, I'm curious about how it handles model management and deployment. I know that LangChain has some components for deploying models, but I'm not sure if they're as robust or user-friendly as other platforms like Hugging Face's Model Hub. Maybe there are better tools out there for managing models after deployment.

I also wonder about the community support. Since it's open-source, there must be a lot of active developers and contributors. This could mean more resources, tutorials, and bug fixes available to help users along the way. But I'm not sure how much that translates into actual user experience compared to commercial platforms.

Another thing is the flexibility in using different frameworks. LangChain works with Llama, BERT, etc., but maybe there are other libraries or tools that offer more features or better performance for specific tasks. It's possible that some users might prefer a different framework if it offers better support or additional functionalities.

I'm also thinking about how easy it is to customize and extend the components. LangChain provides a lot of flexibility, allowing developers to add their own modules or tweak existing ones. This customization can be really powerful because you can tailor your model to specific needs, which might make it more effective than others out there.

But I'm not sure if all these features are equally accessible or how they compare to other platforms like Hugging Face's own Model Hub. Maybe LangChain is more focused on the basics and doesn't offer as much advanced features as some other tools. That could be a limitation for someone looking to push their models further.

Overall, I think LangChain is great because it provides a clear structure, modular components, good documentation, and integration with existing tools. These factors make it easier for developers to build, train, and deploy LLMs without getting bogged down by the complexities of other platforms. However, I'm not sure how it stacks up against more advanced or commercial alternatives in terms of features and community support.
</think>

LangChain is a powerful tool for working with Large Language Models (LLMs) due to several key factors that make it user-friendly and effective:

1. **Modular Architecture**: LangChain offers a clear, modular structure that allows developers to build components independently and then combine them. This modularity simplifies the development process by breaking down complex tasks into manageable parts.

2. **Comprehensive Documentation and Tutorials**: The project provides extensive documentation and tutorials, which are invaluable for new users. These resources guide developers through various aspects of model building, training, and deployment, reducing the learning curve significantly.

3. **Support for Text-Based and Code-Based Tasks**: LangChain's ability to handle both text-based and code-based outputs makes it versatile. This feature is particularly useful as different projects may require varied output formats.

4. **Integration with Hugging Face Tools**: The seamless integration of Hugging Face's libraries provides a wide range of pre-trained models, reducing the effort required for initial model development.

5. **Flexibility in Frameworks**: LangChain supports multiple frameworks like Llama and BERT, offering flexibility to users who may prefer different tools or features based on their specific needs.

6. **Community Support and Resources**: As an open-source project with active communities, LangChain offers a wealth of resources, tutorials, and bug fixes, enhancing user experience and providing additional support for developers.

While LangChain excels in its structured approach, modular components, and integration capabilities, it may lack advanced features compared to commercial platforms. Its focus on simplicity makes it an excellent starting point for those new to LLMs, but users might need more resources or advanced tools for deeper model development.

## 👉 **Huggingface**  

In [18]:
%%time
## ⚠️ Set up the HF_TOKEN secret in Colab and grand access.
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
import os
os.environ["TRANSFORMERS_VERBOSITY"] = "info"
# Create a pipeline with a small model:
llm = HuggingFacePipeline.from_model_id(
    model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
)
chat_model = ChatHuggingFace(llm=llm)
# Use it like any other LangChain LLM
messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="Explain the concept of machine learning in simple terms"
    ),
]
ai_msg = chat_model.invoke(messages)
print(ai_msg.content)
## CPU times: user 2min 25s, sys: 12.3 s, total: 2min 37s
## Wall time: 3min 14s

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cpu
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<|system|>
You're a helpful assistant</s>
<|user|>
Explain the concept of machine learning in simple terms</s>
<|assistant|>
Machine learning is a field that involves the use of algorithms and data to learn from and improve upon the behavior of machines or systems. It is a powerful tool for automating tasks and improving the performance of systems by learning from data and making predictions based on that data.

In simple terms, machine learning is the process of using algorithms to analyze and learn from data. This process involves collecting large amounts of data, training the algorithm to identify patterns and relationships in that data, and then using that knowledge to make predictions or make decisions.

For example, a machine learning algorithm might be used to predict the likelihood of a customer purchasing a particular product based on their previous purchase history and other factors. Or it might be used to recommend products to a user based on their browsing and purchase hist

In [19]:
display(Markdown(ai_msg.content))

<|system|>
You're a helpful assistant</s>
<|user|>
Explain the concept of machine learning in simple terms</s>
<|assistant|>
Machine learning is a field that involves the use of algorithms and data to learn from and improve upon the behavior of machines or systems. It is a powerful tool for automating tasks and improving the performance of systems by learning from data and making predictions based on that data.

In simple terms, machine learning is the process of using algorithms to analyze and learn from data. This process involves collecting large amounts of data, training the algorithm to identify patterns and relationships in that data, and then using that knowledge to make predictions or make decisions.

For example, a machine learning algorithm might be used to predict the likelihood of a customer purchasing a particular product based on their previous purchase history and other factors. Or it might be used to recommend products to a user based on their browsing and purchase history.

Machine learning can be applied to a wide range of applications, from financial services to healthcare to transportation. It has the potential to revolutionize many industries by improving efficiency, reducing costs, and enabling new levels of personalization and customization.