### https://python.langchain.com/docs/tutorials/local_rag/

These instructions are for Python 3.10
### Install Ollama
* `cd /tmp`
* `curl -fsSL https://ollama.com/install.sh | sh`
* Test, Optional (2GB download): `ollama run llama3.2`, Type `/bye` when done
### Install Langchain
* `python3.10 -m pip install langchain langchain_community langchain_chroma langchain_ollama beautifulsoup4 --user`
### Install SQLite ( >= 3.35.0 required, This will install 3.46 )
* `sudo apt install libreadline-dev python3.10-dev`
* `wget https://sqlite.org/2024/sqlite-autoconf-3460100.tar.gz`
* `tar -xvf sqlite-autoconf-3460100.tar.gz && cd sqlite-autoconf-3460100`
* `./configure`
* `make`
* `sudo make install`
* `python3.10 -m pip uninstall pysqlite3`
* `python3.10 -m pip install pysqlite3-binary --user`

In [1]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader( "https://lilianweng.github.io/posts/2023-06-23-agent/" )
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [13]:
def pull_ollama_model( modelStr ):
    """ Pull a named model from Ollama and store it wherever """
    print( f"About to save '{modelStr}'.\nThis will spew a lot of text on the first run..." )
    os.system( f"ollama pull {modelStr}" )

In [8]:
__import__('pysqlite3')
import sys, os
sys.modules['sqlite3'] = sys.modules.pop( 'pysqlite3' )

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
# from langchain_community import embeddings

pull_ollama_model( "nomic-embed-text" )


local_embeddings = OllamaEmbeddings( model = "nomic-embed-text" )
vectorstore      = Chroma.from_documents( documents = all_splits, embedding = local_embeddings )

## https://stackoverflow.com/a/78164483 ##
# persist_directory = "/tmp/chromadb"
# vectorstore = Chroma.from_documents(
#     documents=all_splits,
#     collection_name="test",
#     # embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text')
#     embedding=embeddings.OllamaEmbeddings(model='nomic-embed-text')
# )

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90...   0% ▕                ▏    0 B/274 MB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90...   0% ▕                ▏    0 B/274 MB                  [?25h[?25l[2K[1G[A[2K

In [9]:
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

4

In [10]:
docs[0]

Document(metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.')

In [14]:
from langchain_ollama import ChatOllama

pull_ollama_model( "llama3.1:8b" )

model = ChatOllama(
    model="llama3.1:8b",
)

About to save 'llama3.1:8b'.
This will spew a lot of text on the first run...


[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb...   0% ▕                ▏ 1.7 MB/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling

Graphics card is being used ...
```
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 960         Off | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P5              19W / 128W |    433MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 1660 Ti     Off | 00000000:02:00.0 Off |                  N/A |
| 46%   52C    P2              79W / 120W |   4770MiB /  6144MiB |     95%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
```
Respone took 47.18 seconds to generate!

In [16]:
import time
now = time.time

bgn = now()
response_message = model.invoke(
    "Simulate a rap battle between Stephen Colbert and John Oliver"
)

print( response_message.content )
print( f"Respone took {now()-bgn} seconds to generate!" )

**The scene is set in a dark, crowded nightclub. The crowd is rowdy and ready for the main event: a rap battle between two of television's most beloved satirists, Stephen Colbert and John Oliver. The judges are none other than Trevor Noah, Hasan Minhaj, and Wanda Sykes. Let's get this battle started!**

**Round 1: Stephen Colbert (aka "The O-Show")**

(Sipping on a whiskey-soaked martini, Colbert takes the stage)

Yo, it's your boy Colby, aka The Truth,
Comin' at ya with bars that'll make you swoon.
My show was first, don't you forget,
I wore the tie, and the hair to regret.

(Picking up his mic, he delivers a smooth flow)
I'm not just funny, I'm fact-checked too,
My reports are sharp, like my tongue's got juice in 'em brew.
From Bush to Trump, I've taken them down,
While you were busy making TV for your BBC crown.

**Round 1: John Oliver (aka "The Last Week Tonight")**

(Taking a deep drag on an e-cigarette, Oliver responds)

Hold up, Colby, let me put the facts straight,
My show's be

### https://python.langchain.com/docs/tutorials/local_rag/#using-in-a-chain