### https://python.langchain.com/docs/tutorials/local_rag/

These instructions are for Python 3.10
### Install Ollama
* `cd /tmp`
* `curl -fsSL https://ollama.com/install.sh | sh`
* Test, Optional (2GB download): `ollama run llama3.2`, Type `/bye` when done
### Install Langchain
* `python3.10 -m pip install langchain langchain_community langchain_chroma langchain_ollama beautifulsoup4 --user`
### Install SQLite ( >= 3.35.0 required, This will install 3.46 )
* `sudo apt install libreadline-dev python3.10-dev`
* `wget https://sqlite.org/2024/sqlite-autoconf-3460100.tar.gz`
* `tar -xvf sqlite-autoconf-3460100.tar.gz && cd sqlite-autoconf-3460100`
* `./configure`
* `make`
* `sudo make install`
* `python3.10 -m pip uninstall pysqlite3`
* `python3.10 -m pip install pysqlite3-binary --user`

In [1]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader( "https://lilianweng.github.io/posts/2023-06-23-agent/" )
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
def pull_ollama_model( modelStr ):
    """ Pull a named model from Ollama and store it wherever """
    print( f"About to save '{modelStr}'.\nThis will spew a lot of text on the first run..." )
    os.system( f"ollama pull {modelStr}" )

In [3]:
__import__('pysqlite3')
import sys, os
sys.modules['sqlite3'] = sys.modules.pop( 'pysqlite3' )

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
# from langchain_community import embeddings

pull_ollama_model( "nomic-embed-text" )


local_embeddings = OllamaEmbeddings( model = "nomic-embed-text" )
vectorstore      = Chroma.from_documents( documents = all_splits, embedding = local_embeddings )

## https://stackoverflow.com/a/78164483 ##
# persist_directory = "/tmp/chromadb"
# vectorstore = Chroma.from_documents(
#     documents=all_splits,
#     collection_name="test",
#     # embedding=embeddings.ollama.OllamaEmbeddings(model='nomic-embed-text')
#     embedding=embeddings.OllamaEmbeddings(model='nomic-embed-text')
# )

About to save 'nomic-embed-text'.
This will spew a lot of text on the first run...


[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling ce4a164fc046... 100% ▕████████████████▏   17 B                         
pulling 31df23ea7daa... 100% ▕████████████████▏  420 B                   

In [4]:
question = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(question)
len(docs)

4

In [5]:
docs[0]

Document(metadata={'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log"}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.')

In [6]:
from langchain_ollama import ChatOllama

pull_ollama_model( "llama3.1:8b" )

model = ChatOllama(
    model="llama3.1:8b",
)

About to save 'llama3.1:8b'.
This will spew a lot of text on the first run...


[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest 
pulling 8eeb52dfb3bb... 100% ▕████████████████▏ 4.7 GB                         
pulling 948af2743fc7... 100% ▕████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕████████████████▏   96 B                         
pulling 1a4c3c319823... 100% ▕████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
success [?25h


Graphics card is being used ...
```
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 960         Off | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P5              19W / 128W |    433MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 1660 Ti     Off | 00000000:02:00.0 Off |                  N/A |
| 46%   52C    P2              79W / 120W |   4770MiB /  6144MiB |     95%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
```
Respone took 47.18 seconds to generate!

In [7]:
import time
now = time.time

bgn = now()
response_message = model.invoke(
    "Simulate a rap battle between Stephen Colbert and John Oliver"
)

print( response_message.content )
print( f"Respone took {now()-bgn} seconds to generate!" )

**The scene is set in a dark, crowded nightclub. The crowd is cheering as the host, Snoop Dogg, takes the stage to introduce the main event: a rap battle for the ages, featuring two of the most formidable opponents in comedy - Stephen Colbert and John Oliver!**

Snoop Dogg: "Yo, what's good everybody? Welcome to the ultimate showdown in comedic spitting. In the blue corner, we got the one and only... **Stephen Colbert**!"
(The crowd goes wild as Stephen Colbert steps up to the mic, dressed in his signature red, white, and blue attire.)

Colbert:
Listen up, y'all, I'm here to say
My raps are tighter than a News Network stay
I'm the truth-teller, the one you can't ignore
Got my facts straight, leave no room for more

**Now, in the red corner, we got the man from... England? **John Oliver**, let's get this!**

Oliver:
Hold up, Steve, I got some news to share
Your raps are stale, like your 'Late Night Caring'
You may have fooled the folks with your Colbert charm
But when it comes to truth,

### https://python.langchain.com/docs/tutorials/local_rag/#using-in-a-chain

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Summarize the main themes in these retrieved docs: {docs}"
)


# Convert loaded documents into strings by concatenating their content
# and ignoring metadata
def format_docs( docs ):
    return "\n\n".join( doc.page_content for doc in docs )


chain = {"docs": format_docs} | prompt | model | StrOutputParser()

question = "What are the approaches to Task Decomposition?"

docs = vectorstore.similarity_search(question)

chain.invoke(docs)

'The main themes in these documents are:\n\n1. **Task Decomposition**: Breaking down complex tasks into smaller, manageable subgoals using various methods such as:\n\t* Using Large Language Models (LLM) with simple prompts or task-specific instructions.\n\t* Receiving human inputs to guide the process.\n2. **Autonomous Agent System**: An overview of a system that utilizes LLMs for autonomous decision-making and execution.\n3. **Planning and Execution**:\n\t* Task planning: Breaking down tasks into subgoals, enabling efficient handling of complex tasks.\n\t* Task execution: Expert models execute specific tasks and log results.\n4. **Reflection and Improvement**: The ability to reflect on past actions, learn from mistakes, and refine future steps for improved quality of final results.\n\nThese themes are centered around the capabilities and applications of Large Language Models (LLMs) in decision-making and task execution, with a focus on efficiency, flexibility, and continuous improveme

In [9]:
from langchain_core.runnables import RunnablePassthrough

RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

<context>
{context}
</context>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

chain = (
    RunnablePassthrough.assign(context=lambda input: format_docs(input["context"]))
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"

docs = vectorstore.similarity_search(question)

# Run
chain.invoke({"context": docs, "question": question})

'There are three approaches to Task Decomposition: (1) using Large Language Models (LLM) with simple prompting, (2) applying task-specific instructions, and (3) incorporating human inputs. This process involves breaking down large tasks into smaller, manageable subgoals to enable efficient handling of complex tasks.'

In [10]:
retriever = vectorstore.as_retriever()

qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What are the approaches to Task Decomposition?"

qa_chain.invoke( question )

'According to the context, there are three approaches to Task Decomposition: (1) using Large Language Models (LLM) with simple prompting, (2) utilizing task-specific instructions, and (3) incorporating human inputs.'