# RWKV Chains

LLMs are generally trained with different datasets and configurations, making them react differently to the same input. In the case of RWKV, because it is not transformer-based, most prompts built inside LangChain are not compatible with it. Therefore we collected and built a set of prompts in `RWKV_chains` to make LangChain work with RWKV.

## 0. Load a model first

**Please change the `model_name_or_path` to your own model path.** And it works better with 7B model. 14B is a bit hard to prompt.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# suppress the warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
os.environ["RWKV_JIT_ON"] = '1'
os.environ["RWKV_CUDA_ON"] = '1' # if '1' then use CUDA kernel for seq mode (much faster)

from langchain.llms import RWKV

weight_path = "D:/weights/rwkv/RWKV-4-Raven-7B-v11-Eng99%-Other1%-20230427-ctx8192.pth"
# weight_path = "/home/ubuntu/Documents/Github/raven-weight-7b.pth"
tokenizer_json = "./20B_tokenizer.json"

model = RWKV(model=weight_path, strategy="cuda fp16i8 *20 -> cuda fp16", tokens_path=tokenizer_json)


## A do-all prompt

The Raven versions of RWKV are instruction fined-tuned with Alpaca datasets. Therefore it's much easier if we just include the Alpaca prompt before assigning RWKV to do anything. The Alpaca prompt template is shown below, and we have a `PromptTemplate` called `rwkv_prompt` for ease of use.

```text
Below is an instruction that describes a task. Write a response that appropriately completes the request.
# Instruction:
{Your instruction or question}

# Response:
```



In [None]:
from rwkv_chains import rwkv_prompt
from langchain.chains import LLMChain
chain = LLMChain(llm=model, prompt=rwkv_prompt, verbose=True)

In [None]:
chain.run("""The following context is an introduction. Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

"Assessment of visual acuity depends on the optotypes used for measurement. The ability to recognize different optotypes differs even if their critical details appear under the same visual angle. Since optotypes are evaluated on individuals with good visual acuity and without eye disorders, differences in the lower visual acuity range cannot be excluded. In this study, visual acuity measured with the Snellen E was compared to the Landolt C acuity.",
"100 patients (age 8 - 90 years, median 60.5 years) with various eye disorders, among them 39 with amblyopia due to strabismus, and 13 healthy volunteers were tested. Charts with the Snellen E and the Landolt C (Precision Vision) which mimic the ETDRS charts were used to assess visual acuity. Three out of 5 optotypes per line had to be correctly identified, while wrong answers were monitored. In the group of patients, the eyes with the lower visual acuity, and the right eyes of the healthy subjects, were evaluated.",
"Differences between Landolt C acuity (LR) and Snellen E acuity (SE) were small. The mean decimal values for LR and SE were 0.25 and 0.29 in the entire group and 0.14 and 0.16 for the eyes with strabismus amblyopia. The mean difference between LR and SE was 0.55 lines in the entire group and 0.55 lines for the eyes with strabismus amblyopia, with higher values of SE in both groups. The results of the other groups were similar with only small differences between LR and SE."
_________________
question:
Landolt C and snellen e acuity: differences in strabismus amblyopia?

Start with Yes/No/Maybe, then provide a short explain""")

## Summarization with RWKV

Inside RWKV, the computation graph resembles RNN in the sense that it has a hidden state that is updated with each token. So in the case of summarization with long texts, the results can be much improved if we put the instruction to summarize **after** the text. The optimized prompt templates can be used via `load_rwkv_summarize_chain`. An example usage is shown below.

In [None]:
from rwkv_chains import load_rwkv_summarize_chain
from langchain.docstore.document import Document

summary_chain = load_rwkv_summarize_chain(llm=model, chain_type="stuff", output_key="summary")
# map_reduce and refine are also available

with open("james_webb_news.txt", "r", encoding="utf-8") as f:
    text = f.read().strip()
docs = [Document(page_content=text)]
result = summary_chain.run(docs)
print(result)

In [None]:
# map reduce is also possible, and it can work over a very long article.

mr_chain = load_rwkv_summarize_chain(llm=model, chain_type="map_reduce")

# this will help when you have very long articles that can't be fed into a forward of the LLM.
docs2 = [Document(page_content=i) for i in text.split("\n\n")]
result4 = mr_chain.run(docs2)
print(result4)

## Math with RWKV

RWKV is also capable of doing math with external tools, here we showcase how RWKV can call python `numexpr` to do math. The optimized prompt template is at `LLM_MATH_PROMPT`. An example usage is shown below.

In [None]:
from rwkv_chains import LLM_MATH_PROMPT
from langchain.chains import LLMMathChain

math_chain = LLMMathChain(llm=model, prompt=LLM_MATH_PROMPT, verbose=True)

math_chain.run("What is 592138 * 4242?")

## Bash with RWKV

In [None]:
from rwkv_chains import LLMBashChain

bash_chain = LLMBashChain.from_llm(llm=model, verbose=True)

bash_chain.run("list all files under current directory.")

## Document based QA with RWKV

In [None]:
from rwkv_chains import load_rwkv_qa_chain

qa_chain = load_rwkv_qa_chain(llm=model, verbose=True, chain_type="stuff")

print(qa_chain.run(input_documents = docs, question="Who is Nicola Fox?"))

## FAISS powered Document QA with RWKV

In this example we will download a 90mb embedding model from `SentenceTransformers`, called `all-MiniLM-L6-v2` and let it work with FAISS to provide some context for the LLM to generate answers.

In [21]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings 
from rwkv_chains.retrieval_qa import RetrievalQA


embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
loader = TextLoader("james_webb_news.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

db = FAISS.from_documents(docs, embeddings)

qa = RetrievalQA.from_chain_type(llm=model, chain_type="stuff", retriever=db.as_retriever(), verbose=True)

Created a chunk of size 311, which is longer than the specified 200
Created a chunk of size 303, which is longer than the specified 200
Created a chunk of size 246, which is longer than the specified 200
Created a chunk of size 386, which is longer than the specified 200
Created a chunk of size 604, which is longer than the specified 200
Created a chunk of size 1066, which is longer than the specified 200
Created a chunk of size 248, which is longer than the specified 200
Created a chunk of size 411, which is longer than the specified 200


In [23]:
print(qa.run("Where was the ceremony held?"))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
The ceremony was held at the Steven F. Udvar-Hazy Center in Chantilly, Virginia.
