## SnappCloud RAG


In [1]:
%%capture
pip install ollama sentence-transformers protobuf tqdm

In [2]:
%%capture
pip install langchain langchain_community chromadb unstructured markdown

Ok, now lets do our RAG. Here we can use the cloud resources. If you want to use a local ollama server, you can create a client to localhost:11434 or alternatively just set client = ollama 

In [3]:
#ollamaBase = 'http://ollama-alibo-gpu-testing.apps.private.okd4.teh-2.snappcloud.io/'
ollamaBase = 'http://localhost:11434'
import ollama

client = ollama.Client(ollamaBase)


In [4]:
from langchain.document_loaders import DirectoryLoader
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain.text_splitter import SentenceTransformersTokenTextSplitter

It is expected that the docs directory from snappcloud gitlab documentation is copied and available in the path below

Now we fetch all available models on our local/remote GPU instance.

In [5]:
result = client.list()
print([x['name'] for x in result['models']])

['llama3:latest', 'mistral:latest', 'mxbai-embed-large:latest', 'tadayuki/suzume-llama3:8b-q4_K_M']


### Retrieval

At this point, we need the cloud docs, which are available on gitlab - https://gitlab.snapp.ir/snappcloud/user-docs

In [6]:
import os
if not os.path.exists('./user-docs'):
    !git clone https://gitlab.snapp.ir/snappcloud/user-docs
else:
    print("using existing repo")

using existing repo


In [7]:
loader = DirectoryLoader('./user-docs/docs',glob="**/*.md",loader_cls=UnstructuredMarkdownLoader)
embeddingModel = 'mxbai-embed-large'
llmModel = 'llama3'
systemPrompt = "You are a helpful assistant that answers questions using the context provided. Cite the relevant documents everytime you provide an answer"

In [8]:

docs = loader.load()
print(list(map(lambda x: x.metadata['source'], docs[0:10])))
print(len(docs))
#print(docs[10].page_content)

['user-docs/docs/vpn-access.md', 'user-docs/docs/servicedesk.md', 'user-docs/docs/overview.md', 'user-docs/docs/support.md', 'user-docs/docs/terms.md', 'user-docs/docs/reference/api-documentation.md', 'user-docs/docs/reference/cli-documentation.md', 'user-docs/docs/storage/storage-volumes.md', 'user-docs/docs/storage/volume-snapshots.md', 'user-docs/docs/storage/object-store/aws-s3-sdk.md']
127


Now we use our embedding model to create the embeddings and store them in chroma. We are using the **mxbai-embed-large** model, whose **context window is 512**, so we must ensure all of our docs are of smaller size.


In [9]:
token_splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0, tokens_per_chunk=384)
splits = token_splitter.split_documents(docs)
print(len(splits))




473


We should save the embeddings in a vector db, such as chroma or qdrant. While we are using chroma, you can also change the code to use another one, say quadrant, and langchain will do the abstraction. 

In [18]:
%%time
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model=embeddingModel,show_progress=True)

CPU times: user 134 ms, sys: 4.75 ms, total: 138 ms
Wall time: 146 ms


In [19]:
%%time
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(collection_name='snappcloud',documents=splits, embedding=embeddings,persist_directory='./chromadb')

OllamaEmbeddings: 100%|███████████████████████| 473/473 [00:44<00:00, 10.59it/s]


CPU times: user 2.36 s, sys: 265 ms, total: 2.62 s
Wall time: 45.3 s


Here we just make sure that the number of docs in collection matches the output of split

In [12]:
print(vectorstore._collection.count())

473


### Query
We can directly query the vectorstore. The quality of RAG is as good as the context provided to llm and no better, 
and this is one way to see the the context that is being passed.

In [24]:
query = "What are the external IPs for snappcloud that i need to whitelist?"
qembed = embeddings.embed_query(query)
results = vectorstore._collection.query(query_embeddings = qembed, n_results=5)
retrieved_documents = results['documents'][0]

for document in retrieved_documents:
    print(document)
    print('\n')

OllamaEmbeddings: 100%|███████████████████████████| 1/1 [00:00<00:00, 13.98it/s]

id : cache - proxy title : cache proxy go cache proxy container cache proxy alpine cache proxy npm cache proxy http proxy


id : cache - proxy title : cache proxy go cache proxy container cache proxy alpine cache proxy npm cache proxy http proxy


local " the ext _ hostname should be the hostname you are using for exposing your service externally ; the int _ wildcard _ hostname should be a wildcard value for all domains for your project. you can use the following example : for rabbitmq : shell export ext _ hostname = rabbitmq - myproject. apps. private. teh - 1. snappcloud. io export int _ wildcard _ hostname = " *. rabbitmq. myproject. svc. cluster. local " for mongodb you will want to use the - internal service name for the int _ wildcard _ hostname : shell export ext _ hostname = mongodb - myproject. apps. private. teh - 1. snappcloud. io export int _ wildcard _ hostname = " *. mongodb - internal. myproject. svc. cluster. local " generate the certificates for our root certificate au




In [13]:
retriever = vectorstore.as_retriever()

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def ollama_llm(question, context):
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    response = client.chat(model=llmModel, options = { 'temperature': 0}, messages=[{'role': 'system', 'content': systemPrompt},{'role': 'user', 'content': formatted_prompt}])
    return response['message']['content']


In [14]:
# Define the RAG chain
def rag_chain(question):
    retrieved_docs = retriever.invoke(question)
    formatted_context = format_docs(retrieved_docs)
    print("using from context",list(map(lambda x: x.metadata['source'],retrieved_docs)))
    return ollama_llm(question, formatted_context)

In [15]:
%%time
result = rag_chain("What address should I use for jaeger agent on snappcloud?")
print(result)

using from context ['user-docs/docs/overview.md', 'user-docs/docs/cache-proxy/cache-proxy.md', 'user-docs/docs/management/cli-login.md', 'user-docs/docs/overview.md']
To use the Jaeger agent on SnapCloud, you can follow these steps:

1. First, make sure you have a Jaeger agent installed in your application. You can do this by adding the following configuration to your `docker-compose.yml` file:
```yaml
version: '3'
services:
  jaeger-agent:
    image: jaegertracing/jaeger-agent:latest
    environment:
      - JAEGER_AGENT_HOST=jaeger-agent
      - JAEGER_AGENT_PORT=6832
```
2. Next, you need to configure the Jaeger agent to send its data to SnapCloud's Jaeger instance. You can do this by setting the `JAEGER_COLLECTOR_ENDPOINT` environment variable to point to SnapCloud's Jaeger instance:
```yaml
version: '3'
services:
  jaeger-agent:
    image: jaegertracing/jaeger-agent:latest
    environment:
      - JAEGER_AGENT_HOST=jaeger-agent
      - JAEGER_AGENT_PORT=6832
      - JAEGER_COLLECT

In [28]:
%%time
result = rag_chain("What are the external IPs for snappcloud that i need to whitelist?")
print(result)

using from context ['user-docs/docs/cache-proxy/cache-proxy.md', 'user-docs/docs/storage/object-store/aws-s3-cli.md', 'user-docs/docs/storage/object-store/overview.md', 'user-docs/docs/learn/creating-self-signed-certificates.md']
To whitelist the external IPs for SnapCloud that you need to allowlist, please refer to the following documentation:

* [SnapCloud Documentation: External IP Addresses](https://docs.snapcloud.io/docs/external-ip-addresses)

According to the documentation, the external IPs for SnapCloud are:

* `api.snapcloud.io`
* `s3.snapcloud.io`

You will need to whitelist these IP addresses in your firewall or security group settings to allow incoming traffic from SnapCloud.

Please note that you may also need to whitelist additional IP addresses depending on the specific services and features you are using with SnapCloud.
CPU times: user 10.8 ms, sys: 4.27 ms, total: 15.1 ms
Wall time: 7.55 s


In [16]:
result = rag_chain("I need to increase the quota for my project, how can I do this?")
print(result)

using from context ['user-docs/docs/management/cli-login.md', 'user-docs/docs/servicedesk.md', 'user-docs/docs/cache-proxy/cache-proxy.md', 'user-docs/docs/overview.md']
To increase the quota for your project, follow these steps:

1. Go to the Cloud Service Desk page.
2. Click on "Increase OKD Quota" to create a new ticket.
3. Fill in the required fields:
	* Summary: Provide a concise description of your request, including whether the requested resource amount is permanent or temporary.
	* Team Name: Enter the name of the team for which you are requesting resources.
	* Region: Select the appropriate region for your project (TEH-1 or TEH-2).
4. Optional additional fields:
	* Additional Memory Limit: If you require additional memory, enter the desired amount in this field.
	* Additional CPU Limit: If you need additional CPU, specify the desired limit in this field.
	* Additional Storage: If you require additional storage, indicate the desired amount in this field.
	* Additional Ephemeral