## SnappCloud RAG


In [64]:
%%capture
pip install ollama sentence-transformers protobuf

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [62]:
%%capture
pip install langchain langchain_community chromadb unstructured markdown

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Ok, now lets do our RAG. Here we can use the cloud resources. If you want to use a local ollama server, you can create a client to localhost:11434 or alternatively just set client = ollama 

In [25]:
ollamaBase = 'http://ollama-alibo-gpu-testing.apps.private.okd4.teh-2.snappcloud.io/'

import ollama

client = ollama.Client(ollamaBase)


In [26]:
from langchain.document_loaders import DirectoryLoader
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain.text_splitter import SentenceTransformersTokenTextSplitter

It is expected that the docs directory from snappcloud gitlab documentation is copied and available in the path below

Now we fetch all available models on our local/remote GPU instance.

In [27]:
result = client.list()
print([x['name'] for x in result['models']])

['security-compliance:latest', 'llama3:70b', 'llama3:latest', 'llama3:instruct']


### Retrieval

At this point, we need the cloud docs, which are available on gitlab - https://gitlab.snapp.ir/snappcloud/user-docs

In [34]:
import os
if not os.path.exists('./user-docs'):
    !git clone https://gitlab.snapp.ir/snappcloud/user-docs
else:
    print("using existing repo")

using existing repo


In [32]:
loader = DirectoryLoader('./user-docs/docs',glob="**/*.md",loader_cls=UnstructuredMarkdownLoader)
embeddingModel = 'mxbai-embed-large'
llmModel = 'llama3'
systemPrompt = "You are a helpful assistant that answers questions using the context provided. Cite the relevant documents everytime you provide an answer"

In [49]:

docs = loader.load()
print(list(map(lambda x: x.metadata['source'], docs[0:10])))
print(len(docs))
#print(docs[10].page_content)

['user-docs/docs/overview.md', 'user-docs/docs/support.md', 'user-docs/docs/terms.md', 'user-docs/docs/vpn-access.md', 'user-docs/docs/servicedesk.md', 'user-docs/docs/openstack/images.md', 'user-docs/docs/openstack/overview.md', 'user-docs/docs/openstack/networking.md', 'user-docs/docs/openstack/migrate-1g-to-10g.md', 'user-docs/docs/openstack/IaC.md']
127


Now we use our embedding model to create the embeddings and store them in chroma. We are using the **mxbai-embed-large** model, whose **context window is 512**, so we must ensure all of our docs are of smaller size.


In [55]:
token_splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0, tokens_per_chunk=384)
splits = token_splitter.split_documents(docs)
print(len(splits))


473


We should save the embeddings in a vector db, such as chroma or qdrant. Below are both the examples but you can change the one you want to use as code, and the other field can be set as markdown text.


In [58]:
%%time
from langchain.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model=embeddingModel)

CPU times: user 197 µs, sys: 1.68 ms, total: 1.88 ms
Wall time: 4.76 ms


In [63]:
%%time
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(collection_name='snappcloud',documents=splits, embedding=embeddings,persist_directory='./chromadb')

TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

In [None]:
collection = vectorstore.get('snappcloud')
collection.count()

In [18]:
retriever = vectorstore.as_retriever()

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def ollama_llm(question, context):
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    response = client.chat(model=llmModel, options = { 'temperature': 0}, messages=[{'role': 'system', 'content': systemPrompt},{'role': 'user', 'content': formatted_prompt}])
    return response['message']['content']


In [19]:
# Define the RAG chain
def rag_chain(question):
    retrieved_docs = retriever.invoke(question)
    formatted_context = format_docs(retrieved_docs)
    print("using from context",list(map(lambda x: x.metadata['source'],retrieved_docs)))
    return ollama_llm(question, formatted_context)

In [20]:
%%time
result = rag_chain("What address should I use for jaeger agent on snappcloud?")
print(result)

using from context ['docs/cache-proxy/cache-proxy.md', 'docs/message-queue/mq.md', 'docs/overview.md', 'docs/openstack/images.md']
To use the Jaeger agent on SnappCloud, you should use the following address:

`jaeger-agent:6832`

This is according to the documentation provided in the `images` section of the SnappCloud documentation. The relevant document is not explicitly cited, but it appears to be a part of the overall SnappCloud documentation.

Please note that this answer assumes that you are referring to the Jaeger agent for distributed tracing and monitoring. If you meant something else by "Jaeger agent", please clarify and I'll do my best to provide a more accurate response.
CPU times: user 34.6 ms, sys: 6.1 ms, total: 40.7 ms
Wall time: 14.8 s


In [21]:
%%time
result = rag_chain("What are the external IPs for snappcloud that i need to whitelist?")
print(result)

using from context ['docs/cache-proxy/cache-proxy.md', 'docs/ai-bigdata/kubeflow.md', 'docs/openstack/images.md', 'docs/networking/loadbalancer-as-a-service.md']
To whitelist the external IPs for SnappCloud, you need to specify the following:

* `172.16.10.10`
* `172.16.10.11`
* `172.16.10.12`

These are the IP addresses that will be used as the endpoints for the ExternalService.

You can also use the available images provided by SnappCloud, such as `centos7-public` or `ubuntu20-public`, to provision your instances and avoid preparing images yourself.

Please note that LBaaS is in Alpha state, so it may not be fully supported yet.
CPU times: user 16.4 ms, sys: 939 µs, total: 17.3 ms
Wall time: 4.09 s


In [23]:
result = rag_chain("I need to increase the quota for my project, how can I do this?")
print(result)

using from context ['docs/cache-proxy/cache-proxy.md', 'docs/ai-bigdata/kubeflow.md', 'docs/servicedesk.md', 'docs/message-queue/mq.md']
To increase the quota for your project, you can follow these steps:

1. Go to the Cloud Service Desk page.
2. Click on "Increase [S3/OpenStack/OKD] Quota" to create a new ticket.
3. Fill in the required fields:
	* Summary: Provide a concise description of your request and indicate whether the requested amount of resources is permanent or temporary.
	* [S3/OpenStack/OKD] User ID/Project Name/Team Name: Specify the relevant information for the service you are requesting resources for.
	* Cloud Region: Select the appropriate region for your project (teh-1 or teh-2).
4. Optional additional fields:
	* Additional [resource type]: If you require an increase in a specific resource type (e.g., storage, memory, VCPUs), enter the desired amount in this field.

Please note that filling out the additional fields is optional, and you do not need to complete all of 