# Vectorshift Chatbot

## Installing Library

To use the VectorShift Python library, you should be using Python 3.10 or newer.

The SDK is built upon our API. To access much of the functionality, such as saving and downloading pipelines, you should already have an API key ready.

Our Python SDK is available as the vectorshift package on PyPl. Before downloading, ensure you have pip installed. Then, you can simply get started by downloading the package by running the command in your terminal of choice:

In [1]:
! pip install vectorshift --upgrade



## Vectorshift Chatbot: Add Your Company Knowledge Base to Chat

The pipeline takes in a user question about VectorShift (input node). The user question queries a vector store that contains VectorShift documentation (a database that allows for semantic queries and returns the most relevant pieces of information). The results of the vector store are fed into an LLM prompt, along with the user question, and chat memory.

The overall pipeline should be looks like the figure below:
![alt text](images/vectorshift_chatbot/1-overview.png "Overall Pipeline")

In the first step, lets import our SDK and put the API Key

In [11]:
import vectorshift as vs
from vectorshift.node import InputNode, TextNode, OpenAILLMNode, OutputNode, ChatMemoryNode, KnowledgeBaseNode
from vectorshift.pipeline import Pipeline
from vectorshift.knowledge_base import *

In [12]:
vs_api_key="YOUR_API_KEY"

### Input
Our pipeline takes in one input, which is of type text (the URL). Correspondingly, there's an InputNode class that we can use to represent this input, which requires a name and data type.

![alt text](images/vectorshift_chatbot/2-input_node.png "Input Node")

The data type is more than a constructor argument here. Behind the scenes, node outputs are tagged with different types (e.g. LLMs produce textual output), which can help catch issues with pipelines before they're saved to the VectorShift platform. We list the expected types of different nodes' inputs and outputs in node-specific documentation.

In [4]:
input_node = InputNode(name="User_question", input_type="text")

### Chat Memory
Chat memory allows chatbot to memorize the last n-conversation from the chats.

![alt text](images/vectorshift_chatbot/3-memory_node.png "Overall Pipeline")

In [5]:
chat_history = ChatMemoryNode(memory_type='Full - Formatted')

### Knowledge Base Node
Knowledge base allows VectorShift to store information about your product. In this demo, we use docs.vectorshift.ai as source of documentation

![alt text](images/vectorshift_chatbot/4-knowledge_node.png "Knowledge Node")

Create knowledge base

In [6]:
knowledge_base = KnowledgeBase(name="Vectorshift Doc", description="Knowledge Base for SDK related questions")
knowledge_base.save()

{'id': {'id': '669927c3ae2b4fcc584ea016',
  'name': 'Vectorshift Doc',
  'description': 'Knowledge Base for SDK related questions',
  'chunkSize': 400,
  'chunkOverlap': 0,
  'isHybrid': False,
  'userID': 'auth0|64cbd237b160e37c8c3510d4',
  'orgID': 'Personal',
  'vectorCount': 0,
  'createdDate': '2024-07-18T14:33:39.370864',
  'lastSynced': None,
  'selectedIntegrations': [],
  'vector_db_details': {'vector_db_provider': 'qdrant',
   'vectorstore_id': None,
   'collection_name': 'text-embedding-3-small',
   'embedding_model': 'text-embedding-3-small',
   'embedding_provider': 'openai',
   'dimension': None,
   'is_hybrid': False,
   'sparse_embedding_model': None,
   'query_embedding_model': None,
   'sparse_query_embedding_model': None},
  'documents': None,
  'integration_metadata': None,
  'folderId': None,
  'integrationNode': False,
  'fileProcessingImplementation': None,
  'apifyKey': None,
  'conversation_id': None,
  'hide_from_owner': False}}

In [13]:
knowledge_base.load_documents(document="presentation.pdf", document_type="File")

{'documents': [{'vectorstoreID': '669927c3ae2b4fcc584ea016',
   'documentID': 'a85ead30-d810-4308-a1ab-ee4815b375de',
   'userID': 'auth0|64cbd237b160e37c8c3510d4',
   'orgID': 'Personal',
   'fileID': 'presentation.pdf',
   'status': 'loading',
   'value': 'presentation.pdf',
   'vectorCount': 0,
   'type': 'File',
   'source': 'sources/Personal/auth0|64cbd237b160e37c8c3510d4/669927c3ae2b4fcc584ea016/99fb9f12-3eae-4448-bd07-a5f26b585680',
   'metadata': None,
   'integrationType': None,
   'integrationID': None,
   'itemID': None,
   'itemName': None,
   'errorMessage': None,
   'rescrapeFrequency': 'Never',
   's3FileID': '0e3761d7-61ba-4b6b-b30e-47e8dfb3027b',
   'fileName': None,
   'sharedWith': []}]}

In [8]:
kb_node = KnowledgeBaseNode(
    base_id=knowledge_base.id,
    query_input=input_node.output(),
    max_docs_per_query=7,
    rerank_documents=True,
    alpha=0.9
)

### LLM Node
Knowledge base allows VectorShift to store information about your product. In this demo, we use docs.vectorshift.ai as source of documentation

![alt text](images/vectorshift_chatbot/5-llm_node.png "Knowledge Node")

In [9]:
system_text_raw = """You are a helpful assistant that answers User Question based on Context and Conversational History.

If you are unable to answer the question or if the user requests, direct them to these support resources:
1. Documentation: https://docs.vectorshift.ai/vectorshift/
2. Book a meeting:
https://calendly.com/albert_mao/vectorshift-intro-chat
3. Discord:
https://discord.gg/3bpkv4AX"""
system_text = TextNode(text=system_text_raw)

In [13]:
llm = OpenAILLMNode(
    model="gpt-4", 
    system_input=system_text.output(), 
    prompt_input='History:"""\n{{History}}\n"""\n\nUser Question\n{{User_Question}}\n\n\nContext\n{{Context}}',
    max_tokens=4000,
    text_inputs={'History': chat_history.output(), 'User_Question': input_node.output(), 'Context': kb_node.output()}
)

### Output

The output of the entire pipeline should be the text of the email, which is created by the output_text node. We can just take that node's output() and package it in an OutputNode, which determines the overall returned value of the pipeline.

Remember that OutputNode is a node that represents, in the pipeline's computation graph, the final value produced. We pass in the output() of output_text, which is a NodeOutput, as the input to that node. OutputNodes are a kind of node; NodeOutputs define what a node returns.

![alt text](images/vectorshift_chatbot/6-output.png "Overall Pipeline")

In [15]:
output = OutputNode(
    name="Output", 
    output_type="text", 
    input=llm.output()
)

These are all the nodes we need! The overall structure of the nodes closely follows that of the no-code example. Each node block in the no-code editor became its own object in Python, and each edge between nodes has been represented by the output of one node being passed into the constructor of another.

### Creating and Deploying the Pipeline

Once nodes have been defined, creating a pipeline object is fairly simple, since the node objects themselves already encode the edges between them.

A Pipeline object can be initialized by passing in a list of all nodes, a name, and a description. The list of nodes can be passed in any order.

In [33]:
vectorshift_chatbot = [
    input_node, chat_history, llm, output, kb_node, system_text
]

In [34]:
vectorshift_chatbot_pipeline = Pipeline(
    name="Vectorshift Chatbot",
    description="Generate personalized emails for outreach",
    nodes=vectorshift_chatbot
)

There are a few nifty methods that a Pipeline object has. Printing it gives a representation of its constituent nodes—and if you want to generate code that represents how you could construct the object, there's a method for that too (that assigns generated IDs as variable names for each node).

In [35]:
print(vectorshift_chatbot_pipeline)

(pipeline id <no pipeline id>)=Pipeline(
    id=None,
    name='Vectorshift Chatbot',
    description='Generate personalized emails for outreach',
    nodes=[
	(node id customInput-1)=InputNode(
		name='User_question',
		input_type='text',
		process_files=True
	),
	(node id chatMemory-1)=ChatMemoryNode(
		memory_type='Full - Formatted'
	),
	(node id llmOpenAI-1)=OpenAILLMNode(
		model='gpt-4',
		system_input=text_1.outputs()['output'],
		prompt_input='History:"""
	{{History}}
	"""
	
	User Question
	{{User_Question}}
	
	
	Context
	{{Context}}',
		max_tokens=4000,
		text_inputs={'History': chat_memory_1.outputs()['value'], 'User_Question': custom_input_1.outputs()['value'], 'Context': knowledge_base_1.outputs()['results']}
	),
	(node id customOutput-1)=OutputNode(
		name='Output',
		output_type='text',
		input=llm_open_ai_1.outputs()['response']
	),
	(node id vectorStore-1)=KnowledgeBaseNode(
		query_input=custom_input_1.outputs()['value'],
		base_id='668ea87cb665cebec9cf1aa5',
		max_doc

In [None]:
print(vectorshift_chatbot_pipeline.construction_str())

To save the pipeline to the VectorShift platform, we can pass in our API keys to create a Config object and then pass the pipeline object in.

In [37]:
config = vectorshift.deploy.Config(
    api_key=vs_api_key,
)

In [38]:
config.save_new_pipeline(vectorshift_chatbot_pipeline)

Successfully saved pipeline with ID 668eadc892ca9c05a272bc13.


{'pipeline': {'name': 'Vectorshift Chatbot',
  'description': 'Generate personalized emails for outreach',
  'nodes': [{'id': 'customInput-1',
    'type': 'customInput',
    'data': {'id': 'customInput-1',
     'nodeType': 'customInput',
     'category': 'input',
     'task_name': 'input',
     'inputName': 'User_question',
     'inputType': 'Text'},
    'position': {'x': 0, 'y': -450},
    'positionAbsolute': {'x': 0, 'y': -450},
    'selected': False,
    'dragging': False},
   {'id': 'chatMemory-1',
    'type': 'chatMemory',
    'data': {'id': 'chatMemory-1',
     'nodeType': 'chatMemory',
     'category': 'memory',
     'task_name': 'load_memory',
     'memoryType': 'Full - Formatted',
     'memoryWindow': None,
     'memoryWindowValues': {'Full - Formatted': 0,
      'Full - Raw': 0,
      'Vector Database': 0,
      'Message Buffer': 10,
      'Token Buffer': 2048}},
    'position': {'x': 0, 'y': 0},
    'positionAbsolute': {'x': 0, 'y': 0},
    'selected': False,
    'dragging':

The constructed pipeline should be looks like figure below. You can check via VectorShift Dashboard -> Pipeline 
![alt text](images/7-combined.png "Overall Pipeline")

### Running a Pipeline

To rune a pipeline, you need to fetch the name of pipeline you wanted to try, and then execute with pipeline.run

In [51]:
pipeline = Pipeline.fetch(pipeline_name='Vectorshift Chatbot', api_key=vs_api_key)

In [None]:
response = pipeline.run(
    inputs= {"User_question": "https://www.vectorshift.ai/"},
    api_key=vs_api_key
)

print(response)