# CSV Document Search

Pipeline to ask anything from your CSV file data like sales, performance, analytic, and so on.

## Installing Library

To use the VectorShift Python library, you should be using Python 3.10 or newer.

The SDK is built upon our API. To access much of the functionality, such as saving and downloading pipelines, you should already have an API key ready.

Our Python SDK is available as the vectorshift package on PyPl. Before downloading, ensure you have pip installed. Then, you can simply get started by downloading the package by running the command in your terminal of choice:

In [1]:
! pip install vectorshift --upgrade




[notice] A new release of pip is available: 23.3.2 -> 24.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Pipeline Overview

Chat memory allows chatbot to memorize the last n-conversation from the chats.

![alt text](images/ask_csv/1-overview.png "Overall Pipeline")

In [3]:
import vectorshift as vs
from vectorshift.node import InputNode, URLLoaderNode, TextNode, SemanticSearchNode, OpenAILLMNode, OutputNode, ChatMemoryNode, CSVQueryLoaderNode
from vectorshift.pipeline import Pipeline
from vectorshift.knowledge_base import *

## Define the Vectorshift API Key

Put your vectorshift API key below.

In [4]:
vectorshift.api_key="sk_gEVQxWypbU8avw2nxDxnooSdlaQkx6hUZXkzp3iDL17cJjiG"

## Create input Nodes

For this pipeline we need to create two inputs. One for query and one for CSV File input. We create input_csv node input_type to "file" to receive CSV input. 

![alt text](images/ask_csv/2-inputs.png "Overall Pipeline")

In [8]:
input_query = InputNode(name="Query", input_type="text")

In [9]:
input_csv = InputNode(name="CSV", input_type="file")

In [18]:
csv_loader = CSVQueryLoaderNode(query_input=input_query.output(),csv_input=input_csv.output())

ValueError: Invalid input type to CSV Query DataLoader node input csv: expected Union[csv_file, file, List[csv_file], List[file]], got text. If your input comes from an InputNode, make sure the input type is 'file' and the 'process_files' flag is set to False

In [None]:
output_node = OutputNode(name="Output", input_node=csv_loader.output(), output_type="image")

In [None]:
csv_search_nodes = [input_query, input_csv, csv_loader, output_node]
csv_search_pipeline = Pipeline(
    name="CSV Search Pipeline",
    description="This pipeline searches a CSV file for the given query and returns the result.",
    nodes=csv_search_nodes
)

In [None]:
config = vectorshift.deploy.Config(
    api_key="sk_gEVQxWypbU8avw2nxDxnooSdlaQkx6hUZXkzp3iDL17cJjiG",
)

config.save_new_pipeline(csv_search_pipeline)

## Running a Pipeline

In [None]:
pipeline = Pipeline.fetch(pipeline_name='Vectorshift Chatbot')

response = pipeline.run(
    inputs = {"input_1": "https://www.vectorshift.ai/", "input_2": "/files/pdf.pdf"},
    api_key= "sk_gEVQxWypbU8avw2nxDxnooSdlaQkx6hUZXkzp3iDL17cJjiG"
)

print(response)