# CSV Document Search

Pipeline to ask anything from your CSV file data like sales, performance, analytic, and so on.

## Installing Library

To use the VectorShift Python library, you should be using Python 3.10 or newer.

The SDK is built upon our API. To access much of the functionality, such as saving and downloading pipelines, you should already have an API key ready.

Our Python SDK is available as the vectorshift package on PyPl. Before downloading, ensure you have pip installed. Then, you can simply get started by downloading the package by running the command in your terminal of choice:

In [1]:
! pip install vectorshift --upgrade



## Pipeline Overview

Chat memory allows chatbot to memorize the last n-conversation from the chats.

![alt text](images/ask_csv/1-overview.png "Overall Pipeline")

In [2]:
import vectorshift as vs
from vectorshift.node import InputNode, URLLoaderNode, TextNode, SemanticSearchNode, OpenAILLMNode, OutputNode, ChatMemoryNode, CSVQueryLoaderNode
from vectorshift.pipeline import Pipeline
from vectorshift.knowledge_base import *

## Define the Vectorshift API Key

Put your vectorshift API key below.

In [3]:
vs_api_key = "YOUR_API_KEY"

## Create input Nodes

For this pipeline we need to create two inputs. One for query and one for CSV File input. We create input_csv node input_type to "file" to receive CSV input. 

![alt text](images/ask_csv/2-inputs.png "Overall Pipeline")

In [4]:
input_query = InputNode(name="Query", input_type="text")

In [5]:
input_csv = InputNode(name="CSV", input_type="file", process_files=False)

In [6]:
csv_loader = CSVQueryLoaderNode(query_input=input_query.output(),csv_input=input_csv.output())

In [8]:
output_node = OutputNode(name="Output", input=csv_loader.output(), output_type="text")

In [9]:
csv_search_nodes = [input_query, input_csv, csv_loader, output_node]
csv_search_pipeline = Pipeline(
    name="CSV Search Pipeline",
    description="This pipeline searches a CSV file for the given query and returns the result.",
    nodes=csv_search_nodes
)

In [11]:
config = vectorshift.deploy.Config(
    api_key=vs_api_key,
)

config.save_new_pipeline(csv_search_pipeline)

Successfully saved pipeline with ID 668eb476b665cebec9cf1c58.


{'pipeline': {'name': 'CSV Search Pipeline',
  'description': 'This pipeline searches a CSV file for the given query and returns the result.',
  'nodes': [{'id': 'customInput-1',
    'type': 'customInput',
    'data': {'id': 'customInput-1',
     'nodeType': 'customInput',
     'category': 'input',
     'task_name': 'input',
     'inputName': 'input_1',
     'inputType': 'Text'},
    'position': {'x': 0, 'y': -400},
    'positionAbsolute': {'x': 0, 'y': -400},
    'selected': False,
    'dragging': False},
   {'id': 'customInput-2',
    'type': 'customInput',
    'data': {'id': 'customInput-2',
     'nodeType': 'customInput',
     'category': 'input',
     'task_name': 'input',
     'inputName': 'CSV',
     'inputType': 'File',
     'processFiles': False},
    'position': {'x': 0, 'y': 0},
    'positionAbsolute': {'x': 0, 'y': 0},
    'selected': False,
    'dragging': False},
   {'id': 'dataLoader-1',
    'type': 'dataLoader',
    'data': {'id': 'dataLoader-1',
     'nodeType': 'dataL

## Running a Pipeline

In [12]:
pipeline = Pipeline.fetch(pipeline_name='CSV Search Pipeline')

response = pipeline.run(
    inputs = {"Query": "Who is oppenheimer?", "CSV": ""},
    api_key=vs_api_key
)

print(response)

{}
