# Demo: Answering questions from a document

Referencing the following resources:
* [https://python.langchain.com/en/latest/modules/indexes/getting_started.html](https://python.langchain.com/en/latest/modules/indexes/getting_started.html)
* [https://python.langchain.com/en/latest/use_cases/question_answering.html](https://python.langchain.com/en/latest/use_cases/question_answering.html)
* [https://python.langchain.com/en/latest/use_cases/question_answering/semantic-search-over-chat.html](https://python.langchain.com/en/latest/use_cases/question_answering/semantic-search-over-chat.html)
* [https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/embeddings](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/embeddings)
* [https://techcommunity.microsoft.com/t5/startups-at-microsoft/use-openai-gpt-with-your-enterprise-data/ba-p/3817141](https://techcommunity.microsoft.com/t5/startups-at-microsoft/use-openai-gpt-with-your-enterprise-data/ba-p/3817141)

### Supported Azure OpenAI API versions as of 5/18/2023
* 2023-03-15-preview [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2023-03-15-preview/inference.json)
* 2022-12-01 [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2022-12-01/inference.json)
* ~~2023-05-15 [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2023-05-15/inference.json)~~ *I keep getting 'Resource not found' errors with this version*

### Installing ChromaDB

If you get 
> error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

Try the following:
* Install the build tools by downloading the installer from [here](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
* Navigate to "Individual Components" and select the latest:
  * MSVCv143 - VS 2022 C++ x64/x86 build tools
  * Windows 10 SDK "or" Windows 11 SDK

#### Set environment variables

In [None]:
import os

# Load environment variables from the .env file
from dotenv import load_dotenv
load_dotenv(os.path.join(os.getcwd(), '.env'))

#### Load the document

Mostly taken from [here](https://python.langchain.com/en/latest/use_cases/question_answering.html)

In [None]:
from langchain.document_loaders import CSVLoader #TextLoader
loader = CSVLoader(os.path.join(os.getcwd(), 'bill_sum_data.csv'))

In [None]:
test_data = loader.load()
test_data[0].page_content

In [None]:
from langchain.embeddings import OpenAIEmbeddings
embeddingsllm = OpenAIEmbeddings(deployment="ZSuiteEmbeddings", model="text-embedding-ada-002", chunk_size=1)

In [None]:
from langchain.vectorstores import Chroma

text = test_data[0].page_content.replace('\n', ' ')

# from langchain.text_splitter import CharacterTextSplitter
# text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# split_text = text_splitter.split_text(text)

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2048)
texts = text_splitter.create_documents(text)

from langchain.indexes import VectorstoreIndexCreator
index = VectorstoreIndexCreator(embedding=embeddingsllm).from_documents(texts)

# Create the vector store with the first document entry
docsearch = Chroma.from_texts(test_data[0].page_content, embeddingsllm)

In [None]:
query = "What is the title of bill 103"
docs = docsearch.similarity_search(query)