<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectara Managed Index
In this notebook we are going to show how to use [Vectara](https://vectara.com) with LlamaIndex.
Vectara is the first example of a "Managed" Index, a new type of index in Llama-index which is managed via an API.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex ðŸ¦™.

In [None]:
!pip install llama-index

In [None]:
from llama_index import SimpleDirectoryReader
from llama_index.indices import VectaraIndex

import textwrap

### Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

### Loading documents
Load the documents stored in the `paul_graham_essay` using the SimpleDirectoryReader

In [None]:
documents = SimpleDirectoryReader("../data/10q").load_data()
print("Document ID:", documents[0].doc_id)

Document ID: fe81af94-f315-4e58-a7c6-2625292dc283


### Add the content of the documents into a pre-created Vectara corpus
Here we assume an empty corpus is created and the details are available as environment variables:
* VECTARA_CORPUS_ID
* VECTARA_CUSTOMER_ID
* VECTARA_API_KEY

In [None]:
index = VectaraIndex.from_documents(documents)

### Query the Vectara Index
We can now ask questions using the VectaraIndex retriever.

In [None]:
query = "what are the main risks?"

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=5, n_sentences_before=0, n_sentences_after=0
)
response = query_engine.retrieve(query)
texts = [t.node.text for t in response]
print("\n--\n".join(texts))

Our ownership in these entities involves significant risks that are outside our control.
--
Our ownership in these entities involves significant risks that are outside our control.
--
Our ownership in these entities involves significant risks that are outside our control.
--
Autonomous vehicle technologies involve significant risks and liabilities.
--
We are unable to predict what global or U.S. tax reforms may be proposed or enacted in the future or what effects such future changes would have on our
business.


In [None]:
response = query_engine.query(query)
print(response)

The main risks mentioned in the context are significant risks associated with ownership in certain entities that are outside the company's control, as well as significant risks and liabilities related to autonomous vehicle technologies. Additionally, there is a mention of uncertainty regarding future global or U.S. tax reforms and their potential effects on the company's business.


Vectara supports max-marginal-relevance natively in the backend, and this is available as a query moe. 
Let's see an example of how to use MMR: We will run the same query "What is YC?" but this time we will use MMR where mmr_diversity_bias=1.0 which maximizes the focus on maximum diversity:

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=5,
    n_sentences_before=0,
    n_sentences_after=0,
    vectara_query_mode="mmr",
    vectara_kwargs={"mmr_k": 100, "mmr_diversity_bias": 1.0},
)
response = query_engine.retrieve(query)

texts = [t.node.text for t in response]
print("\n--\n".join(texts))

Our ownership in these entities involves significant risks that are outside our control.
--
We are unable to predict what global or U.S. tax reforms may be proposed or enacted in the future or what effects such future changes would have on our
business.
--
We are subject to climate change risks, including physical and transitional risks, and if we are unable to manage such risks, our business may be adversely
impacted.
--
Autonomous vehicle technologies involve significant risks and liabilities.
--
QUANTITATIVE AND QUALITATIVE DISCLOSURES ABOUT MARKET RISK
We are exposed to market risks in the ordinary course of our business.


As you can see, the results in this case are much more diverse, and for example do not contain the same text more than once.

In [None]:
response = query_engine.query(query)
print(response)

The main risks mentioned in the given context are:
1. Risks associated with ownership in certain entities.
2. Uncertainty regarding future global or U.S. tax reforms and their potential impact on the business.
3. Risks related to climate change, including both physical and transitional risks.
4. Risks and liabilities associated with autonomous vehicle technologies.
5. Market risks that the company is exposed to in the ordinary course of its business.


The resposne is also better as it includes more risk factors mentioned in the original document