# Vectara demo for XYZ Chemical

Build a quick knowledge chatbot/search engine for XYZ Chemicals using Vectara.

In [2]:
# !pip install vectara

In [1]:
import vectara

import importlib
importlib.reload(vectara)

# Initialize the client
client = vectara.vectara(customer_id="123", api_key="456")

Vectara SDK initialized. 


# 1. Create a corpus
A corpus is a collection of documents. 

In [81]:
corpus_id = client.create_corpus("Chemistry and Politics")

New corpus created, corpus ID is: 26
Please write down this corpus ID. You will need it to upload files to it and to query against it.


26

In [2]:
corpus_id = 26
# client.reset_corpus(corpus_id) # delete all documents in the corpus

# 2. Add documents to the corpus

## 2.1 Add documents from a local file, a local folder or a list of file paths. 

In [87]:
client.upload(corpus_id, './test_data/Methane.pdf', verbose=True) # upload one file
# client.upload(corpus_id, './test_data') # upload all files under a folder/directory

True
I am here
Uploading..../test_data/Methane.pdf Success. 


{'response': {'status': {},
  'quotaConsumed': {'numChars': '77400', 'numMetadataChars': '64653'}}}

## 2.2 Create a document by manually uploading parts of it 

A part can be a sentence. A document is thus a sequence of parts. A part is called a "chunk". 

In [83]:
text_list = [
    'We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.',
    'Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble',
    ] 

client.create_document_from_chunks(
    corpus_id, 
    chunks=text_list, 
    chunk_metadata=[
        {'note': 'preamble'},
        {'note': '1st amendment'}], 
    doc_id="Constitution of the United States", 
    doc_metadata={"country": "United States"}, 
    verbose=True)

Uploading the chunks...
{'status': {'code': 'OK', 'statusDetail': '', 'cause': None}, 'quotaConsumed': {'numChars': '538', 'numMetadataChars': '73'}}


{'status': {'code': 'OK', 'statusDetail': '', 'cause': None},
 'quotaConsumed': {'numChars': '538', 'numMetadataChars': '73'}}

You can mix documents of different languages in the same corpus.

In [84]:
text_list = [ # Text in Korean, the Constitution of South Korea
    '悠久한 歷史와 傳統에 빛나는 우리 大韓國民은 3·1運動으로 建立된 大韓民國臨時政府의 法統과 不義에 抗拒한 4·19民主理念을 繼承하고, 祖國의 民主改革과 平和的統一의 使命에 立脚하여 正義·人道와 同胞愛로써 民族의 團結을 鞏固히 하고, 모든 社會的弊習과 不義를 打破하며, 自律과 調和를 바탕으로 自由民主的基本秩序를 더욱 確固히', 
    '大韓民國의 領土는 韓半島와 그 附屬島嶼로 한다.', 
    '모든 國民은 身體의 自由를 가진다. 누구든지 法律에 의하지 아니하고는 逮捕·拘束·押收·搜索 또는 審問을 받지 아니하며, 法律과 適法한 節次에 의하지 아니하고는 處罰·保安處分 또는 强制勞役을 받지 아니한다'
]

client.create_document_from_chunks(
    corpus_id, 
    chunks=text_list, 
    chunk_metadata=[
        {'note': 'preamble'}, 
        {'note': 'Chapter 1, Section 1, Article 1'},
        {'note': 'Chapter 2, Section 12, Article 1'}],
    doc_id="Constitution of South Korea", 
    verbose=True)

Uploading the chunks...
{'status': {'code': 'OK', 'statusDetail': '', 'cause': None}, 'quotaConsumed': {'numChars': '319', 'numMetadataChars': '109'}}


{'status': {'code': 'OK', 'statusDetail': '', 'cause': None},
 'quotaConsumed': {'numChars': '319', 'numMetadataChars': '109'}}

# 3. Now you can search or chat with the documents

In [9]:
r= client.query(corpus_id, "What are the rights of the people?", print_format= 'markdown', 
                # metadata_filter="doc.coury='Uted States'",
                jupyter_display = True, 
                verbose=False) # query the corpus

Query successful. 


### Here is the answer
The rights of the people encompass various fundamental freedoms and protections. These include the
right to personal liberty without arbitrary arrests or punishment without lawful procedures [2].
Additionally, people have the right to freedom of religion, speech, press, and assembly as outlined
in the constitution [3]. The Constitution also establishes the principles of justice, domestic
tranquility, common defense, general welfare, and securing liberty for all citizens [4]. These
rights are inherited from historical movements and are upheld to promote national unity and
democratic values [5].

Factual Consistency Score: `0.54751414`

### References:
    
1. From document **Constitution of South Korea** (matchness=0.74057484):
  _...모든 國民은 身體의 自由를 가진다. 누구든지 法律에 의하지 아니하고는 逮捕·拘束·押收·搜索 또는 審問을 받지 아니하며, 法律과 適法한 節次에 의하지 아니하고는 處罰·保安處分 또는 强制勞役을 받지 아니한다..._

2. From document **Constitution of the United States** (matchness=0.72459704):
  _...Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble..._

3. From document **Constitution of the United States** (matchness=0.7137908):
  _...We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America...._

4. From document **Constitution of South Korea** (matchness=0.67185295):
  _...悠久한 歷史와 傳統에 빛나는 우리 大韓國民은 3·1運動으로 建立된 大韓民國臨時政府의 法統과 不義에 抗拒한 4·19民主理念을 繼承하고, 祖國의 民主改革과 平和的統一의 使命에 立脚하여 正義·人道와 同胞愛로써 民族의 團結을 鞏固히 하고, 모든 社會的弊習과 不義를 打破하며, 自律과 調和를 바탕으로 自由民主的基本秩序를 더욱 確固히..._

5. From document **Methane.pdf** (matchness=0.6627644):
  _...People.hofstra.edu. Retrieved on March 30, 2014...._


In [107]:
r= client.query(corpus_id, "何謂甲烷", print_format= 'markdown', 
                # metadata_filter="doc.coury='Uted States'",
                jupyter_display = True, 
                verbose=False) # query the corpus

Query successful. 


### Here is the answer
甲烷是一种化学物质，其化学式为CH4，由一个碳原子与四个氢原子结合而成[zho2]。甲烷是一种无色、无味、透明的气体[zho3]。该词源自化学后缀“-
ane”，表示属于烷烃家族的物质；以及甲基一词，源自法语“méthylène”，后者又来源于法语“méthyle”[zho1]。甲烷与甲基有关，甲基是与甲烷相关的一个官能团[zho4]。

Factual Consistency Score: `0`

### References:
    
1. From document **Methane.pdf** (matchness=0.80459774):
  _...Etymologically, the word methane is coined from the chemical
suffix "-ane", which denotes substances belonging to the alkane
family; and the word methyl, which is derived from the German
Methyl (1840) or directly from the French méthyle, which is a back-
formation from the French méthylène (corresponding to English
"methylene"),  the  root of which  was  coined  by  Jean-Baptiste..._

2. From document **Methane.pdf** (matchness=0.7917943):
  _...Methane (US: /ˈmɛθeɪn/ METH-ayn, UK: /ˈmiːθeɪn/ MEE-thayn) is
Methane
a  chemical  compound  with  the  chemical  formula  CH4  (one
carbon atom bonded to four hydrogen atoms)...._

3. From document **Methane.pdf** (matchness=0.78040886):
  _...Methane  is  an  odorless,
colourless and transparent
gas...._

4. From document **Methane.pdf** (matchness=0.7750094):
  _...Methyl group, a functional group related to methane...._

5. From document **Methane.pdf** (matchness=0.7702186):
  _...In  general,  methane..._


# What makes Vectara different from other solutions? 

TL;DR: Speed, accuracy, security, scalability. 

1. Search filters, e.g. you can filter by language, by document, by chunk, by any metadata you want to add. For example, asking the US Constitution for what methane is results no answer. 

In [11]:
r= client.query(corpus_id, "何謂甲烷", print_format= 'markdown', 
                metadata_filter="doc.id='Constitution of the United States'",
                jupyter_display = True, 
                verbose=False) # query the corpus

Query successful. 


### Here is the answer
The returned results did not contain sufficient information to be summarized into a useful answer
for your query. Please try a different search or restate your query differently.

Factual Consistency Score: `0`

### References:
    
1. From document **Constitution of the United States** (matchness=0.5970528):
  _...We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America...._

2. From document **Constitution of the United States** (matchness=0.58943474):
  _...Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble..._


2. Language-agnostic. Documents can be in different languages. Query and documents can be in different languages. 
3. Access control, e.g. a user can only get answer from documents authorized to him/her. 
4. Extremely fast, i.e., low-latency. As fast as Google search. 
5. State-of-the-art models for best accuracy, e.g., Boomerang, Slingshot, and HHEM.
6. Security, e.g., SOC2, HIPAA, etc.

**We do not train our model using your data.**

In [12]:
client.list_documents(26)

{'document': [{'id': 'Constitution of South Korea', 'metadata': []},
  {'id': 'Constitution of the United States',
   'metadata': [{'name': 'country', 'value': 'United States'}]},
  {'id': 'Methane.pdf',
   'metadata': [{'name': 'CreationDate', 'value': '1714593506'},
    {'name': 'Producer', 'value': 'Skia/PDF m121'},
    {'name': 'Creator', 'value': 'Chromium'},
    {'name': 'ModDate', 'value': '1714593506'},
    {'name': 'title', 'value': 'Methane'}]}],
 'nextPageKey': ''}