# Demo of the unofficial Python SDK for [Vectara](https://vectara.com)'s RAG platform

For questions, ask forrest@vectara.com 

In [1]:
import vectara

In [2]:
# Get some test data 
!mkdir testdoc 
!wget https://www.cs.jhu.edu/~jason/papers/mei+al.icml20.pdf -O testdoc/neural_datalog_through_time.pdf -nv 
!wget https://docs.vectara.com/assets/files/vectara_employee_handbook-4524365135dc70a59977373c37601ad1.pdf -O testdoc/vectara.pdf -nv 
!wget https://raw.githubusercontent.com/TexteaInc/funix-doc/main/Reference.md -O testdoc/funix.md -nv 
!wget https://raw.githubusercontent.com/codepod-io/codepod.io/main/docs/3-manual/README.md -O codepod.md -nv
# !wget https://raw.githubusercontent.com/tangxyw/RecSysPapers/main/Calibration/Posterior%20Probability%20Matters%20-%20Doubly-Adaptive%20Calibration%20for%20Neural%20Predictions%20in%20Online%20Advertising.pdf -O testdoc/Calibration.pdf


mkdir: cannot create directory ‘testdoc’: File exists
2023-11-21 19:26:55 URL:https://www.cs.jhu.edu/~jason/papers/mei+al.icml20.pdf [2087657/2087657] -> "testdoc/neural_datalog_through_time.pdf" [1]
2023-11-21 19:26:55 URL:https://docs.vectara.com/assets/files/vectara_employee_handbook-4524365135dc70a59977373c37601ad1.pdf [53575/53575] -> "testdoc/vectara.pdf" [1]
2023-11-21 19:26:56 URL:https://raw.githubusercontent.com/TexteaInc/funix-doc/main/Reference.md [35949/35949] -> "testdoc/funix.md" [1]


# Create a client object 

By default, the constructor will look for the following environment variables:
* VECTARA_CUSTOMER_ID
* VECTARA_CLIENT_ID
* VECTARA_CLIENT_SECRET

In [4]:
# client = vectara.vectara() # default to credentials in environment variables

## OR import from a python file called keys.py
from keys import VECTARA_CUSTOMER_ID, VECTARA_CLIENT_ID, VECTARA_CLIENT_SECRET
client = vectara.vectara(VECTARA_CUSTOMER_ID, VECTARA_CLIENT_ID, VECTARA_CLIENT_SECRET)

Bearer/JWT token generated. It will expire in 30 minutes. To-regenerate, please call acquire_jwt_token(). 


# Create a corpus

In [4]:
corpus_id = client.create_corpus("test_corpus") 

# Reset Corpus (when needed)

In [5]:
corpus_id = 9 # manual set here 
client.reset_corpus(corpus_id)

Resetting corpus 9 successful. 


# Add files to a corpus

You can use the `upload()` method to upload a file, a list of files, or a folder to a corpus. The `upload()` method automatically detects the type of file source to switch between the three methods below.
* `upload_file()`: upload a single file
* `upload_files()`: upload a list of files
* `upload_folder()`: upload all files in a folder

Of course, if you are very sure about what you are doing, you can also use the three methods above directly.

In [6]:
corpus_id = 9 # manually set corpus_id if needed. 
client.upload(corpus_id, './testdoc', verbose=True)

Uploading files from folder: ./testdoc


Uploading...:   0%|          | 0/3 [00:00<?, ?it/s, ./testdoc/neural_datalog_through_time.pdf]

Uploading..../testdoc/neural_datalog_through_time.pdf 

Uploading...:  33%|███▎      | 1/3 [00:11<00:22, 11.32s/it, ./testdoc/funix.md]                       

Success. 
Uploading..../testdoc/funix.md 

Uploading...:  67%|██████▋   | 2/3 [00:13<00:05,  5.98s/it, ./testdoc/vectara.pdf]

Success. 
Uploading..../testdoc/vectara.pdf 

Uploading...: 100%|██████████| 3/3 [00:16<00:00,  5.34s/it, ./testdoc/vectara.pdf]

Success. 





# Query to a corpus and beautifully display the results

## Example query 1

In [9]:
answer = client.query(corpus_id, "What should I do to rearrange objects?")
_ = vectara.post_process_query_result(answer, jupyter_display=True)

Query successful. 


### Here is the answer
To rearrange objects, you can follow these steps: [1] Recompute the embeddings of the objects in parallel for a certain number of iterations. [2] Within each strongly connected component, initialize the embeddings to 0 and then recompute them in parallel for that component. [3] Change the order and orientation of the objects using the "direction" attribute in a Funix decorator. [4] Consider using embeddings of entities and relations that reflect selected past events. [5] Visit the components in topologically sorted order to ensure that you work on each component after its upstream nodes have converged.

### References:
    
1. From document **neural_datalog_through_time.pdf** (matchness=0.65684634):
  _...This method recomputes all embeddings in parallel,
and repeats this for some number of iterations...._

2. From document **neural_datalog_through_time.pdf** (matchness=0.6553048):
  _...Within each strongly connected component C, ini-
tialize the embeddings to 0 and then recompute them in
parallel for |C| iterations...._

3. From document **funix.md** (matchness=0.65107906):
  _...You can change their order and orientation using the "direction" attribute in a Funix decorator...._

4. From document **neural_datalog_through_time.pdf** (matchness=0.6380951):
  _...® Embeddings of entities and relations
that reﬂect selected past events (§2.4 and §2.6)...._

5. From document **neural_datalog_through_time.pdf** (matchness=0.6360733):
  _...In the general case, visiting the com-
ponents in topologically sorted order means that we wait to
work on component C until its strictly upstream nodes have
“converged,” so that the limited iterations on C make use of
the best available embeddings of the upstream nodes...._


## Example query 2

In [25]:
answer = client.query(corpus_id, "Can I bring friends to the office?")
_ = vectara.post_process_query_result(answer, jupyter_display=True, format='Markdown')

Query successful. 


## Example query 3

In [24]:
answer = client.query(corpus_id, "How to set the frequency?")
_ = vectara.post_process_query_result(answer, jupyter_display=True, format='Markdown')

Query successful. 
