### Load Data and Construct VectorDatabase

In [1]:
from Utils import *
import os

# initialize the database
db = VectorDatabase()

# gpu if available
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:4000"

# initialize the database
db.initialize_process(chunk_size=256 ,chunk_overlap=200)

resource module not available on Windows
< VectorDatabase initialized > 
  - loading data into ChromaDB 
     - loading [faq] ...
        ... collection [faq] deleted.
     - loading [insurance] ...
        ... collection [insurance] deleted.
     - loading [finance] ...
        ... collection [finance] deleted.


### Vector Retriever

In [2]:
from Utils import *

# initialize the retriever
retriever = Retriever()
# do question 
retriever.process_questions(method='Vector')

# evaluate the accuracy
evaluator = Evaluation()
evaluator.output_evaluation()
evaluator.output_incorrect_answers()

< VectorDatabase initialized > 
< Retriever initialized > 
  - Answers saved to output.json 
< Evaluation by Ground Truths > 
  - Retrieval accuracy: 87.33%
     - Category: [insurance], Accuracy: 90.00%
     - Category: [finance], Accuracy: 74.00%
     - Category: [faq], Accuracy: 98.00%

Category: insurance
  - QID: 2, Expected: 428, Actual: 578, Source: [475, 325, 578, 428, 606, 258, 275, 565], Query: 本公司應在效力停止日前多少天以書面通知要保人？
  - QID: 4, Expected: 186, Actual: 179, Source: [186, 627, 536, 179, 174, 178], Query: 本契約內容的變更應經由誰同意並批註？
  - QID: 11, Expected: 7, Actual: 325, Source: [241, 258, 141, 298, 357, 325, 7], Query: 保單借款的可借金額上限如何決定？
  - QID: 29, Expected: 578, Actual: 524, Source: [527, 256, 102, 578, 524, 236, 431], Query: 被保險人於本契約有效期間內身故，本公司是否會依本契約約定給付保險金？
  - QID: 47, Expected: 620, Actual: 243, Source: [620, 476, 243, 337, 182, 536, 179, 596], Query: 保單年度數是如何計算的?

Category: finance
  - QID: 53, Expected: 351, Actual: 145, Source: [178, 145, 344, 883, 694, 351, 736, 611], Query: 

### BM25 Retriever

In [1]:
from Utils import *

# initialize the retriever
retriever = Retriever()
# do question 
retriever.process_questions(method='original')

# evaluate the accuracy
evaluator = Evaluation()
evaluator.output_evaluation()

resource module not available on Windows
< VectorDatabase initialized > 
< Retriever initialized > 


Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\marks\AppData\Local\Temp\jieba.cache
Loading model cost 0.636 seconds.
Prefix dict has been built successfully.


  - Answers saved to output.json 
< Evaluation by Ground Truths > 
  - Retrieval accuracy: 72.67%
     - Category: [insurance], Accuracy: 82.00%
     - Category: [finance], Accuracy: 44.00%
     - Category: [faq], Accuracy: 92.00%


### BM25 + Vector Fusion Retriever

In [1]:
from Utils import *

# initialize the retriever
retriever = Retriever()
# do question 
retriever.process_questions(method='BM25_Vector')

# evaluate the accuracy
evaluator = Evaluation()
evaluator.output_evaluation()

resource module not available on Windows
< VectorDatabase initialized > 
< Retriever initialized > 
  - Answers saved to output.json 
< Evaluation by Ground Truths > 
  - Retrieval accuracy: 88.00%
     - Category: [insurance], Accuracy: 94.00%
     - Category: [finance], Accuracy: 74.00%
     - Category: [faq], Accuracy: 96.00%


## Llama index BM25

In [2]:
from Utils import *

# initialize the retriever
retriever = Retriever()
# do question 
retriever.process_questions(method='BM25')

# evaluate the accuracy
evaluator = Evaluation()
evaluator.output_evaluation()

< VectorDatabase initialized > 
< Retriever initialized > 
  - Answers saved to output.json 
< Evaluation by Ground Truths > 
  - Retrieval accuracy: 82.67%
     - Category: [insurance], Accuracy: 98.00%
     - Category: [finance], Accuracy: 60.00%
     - Category: [faq], Accuracy: 90.00%


### weight

In [None]:
from Utils import *

# initialize the retriever
retriever = Retriever()
# do question 
retriever.process_questions(method='weight')

# evaluate the accuracy
evaluator = Evaluation()
evaluator.output_evaluation()

In [2]:
from Utils import *

# initialize the retriever
retriever = Retriever()
# do question 
retriever.process_questions(method='weight',combine_method='weighted')

# evaluate the accuracy
evaluator = Evaluation()
evaluator.output_evaluation()

< VectorDatabase initialized > 
< Retriever initialized > 
  - Answers saved to output.json 
< Evaluation by Ground Truths > 
  - Retrieval accuracy: 87.33%
     - Category: [insurance], Accuracy: 92.00%
     - Category: [finance], Accuracy: 72.00%
     - Category: [faq], Accuracy: 98.00%
