<a href="https://colab.research.google.com/github/dimitrod/ehu_nlp_dimathina/blob/clean_branch/test_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Test Environment
Use this notebook to conduct single tests on the model of your choice.





---



## Setup
To import the model into the evaluation environment enter the parameters into the script below. Use the following table to find the right entries for each model:


|Model|Context|MODEL_NAME|DATABASE|PARAMS|Note|
|----------|----------|----------|----------|----------|----------|
|Tiny Llama|-|tiny_llama_no_retriever|-|-|**very slow**|
|Tiny Llama|Whole documents|tiny_llama_dense|**external**|-|**very slow**|
|Mistral Instruct|-|mistral_instruct_no_retriever|-|-|**Huggingface and Pinecone Token and GPU required**|
|Mistral Instruct|Whole documents|mistral_instruct_dense|**external**|-|**Huggingface and Pinecone Token and GPU required**|
|Mistral Instruct|Text Fragments|mistral_instruct_hybrid|sparse-dense|k, c, o|**Huggingface and Pinecone Token and GPU required**|
|Bert Base|QA Pairs|bert_base_qa_embeddings|**directly imported**|k|-|
|Bert Base|Whole documents|bert_base_dense|**external**|-|-|
|Bert Base|Text Fragments|bert_base_sparse|sparse|k|-|
|Bert Finetuned|Whole documents|bert_finetuned_dense|**external**|-|-|
|Chat GPT 4o|-|chat_gpt_no_retriever|-|t|**Not free to use, OpenAI Token required**|
|Chat GPT 4o|Whole documents|chat_gpt_hybrid|sparse-dense|k, c, o, t|**Not free to use, OpenAI Token required**|

The meaning of each parameter can be found in this table

|Parameter Name|Description|
|----------|----------|
|k|Number of contexts the retriever sends to the reader|
|c|chunk size of each context|
|o|overlap between the contexts|
|t|temparature of the reader model|

If the model uses an external database, a directly imported database or no database please enter an empty string ("") for the DATABASE variable in the script.

In [None]:
import os
import importlib

os.environ["MODEL_NAME"] = "tiny_llama_no_retriever"
os.environ["DATABASE"] = ""

Execute the following script to setup the test environment

In [None]:
import os
import shutil

# Set environment variables
directory = os.environ["MODEL_NAME"]
database = os.environ["DATABASE"]

# Install Git LFS
!sudo apt-get install git-lfs -y
!git lfs install

# Clone the repository
!git clone --branch clean_branch https://github.com/dimitrod/ehu_nlp_dimathina.git
%cd ehu_nlp_dimathina

# Fetch and checkout files for the model
dir_path = f"models/{directory}/*"
!git lfs fetch --include="{dir_path}"
!git lfs checkout
%cd ..

# Move the model to the current directory
shutil.move(f"ehu_nlp_dimathina/models/{directory}", ".")

# Install model-specific requirements
!pip install -r {directory}/requirements.txt

# Handle the optional database
if database:
    %cd ehu_nlp_dimathina
    db_path = f"databases/{database}/*"
    !git lfs fetch --include="{db_path}"
    !git lfs checkout
    %cd ..
    shutil.move(f"ehu_nlp_dimathina/databases/{database}", f"{directory}/database")

# Cleanup
shutil.rmtree("ehu_nlp_dimathina")
#shutil.rmtree("sample_data")


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.3).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.
Git LFS initialized.
Cloning into 'ehu_nlp_dimathina'...
remote: Enumerating objects: 1902, done.[K
remote: Counting objects: 100% (493/493), done.[K
remote: Compressing objects: 100% (402/402), done.[K
remote: Total 1902 (delta 265), reused 125 (delta 91), pack-reused 1409 (from 1)[K
Receiving objects: 100% (1902/1902), 30.09 MiB | 48.99 MiB/s, done.
Resolving deltas: 100% (1159/1159), done.
/content/ehu_nlp_dimathina
fetch: Fetching reference refs/heads/clean_branch
Skipped checkout for "databases/sparse-dense/document_library.pkl", content not local. Use fetch to download.
Skipped checkout for "databases/sparse-dense/documents.txt", content not local. Use fetch to download.
Skipped checkout for "databases/sparse-dense/tfidf_vocabulary.pkl", content not local. Use 

Enter the hyperparameters of the model. If the model doesn't require any leave the list empty **but still execute the script**

In [None]:
params = []

Create an instance of the model

In [None]:
model_name = os.environ["MODEL_NAME"]

module_path = f"{model_name}.{model_name}"  # Combine directory and module
module_obj = importlib.import_module(module_path)  # Import the module dynamically
cls_obj = getattr(module_obj, model_name)
model = cls_obj(params)



---



# Tests
Start the tests that you want to conduct

In [None]:
question = "What is the capital of france?"

answer = model.invoke(question)

print(answer)

The capital of France is Paris.
