<a href="https://colab.research.google.com/github/jiwon-hae/RAG_with_llamaindex/blob/main/llamaindex_with_local_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Save model from hugging face

In [1]:
from transformers import AutoTokenizer, AutoModel

model_llm = "Writer/camel-5b-hf"
model_emb = "BAAI/bge-small-en-v1.5"

save_llm = "models/camel-5b-hf"
save_emb = "models/bge-small-en-v1.5"

tokenizer = AutoTokenizer.from_pretrained(model_llm)
model = AutoModel.from_pretrained(model_llm)
tokenizer.save_pretrained(save_llm)
model.save_pretrained(save_llm)

tokenizer = AutoTokenizer.from_pretrained(model_emb)
model = AutoModel.from_pretrained(model_emb)
tokenizer.save_pretrained(save_emb)
model.save_pretrained(save_emb)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [2]:
!pip install llama-index llama-index-llms-huggingface llama-index-embeddings-huggingface



In [3]:
from llama_index.llms.huggingface import HuggingFaceLLM

llm = HuggingFaceLLM(
    tokenizer_name="models/camel-5b-hf",
    model_name="models/camel-5b-hf",
    context_window=2048,
    max_new_tokens=512,
    generate_kwargs={"temperature": 0.25, "do_sample": True},
    device_map="auto",
    tokenizer_kwargs={"max_length": 2048},
)


Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

In [4]:
llm.complete("What is Mistral AI?")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


CompletionResponse(text='\nMistral AI is a cutting-edge artificial intelligence (AI) system that combines natural language processing, machine learning, and advanced analytics to provide actionable insights and predictions across various industries, such as finance, healthcare, and transportation.', additional_kwargs={}, raw={'model_output': tensor([[ 2061,   318, 15078,  1373,  9552,    30,   198, 49370,  1373,  9552,
           318,   257,  7720,    12, 14907, 11666,  4430,   357, 20185,     8,
          1080,   326, 21001,  3288,  3303,  7587,    11,  4572,  4673,    11,
           290,  6190, 23696,   284,  2148,  2223,   540, 17218,   290, 16277,
          1973,  2972, 11798,    11,   884,   355,  9604,    11, 11409,    11,
           290,  9358,    13, 50256]], device='cuda:0')}, logprobs=None, delta=None)

In [5]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name='models/bge-small-en-v1.5')



In [6]:
embeddings = embed_model.get_text_embedding("What is GPT4")
embeddings[:3]

[0.0809716209769249, 0.046074219048023224, 0.024222958832979202]

In [7]:
!mkdir dataset/llamaindex_data
!ls dataset/

mkdir: cannot create directory ‘dataset/llamaindex_data’: File exists
llamaindex_data  ml_qa_new_models.csv  ml_qa_test.csv
lm_texts	 ml_qa_raw.csv	       ml_qa_train.csv


## Llamaindex with local models

In [8]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("dataset/llamaindex_data").load_data()
vector_index = VectorStoreIndex.from_documents(documents, embed_model=embed_model, similarity_top_k = 1)
query_engine = vector_index.as_query_engine(llm = llm)

In [9]:
response = query_engine.query("What is Mistral AI?")
response.response

Token indices sequence length is longer than the specified maximum sequence length for this model (1514 > 512). Running this sequence through the model will result in indexing errors
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


' Mistral AI is an AI company founded in 2015 by Elon Musk and Sam Altman, with headquarters in San Francisco, California. It is known for its work on autonomous vehicles, including the Tesla Model 3 and the Neuralink project. It has developed AI-driven self-driving technology, and is known for its work on AI-powered language translation and natural language processing.'

In [10]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader


documents = SimpleDirectoryReader("dataset/llamaindex_data").load_data()
vector_index = VectorStoreIndex.from_documents(documents, embed_model="local:models/bge-small-en-v1.5", similarity_top_k=1)
query_engine = vector_index.as_query_engine(llm=llm)




In [11]:
response = query_engine.query("What is Mistral AI?")
response.response

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'\nMistral AI is an AI company founded in 2018 by Yann LeCun, a French entrepreneur and investor. It is known for its work on autonomous vehicles, particularly the AlphaGo AI program that defeated a human world champion in the game of Go. Mistral AI is based in Paris, France.\n\nOriginal Answer: Mistral AI is an AI company founded in 2018 by Yann LeCun, a French entrepreneur and investor. It is known for its work on autonomous vehicles, particularly the AlphaGo AI program that defeated a human world champion in the game of Go. Mistral AI is based in Paris, France.'

### Get Documents

In [12]:
import requests

def get_wikipedia_page_links(title):
    S = requests.Session()

    URL = "https://en.wikipedia.org/w/api.php"

    params = {
        "action": "query",
        "format": "json",
        "titles": title,
        "prop": "links",
        "pllimit": "max"
    }

    all_links = []

    while True:
        response = S.get(url=URL, params=params).json()
        pages = response.get('query', {}).get('pages', {})
        for page_id, page_content in pages.items():
            links = page_content.get('links', [])
            for link in links:
                all_links.append(link['title'])

        if 'continue' in response:
            params['plcontinue'] = response['continue']['plcontinue']
        else:
            break

    return all_links

In [13]:
links_lm = get_wikipedia_page_links('Large language model')
len(links_lm)

558

In [14]:
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path='.env')
api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key = api_key)

def get_response(prompt):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt},],
    )
    return response.choices[0].message.content.strip()


In [15]:
def get_article_abstract(title):
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            "exintro": True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response['query']['pages'].values()))
    return page.get("extract", "")



def is_ml_related(title, abstract):
    prompt = f"""
    Given the abstract of the Wikipedia article, determine if the article is machine-learning or language model related.
    [Rules]
    - Exclude articles about specific individuals
    - Exclude articles that are about general knowledge and not related to machine-learning or langauge model.
    - Include articles about the developer, company.
    - Answer True if the article is related to machine-learning or language model, else False

    [title]
    {title}
    [abstract]
    {abstract}
    [answer]
    """

    return (get_response(prompt).lower().find("true") > -1)


In [16]:
idx = 0
links_lm[0]

'1.58-bit large language model'

In [17]:
title = links_lm[0]
abstract = get_article_abstract(title)
abstract

'A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a transformer large language model with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and perplexity of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit FP16 or BF16) counterparts, this design allows reaching the same artificial intelligence goals with much lower hardware requirements, latency, and training effort.\nThe name comes from a fact that a single trit, a ternary arithmetic equivalent of a bit that can take the {-1, 0, 1} values, carries \n  \n    \n      \n        l\n        o\n        \n          g\n          \n            2\n          \n        \n        3\n        ≈\n        1.58\n      \n    \n    {\\displaystyle log_{2}3\\approx 1.58}\n  \n bits of information. 

In [18]:
is_ml_related(title, abstract)

True

In [19]:
abstract = get_article_abstract('Syntax')
abstract

'In linguistics, syntax ( SIN-taks) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency), agreement, the nature of crosslinguistic variation, and the relationship between form and meaning (semantics). Diverse approaches, such as generative grammar and functional grammar, offer unique perspectives on syntax, reflecting its complexity and centrality to understanding human language.'

In [20]:
is_ml_related('Syntax', abstract)

False

In [21]:
from pathlib import Path

data_path = Path('dataset/lm_texts')
if not data_path.exists():
  Path.mkdir(data_path)

In [22]:
def save_full_article(title):
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            "explaintext": True,
        },
    ).json()
    page = next(iter(response['query']['pages'].values()))
    full_text = page.get("extract", "")

    with open(f"../dataset/lm_texts/{title}.txt", "w", encoding="utf-8") as file:
        file.write(full_text)

In [23]:
from tqdm import tqdm
import random

title_valid = []

for title in tqdm(random.sample(links_lm, 400)):
  try:
    abstract = get_article_abstract(title)
    if is_ml_related(title, abstract):
      save_full_article(title)
      title_valid.append(title)
  except Exception as e:
        print(f"Error processing '{title}': {e}")


  0%|          | 1/400 [00:01<09:12,  1.38s/it]

Error processing 'Active learning (machine learning)': [Errno 2] No such file or directory: '../dataset/lm_texts/Active learning (machine learning).txt'


  0%|          | 2/400 [00:02<09:34,  1.44s/it]

Error processing 'Quantization (signal processing)': [Errno 2] No such file or directory: '../dataset/lm_texts/Quantization (signal processing).txt'


  1%|          | 3/400 [00:04<09:54,  1.50s/it]

Error processing 'Learning to rank': [Errno 2] No such file or directory: '../dataset/lm_texts/Learning to rank.txt'


  2%|▏         | 6/400 [00:07<07:27,  1.13s/it]

Error processing 'Topic model': [Errno 2] No such file or directory: '../dataset/lm_texts/Topic model.txt'


  2%|▏         | 7/400 [00:08<07:52,  1.20s/it]

Error processing 'Computer-assisted translation': [Errno 2] No such file or directory: '../dataset/lm_texts/Computer-assisted translation.txt'


  5%|▌         | 20/400 [00:10<01:18,  4.82it/s]

Error processing 'Graph neural network': [Errno 2] No such file or directory: '../dataset/lm_texts/Graph neural network.txt'
Error processing 'Wikidata': Expecting value: line 1 column 1 (char 0)
Error processing 'Language resource': Expecting value: line 1 column 1 (char 0)
Error processing 'Generative adversarial network': Expecting value: line 1 column 1 (char 0)
Error processing 'Category:Artificial neural networks': Expecting value: line 1 column 1 (char 0)
Error processing 'Example-based machine translation': Expecting value: line 1 column 1 (char 0)
Error processing 'Language model benchmark': Expecting value: line 1 column 1 (char 0)
Error processing 'Rule-based machine translation': Expecting value: line 1 column 1 (char 0)
Error processing 'Canonical correlation': Expecting value: line 1 column 1 (char 0)
Error processing 'Adobe Firefly': Expecting value: line 1 column 1 (char 0)
Error processing 'Concordancer': Expecting value: line 1 column 1 (char 0)
Error processing 'Exxo

 11%|█         | 44/400 [00:10<00:21, 16.92it/s]

Error processing 'Terminology extraction': Expecting value: line 1 column 1 (char 0)
Error processing 'Google Ngram Viewer': Expecting value: line 1 column 1 (char 0)
Error processing 'Sentence extraction': Expecting value: line 1 column 1 (char 0)
Error processing 'Natural Language Toolkit': Expecting value: line 1 column 1 (char 0)
Error processing 'Semantic role labeling': Expecting value: line 1 column 1 (char 0)
Error processing 'Qwen': Expecting value: line 1 column 1 (char 0)
Error processing 'PaLM': Expecting value: line 1 column 1 (char 0)
Error processing 'MuZero': Expecting value: line 1 column 1 (char 0)
Error processing 'Spiking neural network': Expecting value: line 1 column 1 (char 0)
Error processing 'Lotfi A. Zadeh': Expecting value: line 1 column 1 (char 0)
Error processing 'Principal component analysis': Expecting value: line 1 column 1 (char 0)
Error processing 'Ilya Sutskever': Expecting value: line 1 column 1 (char 0)
Error processing 'GPT-4.1': Expecting value: l

 17%|█▋        | 68/400 [00:10<00:09, 34.75it/s]

Error processing 'Block cipher': Expecting value: line 1 column 1 (char 0)
Error processing 'Cognition': Expecting value: line 1 column 1 (char 0)
Error processing 'Vector database': Expecting value: line 1 column 1 (char 0)
Error processing 'Warren Sturgis McCulloch': Expecting value: line 1 column 1 (char 0)
Error processing 'WaveNet': Expecting value: line 1 column 1 (char 0)
Error processing 'Reinforcement learning from human feedback': Expecting value: line 1 column 1 (char 0)
Error processing 'Chinchilla (language model)': Expecting value: line 1 column 1 (char 0)
Error processing 'Mamba (deep learning architecture)': Expecting value: line 1 column 1 (char 0)
Error processing 'Neural scaling law': Expecting value: line 1 column 1 (char 0)
Error processing 'Latent Dirichlet allocation': Expecting value: line 1 column 1 (char 0)
Error processing 'Veo (text-to-video model)': Expecting value: line 1 column 1 (char 0)
Error processing 'Justification (epistemology)': Expecting value: l

 23%|██▎       | 92/400 [00:10<00:05, 56.22it/s]

Error processing 'Empirical risk minimization': Expecting value: line 1 column 1 (char 0)
Error processing 'Semi-supervised learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Hierarchical clustering': Expecting value: line 1 column 1 (char 0)
Error processing 'Herbert A. Simon': Expecting value: line 1 column 1 (char 0)
Error processing 'Coreference': Expecting value: line 1 column 1 (char 0)
Error processing 'Marvin Minsky': Expecting value: line 1 column 1 (char 0)
Error processing 'Autonomous agent': Expecting value: line 1 column 1 (char 0)
Error processing 'Word-sense induction': Expecting value: line 1 column 1 (char 0)
Error processing 'SpaCy': Expecting value: line 1 column 1 (char 0)
Error processing 'Dream Machine (text-to-video model)': Expecting value: line 1 column 1 (char 0)
Error processing 'John McCarthy (computer scientist)': Expecting value: line 1 column 1 (char 0)
Error processing 'International Conference on Learning Representations': Expecting 

 29%|██▉       | 116/400 [00:11<00:03, 76.88it/s]

Error processing 'Softmax function': Expecting value: line 1 column 1 (char 0)
Error processing 'Modular arithmetic': Expecting value: line 1 column 1 (char 0)
Error processing 'Factor analysis': Expecting value: line 1 column 1 (char 0)
Error processing 'Rectifier (neural networks)': Expecting value: line 1 column 1 (char 0)
Error processing 'Natural language generation': Expecting value: line 1 column 1 (char 0)
Error processing 'Gradient descent': Expecting value: line 1 column 1 (char 0)
Error processing 'Kiswahili': Expecting value: line 1 column 1 (char 0)
Error processing 'Text processing': Expecting value: line 1 column 1 (char 0)
Error processing 'Semantics': Expecting value: line 1 column 1 (char 0)
Error processing 'Project Debater': Expecting value: line 1 column 1 (char 0)
Error processing 'Reinforcement learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Feedforward neural network': Expecting value: line 1 column 1 (char 0)
Error processing 'Category:CS

 35%|███▌      | 140/400 [00:11<00:02, 91.98it/s]

Error processing 'Discrete Fourier transform': Expecting value: line 1 column 1 (char 0)
Error processing 'Synthetic data': Expecting value: line 1 column 1 (char 0)
Error processing 'Thesaurus (information retrieval)': Expecting value: line 1 column 1 (char 0)
Error processing 'Word n-gram language model': Expecting value: line 1 column 1 (char 0)
Error processing 'Neural machine translation': Expecting value: line 1 column 1 (char 0)
Error processing 'MIT Technology Review': Expecting value: line 1 column 1 (char 0)
Error processing 'AlphaZero': Expecting value: line 1 column 1 (char 0)
Error processing 'State-space representation': Expecting value: line 1 column 1 (char 0)
Error processing 'IBM Watson': Expecting value: line 1 column 1 (char 0)
Error processing 'Glossary of artificial intelligence': Expecting value: line 1 column 1 (char 0)
Error processing 'Linear regression': Expecting value: line 1 column 1 (char 0)
Error processing 'GPT-4o': Expecting value: line 1 column 1 (cha

 41%|████      | 164/400 [00:11<00:02, 102.58it/s]

Error processing 'Shoggoth': Expecting value: line 1 column 1 (char 0)
Error processing '15.ai': Expecting value: line 1 column 1 (char 0)
Error processing 'Text segmentation': Expecting value: line 1 column 1 (char 0)
Error processing 'Long short-term memory': Expecting value: line 1 column 1 (char 0)
Error processing 'Recurrent neural network': Expecting value: line 1 column 1 (char 0)
Error processing 'AutoGPT': Expecting value: line 1 column 1 (char 0)
Error processing 'OpenAI o3': Expecting value: line 1 column 1 (char 0)
Error processing 'Fine-tuning (machine learning)': Expecting value: line 1 column 1 (char 0)
Error processing 'ArXiv (identifier)': Expecting value: line 1 column 1 (char 0)
Error processing 'Foundation model': Expecting value: line 1 column 1 (char 0)
Error processing 'In-context learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Artificial neural network': Expecting value: line 1 column 1 (char 0)
Error processing 'Loss functions for classif

 47%|████▋     | 188/400 [00:11<00:01, 107.72it/s]

Error processing 'Boosting (machine learning)': Expecting value: line 1 column 1 (char 0)
Error processing 'Statistical inference': Expecting value: line 1 column 1 (char 0)
Error processing 'Distant reading': Expecting value: line 1 column 1 (char 0)
Error processing 'Word2vec': Expecting value: line 1 column 1 (char 0)
Error processing 'ImageNet': Expecting value: line 1 column 1 (char 0)
Error processing 'Ontology (information science)': Expecting value: line 1 column 1 (char 0)
Error processing 'Suno AI': Expecting value: line 1 column 1 (char 0)
Error processing 'Structured prediction': Expecting value: line 1 column 1 (char 0)
Error processing 'Jürgen Schmidhuber': Expecting value: line 1 column 1 (char 0)
Error processing 'Web API': Expecting value: line 1 column 1 (char 0)
Error processing 'OPTICS algorithm': Expecting value: line 1 column 1 (char 0)
Error processing 'Perceptron': Expecting value: line 1 column 1 (char 0)
Error processing 'Quasi-Newton method': Expecting value:

 53%|█████▎    | 212/400 [00:11<00:01, 110.44it/s]

Error processing 'BLOOM (language model)': Expecting value: line 1 column 1 (char 0)
Error processing 'Convolutional neural network': Expecting value: line 1 column 1 (char 0)
Error processing 'International Mathematical Olympiad': Expecting value: line 1 column 1 (char 0)
Error processing 'Punctuation mark': Expecting value: line 1 column 1 (char 0)
Error processing 'Cardinal direction': Expecting value: line 1 column 1 (char 0)
Error processing 'Weight initialization': Expecting value: line 1 column 1 (char 0)
Error processing 'Algorithmic bias': Expecting value: line 1 column 1 (char 0)
Error processing 'Batch normalization': Expecting value: line 1 column 1 (char 0)
Error processing 'Autoregressive model': Expecting value: line 1 column 1 (char 0)
Error processing 'Lexical analysis': Expecting value: line 1 column 1 (char 0)
Error processing 'Conditional random field': Expecting value: line 1 column 1 (char 0)
Error processing 'Christopher D. Manning': Expecting value: line 1 colum

 59%|█████▉    | 236/400 [00:12<00:01, 111.17it/s]

Error processing 'Encoder-decoder model': Expecting value: line 1 column 1 (char 0)
Error processing 'NeurIPS': Expecting value: line 1 column 1 (char 0)
Error processing 'Learning curve (machine learning)': Expecting value: line 1 column 1 (char 0)
Error processing 'Template:Cite journal': Expecting value: line 1 column 1 (char 0)
Error processing 'Policy gradient method': Expecting value: line 1 column 1 (char 0)
Error processing 'Yoshua Bengio': Expecting value: line 1 column 1 (char 0)
Error processing 'Latent semantic analysis': Expecting value: line 1 column 1 (char 0)
Error processing 'Commonsense reasoning': Expecting value: line 1 column 1 (char 0)
Error processing 'Runway (company)': Expecting value: line 1 column 1 (char 0)
Error processing 'Batch learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Pathways Language Model': Expecting value: line 1 column 1 (char 0)
Error processing 'Shallow parsing': Expecting value: line 1 column 1 (char 0)
Error processi

 65%|██████▌   | 260/400 [00:12<00:01, 113.29it/s]

Error processing 'Generative artificial intelligence': Expecting value: line 1 column 1 (char 0)
Error processing 'Pronunciation assessment': Expecting value: line 1 column 1 (char 0)
Error processing 'Facial recognition system': Expecting value: line 1 column 1 (char 0)
Error processing 'MiniMax (company)': Expecting value: line 1 column 1 (char 0)
Error processing 'Deep learning speech synthesis': Expecting value: line 1 column 1 (char 0)
Error processing 'Bitext word alignment': Expecting value: line 1 column 1 (char 0)
Error processing 'List of artificial intelligence companies': Expecting value: line 1 column 1 (char 0)
Error processing 'Gating mechanism': Expecting value: line 1 column 1 (char 0)
Error processing 'Seq2seq': Expecting value: line 1 column 1 (char 0)
Error processing 'Latent diffusion model': Expecting value: line 1 column 1 (char 0)
Error processing 'LaMDA': Expecting value: line 1 column 1 (char 0)
Error processing 'Highway network': Expecting value: line 1 colum

 71%|███████   | 284/400 [00:12<00:01, 114.95it/s]

Error processing 'Learning rate': Expecting value: line 1 column 1 (char 0)
Error processing 'Rule-based machine learning': Expecting value: line 1 column 1 (char 0)
Error processing 'DeepSeek': Expecting value: line 1 column 1 (char 0)
Error processing 'Nat (unit)': Expecting value: line 1 column 1 (char 0)
Error processing 'Vapnik–Chervonenkis theory': Expecting value: line 1 column 1 (char 0)
Error processing 'Doi (identifier)': Expecting value: line 1 column 1 (char 0)
Error processing 'Bootstrapping': Expecting value: line 1 column 1 (char 0)
Error processing 'MMLU': Expecting value: line 1 column 1 (char 0)
Error processing 'Naive Bayes classifier': Expecting value: line 1 column 1 (char 0)
Error processing 'Grammar induction': Expecting value: line 1 column 1 (char 0)
Error processing 'Non-negative matrix factorization': Expecting value: line 1 column 1 (char 0)
Error processing 'Semantic similarity': Expecting value: line 1 column 1 (char 0)
Error processing 'Ideogram (text-to-

 77%|███████▋  | 309/400 [00:12<00:00, 117.35it/s]

Error processing 'Text simplification': Expecting value: line 1 column 1 (char 0)
Error processing 'Apache License': Expecting value: line 1 column 1 (char 0)
Error processing 'Fei-Fei Li': Expecting value: line 1 column 1 (char 0)
Error processing 'Andrew Ng': Expecting value: line 1 column 1 (char 0)
Error processing 'Template talk:Artificial intelligence navbox': Expecting value: line 1 column 1 (char 0)
Error processing 'John von Neumann': Expecting value: line 1 column 1 (char 0)
Error processing 'K-nearest neighbors algorithm': Expecting value: line 1 column 1 (char 0)
Error processing 'Transformer (machine learning model)': Expecting value: line 1 column 1 (char 0)
Error processing 'International Conference on Machine Learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Normalization (machine learning)': Expecting value: line 1 column 1 (char 0)
Error processing 'ChatGPT': Expecting value: line 1 column 1 (char 0)
Error processing 'Stephen Grossberg': Expecting

 83%|████████▎ | 333/400 [00:12<00:00, 115.83it/s]

Error processing 'Automated essay scoring': Expecting value: line 1 column 1 (char 0)
Error processing 'Logistic regression': Expecting value: line 1 column 1 (char 0)
Error processing 'Ensemble learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Temporal difference learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Stochastic gradient descent': Expecting value: line 1 column 1 (char 0)
Error processing 'Joseph Weizenbaum': Expecting value: line 1 column 1 (char 0)
Error processing 'Frank Rosenblatt': Expecting value: line 1 column 1 (char 0)
Error processing 'Kling (text-to-video model)': Expecting value: line 1 column 1 (char 0)
Error processing 'Corpus linguistics': Expecting value: line 1 column 1 (char 0)
Error processing 'Computational learning theory': Expecting value: line 1 column 1 (char 0)
Error processing 'Text mining': Expecting value: line 1 column 1 (char 0)
Error processing 'Action selection': Expecting value: line 1 column 1 (char

 89%|████████▉ | 357/400 [00:13<00:00, 113.85it/s]

Error processing 'Electrochemical RAM': Expecting value: line 1 column 1 (char 0)
Error processing 'Pachinko allocation': Expecting value: line 1 column 1 (char 0)
Error processing 'Question answering': Expecting value: line 1 column 1 (char 0)
Error processing 'Natural gas': Expecting value: line 1 column 1 (char 0)
Error processing 'Curriculum learning': Expecting value: line 1 column 1 (char 0)
Error processing 'Cognitive linguistics': Expecting value: line 1 column 1 (char 0)
Error processing 'Voice user interface': Expecting value: line 1 column 1 (char 0)
Error processing 'Dan Jurafsky': Expecting value: line 1 column 1 (char 0)
Error processing 'FrameNet': Expecting value: line 1 column 1 (char 0)
Error processing 'Test set': Expecting value: line 1 column 1 (char 0)
Error processing 'List of chatbots': Expecting value: line 1 column 1 (char 0)
Error processing 'Cliff Shaw': Expecting value: line 1 column 1 (char 0)
Error processing 'Variational autoencoder': Expecting value: li

 95%|█████████▌| 381/400 [00:13<00:00, 113.21it/s]

Error processing 'Probabilistic context-free grammar': Expecting value: line 1 column 1 (char 0)
Error processing 'Demis Hassabis': Expecting value: line 1 column 1 (char 0)
Error processing 'Part-of-speech tagging': Expecting value: line 1 column 1 (char 0)
Error processing 'Dimensionality reduction': Expecting value: line 1 column 1 (char 0)
Error processing 'Overfit': Expecting value: line 1 column 1 (char 0)
Error processing 'LeNet': Expecting value: line 1 column 1 (char 0)
Error processing 'Semantic decomposition (natural language processing)': Expecting value: line 1 column 1 (char 0)
Error processing 'Artificial human companion': Expecting value: line 1 column 1 (char 0)
Error processing 'Artificial intelligence and copyright': Expecting value: line 1 column 1 (char 0)
Error processing 'Neural network (machine learning)': Expecting value: line 1 column 1 (char 0)
Error processing 'Multilayer perceptron': Expecting value: line 1 column 1 (char 0)
Error processing 'S2CID (identif

100%|██████████| 400/400 [00:13<00:00, 29.52it/s] 

Error processing 'Residual neural network': Expecting value: line 1 column 1 (char 0)
Error processing 'BabelNet': Expecting value: line 1 column 1 (char 0)
Error processing 'Explicit semantic analysis': Expecting value: line 1 column 1 (char 0)
Error processing 'Data compression': Expecting value: line 1 column 1 (char 0)
Error processing 'Allen Newell': Expecting value: line 1 column 1 (char 0)
Error processing 'Timeline of artificial intelligence': Expecting value: line 1 column 1 (char 0)
Error processing 'Graphical model': Expecting value: line 1 column 1 (char 0)
Error processing 'Document-term matrix': Expecting value: line 1 column 1 (char 0)
Error processing 'DBpedia': Expecting value: line 1 column 1 (char 0)
Error processing 'Mixture of experts': Expecting value: line 1 column 1 (char 0)
Error processing 'Regularization (mathematics)': Expecting value: line 1 column 1 (char 0)
Error processing 'International Joint Conference on Artificial Intelligence': Expecting value: line




In [24]:
len(title_valid)

0

In [25]:
from llama_index.core import Settings

Settings.chunk_size = 256

In [28]:
%%time
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents_lm = SimpleDirectoryReader("dataset/lm_texts").load_data()
vector_index_lm = VectorStoreIndex.from_documents(documents_lm, embed_model="local:models/bge-small-en-v1.5", similarity_top_k=1)




CPU times: user 3.93 s, sys: 116 ms, total: 4.04 s
Wall time: 3.56 s


#### Save Vector Index Locally

In [29]:
vector_index_lm.storage_context.persist(persist_dir = "models/vector_index_lm_ext_bge")

#### Load local vector index

In [30]:
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir = "models/vector_index_lm_ext_bge")
vector_index_lm = load_index_from_storage(storage_context, embed_model = embed_model, similarity_top_k = 1)

Loading llama_index.core.storage.kvstore.simple_kvstore from models/vector_index_lm_ext_bge/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from models/vector_index_lm_ext_bge/index_store.json.


In [35]:
query_engine_lm = vector_index_lm.as_query_engine(llm=llm)

### Evaluation
**Manual Evlauation**

In [36]:
%env CUDA_LAUNCH_BLOCKING=1

env: CUDA_LAUNCH_BLOCKING=1


In [38]:
query = "What is Claude3?"

print(llm.complete(query))

for engine in [query_engine, query_engine_lm]:
    print("========")
    response = engine.query(query)

    resp_text = response.response
    resp_nodes = response.source_nodes

    print(resp_text)
    print("===Sources====")
    for node in resp_nodes:
        print(node)
    print("=====\n\n")

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


### Evaluate- using GPT call
**Generate query from chunks**

In [39]:
import random

nodes = list(vector_index_lm.docstore.docs.values())
nodes_sampled = [node.text for node in random.sample(nodes, 100)]

In [40]:
def gen_qa(context):
  prompt = f"""
  Given the context, generate the question-answer set

  [Rules]
  - The question and answer must be sufficiently related.
  - The question should be answerable by referring to the content of the context
  - The question must be in one sentence
  - The answer must be three sentences or fewer

  [Example A]
  [context]
  Perplexity
  The most commonly used used of a language model's performance is its perplexity on a given text corpus. Perplexity is a measure of how well a model performs.
  Because language models may overfit to their training data, models are usually evaluatled by their perplexity on a test set of unseen data.

  [question]
  What is perplexity?

  [answer]
  Perplexity is a metric for assessing how effectivly a language model can forcast the contents of a dataset, commonly used as a measure of a languague model's performance

  [Task]
  [context]
  {context}

  [question]
  """
  return get_response(prompt)


In [41]:
def parse_qa(qa):
  question, answer = qa.split("[answer]")
  question = question.strip()
  answer = answer.strip()

  return question, answer

In [42]:
context = nodes[0]
qa = gen_qa(context)
q, a = parse_qa(qa)

In [43]:
context

TextNode(id_='91815665-730c-4ec3-9ae6-af8730821119', embedding=None, metadata={'file_path': '/content/dataset/lm_texts/1.58-bit large language model.txt', 'file_name': '1.58-bit large language model.txt', 'file_type': 'text/plain', 'file_size': 3735, 'creation_date': '2025-06-22', 'last_modified_date': '2025-06-22'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='6189095e-2d97-433a-9588-65b5e77e4197', node_type='4', metadata={'file_path': '/content/dataset/lm_texts/1.58-bit large language model.txt', 'file_name': '1.58-bit large language model.txt', 'file_type': 'text/plain', 'file_size': 3735, 'creation_date': '2025-06-22', 'last_modified_date': '2025-06-22'}, hash='7d61d7daf3ca50619b1af6ce170732e771a981a

In [44]:
qa

'What is a 1.58-bit Large Language Model (1.58-bit LLM)? \n\n[answer]\nA 1.58-bit Large Language Model (1.58-bit LLM) is a version of a transformer large language model that utilizes only three values (-1, 0, and +1) for its weights, enabling it to replace multiplications with additions and reduce storage memory usage. This theoretical restriction aims to improve efficiency without compromising end-task performance and perplexity.'

In [45]:
q, a

('What is a 1.58-bit Large Language Model (1.58-bit LLM)?',
 'A 1.58-bit Large Language Model (1.58-bit LLM) is a version of a transformer large language model that utilizes only three values (-1, 0, and +1) for its weights, enabling it to replace multiplications with additions and reduce storage memory usage. This theoretical restriction aims to improve efficiency without compromising end-task performance and perplexity.')

In [46]:
import pandas as pd
from tqdm import tqdm

qa_pairs = []
questions = []
answers = []
contexts = []
for node in tqdm(nodes_sampled[:]):
    pass
    qa = gen_qa(node)
    try:
        q, a = parse_qa(qa)

        contexts.append(node)
        qa_pairs.append(qa)
        questions.append(q)
        answers.append(a)
    except Exception as exp:
        print(exp)

  3%|▎         | 3/100 [00:04<02:28,  1.53s/it]

too many values to unpack (expected 2)


100%|██████████| 100/100 [01:51<00:00,  1.12s/it]


In [47]:
df_qa = pd.DataFrame({'context': contexts, 'qa_pairs': qa_pairs, 'question': questions, 'answer': answers})

In [48]:
df_qa.to_csv('dataset/ml_qa_raw.csv')
df_qa[:10].to_csv('dataset/ml_qa_test.csv')
df_qa[10:].to_csv('dataset/ml_qa_train.csv')

In [49]:
df_qa

Unnamed: 0,context,qa_pairs,question,answer
0,SNNs are theoretically more powerful than so c...,What are some limitations of using Spike Neura...,What are some limitations of using Spike Neura...,SNNs have training issues and high hardware re...
1,)\n \n (\n 1\n ...,What is the formula for calculating the probab...,What is the formula for calculating the probab...,The formula is the product of the probability ...
2,K\n )\n =\n \n {...,"What is the formula for T(n, K) in the given c...","What is the formula for T(n, K) in the given c...","The formula for T(n, K) is defined as follows:..."
3,p\n (\n ¬\n ...,What does the equation represent in the contex...,What does the equation represent in the context?,The equation represents the probability of not...
4,r\n \n t\n \n ...,What are the implications of having a discount...,What are the implications of having a discount...,A discount factor in Q-function learning deter...
...,...,...,...,...
94,β\n )\n +\n \n e...,What is the goal of the researchers in the con...,What is the goal of the researchers in the con...,The researchers' goal is to estimate the funct...
95,K)=\left\{{\begin{array}{cc}2^{N}&K\geq N\\2\s...,"What happens to the value of T(N, K)/2^N when ...","What happens to the value of T(N, K)/2^N when ...","When N is less than or equal to 2K, the value ..."
96,i\n \n \n ...,"What does the symbol ""i − x̄"" represent in the...","What does the symbol ""i − x̄"" represent in the...","In the context provided, ""i - x̄"" represents t..."
97,A feature vector \n \n \n \n \...,What is a feature vector in the context of doc...,What is a feature vector in the context of doc...,A feature vector in this context is a histogra...


#### Generate query of AI models

In [50]:
new_models = ["Claude 3", "Germini 1.5", "Mixtral8x7B", "Llama3", "PanGu"]

In [53]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("dataset/lm_texts").load_data()

vector_index = VectorStoreIndex.from_documents(documents)
vector_index.storage_context.persist(persist_dir="models/vector_index_lm_text_chatGPT")

In [54]:
from llama_index.core.retrievers import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=vector_index,
    similarity_top_k=1,
)

In [55]:
model_contexts = []
for model in new_models:
    result = retriever.retrieve(model)[0].text
    model_contexts.append(result)

model_contexts

["Searching for archival traces of James Hemings, Thomas Jefferson's enslaved chef, Klein juxtaposes visualisations of his presence with Jefferson's own charts and tables as the basis for a discussion of data visualisation as it relates to the construction of race.\nThe COST Action 'Distant Reading for European Literary History' is a European networking project bringing together scholars interested in corpus building, quantitative text analysis, and European literary history. It aims to create a network of researchers jointly developing the distant reading resources and methods necessary to change the way European literary history is written. The objectives of the project include coordinating the creation of a multilingual European Literary Text Collection (ELTeC) containing digital full-texts of novels in different European languages.\n\n\n== See also ==\nClose reading\n\n\n== References ==",
 "A recent solution to physical limitations of technology comes from GeriJoy, in the form of 

In [56]:
def gen_qa_new_models(context, model):
    prompt = f"""
    Given the context and model, generate the question-answer set

    [Rules]
    - Generate questions about the model, and create answers to them based on the given context.
    - Create a simple one-sentence question focusing on one aspect of the model's characteristics.
    - Keep the answer to no more than two sentences.

    [Example A]
    [context]
    GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021.[1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.[2]
    [question]
    When is the initial relase of GPT-J?

    [answer]
    The initial release of GPT-J, an open-source large language model developed by EleutherAI, was in June 2021.

    [Task]
    [context]
    {context}
    [model]
    {model}
    [question]
    """

    return get_response(prompt)

In [57]:
qa_pairs = []
questions = []
answers = []
contexts = []
for model, context in zip(new_models, model_contexts):
    qa = gen_qa_new_models(model, context)
    try:
        q, a = parse_qa(qa)

        contexts.append(context)
        qa_pairs.append(qa)
        questions.append(q)
        answers.append(a)
    except Exception as exp:
        print(exp)


In [58]:
df_qa_new_model = pd.DataFrame({'context': contexts, 'qa_pairs': qa_pairs, 'question': questions, 'answer': answers})
df_qa_new_model.to_csv("dataset/ml_qa_new_models.csv")

In [59]:
df_qa_new_model

Unnamed: 0,context,qa_pairs,question,answer
0,Searching for archival traces of James Hemings...,What is the focus of the COST Action 'Distant ...,What is the focus of the COST Action 'Distant ...,The project aims to create a network of resear...
1,A recent solution to physical limitations of t...,What is Germini 1.5's solution to physical lim...,What is Germini 1.5's solution to physical lim...,"Germini 1.5, developed by GeriJoy, offers virt..."
2,"Ma, Shuming; Wang, Hongyu; Huang, Shaohan; Zha...",What is the title of the technical report rela...,What is the title of the technical report rela...,The technical report related to the BitNet b1....
3,"""LLM-Based Edge Intelligence: A Comprehensive ...",What is the title of the research paper introd...,What is the title of the research paper introd...,The title of the research paper introducing th...
4,A recent solution to physical limitations of t...,What is the main purpose of GeriJoy's virtual ...,What is the main purpose of GeriJoy's virtual ...,GeriJoy's virtual pets for seniors provide a w...


#### Generate Response from the query

In [60]:
!mkdir results

mkdir: cannot create directory ‘results’: File exists


In [61]:
from pathlib import Path
data_path = Path('results/inference_result')
eval_data_path = Path("results/eval_result")


if not data_path.exists():
  Path.mkdir(data_path)

if not eval_data_path.exists():
  Path.mkdir(eval_data_path)

In [62]:
df_test = df_qa[:10]
df_test

Unnamed: 0,context,qa_pairs,question,answer
0,SNNs are theoretically more powerful than so c...,What are some limitations of using Spike Neura...,What are some limitations of using Spike Neura...,SNNs have training issues and high hardware re...
1,)\n \n (\n 1\n ...,What is the formula for calculating the probab...,What is the formula for calculating the probab...,The formula is the product of the probability ...
2,K\n )\n =\n \n {...,"What is the formula for T(n, K) in the given c...","What is the formula for T(n, K) in the given c...","The formula for T(n, K) is defined as follows:..."
3,p\n (\n ¬\n ...,What does the equation represent in the contex...,What does the equation represent in the context?,The equation represents the probability of not...
4,r\n \n t\n \n ...,What are the implications of having a discount...,What are the implications of having a discount...,A discount factor in Q-function learning deter...
5,== Algorithm ==\n\nAfter \n \n \n \n ...,What is the role of the discount factor in the...,What is the role of the discount factor in the...,"The discount factor, represented by gamma, det..."
6,".\n .\n ,\n (\n \n...",What is the Decision Transformer approach in r...,What is the Decision Transformer approach in r...,The Decision Transformer approach models reinf...
7,"In a similar vein, Stephen Marche focuses on t...",What are some critiques of distant reading in ...,What are some critiques of distant reading in ...,Critiques of distant reading in computational ...
8,∑\n (\n \n ...,What does the symbol ∑ represent in the contex...,What does the symbol ∑ represent in the contex...,The symbol ∑ represents the sum of a series of...
9,"Expert systems, that were popular in the 1980s...",How did AI systems in the 1980s differ from mo...,How did AI systems in the 1980s differ from mo...,Expert systems in the 1980s were limited in th...


In [63]:
df_test = df_qa[:10]
llm_answer = []
qe_answer = []

for idx, row in tqdm(df_test.iterrows()):
    query = row['question']
    resp_llm = llm.complete(query)
    resp_qe = query_engine_lm.query(query)
    llm_answer.append(resp_llm)
    qe_answer.append(resp_qe)

0it [00:00, ?it/s]


RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


In [64]:
df_eval = pd.concat([df_test, df_test])
df_eval['answer_llm'] = llm_answer + qe_answer
df_eval['model'] = ['llm']*len(llm_answer) + ['qe']*len(qe_answer)
df_eval.to_csv(f"{data_path}/qa_inferenced.csv", index=False)

ValueError: Length of values (0) does not match length of index (20)

In [65]:
llm_answer = []
qe_answer = []

for idx, row in tqdm(df_qa_new_model.iterrows()):
    query = row['question']
    resp_llm = llm.complete(query)
    resp_qe = query_engine_lm.query(query)
    llm_answer.append(resp_llm)
    qe_answer.append(resp_qe)


0it [00:00, ?it/s]


RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


In [66]:
df_eval = pd.concat([df_qa_new_model, df_qa_new_model])
df_eval['answer_llm'] = llm_answer + qe_answer
df_eval['model'] = ['llm']*len(llm_answer) + ['qe']*len(qe_answer)
df_eval.to_csv(f"{data_path}/qa_models_inferenced.csv", index=False)

ValueError: Length of values (0) does not match length of index (10)