# ArChat Evaluation

## QA generation

We will keep just the most 10 smallest articles in order of pages

In [1]:
import os
from PyPDF2 import PdfReader

def keep_smallest_articles(directory, top_n=10):
    if not os.path.exists(directory):
        raise FileNotFoundError(f"The directory '{directory}' does not exist.")

    pdf_page_counts = {}

    for file_name in os.listdir(directory):
        if file_name.endswith('.pdf'):
            file_path = os.path.join(directory, file_name)
            try:
                reader = PdfReader(file_path)
                pdf_page_counts[file_name] = len(reader.pages)
            except Exception as e:
                print(f"Error reading {file_name}: {e}")

    if not pdf_page_counts:
        raise ValueError("No PDF files found in the directory.")

    # Sort articles by the number of pages (ascending) and select the top N
    sorted_articles = sorted(pdf_page_counts.items(), key=lambda x: x[1])
    smallest_articles = [file for file, pages in sorted_articles[:top_n]]

    # Remove files not in the smallest_articles list
    for file_name in os.listdir(directory):
        if file_name.endswith('.pdf') and file_name not in smallest_articles:
            file_path = os.path.join(directory, file_name)
            try:
                os.remove(file_path)
                print(f"Removed {file_name}")
            except Exception as e:
                print(f"Error removing {file_name}: {e}")

In [2]:
directory = "data"
keep_smallest_articles(directory)

Let's generate our responses from our model

In [1]:
import csv
from prompts import *
from helpers import *

def generate_responses(input_csv, output_csv):
    llm = OllamaLLM(model="deepseek-r1:1.5b")

    rows = []
    with open(input_csv, 'r', newline='', encoding='utf-8') as infile:
        reader = csv.reader(infile, delimiter=';')
        for row in reader:
            directoryfile = row[0]
            question = row[1]

            # Load PDF to vector store and retrieve relevant documents
            try:
                vectorstore = load_pdf_to_vectorstore(directoryfile)
                relevent_docs = retrieve_from_vectorstore(vectorstore, query=question)
                prompt = ChatPromptTemplate.from_template(question_answering_prompt)
                prompt_with_context = prompt.format(context=relevent_docs, question=question)

                # Get response from the model
                response = llm.invoke(prompt_with_context)
                print("Question: ", question, "response: ", response)
                row.append(response.strip())
            except Exception as e:
                row.append(f"Error processing file: {e}")
            
            rows.append(row)

    # Write results back to the CSV
    with open(output_csv, 'w', newline='', encoding='utf-8') as outfile:
        writer = csv.writer(outfile, delimiter=';')
        writer.writerows(rows)

    print(f"Responses have been appended to {output_csv}")

In [2]:
input_csv = "ours_QA.csv"
output_csv = "ours_QA_responses.csv"
generate_responses(input_csv, output_csv)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/4.13k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/402 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Question:  What is the main contribution of the paper? response:  <think>
Okay, so I'm trying to figure out how to answer this question about the main contributions of the paper. The user provided a detailed abstract from the paper "A Multilingual Deep Learning Approach to Sequence-to-Sequence Translation." Let me read through it again carefully.

The paper introduces a multilingual deep neural network for translation. They mention using a multilayered Long Short-Term Memory (LSTM) network. I remember that LSTMs are good at handling sequential data, which is why they're used here for both input and output sequences in translation tasks.

Let me break down the key points from the abstract:

1. **Multilingual Approach**: The authors trained a single model across multiple languages by using two LSTM networks—one for the input sentence and another for the output sentence. This helps the model generalize better since it's learning common patterns that can be applied across different languag

Multiple definitions in dictionary at byte 0xc1b for key /F16


Question:  What are the performance outcomes of the proposed method on CPU-based inference? response:  <think>
Okay, so I'm trying to figure out the performance outcomes of the proposed method on CPU-based inference. From what I understand, they developed a method called "LlamaX," which is an optimization for training large language models with quantization techniques, specifically in the context of AI models.

I remember that quantization refers to reducing the precision of weights and activations in neural networks to save memory and computation. This makes models faster and more efficient. They mention something about Fp8 quantization, which I think stands for floating-point precision 8, and they're using this with LlamaX to compress both parameters and activations.

The key findings from their experiments show that when they used LlamaX with Fp8 quantization on CPU hardware, the models performed better than those trained without compression. Specifically, they achieved higher accur

Multiple definitions in dictionary at byte 0xc1b for key /F16


Question:  What problem does the paper address, and what is the proposed solution? response:  <think>
Alright, I need to figure out what the paper addresses and how it proposes solving that problem. Let me start by reading through the provided information carefully.

The user has shared a detailed conversation history where they asked about their research paper's topic and their own thoughts. Now, in response, they provided an overview of the paper's content, including its introduction, methodology, experiments, results, and conclusion. They've also included some background on weighted low-rank approximations.

Okay, so from this, it seems like the paper is about solving a specific type of optimization problem where we have to find a low-rank approximation under certain constraints, particularly when there are weights involved. The authors propose an EM algorithm for handling these cases and compare it with other methods like sum-squared error or logistic regression.

Wait, actually, i

Multiple definitions in dictionary at byte 0xc1b for key /F16


Question:  How does the weighted low-rank approximation differ from the unweighted case? response:  <think>
Okay, so I'm trying to figure out how the weighted low-rank approximation differs from the unweighted one. Let me start by recalling what each method is about.

In the unweighted case, we have a matrix X that's supposed to be a rank-q approximation of A. The goal is to minimize the squared error ||X - A||²_F over all rank-q matrices X. The key equations here involve solving linear systems and pseudo-inverses because the problem is convex but doesn't have a unique solution, leading to critical points which include both global minima and saddle points.

Now, when weights are introduced, we're looking at a weighted version where each entry of A or X gets multiplied by a weight matrix W. So, instead of ||X - A||²_F, it's ||W ⊗ (X - A)||²_F or something like that? Wait, actually, let me get this straight.

The error term is ||W ⊗ X - A||²_F, where ⊗ denotes the Kronecker product. So e