In [2]:
from tqdm import tqdm

In [None]:
# Initialize model
from mini.inference import MINI
mini = MINI(model_name='meta-llama/Llama-3.2-3B-Instruct')

In [4]:
sample_queries = [
    # unix l1
    "What does FLOSS stand for and what are its key freedoms?",
    "What is the difference between GNU GPL and BSD licenses?",
    "What is the historical significance of AT&T in the development of Unix?",
    "What is the principle of modularity in Unix systems?",
    "How does the Unix principle of privilege separation enhance security?",
    #unix lecture 5
    "What are the main files used for storing user account information in Unix systems?",
    "How are users identified during login compared to file and process ownership in Unix?",
    "What are the three types of accounts that can be found in Unix systems?",
    "What is the purpose of the /etc/shadow file in Unix?",
    "How does the chmod command modify file permissions in numeric mode?",
    #unix lecture 10
    "What is AWK used for?",
    "How does AWK identify records and fields in a text file?",
    "What is the purpose of the BEGIN and END sections in an AWK script?",
    "Which special variable in AWK represents the current line being processed?",
    "What is the default behavior of an AWK script when no condition is specified?",
    #aml bayes
    "What is the Bayes classifier and its optimality?",
    "How is risk defined for a classification rule in Bayes theory?",
    "What is the 0-1 loss function in classification?",
    "How does the kernel density estimator work in empirical Bayes methods?",
    "What are the remarks on the k-nearest neighbor (kNN) method?",
    #aml logistic
    "What is logistic regression based on?",
    "What are some applications of logistic classification?",
    "What estimation methods are used for logistic regression?",
    "Why is the logistic function used to model probabilities in logistic regression?",
    "What does the IRLS algorithm aim to compute in logistic regression?",
    #aml evaluation
    "What are the key measures for evaluating classifiers in a one-class approach?",
    "How is precision calculated in the evaluation of classifiers?",
    "What does FDR represent in classification evaluation?",
    "What is the purpose of k-fold cross-validation in classification?",
    "Why should resubstitution estimators not be used for comparing classifiers?"
]

expected_responses = [
    #unix l1
    "FLOSS stands for Free Libre Open Source Software, emphasizing freedoms to use, analyze, modify, and redistribute software along with access to source code.",
    "GNU GPL ensures users' freedoms are protected via copyleft, requiring modifications to be distributed under the same license, while BSD licenses allow proprietary use by imposing fewer restrictions.",
    "AT&T developed the first Unix system in 1969 and distributed it freely to universities, fostering significant advancements such as the TCP/IP stack at Berkeley.",
    "The principle of modularity in Unix emphasizes creating simple, standalone components that can be combined into complex workflows, enabling flexibility and maintainability.",
    "Privilege separation in Unix enhances security by assigning minimal required privileges to processes, using pseudo-users for services, and separating different file types into distinct directories.",
    #unix lecture 5
    "/etc/passwd provides user account information, /etc/shadow stores encrypted passwords and aging limits, and /etc/group specifies system groups.",
    "During login, users are prompted for their names, but file and process ownership use numerical IDs (UIDs) and GIDs.",
    "The three types of accounts in Unix systems are root (UID 0, GID 0) with unlimited privileges, regular users, and system accounts.",
    "/etc/shadow stores encrypted passwords and aging limits, is not readable by regular users, and enhances security by moving encrypted passwords out of the publicly readable /etc/passwd file.",
    "In numeric mode, chmod uses numbers (4 for read, 2 for write, and 1 for execute) to define permissions for user, group, and others.",
    #unix lecture 10
    "AWK is used for analyzing text files or text streams, treating them as databases where records are identified with lines and fields with blank-separated words.",
    "AWK identifies records with lines and fields with blank-separated words by default, but the record separator (RS) and field separator (FS) variables can be redefined.",
    "The BEGIN section in an AWK script is used for instructions executed before processing any input, and the END section is for instructions executed after processing all input.",
    "The special variable $0 in AWK represents the current line being processed.",
    "If no condition is specified in an AWK script, the default behavior is to print the current line to the standard output.",
    #aml bayes
    "The Bayes classifier minimizes the Bayes risk R(d) by selecting the decision rule d∗(x) = argmaxiπip(x|i). This makes it optimal as it achieves the lowest possible error for a given distribution.",
    "The risk for a classification rule is defined as R(d) = EX,Y[l(Y, d(X))], where (X, Y) follows the joint distribution PX,Y. For the 0-1 loss function, it represents the probability of a misclassification.",
    "The 0-1 loss function is defined as l0-1(i, j) = I(i ≠ j), where I is an indicator function. It incurs a loss of 1 for incorrect decisions and 0 otherwise.",
    "The kernel density estimator approximates the density p(x) using a kernel function K and a smoothing parameter hn. It is calculated as ˆpn(x) = (1 / nhn) ∑ K((x − Xi) / hn), where the sum is over all data points Xi.",
    "The kNN method classifies an observation based on the most frequent class among its k nearest neighbors. Larger values of k reduce variance, but small k values like 1 or 3 are often used. Spatial indexing can address its O(n) time complexity.",
    #aml logistic
    "Logistic regression is based on fitting a logistic regression model to the data, modeling the dependence of the response variable π(x) on predictors.",
    "Logistic classification is used in applications like displaying adverts on web pages with keywords as predictors and reliability scoring of bank clients based on their profile.",
    "Estimation methods for logistic regression include Maximum Likelihood (ML), Blyth estimator, Regularized estimators (Lasso and Ridge), and Iteratively Reweighted Least Squares (IRLS).",
    "The logistic function is used because it maps values to a range between 0 and 1 and provides interpretable odds ratios, such as π(x)/(1-π(x)).",
    "The IRLS algorithm iteratively computes the Weighted Least Squares estimator for modified responses to find the parameters of the logistic regression model.",
    #aml evaluation
    "In a one-class approach, key measures include False Discovery Rate (FDR), True Positive Rate (TPR), precision, and recall, focusing on the performance for the distinguished class.",
    "Precision is calculated as the intersection of true positives (t ∩ ˆt) divided by the total predicted positives (ˆt).",
    "FDR, or False Discovery Rate, represents the proportion of false positives among all predicted positives and is equal to 1 minus precision.",
    "K-fold cross-validation is used to evaluate a classifier's performance by dividing data into K parts, training on K-1 parts, and testing on the remaining part to estimate error and reduce bias.",
    "Resubstitution estimators should not be used for comparing classifiers as they are overly optimistic, especially when classifiers differ in the number of parameters, leading to misleading conclusions."
]

In [None]:
dataset = []

for query,reference in tqdm(zip(sample_queries,expected_responses)):
    answer, sources = mini.query(query)
    sources = [source['excerpt'] for source in sources]
    dataset.append({
        'question': query,
        'retrieved_contexts': sources,
        'response': answer,
        'reference': reference
    })

In [6]:
import json
with open('mini_dataset.json', 'w') as fout: 
    json.dump(dataset, fout)