In [8]:

from datasets import load_dataset

dataset = load_dataset("lumolabs-ai/Lumo-Fart-DS-Instruct")

print("Dataset structure:")
print(dataset)

Dataset structure:
DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'chunk'],
        num_rows: 475152
    })
    test: Dataset({
        features: ['question', 'answer', 'chunk'],
        num_rows: 25008
    })
})


In [9]:
import pandas as pd

df = pd.DataFrame(dataset['train'])
pd.set_option('display.max_colwidth', None)  # Print the whole question and answer


def printQuestions(df, n):
    print(f"Printing {n} questions from dataframe of size {len(df)}")
    for idx, row in enumerate(df.itertuples(index=False), start=1):
        if idx > n: break
        print(f"Question {idx}:\n{row.question}\n")
        print(f"Answer {idx}:\n{row.answer}\n")
        print("-" * 80)

def filter_questions_by_all_keywords(df, keywords):
    filtered_df = df.copy()
    for kw in keywords:
        filtered_df = filtered_df[filtered_df['question'].str.contains(kw, case=False, na=False)]
    return filtered_df

# printQuestions(df,3)




In [10]:
filtered_df = filter_questions_by_all_keywords(df, ['anchor', ])
printQuestions(filtered_df,10)

Printing 10 questions from dataframe of size 7636
Question 1:
What does 'anchor.web3.sendAndConfirmTransaction' do?

Answer 1:
The `anchor.web3.sendAndConfirmTransaction` function is used to send a transaction to the Solana blockchain and wait for its confirmation. This function takes three parameters:

1. **connection**: This is the connection object to the Solana cluster, which allows interaction with the blockchain.
2. **transaction**: This is the transaction object that contains the instructions to be executed on the blockchain.
3. **signers**: This is an array of signers (usually the accounts that need to sign the transaction) that are required to authorize the transaction.

Once the transaction is sent, the function will wait for the transaction to be confirmed, meaning it will check that the transaction has been processed and included in a block on the Solana blockchain. This ensures that the operation has been successfully completed before proceeding with further actions in the

In [11]:
qa = [
{
"question": "You are building a Solana program using the Pinocchio library. If you want to define a Solana program entrypoint that does not parse the full input right away (to save compute units until you actually need to parse), which macro should you use and why?",
"answer": "You should use the `lazy_program_entrypoint!` macro, because it only wraps the raw program input into an `InstructionContext` and defers parsing of accounts and instruction data until those values are explicitly requested."
},

{
"question": "You are building a Solana program using the Pinocchio library. How does enabling the `std` feature in `pinocchio` affect the behavior of the `msg!` macro compared to a `no_std` setup?",
"answer": "Enabling the `std` feature allows `msg!` to perform Rust’s built-in string formatting (e.g., `msg!(\"Hello {}\", var)`). In a `no_std` environment, formatting is not supported by `msg!`, so it can only output literal messages or fixed data."
},

{
"question": "You are building a Solana program using the Pinocchio library. How does `pinocchio` convert the raw byte array from the runtime into the separate `program_id`, `accounts`, and `instruction_data` slices when using `entrypoint!`?",
"answer": "It uses zero-copy deserialization. The SVM loader provides a serialized byte array, and `pinocchio` reads that directly into `program_id` (as a `Pubkey`), `accounts` (as slices of `AccountInfo`), and the remainder as `instruction_data`, without extra allocations."
},

{
"question": "You are building a Solana program using the Pinocchio library and want to display a base integer (lamports) as if it has nine decimal places (one SOL). Which formatting option should you use in `pinocchio-log` to reflect this accurately?",
"answer": "Use the `Precision` attribute (e.g. `\"{:.9}\"`) for the integer value so that it is formatted as if it has nine decimal digits. For example, `1_000_000_000` would appear as `1.000000000` if you choose `precision=9`."
},

{
"question": "You are building a Solana program using Pinocchio. In what way does `pinocchio::entrypoint!` typically consume fewer compute units than the standard `solana-program` entrypoint macro?",
"answer": "`pinocchio::entrypoint!` avoids additional copies or allocations by processing the input in a zero-copy manner, leading to lower compute unit usage compared to the generic approach in `solana-program`."
},


{
"question": "You are building a Solana program using Pinocchio and you need advanced string formatting but want to keep compute usage as low as possible. What is the recommended solution?",
"answer": "Use the `pinocchio-log` crate’s `log!` macro. It provides a custom, lightweight formatting system that supports basic string and numeric formatting at a lower compute cost than Rust’s built-in `format_args!`."
},


{
"question": "In `@solana/kit`, how are addresses handled differently compared to web3.js 1.x, and why is there not a `PublicKey` class?",
"answer": "Addresses are plain strings typed as `Address`, and the `PublicKey` class is removed. This improves tree-shakability and reduces bundle size in Kit."
},
{
"question": "When using `@solana/kit` in environments without native Web Crypto Ed25519 support, what should a developer do to ensure signing still works?",
"answer": "They should install and call `install()` from `@solana/webcrypto-ed25519-polyfill`, which mimics the Ed25519 Web Crypto API functionality in unsupported environments."
},
{
"question": "How can you estimate and limit compute unit usage for a transaction in `@solana/kit`?",
"answer": "Estimate compute units using `getComputeUnitEstimateForTransactionMessage` and prepend a `SetComputeUnitLimitInstruction` using `prependTransactionMessageInstruction`."
},


{
"question": "When migrating to `@solana/kit`, how can developers convert legacy web3.js `VersionedTransaction` objects?",
"answer": "Use `fromVersionedTransaction()` from `@solana/compat` to transform the legacy transaction object into a Kit-compatible message for signing or sending."
},

{
"question": "When using the Solana Gill library, how can you pass a custom AbortSignal to a Solana RPC request in gill?",
"answer":
    "You can pass a JavaScript AbortController signal to the .send() method of any RPC call. For example, by creating an AbortController and then calling rpc.getSlot().send({ abortSignal: controller.signal })."

},


{
"question": "When using the Solana Gill library, if you want to generate a random keypair that can be later saved to a file, which function should you use?",
"answer":
    "Use the generateExtractableKeyPairSigner() function so you can extract and save the secret key."


},



{
"question": "When using the Solana Gill library: How can you manually override the default computeUnitLimit set by a token builder like buildCreateTokenTransaction?",
"answer":
    "You can pass your own computeUnitLimit parameter to the builder function. For example: buildCreateTokenTransaction({ computeUnitLimit: 1_400_000 })"
,
},




{
"question": "What is the recommended Solana version for Anchor v0.31.0?",
"answer": "Anchor v0.31.0 recommends Solana v2.1.0, which involves the Agave transition, renaming some of the Solana binaries to Agave.",
},

{
"question": "If a developer doesn't want the default TypeScript test files upon workspace initialization, which template can they specify?",
"answer": "They can use the Rust test template by running `anchor init --test-template rust` to avoid generating a TypeScript test scaffold.",
},
{
      "question": "When using the Solana Gill library: How do you specify a custom freeze authority when creating a token?",
    "answer": "Pass freezeAuthority as a field to getCreateTokenInstructions. It will be encoded into the InitializeMint instruction.",
},
{
    "question": "When using the Solana Gill library: What builder function can you use to generate all instructions for creating a token with metadata?",
    "answer": "You can use buildCreateTokenTransaction to generate a transaction with all required instructions for mint creation, metadata setup, and optional configuration like compute unit limit.",

}, {
        "question": "When using the Solana Gill library:  How does gill support sending transactions without manually signing first?",
    "answer": "You can pass a CompilableTransactionMessage to sendAndConfirmTransactionWithSignersFactory, and it will automatically sign and send it.",
},  {
        "question": "Why is ProgramError::Custom(0) treated specially in the Pinocchio error representation?",
        "answer": "ProgramError::Custom(0) is treated specially and mapped to a unique builtin error code (CUSTOM_ZERO) to prevent ambiguity and overlap with other error values."
    },

#######################################################



#######################################################

  {
    "question": "In `@solana/kit`: When using the `getOptionCodec` helper with a fixed-size inner codec, how can you ensure the outer `Option` codec is also fixed-size?",
    "answer": "You must set the `noneValue` option to `\"zeroes\"`, which pads the `None` case with zero bytes to match the fixed size of the inner codec.",
    "sources": [
      "File: /Users/caiser/Desktop/kit/docs/content/docs/concepts/codecs.mdx, snippet: \"Additionally, if the item is a `FixedSizeCodec`, you may set the `noneValue` option to `\"zeroes\"` to also make the returned Option codec a `FixedSizeCodec`.\""
    ]
  },

  {
    "question": "In `@solana/kit`: How can a transaction in `@solana/kit` be signed partially by multiple signers without scanning each instruction manually to find needed signers?",
    "answer": "Attach signers (e.g. KeyPairSigners) directly to instructions and the fee payer, then call `signTransactionMessageWithSigners` which automatically aggregates all required signers and signs in one pass.",
    "sources": [
      "File: /Users/caiser/Desktop/kit/docs/content/docs/getting-started/build-transaction.mdx, snippet: \"All that's left to do is call `signTransactionMessageWithSigners`. This helper function will extract and deduplicate all the signers from the transaction message.\""
    ]
  },
  {
    "question": "What happens when you attempt to decode an account that does not exist on-chain using `fetchEncodedAccount` from `@solana/kit`?",
    "answer": "The function returns a `MaybeEncodedAccount` with its `exists` field set to `false`, and only the address is known. You can then check `exists` or assert existence with a helper like `assertAccountExists`.",
    "sources": [
      "File: /Users/caiser/Desktop/kit/docs/content/docs/getting-started/fetch-account.mdx, snippet: \"If it does, we have access to all its fields. Otherwise, we just know its address — since it was provided as an input.\""
    ]
  },



  {
    "question": " How does `signTransactionMessageWithSigners` differ from a low-level manual signing approach in `@solana/kit`?",
    "answer": "It collects all signers attached to instructions or the fee payer within the `CompilableTransactionMessage`, de-duplicates them, and signs the transaction automatically, whereas manual signing requires discovering and passing each signer explicitly.",
    "sources": [
      "File: /Users/caiser/Desktop/kit/docs/content/docs/getting-started/build-transaction.mdx, snippet: \"As the message is signed, it is compiled… This helper function will extract and deduplicate all the signers…\""
    ]
  },
  {
    "question": "In `@solana/kit`: Why does `@solana/kit` not return a transaction signature directly from a `sendAndConfirmTransaction` call?",
    "answer": "Because the signature is deterministically known beforehand (the fee payer’s signature), so `@solana/kit` provides `getSignatureFromTransaction` to retrieve it. The library separates the concept of sending from signature retrieval.",
    "sources": [
      "File: /Users/caiser/Desktop/kit/docs/content/docs/getting-started/send-transaction.mdx, snippet: \"Notice that the `sendTransaction` or `sendAndConfirmTransaction` functions do not return the transaction signature… The transaction signature is accessible as soon as the transaction is signed…\""
    ]
  },


  {
    "question": "In `@solana/kit`, how can you add partial signers at the instruction level rather than only specifying them at transaction creation?",
    "answer": "Pass a `TransactionSigner` object to fields like `newAccount` or `payer` when building instructions. The transaction message keeps track of all signers so you don’t have to re-specify them at final signing.",
    "sources": [
      "File: /Users/caiser/Desktop/kit/docs/content/docs/getting-started/instructions.mdx, snippet: \"We can build this instruction using… For example: { payer: client.wallet, newAccount: mint }… both are signers…\""
    ]
  },

]



len(qa)

25

In [12]:
OPENROUTER_API_KEY = ""  # PUT YOUR OPENROUTER KEY HERE
OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions"

import requests
import json
import re


def ask_llm(question, model = "google/gemini-2.5-pro-preview-03-25"):
    headers = {
        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
        "Content-Type": "application/json"
    }

    data = {
        "model": model,
        "temperature": 0.0,
        "messages": [
            {"role": "user", "content": question}
        ]
    }

    try:
        response = requests.post(OPENROUTER_API_URL, headers=headers, json=data)
        response.raise_for_status()
        return response.json()["choices"][0]["message"]["content"]
    except requests.exceptions.RequestException as e:
        print(f"Error making request to OpenRouter: {e}")
        return {"error": str(e)}
    
def evaluate_answer(question,llm_answer, reference_answer, model = "google/gemini-2.5-pro-preview-03-25") :

    prompt = f"""
    You will evaluate an LLM's answers and determine whether they are correct.

    I'll give you:
    1. A question relating to Solana development
    2. A reference answer that is correct
    3. An LLM's answer to the same question

    Evaluate if the LLM's answer conveys the same information as the reference answer. Ignore differences in phrasing.
    Ignore extra information in either the reference or LLM answer that is not strictly needed to answer the question.


    Score:
    - 0: Incorrect or unrelated
    - 1: Fully correct, containing all key information from reference
    Be strict.


    Question: "{question}"

    Reference answer: "{reference_answer}"

    LLM's answer: "{llm_answer}"

    Provide your evaluation in this JSON format:
    {{
      "score": [0-1],
      "reasoning": "Your explanation for the score",
    }}

    Respond with ONLY the JSON object.
    """

    headers = {
        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
        "Content-Type": "application/json"
    }

    data = {
        "model": model,
        "temperature": 0.0,
        "messages": [
            {"role": "user", "content": prompt}
        ]
    }

    try:
        response = requests.post(OPENROUTER_API_URL, headers=headers, json=data)
        response.raise_for_status()
        result = response.json()

        answer_content = result["choices"][0]["message"]["content"]
        try:
            json_match = re.search(r'({[\s\S]*})', answer_content)
            if json_match:
                json_str = json_match.group(1)
                evaluation = json.loads(json_str)
            else:
                evaluation = json.loads(answer_content)
        except json.JSONDecodeError:
            print(f"Could not parse JSON from response: {answer_content}")
            evaluation = {
                "error": "Failed to parse evaluation",
                "raw_response": answer_content
            }

        return evaluation
    except Exception as e:
        print(f"Error during evaluation: {e}")
        return {"error": str(e)}


In [13]:
import concurrent.futures
from tqdm import tqdm  #progress bar
import time

def process_single_question(qa_pair, answer_model, eval_model):

    question = qa_pair["question"]
    reference_answer = qa_pair["answer"]
    llm_answer = ask_llm(question, model=answer_model)

    evaluation = evaluate_answer(question,llm_answer, reference_answer, model=eval_model)

    return {
        "question": question,
        "reference_answer": reference_answer,
        "llm_answer": llm_answer,
        "score": evaluation.get("score", 0),
        "reasoning": evaluation.get("reasoning", ""),
    }

def evaluate_in_parallel(qa_pairs, answer_model="google/gemini-2.5-pro-preview-03-25", eval_model="google/gemini-2.5-pro-preview-03-25", workers=5):

    results = []
    start_time = time.time()

    print(f"Processing {len(qa_pairs)} questions with {workers} parallel workers")

    with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor:
      futures = [
          executor.submit(process_single_question, qa_pair, answer_model, eval_model)
          for qa_pair in qa_pairs
      ]

      for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
        try:
          result = future.result()
          results.append(result)

        except Exception as e:
          print(f"Error during evaluation: {e}")


      total_time = time.time() - start_time
      print(f"\nSummary:")
      print(f"Total questions: {len(qa_pairs)}")
      print(f"Total time: {total_time:.2f} seconds, Average time per question: {total_time/len(qa_pairs):.2f} seconds")
      print(f"Average score: {sum(result['score'] for result in results) / len(results):.2f}")
      print("Scores: ", [result['score'] for result in results])

    return results


In [15]:
evaluate_in_parallel(qa,workers=10,answer_model="google/gemini-2.5-pro-preview-03-25", eval_model= "google/gemini-2.5-pro-preview-03-25")
evaluate_in_parallel(qa,workers=10, answer_model="anthropic/claude-3.7-sonnet", eval_model= "google/gemini-2.5-pro-preview-03-25")
evaluate_in_parallel(qa,workers=10, answer_model="openai/gpt-4.1", eval_model= "google/gemini-2.5-pro-preview-03-25")
evaluate_in_parallel(qa,workers=10, answer_model="openai/gpt-4o", eval_model= "google/gemini-2.5-pro-preview-03-25")
evaluate_in_parallel(qa,workers=10, answer_model="x-ai/grok-3-beta", eval_model= "google/gemini-2.5-pro-preview-03-25")
# print("----------------------------------------")

Processing 25 questions with 10 parallel workers


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [01:36<00:00,  3.85s/it]



Summary:
Total questions: 25
Total time: 96.34 seconds, Average time per question: 3.85 seconds
Average score: 0.04
Scores:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
Processing 25 questions with 10 parallel workers


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:41<00:00,  1.67s/it]



Summary:
Total questions: 25
Total time: 41.70 seconds, Average time per question: 1.67 seconds
Average score: 0.08
Scores:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
Processing 25 questions with 10 parallel workers


 40%|██████████████████████████████████████████████████████████████▊                                                                                              | 10/25 [00:26<00:21,  1.43s/it]

Error making request to OpenRouter: Response ended prematurely


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:55<00:00,  2.20s/it]



Summary:
Total questions: 25
Total time: 55.06 seconds, Average time per question: 2.20 seconds
Average score: 0.04
Scores:  [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Processing 25 questions with 10 parallel workers


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:39<00:00,  1.59s/it]



Summary:
Total questions: 25
Total time: 39.63 seconds, Average time per question: 1.59 seconds
Average score: 0.00
Scores:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Processing 25 questions with 10 parallel workers


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:05<00:00,  5.01s/it]


Summary:
Total questions: 25
Total time: 125.17 seconds, Average time per question: 5.01 seconds
Average score: 0.04
Scores:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]





[{'question': 'You are building a Solana program using the Pinocchio library and want to display a base integer (lamports) as if it has nine decimal places (one SOL). Which formatting option should you use in `pinocchio-log` to reflect this accurately?',
  'reference_answer': 'Use the `Precision` attribute (e.g. `"{:.9}"`) for the integer value so that it is formatted as if it has nine decimal digits. For example, `1_000_000_000` would appear as `1.000000000` if you choose `precision=9`.',
  'llm_answer': 'In Solana, 1 SOL is equivalent to 1,000,000,000 lamports (9 decimal places). When using the Pinocchio library for building Solana programs, and specifically the `pinocchio-log` crate for logging or displaying values, you need to format the integer value (lamports) to appear as if it has 9 decimal places to represent SOL accurately.\n\nTo achieve this, you should use the `{:09}` formatting option in Rust, which is supported by `pinocchio-log`. This ensures that the integer is padded w

Processing 31 questions with 10 parallel workers
100%|████████████████████████████████████████| 31/31 [02:03<00:00,  3.99s/it]

Summary:
Total questions: 31
Total time: 123.70 seconds, Average time per question: 3.99 seconds
Average score: 0.39
Scores:  [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1]
Processing 31 questions with 10 parallel workers
100%|████████████████████████████████████████| 31/31 [00:40<00:00,  1.32s/it]

Summary:
Total questions: 31
Total time: 40.78 seconds, Average time per question: 1.32 seconds
Average score: 0.42
Scores:  [1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0]
Processing 31 questions with 10 parallel workers
100%|████████████████████████████████████████| 31/31 [00:48<00:00,  1.58s/it]

Summary:
Total questions: 31
Total time: 48.85 seconds, Average time per question: 1.58 seconds
Average score: 0.45
Scores:  [0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0]
Processing 31 questions with 10 parallel workers
100%|████████████████████████████████████████| 31/31 [00:35<00:00,  1.13s/it]

Summary:
Total questions: 31
Total time: 35.06 seconds, Average time per question: 1.13 seconds
Average score: 0.29
Scores:  [1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0]
Processing 31 questions with 10 parallel workers
100%|████████████████████████████████████████| 31/31 [01:21<00:00,  2.64s/it]

Summary:
Total questions: 31
Total time: 81.98 seconds, Average time per question: 2.64 seconds
Average score: 0.29
Scores:  [1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]
----------------------------------------