# In The Name Of GOD
# Turing Apex Challenge Submission

## Challenge Overview

**Scenario:** The year is 2050, and AI models now compete in the Turing Apex Challenge—a global tournament where only the most advanced AI survives. The world's top research teams train their models to tackle ultra-complex scientific problems with minimal human intervention.

**Task:** You will be given a training dataset consisting of 50 high-stakes scientific questions, each presented as a multiple-choice question with 4 options, along with their correct answers. Your goal is to develop and optimize an AI model capable of answering similar scientific questions accurately.


**Evaluation:** Your model’s performance will be judged solely on Accuracy on the unseen test questions.

**Core Requirements:**
1. Use an open-source LLM.
2. Implement RAG using external data.
3. Solution must be a runnable Google Colab notebook.

## This Notebook's Approach

1.  **LLM:** `mistralai/Mistral-7B-Instruct-v0.3` (Quantized for Colab)
2.  **RAG:** Wikipedia Retriever with LLM-based Query Expansion.
3.  **Technique:** Few-Shot Prompting + Chain-of-Thought (CoT).

## 1. Setup: Install Dependencies

In [None]:
!pip install -qU transformers accelerate bitsandbytes
!pip install -qU langchain langchain_community langchain_core
!pip install -qU wikipedia pandas scikit-learn

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m55.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m40.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m36.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## 2. Imports and Configuration

In [None]:
import pandas as pd
import torch
import re
import numpy as np

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
from huggingface_hub import login

from langchain_community.llms import HuggingFacePipeline
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.prompts import PromptTemplate, FewShotPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda, RunnableParallel

# Put your HF_Token for connecting to hugging face also you can set it in colab!
hf_token = "HF_TOKEN"
if hf_token:
    print("Logging into Hugging Face Hub...")
    login(token=hf_token)
else:
    print("HF_TOKEN secret not found!!")

Logging into Hugging Face Hub...


## 3. Load Open-Source LLM (Mistral-7B-Instruct)

Loads the `mistralai/Mistral-7B-Instruct-v0.3` model using 4-bit quantization for memory efficiency on Colab.

In [None]:
model_id = "mistralai/Mistral-7B-Instruct-v0.3"


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)


print(f"Loading tokenizer for {model_id}...")
tokenizer = AutoTokenizer.from_pretrained(model_id)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token


print(f"Loading model {model_id}...")

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)
print("Model loaded successfully.")


text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=300, # Sufficient length for CoT reasoning + answer
    temperature=0.1,   # Low temperature for more deterministic output
    do_sample=True     # Required for temperature to work!!!
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

Loading tokenizer for mistralai/Mistral-7B-Instruct-v0.3...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Loading model mistralai/Mistral-7B-Instruct-v0.3...


config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Device set to use cuda:0


Model loaded successfully.


  llm = HuggingFacePipeline(pipeline=text_generation_pipeline)


## 4. Load Data

Loads the training data (`train_data.csv`) required for few-shot examples and the test data (`test_data.csv`) for generating final predictions.

In [None]:
train_file = "train_data.csv"
test_file = "test_data.csv"

train_df = pd.read_csv(train_file)
test_df = pd.read_csv(test_file)
print(f"Loaded {len(train_df)} training examples from '{train_file}'.")
print(f"Loaded {len(test_df)} test examples from '{test_file}'.")

Loaded 50 training examples from 'train_data.csv'.
Loaded 150 test examples from 'test_data.csv'.


## 5. Setup RAG Retriever (Wikipedia)

Configures the base `WikipediaRetriever` from LangChain.

In [None]:
# Configure the retriever to fetch 2 documents per query
wiki_retriever = WikipediaRetriever(
    lang="en",
    load_max_docs=2,
    top_k_results=2,
    doc_content_chars_max=2000 # Max characters per document
    )
print("Wikipedia retriever configured.")

Wikipedia retriever configured.


## 6. Setup Query Expansion

Defines the logic to use the LLM for generating multiple search queries based on the original question, aiming to improve the relevance of retrieved context.

In [None]:
# Prompt template asking the LLM to generate search queries
query_expansion_template = """You are an AI assistant helping retrieve relevant information.
Based on the following user question, generate 3 diverse and effective search queries suitable for finding accurate information on Wikipedia.
Output *only* the queries, each on a new line. Do not include numbering or bullet points.

User Question: {original_question}

Search Queries:
"""
query_expansion_prompt = PromptTemplate(
    input_variables=["original_question"],
    template=query_expansion_template
)

query_expansion_chain = query_expansion_prompt | llm | StrOutputParser()

# Function to parse the LLM query output into a list
def parse_expanded_queries(query_string: str) -> list[str]:
    queries = query_string.strip().split('\n')
    return [q.strip() for q in queries if q.strip()]

# Function to generate and parse expanded queries for a question
def get_expanded_queries(original_question: str) -> list[str]:
    expanded_query_output = query_expansion_chain.invoke({"original_question": original_question})
    return parse_expanded_queries(expanded_query_output)


## 7. Prepare Few-Shot Examples

Selects a few examples from the training data to provide in-context learning for the LLM.

In [None]:
few_shot_examples = []
num_few_shot = 5

example_indices = list(np.random.randint(low =0,high=50,size=(num_few_shot,)))
for i in example_indices:
    row = train_df.loc[i]
    example = {
        "question": row['prompt'],
        "A": row['A'], "B": row['B'], "C": row['C'], "D": row['D'], "E": row['E'],
        "answer": row['answer']
    }
    few_shot_examples.append(example)
print(f"Prepared {len(few_shot_examples)} few-shot examples.")

Prepared 5 few-shot examples.


## 8. Create Prompt Template With Chain-of-Thought (CoT) Technique

Defines the main prompt structure using `FewShotPromptTemplate`. It includes instructions, the few-shot examples , the retrieved context, the target question/options, and instructions for CoT reasoning and the final answer format.

In [None]:
example_prompt_template = """
Question: {question}
A) {A}
B) {B}
C) {C}
D) {D}
E) {E}
Correct Answer: {answer}
"""

example_prompt = PromptTemplate(
    input_variables=["question", "A", "B", "C", "D", "E", "answer"],
    template=example_prompt_template
)

prefix = """
You are an expert scientific assistant. Answer the following multiple-choice question accurately based on the provided context and examples.
Your final response MUST be ONLY one of the uppercase letters A, B, C, D, or E, delivered in the specific format 'LLM Answer: [Letter]'.

Here are some examples showing the required output format:
"""

suffix = """
---Relevant Context Start---
{context}
---Relevant Context End---

Now, answer the following question using ONLY the context provided above and your general scientific knowledge:

Question: {question}
A) {A}
B) {B}
C) {C}
D) {D}
E) {E}

Think step-by-step to determine the best answer based ONLY on the provided context and question. Explain your reasoning clearly.
If the context is incomplete or conflicting, you must still choose the single most likely answer from options A, B, C, D, or E and provide it in the final format. Do NOT output any other text.
Provide the final answer *only* in the specified format 'LLM Answer: [Letter]' on a new line immediately after your reasoning.
Do not include the reasoning steps on the final answer line itself or any other text.

Step-by-step thinking:

"""

# Create the FewShotPromptTemplate
few_shot_prompt = FewShotPromptTemplate(
    examples=few_shot_examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["context", "question", "A", "B", "C", "D", "E"],
    example_separator="\n\n"
)
print("Few-shot CoT prompt template created.")

Few-shot CoT prompt template created.


## 9. Build the Enhanced RAG Chain

Constructs the final LangChain Expression Language (LCEL) pipeline.

In [None]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Function to retrieve context using expanded queries
def retrieve_expanded_context(input_dict):
    original_question = input_dict['question']
    # Generate expanded queries using the previously defined chain
    queries = get_expanded_queries(original_question)
    # Combine original question with expanded ones for complex search
    all_queries = [original_question] + queries

    all_retrieved_docs = []
    # Retrieve documents for each query
    for q in all_queries:
        retrieved = wiki_retriever.invoke(q)
        all_retrieved_docs.extend(retrieved)

    # De-duplicate retrieved documents based on source URL to avoid redundancy
    unique_docs = {}
    for doc in all_retrieved_docs:
        source_url = doc.metadata.get('source', doc.page_content[:100])
        if source_url not in unique_docs:
            unique_docs[source_url] = doc

    # Format the unique documents into a single context string
    formatted_context = format_docs(list(unique_docs.values()))
    return formatted_context

# Define the final RAG chain using LCEL
rag_chain = (
    # Use RunnableParallel to manage inputs for context retrieval and the main prompt
    RunnableParallel(
        # Generate context using the expanded retrieval function
        context=RunnableLambda(retrieve_expanded_context),
        question=RunnablePassthrough(),
        A=RunnablePassthrough(),
        B=RunnablePassthrough(),
        C=RunnablePassthrough(),
        D=RunnablePassthrough(),
        E=RunnablePassthrough()
    )
    # Pipe the combined dictionary (with context) into the prompt template
    | few_shot_prompt
    # Pipe the formatted prompt into the LLM
    | llm
    # Parse the LLM string output
    | StrOutputParser()
)
print("RAG chain with query expansion defined.")

RAG chain with query expansion defined.


## 10. Define Final Output Parser Function

Defines the function to extract the single-letter answer (A-E) from the LLM output.

In [None]:
def parse_llm_output(llm_output_text):
  pattern = r"LLM Answer: ([A-E])"

  # Search for the pattern in the input text
  match = re.search(pattern, llm_output_text)

  if match:
    return match.group(1)
  else:
    # If we can not decode llm output return "W" for wrong character!
    return "W"

## 11. Generate Predictions on Test Data

Iterates through the test dataset runs the complete RAG chain for each question and parse the output.

In [None]:
test_predictions = []

print(f"Starting prediction loop for {len(test_df)} TEST questions...")

# Iterate over all test dataframe
for index, row in test_df.iterrows():
    #input dictionary for the RAG chain
    chain_input = {
        'question': row['prompt'], 'A': row['A'], 'B': row['B'],
        'C': row['C'], 'D': row['D'], 'E': row['E']
    }

    try:
        # Invoke the full RAG chain
        raw_llm_output = rag_chain.invoke(chain_input)
        # Parse the final letter answer!
        final_answer = parse_llm_output(raw_llm_output)
        test_predictions.append(final_answer)

        print(f" Question: {index+1} Parsed: '{final_answer}' \n")

    except Exception as e:
        print(f"Error processing test question {index + 1}: {e} \n")
        test_predictions.append('W')

Starting prediction loop for 150 TEST questions...
 Question: 1 Parsed: 'D' 

 Question: 2 Parsed: 'W' 

 Question: 3 Parsed: 'B' 

 Question: 4 Parsed: 'B' 

 Question: 5 Parsed: 'C' 

 Question: 6 Parsed: 'B' 

 Question: 7 Parsed: 'W' 

 Question: 8 Parsed: 'D' 

 Question: 9 Parsed: 'B' 

 Question: 10 Parsed: 'W' 

 Question: 11 Parsed: 'A' 

 Question: 12 Parsed: 'D' 

 Question: 13 Parsed: 'E' 

 Question: 14 Parsed: 'C' 

 Question: 15 Parsed: 'B' 

 Question: 16 Parsed: 'A' 

 Question: 17 Parsed: 'B' 

 Question: 18 Parsed: 'B' 

 Question: 19 Parsed: 'B' 

 Question: 20 Parsed: 'E' 

 Question: 21 Parsed: 'E' 

 Question: 22 Parsed: 'C' 

 Question: 23 Parsed: 'W' 

 Question: 24 Parsed: 'C' 

 Question: 25 Parsed: 'D' 

 Question: 26 Parsed: 'B' 

 Question: 27 Parsed: 'A' 

 Question: 28 Parsed: 'A' 

 Question: 29 Parsed: 'D' 

 Question: 30 Parsed: 'B' 

 Question: 31 Parsed: 'B' 

 Question: 32 Parsed: 'B' 

 Question: 33 Parsed: 'B' 

 Question: 34 Parsed: 'A' 

 Quest

In [None]:
print(test_predictions)

['D', 'W', 'B', 'B', 'C', 'B', 'W', 'D', 'B', 'W', 'A', 'D', 'E', 'C', 'B', 'A', 'B', 'B', 'B', 'E', 'E', 'C', 'W', 'C', 'D', 'B', 'A', 'A', 'D', 'B', 'B', 'B', 'B', 'A', 'C', 'D', 'E', 'E', 'W', 'B', 'C', 'E', 'B', 'D', 'A', 'D', 'A', 'W', 'D', 'A', 'D', 'E', 'W', 'B', 'D', 'C', 'E', 'C', 'A', 'B', 'B', 'D', 'D', 'A', 'C', 'A', 'A', 'A', 'B', 'E', 'E', 'A', 'W', 'A', 'W', 'C', 'A', 'C', 'A', 'A', 'E', 'C', 'E', 'B', 'C', 'A', 'A', 'W', 'A', 'B', 'B', 'A', 'A', 'B', 'C', 'C', 'D', 'C', 'W', 'A', 'D', 'A', 'D', 'C', 'E', 'A', 'E', 'C', 'E', 'D', 'W', 'C', 'D', 'C', 'C', 'B', 'C', 'B', 'C', 'D', 'D', 'E', 'E', 'C', 'A', 'B', 'E', 'A', 'C', 'B', 'A', 'W', 'B', 'A', 'D', 'B', 'D', 'B', 'W', 'D', 'W', 'E', 'D', 'D', 'D', 'A', 'B', 'B', 'W', 'W']


## Developed By Eiliya Mohebi For DataCoLab Challenge