# Example of how to read questions in chemiq.jsonl

In [1]:
import json

# Load question data
all_questions = []
with open("questions/chemiq.jsonl", 'r') as f:
    for line in f:
        all_questions.append(json.loads(line))

print(f"Loaded {len(all_questions)} questions")

Loaded 816 questions


# Question JSON Format

Each question is a Python dict with the following keys:

- **uuid**: unique identifier
- **ChemIQ**: Boolean whether question is part of main ChemIQ benchmark
- **question_category**, **sub_category**  
- **meta_data**: e.g. `smiles`, `smiles_random`, `carbon_count`  
- **prompt**: the question text shown to users  
- **answer**: the expected answer  
- **answer_format**, **answer_range**, **verification_method**  

To submit a question, send its `prompt` and keep track of the `uuid`.  


In [2]:
all_questions[0]

{'uuid': 'ee76b5bc-6a08-4f92-b4c7-337d955f8838',
 'question_category': 'nmr_elucidation',
 'sub_category': 'zinc_2d',
 'smiles': 'O=Nc1ccc(Br)cc1',
 'prompt': 'Write the SMILES string of the molecule consistent with this data.\n\nFormula: C6H4BrNO\n\n1H NMR: δ 8.02 (ddd, J = 8.53, 2.05, 0.46 Hz, 2H), 7.5 (ddd, J = 8.53, 1.26, 0.46 Hz, 2H).\n\n13C NMR: δ 163.95 (1C, s), 132.71 (2C, s), 122.19 (2C, s), 120 (1C, s).\n\nCOSY (δH, δH): (7.5, 8.02).\n\nHSQC (δH, δC): (7.5, 132.71), (8.02, 122.19).\n\nHMBC (δH, δC): (8.02, 163.95), (7.5, 163.95), (8.02, 132.71), (7.5, 132.71), (8.02, 122.19), (7.5, 122.19), (8.02, 120), (7.5, 120).\n\nOnly write the SMILES string. Do not write stereochemistry. Do not write any comments.',
 'answer': 'O=Nc1ccc(Br)cc1',
 'answer_format': 'smiles',
 'answer_range': None,
 'verification_method': 'canonical_smi_match',
 'ChemIQ': True}

# Example question

In [3]:
print(f"{'='*20} PROMPT {'='*20}")
print(all_questions[0]["prompt"])
print(f"{'='*20} ANSWER {'='*20}")
print(all_questions[0]["answer"])

Write the SMILES string of the molecule consistent with this data.

Formula: C6H4BrNO

1H NMR: δ 8.02 (ddd, J = 8.53, 2.05, 0.46 Hz, 2H), 7.5 (ddd, J = 8.53, 1.26, 0.46 Hz, 2H).

13C NMR: δ 163.95 (1C, s), 132.71 (2C, s), 122.19 (2C, s), 120 (1C, s).

COSY (δH, δH): (7.5, 8.02).

HSQC (δH, δC): (7.5, 132.71), (8.02, 122.19).

HMBC (δH, δC): (8.02, 163.95), (7.5, 163.95), (8.02, 132.71), (7.5, 132.71), (8.02, 122.19), (7.5, 122.19), (8.02, 120), (7.5, 120).

Only write the SMILES string. Do not write stereochemistry. Do not write any comments.
O=Nc1ccc(Br)cc1
