In [None]:
import re
import pandas as pd
import torch
from neurovlm.data import get_data_dir
from neurovlm.models import Specter

# SmolLM3-3b

## Load data and models

Load quantized version with llama_cpp.

In [None]:
from llama_cpp import Llama

llm = Llama(
    model_path="/Users/anon/Downloads/SmolLM3-Q4_K_M.gguf",
    n_ctx=8192,
    verbose=False
)

llama_context: n_ctx_per_seq (8192) < n_ctx_train (65536) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 

In [3]:
# Load data and Specter
data_dir = get_data_dir()
df = pd.read_parquet(data_dir / "publications_less.parquet")
specter = Specter()
aligner = torch.load(data_dir / "aligner_specter_adhoc_query.pt", weights_only=False).to("cpu")
latent_text = torch.load(data_dir / "latent_text_specter2_adhoc_query.pt", weights_only=True).to("cpu")
latent_text /= latent_text.norm(dim=1)[:, None] # unit norm

## Example query

The titles and abstract most related to the query will be passed to the LM.

In [4]:
# Encode query with specter than rank publications
query = "what is the role of the hippocampus in memory formation"
encoded_text = specter(query)[0].detach()
encoded_text_norm = encoded_text / encoded_text.norm()
cos_sim = latent_text @ encoded_text_norm
inds = torch.argsort(cos_sim, descending=True)

# Aggregate publications to pass to LM
papers = "\n".join(
    [f"[{ind + 1}] " + df.iloc[int(i)]["name"] + "\n" + re.sub(r'\s+', ' ', df.iloc[int(i)]["description"].replace("\n", "")) + "\n"
     for ind, i in enumerate(inds[:5])]
)

In [5]:
# Top 10 related publications - these will be passed to LM
df.iloc[inds[:10]]["name"].values.tolist()

['Probing the relevance of the hippocampus for conflict-induced memory improvement',
 'Specifying the role of the ventromedial prefrontal cortex in memory formation.',
 'Does the hippocampus mediate objective binding or subjective remembering?',
 'Role of hippocampal CA1 atrophy in memory encoding deficits in amnestic Mild Cognitive Impairment.',
 'Dose-dependent effect of the Val66Met polymorphism of the brain-derived neurotrophic factor gene on memory-related hippocampal activity.',
 'The role of hippocampus dysfunction in deficient memory encoding and positive symptoms in schizophrenia.',
 'Investigating the effect of hippocampal sclerosis on parietal memory network',
 'Differences between memory\xa0encoding and retrieval failure in mild cognitive impairment: results from quantitative electroencephalography and magnetic resonance volumetry',
 'Long-term retrograde amnesia...the crucial role of the hippocampus.',
 'Additive genetic effect of APOE and BDNF on hippocampus activity.']

## SmolLM3B + Reasoning

The LM is passed two things and asked to reason:

1. The user query
2. Publications related to user query


Note:

The \<think\> tags in the first half of output is where the LM is reasoning.

In [6]:
# Context to give the LM
context  = "Summarize the high level ideas of the topic in."
# context += "2. Select a subset of related publications. "
# context += "3. Use in text citations, [1], [2], etc, at the end of related statments. "

messages = [
    {"role": "system", "content": context},
    {"role": "user", "content": f"Topic: {query}.\n\nSupplemental publications: " + "\n" + papers}
]

stream = llm.create_chat_completion(messages=messages, stream=True, temperature=0.6)
start_stream = False # start stream after thinking
for chunk in stream:
    delta = chunk['choices'][0]['delta']
    if 'content' in delta and start_stream:
        print(delta['content'], end="", flush=True)
    elif 'content' in delta:
        if delta['content'] == "</think>":
            start_stream = True


The role of the hippocampus in memory formation is multifaceted and involves both encoding and retrieval processes. Research indicates that the hippocampus plays a critical role in episodic memory, particularly in the encoding phase, where it supports the integration of new information into existing memory structures. Studies have shown that the hippocampus is involved in conflict-induced memory improvement, enhancing the retention of conflict stimuli. However, the relationship between the hippocampus and conflict processing remains less understood, especially in patients with hippocampal lesions.

The hippocampus seems to facilitate the interaction between conflict processing and memory formation, enabling better memory retention for conflict stimuli. This function is likely disrupted in individuals with hippocampal damage, as evidenced by impaired conflict-induced memory improvement in patients with mesial temporal lobe epilepsy (MTLE).

Other studies suggest that the ventromedial p

## Thinking

The model can "think" to improve reasoning, but is slower.

In [7]:
# Context to give the LM
context  = "Summarize the high level ideas of the topic in. "
# context += "2. Select a subset of related publications. "
# context += "3. Use in text citations, [1], [2], etc, at the end of related statments. "
context += "<think></think>"

messages = [
    {"role": "system", "content": context},
    {"role": "user", "content": f"Topic: {query}.\n\nSupplemental publications: " + "\n" + papers}
]

stream = llm.create_chat_completion(messages=messages, stream=True, temperature=0.6)
start_stream = False # start stream after thinking
for chunk in stream:
    delta = chunk['choices'][0]['delta']
    if 'content' in delta:
        print(delta['content'], end="", flush=True)

<think>
Okay, let's tackle this query. The user is asking about the role of the hippocampus in memory formation, with a focus on the provided supplementary publications. They want a concise summary of the key ideas from each study, highlighting the different aspects of hippocampal function related to memory.

First, I need to go through each of the five studies one by one. Let me start with the first one. Study [1] talks about the hippocampus in conflict resolution and memory formation in epilepsy patients. The main points here are that the hippocampus contributes to resolving conflicts during memory encoding, but in patients with hippocampal sclerosis, there's no memory improvement. They found that in controls, the hippocampus is involved in both conflict and memory, but in patients, it's more about conflict resolution affecting retrieval rather than memory. The conclusion is that the hippocampus might reorganize its role when damaged, but more research is needed.

Next, study [2] foc

**Role of the Hippocampus in Memory Formation: A Summary of Key Findings**

The hippocampus plays a central role in memory formation, influencing both episodic memory and conflict resolution. Here’s a concise summary of key insights from the provided studies:

1. **Conflict Resolution and Memory in Epilepsy**:  
   - The hippocampus contributes to resolving conflict during memory encoding, enhancing memory for conflicting stimuli in healthy controls.  
   - In patients with hippocampal sclerosis (e.g., MTLE), there is no memory benefit; instead, conflict resolution affects retrieval, suggesting a reorganization of hippocampal function.  
   - This highlights the hippocampus’s dual role in conflict processing and memory formation, with potential reorganization in damaged states.

2. **Ventromedial Prefrontal Cortex (vmPFC) in Memory Formation**:  
   - The vmPFC is critical for memory formation when new information aligns with existing knowledge (knowledge congruency).  
   - In studies where information is unrelated to prior knowledge (low congruency), vmPFC involvement in memory formation is diminished, clarifying its specialized role in context-specific memory integration.

3. **Binding vs. Remembering in the Hippocampus**:  
   - The hippocampus primarily mediates **context binding** (objective memory of contextual details) rather than **subjective remembering** (detailed recall).  
   - The parietal cortex, particularly the left inferior parietal cortex, is linked to subjective remembering, suggesting a functional distinction between the hippocampus and parietal regions in memory processing.

4. **Hippocampal Atrophy and Memory Deficits in aMCI**:  
   - **Encoding** deficits are strongly associated with **CA1** hippocampal atrophy, indicating a specialized role for the CA1 subfield in memory encoding.  
   - **Retrieval** deficits are linked to **white matter** loss in parietal and frontal areas, suggesting a broader network disruption in retrieval.

5. **BDNF Val66Met Polymorphism and Hippocampal Activity**:  
   - The Met-BDNF allele is associated with reduced hippocampal activity during encoding, with effects dependent on the number of Met alleles (dose-dependent).  
   - This suggests that BDNF signaling in the hippocampus is crucial for memory encoding, and the polymorphism may influence neuroplasticity and memory formation.

**Implications**:  
- The hippocampus’s role in memory is multifaceted, involving conflict resolution, context binding, and BDNF-dependent encoding.  
- Distinct brain regions (e.g., parietal cortex) contribute to subjective memory, while the hippocampus focuses on objective memory structures.  
- Understanding these interactions can inform therapeutic strategies for memory impairments, particularly in conditions like aMCI and epilepsy.  
- Genetic factors, such as the Val66Met polymorphism, further highlight variability in hippocampal function, emphasizing the importance of personalized approaches in neuroimaging and treatment.