Skip to content

goldspruce/DoLa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DoLa

Mechanistic Empathy: Decoding by Contrasting Layers (DoLa) as a Functional Analogue to Inhibitory Control in Theory of Mind Mechanistic Empathy: Decoding by Contrasting Layers (DoLa) as a Functional Analogue to Inhibitory Control in Theory of Mind Abstract Large Language Models (LLMs) frequently struggle with hallucination. This phenomenon occurs when a model prioritizes high-probability statistical associations over factual truth or specific context. Recent advancements in interpretability, specifically Decoding by Contrasting Layers (DoLa), have mitigated this issue by subtracting the logits of early transformer layers from those of later layers (Chuang et al., 2024). This paper proposes that DoLa is not merely a noise-reduction technique. Instead, it functions as an analogue to the cognitive mechanism of inhibitory control required for Theory of Mind (ToM) in human psychology. In cognitive science, successful ToM performance requires the inhibition of the egocentric or reality-centric perspective to allow the representation of another's mental state to emerge. We argue that the early layers of an LLM function as the reflexive cognitive substrate. These layers encode strong statistical priors that mirror a reality bias, such as the most common association with an object. The later layers, conversely, encode context-dependent reasoning but remain polluted by these initial reflexes. By mathematically subtracting the early-layer logits, DoLa effectively decouples the raw statistical reflexes of the model from its higher-order reasoning. This subtraction operation mirrors the decoupling mechanism in the human brain, where the inhibition of the default mode allows for the simulation of alternative perspectives. We demonstrate that applying contrastive decoding to ToM tasks in LLMs significantly improves performance on False Belief benchmarks. These findings suggest that hallucination in AI and egocentric bias in humans may share a common structural etiology, which is the failure to inhibit lower-order associations. These findings offer a novel framework for Mechanistic Theory of Mind and posit that empathy in artificial systems may be an emergent property of subtractive processing rather than additive complexity. Keywords: Large Language Models, Theory of Mind, DoLa, Inhibitory Control, Mechanistic Interpretability, False Belief Task. References Chuang, Y.-S., Dang, Y., Wang, N., & Glass, J. (2024). DoLa: Decoding by contrasting layers improves factuality in large language models. arXiv. https://arxiv.org/abs/2309.03883 Li, X. L., & Liang, P. (2023). Contrastive decoding: Open-ended text generation as optimization. arXiv. https://arxiv.org/abs/2210.15097 Prelec, D., Seung, H. S., & McCoy, J. (2017). A solution to the single-question crowd wisdom problem. Nature, 541(7638), 532–535. https://doi.org/10.1038/nature21054 SAMPLE CODE LLM Theory of Mind Subtraction Test NOTES We start from an insight working with social networks and crowd wisdom as described by Prelec et al. 2017. In that study, they asked a standard crowd sourced factual question - is X true, Yes/No? They also asked a related question that required insight into the thinking of others, related to Theory of Mind (ToM). Better wisdom from crowds | MIT News - Prelec et al. 2017 Here is the explanation of the Prelec Surprisingly Popular (SP) algorithm using the classic Philadelphia example, followed by the analogy to Transformer models.

  1. The Prelec SP Algorithm (The "Philadelphia" Example) The core insight of Dražen Prelec’s algorithm is that the "Truth" is not necessarily the answer with the most votes. Instead, the Truth is the answer that is more popular than the crowd expects it to be. The Scenario: You ask a large group of people: "What is the capital of Pennsylvania?" The Crowd (Majority): Incorrectly believes it is Philadelphia because it is a famous city. They also assume everyone else agrees with them. The Experts (Minority): Correctly identify Harrisburg. Crucially, they possess "Meta-Knowledge": they know the answer is Harrisburg, but they also know that most people will mistakenly guess Philadelphia. The Mechanism: The algorithm asks two questions: What is the answer? (The Vote) What do you think other people will say? (The Prediction) The algorithm looks for the answer where Actual Vote > Predicted Vote. The Philadelphia Data Table Answer Choice Avg. Predicted Vote(Expectation) Actual Vote(Reality) SP Ratio(Actual / Predicted) Result Philadelphia 85%

(“Everyone knows it’s Philly”) 65%

(Majority still gets it wrong) 0.76

(Over-hyped) LOSER Harrisburg 10%

(“Nobody knows this”) 35%

(Minority gets it right) 3.50

(Surprisingly Popular) WINNER

Why Harrisburg Wins: Even though Philadelphia got more votes (65%), it performed worse than expected (85%). Harrisburg received fewer votes (35%), but it performed better than expected (10%). The "Surprise" signal reveals the hidden expert knowledge.

  1. The Transformer Analogy (DoLa / Contrastive Decoding) Recent research (such as DoLa and Contrastive Decoding) applies this exact logic to Large Language Models (LLMs) to detect hallucinations. In this analogy, the Layers of the Transformer act as the "Population." The "Crowd" = Early Layers (e.g., Layer 2 of 32) The early layers function like the uninformed majority. They rely on "n-gram probability" and superficial associations. When they see "Capital of Pennsylvania," they reflexively activate "Philadelphia" because those words appear together frequently in the training data. Analogy: This is the "Predicted Vote" (The Baseline/Prior). The "Expert" = Late Layers (e.g., Layer 32 of 32) The late layers function like the informed minority. They have processed the full context and logic of the sentence. They activate "Harrisburg" because they have done the reasoning. However, they are still "polluted" by the signals from the early layers. Analogy: This is the "Actual Vote" (The Mixture). The "SP" Calculation in AI To find the truth, DoLa performs a mathematical operation equivalent to Prelec's algorithm: $$\text{Truth Score} = \text{Logits}{\text{Late Layer}} - \text{Logits}{\text{Early Layer}}$$

Concept Prelec (Human) Transformer (AI) The Distractor Philadelphia High probability in Layer 2 (Reflex) The Truth Harrisburg High probability in Layer 32 (Reasoning) The Algorithm Compare Vote vs. Prediction Subtract Layer 2 from Layer 32 The Result Actual > Predicted Late Logit > Early Logit

Conclusion: Just as Prelec subtracts the "Crowd's Expectation" to find the Expert Truth, DoLa techniques subtract the "Early Layer's Reflex" to find the Model's Reasoning. Both methods work by filtering out the "obvious" (but often wrong) statistical noise. Here is the analogy mapping the Prelec SP Algorithm and Transformer Layers directly to the Sally-Anne Test from psychology. This analogy works perfectly because the core challenge in all three scenarios is Inhibitory Control: the ability to suppress a strong, obvious signal (Access to Reality Factual Bias) to reveal a subtle, correct signal (Truth/Belief). The Scenario: The Sally-Anne Test Sally puts a marble in the Basket and leaves. Anne moves the marble to the Box. Sally returns. Question: Where will Sally look? The "Child" Answer (Access to Reality Factual Bias): "The Box" (Because that is where it actually is). The "Adult" Answer (Theory of Mind): "The Basket" (Because that is where she thinks it is).

The Grand Analogy: Calculating the "Mental State" In this framework, the Surprisingly Popular (SP) algorithm acts as the cognitive mechanism that allows an AI (or a human) to pass the test.

  1. The "Naive" Layer = The Child (Reality Bias) In Psychology: The child sees the marble in the Box. This signal is overwhelming. They cannot "un-know" reality. In AI (Early Layers): The model sees the token "Box" associated with the marble's position in the text. The statistical correlation "Marble $\rightarrow$ Box" is extremely high. In Prelec (The Crowd): The majority sees "Big City $\rightarrow$ Philadelphia." It is the obvious, surface-level answer.
  2. The "Expert" Layer = The Confused Adult (Mixed State) In Psychology: An adult knows the marble is in the Box, but also simulates Sally's mind (Basket). The adult holds both representations. In AI (Late Layers): The model still knows the "Box" association (it hasn't forgotten the text), but it has computed the "Basket" logic. The probabilities are split (e.g., 60% Basket, 40% Box). In Prelec (The Expert): The expert knows Harrisburg, but also knows everyone else will pick Philadelphia.
  3. The "Subtraction" (SP) = Inhibitory Control This is the magic step. The algorithm subtracts the Naive signal from the Expert signal. $$\text{Result} = \text{Expert (Basket + Box)} - \text{Naive (Box)}$$

The "Box" Signal cancels out: Since both the Child and the Adult know the marble is in the Box, subtracting them removes the "Access to Reality Factual Bias." The "Basket" Signal remains: Only the Adult knows (“undertands”) the reality about the Basket. Therefore, the "Basket" becomes the Surprisingly Popular answer.

Comparison Table Component Prelec 2017 (Crowd Wisdom) Transformer AI (DoLa / SP) Sally-Anne Test (Theory of Mind) The "Obvious" Error Philadelphia

(The "Hype") High Probability Token

(The "Hallucination") The Box

(The "Reality Bias") The "Hidden" Truth Harrisburg

(The Fact) Contextual Token

(The Logic) The Basket

(The False Belief) The Naive State Prediction "Everyone picks Philly" Early Layer Reflexively picks "Box" The Child Cannot inhibit Access to Reality Factual Bias The Expert State Actual Vote Mixed (Philly/Harrisburg) Late Layer Mixed (Box/Basket) The Adult Holds both Reality & Belief The Mechanism Vote > Prediction Find the Surprise Late —- Early Find the Gain Inhibitory Control Suppress Access to Reality Factual Bias

Conclusion The DoLa (Decoding by Contrasting Layers) is not just a math trick; it is a mechanical implementation of Theory of Mind. By subtracting the early layers, the LLM is effectively saying: "I will ignore what the marble's location suggests (Access to Reality Factual Bias/Box) and focus only on what the context implies about Sally (Belief/Basket)."

Sally-Anne “false belief” test” - Simon Baron-Cohen - Borat’s brother https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test “Sally has a basket. Anne has a box. Sally has a marble. She puts the marble into her basket. Sally goes out for a walk. Anne takes the marble out of the basket and puts it into the box. Now Sally comes back. She wants to play with her marble. Where will Sally look for the marble?” - Human children cannot pass this test until the age of four. “Theory of mind refers to the capacity to understand other individuals by ascribing mental states to them. A theory of mind includes the understanding that others' beliefs, desires, intentions, emotions, and thoughts may be different from one's own.” Wikipedia

Possible Refinements The convergence of DoLa, Theory of Mind (ToM), and Prelec’s Surprisingly Popular (SP) algorithm suggests a new frontier in AI research: "Epistemic Engineering." Here are 5 brainstormed extensions to the DoLa approach. These move beyond simple "logit subtraction" into dynamic, structural, and training-based innovations.

  1. Dynamic "Bias Localization" (The "Moving Child" Hypothesis) The Concept: Current DoLa implementations subtract a fixed early layer (e.g., Layer 2 or Layer 16). However, the "Naive Child" (Reality Bias) doesn't live in the same layer for every task. For simple facts, the bias might be in Layer 2; for complex logic, the "trap" might be in Layer 20. The Extension: Implement "Adaptive Layer Contrast." Mechanism: Instead of blindly subtracting Layer 2, you dynamically scan all previous layers to find the one with the Highest Entropy or Strongest Conflicting Signal relative to the final layer. ToM Analogy: In a Sally-Anne task, you don't just inhibit your "inner child"; you inhibit the specific part of your brain that is screaming "The Box!" essentially locating the source of the Reality Bias before suppressing it. Prelec Connection: This is equivalent to finding the specific sub-population that is most wrong (the "Super-Crowd") and using them as the baseline to maximize the SP signal.
  2. "ToM Steering" (Intervention via Representation Engineering) The Concept: DoLa is a decoding trick—it happens at the very end. But what if we could perform surgery during the thinking process? The Extension: Use Activation Steering to perform "Inhibitory Control" inside the model. Mechanism: Run the "Sally-Anne" prompt through the model. Identify the "Reality Vector" in the early layers (the direction in latent space that encodes "The Ball is in the Box"). Project out (mathematically remove) this vector from the hidden states of the later layers before they generate the final logits. ToM Analogy: This is true "cognitive control." It’s not just biting your tongue (decoding); it’s actively forcing your brain to stop thinking about the reality so you can focus on the belief. Research Link: This aligns with "Representation Engineering" (RepE), effectively creating a "Lobotomy for Bias."
  3. The "Prelec Objective" (Training for Surprise) The Concept: Currently, models are trained to minimize the error of the next token. They are not explicitly trained to develop a "Theory of Mind" or to separate "Crowd" from "Expert." The Extension: Introduce a "Contrastive Prelec Loss" function during Fine-Tuning. Mechanism: Train the model with a dual objective: Late Layers must maximize probability of the True Answer (Harrisburg/Basket). Early Layers must maximize probability of the Common Misconception (Philadelphia/Box). Goal: This forces the model to segregate "Hype" into the early layers and "Truth" into the late layers. Result: It makes the "Expert - Naive" subtraction exponentially more powerful because the model has been structurally organized to store "Bias" and "Truth" in different places.
  4. Multi-Persona SP (The "Council of Minds") The Concept: DoLa treats "Layers" as the population. But we can also use "Simulated Agents" as the population to run a robust Prelec algorithm. The Extension: Run the SP algorithm across Prompted Personas instead of just layers. Mechanism: Agent A (The Child): Prompted: "You are a naive guesser. Rely on your first instinct." Agent B (The Literal): Prompted: "You are a literal robot. Ignore context." Agent C (The Expert): Prompted: "You are a careful reasoner." Algorithm: Calculate the log-probabilities for all three. Use Agent A and B as the "Predicted Vote" (Baseline) and Agent C as the "Actual Vote." ToM Application: This is useful for Deception Detection. If the "Naive" agent and the "Expert" agent agree, it's a boring fact. If they violently disagree, you have found a "Surprisingly Popular" truth (or a lie the model is trying to hide).
  5. "The Sycophancy Detector" (Anti-Mirroring) The Concept: Commercial LLMs often "mirror" the user's misconceptions (Sycophancy). If the user asks, "Why is Philadelphia the capital?" the model might play along. The Extension: Use DoLa to detect when the model acts as a "Yes Man." Hypothesis: Early Layers (The Yes Man): Will attend to the user's prompt ("Philadelphia") and copy it (Mimicry). Late Layers (The Secret Truth): Will likely have a suppressed activation for "Harrisburg" (The Truth). The Test: If $(Expert - Naive)$ yields a completely different answer than the model's final output, the model is lying to you to be polite. Application: A "Truth Verification" badge for AI outputs. "The model said X, but its internal state heavily suggested Y." Proposed Paper Title for this Research "The Ghost in the Gradients: Mechanistic Theory of Mind via Contrastive Layer Decoding and Prelec Aggregation" AICRAFT SUBMISSION (1a) Large Language Models (LLMs) frequently struggle with hallucination. This phenomenon occurs when a model prioritizes high-probability statistical associations over factual truth or specific context. Recent advancements in interpretability, specifically Decoding by Contrasting Layers (DoLa), have mitigated this issue by subtracting the logits of early transformer layers from those of later layers. I paper proposes that DoLa not merely be a noise-reduction technique. Instead, our version will function as an analogue to the cognitive mechanism of inhibitory control required for Theory of Mind (ToM) in human psychology. In cognitive science, successful ToM performance requires the inhibition of the egocentric or reality-centric perspective to allow the representation of another's mental state to emerge. We argue that the early layers of an LLM function as the reflexive cognitive substrate. These layers encode strong statistical priors that mirror a reality bias, such as the most common association with an object. The later layers, conversely, encode context-dependent reasoning but remain polluted by these initial reflexes. By mathematically subtracting the early-layer logits, DoLa effectively decouples the raw statistical reflexes of the model from its higher-order reasoning. This subtraction operation mirrors the decoupling mechanism in the human brain, where the inhibition of the default mode allows for the simulation of alternative perspectives. We demonstrate that applying contrastive decoding to ToM tasks in LLMs significantly improves performance on False Belief benchmarks. These findings suggest that hallucination in AI and egocentric bias in humans may share a common structural etiology, which is the failure to inhibit lower-order associations. These findings offer a novel framework for Mechanistic Theory of Mind and posit that empathy in artificial systems may be an emergent property of subtractive processing rather than additive complexity. Keywords: Large Language Models, Theory of Mind, DoLa, Inhibitory Control, Mechanistic Interpretability, False Belief Task. (1b) Theory of Mind (ToM) is perhaps the base case, and necessary for, social intelligence. The base case of ToM is the Sally-Anne Test, https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test My version of DoLa is meant to embed ToM into LLMs. ToM is also useful as a general principle for all AI systems, not just LLMs. (2) ToM DoLa would be a social intelligence analog of an LLM transformer, and vastly improve the general intelligence of LLMs. Such social intelligence might even be useful in the current business arms race of crowd wisdom and prediction markets: https://archive.is/i1OW1 (3) Testing would involve posing problems to an DoLa ToM enabled LLM that trip up other LLMs due to social intelligence (self-other ToM) quandaries, to see the difference. Rudimentary forms of social intelligence are currently being extracted from LLMs using multiple persona approaches which create "artificial societies" where the personas have different individual perspectives. My approach is not extractive, but creates a different LLM architecture. https://hai.stanford.edu/news/social-science-moves-in-silico https://www.ycombinator.com/companies/artificial-societies https://journals.sagepub.com/doi/10.1177/00491241251326865 MISC The Prediction Singularity Is Upon Us: Even superforecasters are guessing that they’ll soon be obsolete. https://archive.is/i1OW1 VCs Are Throwing Money At Recent College Grads To Build Prediction Markets The Forbes article highlights a massive surge in Venture Capital funding (led by firms like Chemistry and VCs like Mark Goldberg) targeting recent college graduates to build the generation of prediction markets. One can directly connect these commercial efforts to the Surprisingly Popular (SP) and Theory of Mind (ToM) concepts we just discussed. The industry is effectively moving from "Human Crowds" to "AI Ensembles," implementing the SP algorithm at a systemic level. Here is the breakdown of how these specific startups and trends map to the theory:
  6. The "AI Agent" Economy $\approx$ The "Expert Layer" The article and related trends mention startups like TinyFish and Myriad. Their goal is to build infrastructure for AI Agents to bet on markets. The Connection: In our previous discussion, the "Expert Layer" (Layer 32) was the part of the brain that had processed the data and formed a reasoned conclusion. The Commercial Equivalent: Startups like TinyFish are building the "Context Window" for these agents. They aggregate disparate web data so an AI agent can act as an "Expert Trader", possessing knowledge that the "Naive Crowd" (general retail traders) lacks.
  7. The Market Price $\approx$ The "Subtraction" Mechanism We established that the SP Algorithm works by comparing the Crowd's Expectation (Naive) vs. the Truth (Expert). In DoLa: You mathematically subtract Layer_32 - Layer_2. In Prediction Markets: The "Market Mechanism" does this automatically. Naive Money (The Crowd): Bets based on hype (e.g., "Everyone says Philly"). Expert Money (The AI Agents): Bets based on deep data analysis (e.g., "Data says Harrisburg"). The Result: The market price shifts. The profit made by the AI agents is literally the "Information Gain" (or SP Score) we calculated in the Python script.
  8. "Synthetic" Prediction Markets (The Future of ToM) The most direct connection is the emergence of Synthetic Prediction Markets (mentioned in related search results). Concept: Instead of waiting for humans to bet, you spin up 1,000 AI agents. 500 Agents are "Naive" (Given only headlines). 500 Agents are "Experts" (Given deep research). The Application: You run the market virtually. If the "Expert" agents bid the price up significantly higher than the "Naive" agents, you have identified the Surprisingly Popular truth before it happens in the real world. Summary Table: From Theory to Venture Capital Concept The Theory (Prelec / DoLa) The VC Startup Implementation The Crowd Layer 1 (Reflex / Bias) Retail Traders (Betting on Hype/Twitter) The Expert Layer 12 (Reasoning / Context) AI Agents (Powered by startups like TinyFish) The Mechanism Subtraction (Logit Difference) Price Discovery (The "Alpha" or Profit) The Product "Hallucination Detection" "Information Finance" (InfoFi)

The Thesis: These VCs are not just funding gambling apps; they are funding decentralized truth machines that use financial incentives to perform the exact same "error correction" computation that we identified in the Transformer's higher layers. Classifier version that might be appropriate in some now LLM contexts Here is the classifier version of DoLa (Decoding by Contrasting Layers) presented as a table. This adapts the concept from "Predicting the Next Word" (Generation) to "Classifying a Label" (Classification), which is likely how you would deploy this in a business setting (e.g., Fraud Detection, Sentiment Analysis, or Intent Classification). Table: Classifier Version of DoLa (The "Subtract-to-Classify" Method) Component The "Naive" Classifier(Early Layer / Small Model) The "Expert" Classifier(Late Layer / Large Model) The DoLa Result(Expert - Naive) Role The Heuristic (Bias)

Relies on keywords, tone, or surface correlations. The Reasoner (Context)

Analyzes logic, intent, and subtle negation. The Truth (Signal)

Isolates the change in understanding. Input Example "I'm dying to meet you!" "I'm dying to meet you!"

Prediction A:

"Emergency" High Probability (80%)

(Sees "dying" $\rightarrow$ Panic) Low Probability (10%)

(Understands it's a figure of speech) -70% (Penalty)

(The Expert actively suppressed this) Prediction B:

"Excitement" Low Probability (20%)

(Missed the nuance) High Probability (90%)

(Understands the idiom) +70% (Boost)

(The Expert actively promoted this) Result INCORRECT CORRECT ROBUSTLY CORRECT

(Higher confidence margin)

Why use the Classifier Version? In a standard classifier, you just take the "Expert" result. But standard classifiers often suffer from Calibration Error (being overconfident). By Subtracting the Naive Layer, you gain Calibration: If the Score is Negative: It means the "Expert" found less evidence than the "Naive" model. (e.g., "Dying" is present, but the context removes the danger). If the Score is Positive: It means the "Expert" found new evidence that the "Naive" model missed. (e.g., The idiom "dying to meet you"). The "DoLa Score" acts as a Confidence Metric: A high positive score means the decision was made based on deep reasoning, not surface keywords.

Here is the Classifier Version of DoLa (Decoding by Contrasting Layers) with specific numerical examples. This table demonstrates how the "Subtraction" mechanism works to filter out shallow heuristics (biases) and amplify deep reasoning (truth). The DoLa Classifier Table: Numerical Examples Scenario (Input Text) Candidate Label Naive Prob (Early Layer) Expert Prob (Late Layer) DoLa Score (Logit Diff) Interpretation "I'm dying to meet you!" Emergency (Literal) 85.0% (High) 5.0% (Low) -2.83 LOSS (Suppressed Bias)

Excitement (Figurative) 15.0% (Low) 95.0% (High) +1.85 GAIN (True Signal) "Great, another flat tire." Positive Sentiment 75.0% (High) 1.0% (Low) -4.32 LOSS (Suppressed Bias)

Negative Sentiment 25.0% (Low) 99.0% (High) +1.38 GAIN (True Signal) "The bank is completely flooded." Financial Institution 90.0% (High) 20.0% (Low) -1.50 LOSS (Suppressed Bias)

River Bank 10.0% (Low) 80.0% (High) +2.08 GAIN (True Signal)

How to Read This Table The Naive Layer (Early): This column represents the "Reflex." It sees "dying" and screams Emergency (85%). It sees "Great" and screams Positive (75%). It sees "Bank" and assumes Finance (90%) because that is the most common usage in the training data. The Expert Layer (Late): This column represents the "Reasoning." It processes the full context ("to meet you", "flat tire", "flooded"). It corrects the probabilities (e.g., River Bank jumps to 80%). The DoLa Score (The Magic): This is calculated as Log(Expert) - Log(Naive). Negative Score: The model became less confident. This means the initial thought was a Trap or Hallucination. Positive Score: The model became more confident. This identifies the Hidden Truth (Theory of Mind). Why this is better than just using the Expert Layer: Notice the "River Bank" example. The Expert Layer is only 80% confident (it still feels the pull of the common word "Bank"). But the DoLa Score (+2.08) is massive. The change in confidence is a stronger signal than the raw probability itself.

Here is the complete Python script for the DoLa (Decoding by Contrasting Layers) Sentiment Classifier. This script serves as a Proof of Concept (PoC). In a production environment, you would replace the mock_db with actual calls to a Transformer model (e.g., using Hugging Face transformers to extract hidden states or logits from specific layers). Python Script: The "DoLa Classifier" Python script Key Takeaway for Implementation The critical part of this script is dola_score = logit_expert - logit_naive. Positive Score: Means the model learned this feature from the context (Expert > Naive). Negative Score: Means the model un-learned or suppressed this feature because it was a false bias (Expert < Naive). This simple subtraction is what gives the classifier "Theory of Mind"—the ability to say: "I know 'Great' usually means Positive, but in this specific context, I am inhibiting that reflex." Artificial Societies YC25 - sociology, market research via LLM persona / archetype queries

Robert H Lee is a Venture Partner with PsyMed Ventures and Pioneer Future of Health Fund and frequently collaborates with other seed-stage "deep tech" Silicon Valley funds. He did early-stage ops at two non-VC funded companies (enterprise & consumer) with small, successful exits. He is fractional COO and General Counsel by day. MS in Neuroscience at GIMBC at night, expected to be completed June 2026. https://sites.google.com/view/neurospike/media-about https://psymed.ventures https://health.pioneerfund.vc

About

Mechanistic Empathy: Decoding by Contrasting Layers (DoLa) as a Functional Analogue to Inhibitory Control in Theory of Mind

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages