# InFact: Building Trust in Science through Collaborative Evaluation
*A Gaia Lab project*

<br>

> What we should do is create an institution that collects and evaluates scientific evidence and gives out confidence values based on evidence. -- [Sabine Hossenfelder](https://www.youtube.com/watch?v=zucXnn64qtk&t=314s)

The InFact Project is our attempt to realize this vision. We're building a prototype for a decentralized system that evaluates scientific claims and provides a clear measure of confidence based on available evidence.  Imagine a collaborative platform where scientists and the public can work together to assess the reliability of scientific findings, supported by AI and rigorous automated statistics. This is the core idea behind InFact.

## How does InFact work?

At its heart, InFact uses a network of interconnected nodes. Each node focuses on a specific scientific question, like "Do human-generated greenhouse gas emissions significantly increase global temperatures?"

Within each node, a sophisticated "inference engine" analyzes data related to the question. This engine combines the power of artificial intelligence (specifically, large language models or LLMs) with Bayesian statistics, a mathematical framework for updating beliefs based on evidence.

Breaking down the process:

* Data collection:  The node gathers data from various sources (research papers, datasets, etc.) related to the scientific question.

* AI-powered analysis:  LLMs are used to automatically extract key information from the data, identify relevant studies, and even assess the quality of the evidence.

* Bayesian updating:  The system uses Bayesian methods to weigh the evidence and update a "confidence score" for the scientific claim. This score reflects the strength of the evidence supporting the claim.

* Transparency and traceability:  All data, analyses, and confidence scores are recorded and made available for scrutiny. This ensures transparency and allows for continuous improvement of the system.

## Addressing the Challenges of Data Analysis

One of the biggest challenges in evaluating scientific claims is the sheer diversity and complexity of scientific data. InFact tackles this challenge by using LLMs to generate custom data analysis pipelines for each new piece of evidence. We use frontier off-the-shelf LLMs (currently, Claude 3.5 Sonnet). These AI models are pre-trained on vast amounts of scientific literature, allowing them to adapt to different types of studies and data formats.

## Beyond the Prototype

While our current prototype relies heavily on LLMs, we recognize the need for even greater rigor. Our team is developing a framework for "automatic progressive data analysis." This framework will combine the flexibility of LLMs with the reliability of established statistical models, creating a more robust and trustworthy system for evaluating scientific claims.

## InFact in Action

We envision InFact as a user-friendly platform that presents complex scientific information in a clear and accessible way. Imagine interactive visualizations that show how confidence scores evolve as new evidence emerges, along with explanations that help users understand the reasoning behind the scores.

## The Future of Scientific Confidence

InFact is more than just a technology; it's a vision for a future where scientific knowledge is more accessible, transparent, and trustworthy. By empowering scientists and the public to collaboratively evaluate evidence, we can foster a deeper understanding of science and its role in shaping our world.

## The Gaia Network

InFact is also envisioned as a demonstration of the capabilities of the [Gaia Network Protocol](https://gaia-lab.de), the Gaia Lab's main project. Visit our website to learn more.






# Technical description (assumes math/stats background, feel free to skip otherwise)

Here we propose **InFact**: a prototype implementation for such a scientific institution, specifically as a decentralized model-based inference engine combining generative AI and Bayesian statistics. InFact is also an implementation of a subset of the [Gaia Network Protocol](https://engineeringideas.substack.com/p/gaia-network-an-illustrated-primer).

An InFact node consists of:
* A natural-language description of a hypothesis $H$, a binary statement that represents the scope of this node -- i.e., the scientific question the node is supposed to provide confidence values for. Our standing example here will be: "Human-generated GHG emissions significantly increase global temperatures."
* A *model* composed of:
  * A *likelihood function* that receives a data point $D_i$ and calculates the positive and negative log-likelihoods $l^+_i = \log P(D_i\ |\ H, D_{<i}), l^-_i = \log P(D_i\ |\ \neg H, D_{<i})$. These are simply the (log) probabilities of such data being observed assuming that $H$ is true and false, respectively.
  * A *posterior log-odds ratio* or belief state $\pi_i = \log \frac{P(H\ |\ D_1 \dots D_i)}{P(\neg H\ |\ D_1 \dots D_i)}$, which is updated by the likelihood function. Before observing any data points, this starts as a *prior log-odds ratio* $\pi_0$; then after the likelihood function runs, the posterior gets updated as $\pi_i = \pi_{i-1} + l^+_i - l^-_i$.
  * A set of *hyperparameters* to calculate confidence intervals around the posterior. These can simply be Beta distribution parameters $\alpha^{+}_i , \alpha^{-}_i$, updated as $\alpha^{+}_i = \alpha^{+}_{i-1} + P(H\ |\ D_1 \dots D_i), \alpha^{-}_i = \alpha^{-}_{i-1} + P(\neg H\ |\ D_1 \dots D_i)$. Assuming that $H$ has a true truth value $p^*$, then this second-order distribution converges to a Delta peaked at $p^*$.
* A simple database that stores the sequence of tuples $\{(D_i, l^+_i, l^-_i, \pi_i)\}_i$.

At the end we also show a UX for displaying the evolution of $\pi_i$ and its interpretation.

The key challenge here is the model, specifically the likelihood function. Data comes in an immense variety of shapes and sizes; it is often imprecise and noisy. We also often don't have first-hand access to data, only to second-hand reports and summaries. It is impossible to define a likelihood upfront that will adequately and perfectly extract and incorporate all this information into the $l^{+,-}_i$ updates -- which would be equivalent to having a universal data analysis workflow. Instead, we use pre-trained LLMs to perform a plausible approximation of this workflow, effectively generating a custom data analysis pipeline that is updated every time the data set is added to. Contemporary LLMs have sufficient built-in knowledge of science and statistics to perform decently at this task. Furthermore, the LLM prompt and response are recorded, producing a complete rationale which can be inspected and improved upon. (This is sufficient for a prototype, but we can and should aspire for better rigor. Our team is developing a framework for automatic progressive data analysis -- which uses LLMs to propose workflow components, but then crystallizes known-good components into a database of reusable and vetted models, rather than naively relying on LLMs for ab initio analysis every time.)

Here is the algorithm we implement:
* Receive a data file (ex: a PDF, HTML, CSV, or PNG).
* [LLM] Parse the file into a set of data points.
* [LLM] Identify which data points are redundant (have been previously observed) and can be safely discarded.
* [LLM] Generate a likelihood model for the remaining data points, incorporating all metadata available from the file, as well as background knowledge that the modeler has. Relevant questions: Is this is a meta-analysis or an observational study? Are uncertainties reported, or can they be calculated? How trustworthy is the source?
* Run the likelihood model and compute the log-likelihood updates.
* Compute the new posterior and Beta hyperparameters.
* Store updates in the database.
* Render the UX.

# TODOs:
* Ability to upload data files to a watched directory, triggering an update cycle for the node.
* Ability to submit new human-curated/edited models for a data file, triggering an update cycle.
* Ability to generate multiple models per data file, ascribe weights to each model, and update the posterior according to a weighted average of the likelihood ratios.
* Ability to automatically scrape Web corpora for literature, NotebookLM-style, and upload relevant material as data files.
* APIs for submitting data files and models, and for querying the database.
* (Advanced) API for counterfactual queries: "how would the posterior change if we observed/did this"? This has two possible versions:
  * Natural-language queries, turned into statistical queries by the LLM. This is relatively expensive and brittle.
  * Queries expressed in a statistical format. These would need to assume a specific model version to make sense, and hence the API also needs to expose a stable data model/random variable ontology, which gets auto-updated whenever the underlying model changes.

In [None]:
# @title
!pip install anthropic



In [None]:
# @title
!pip install autogen



In [None]:
# @title
import anthropic
from typing import Any, Dict, Union, List, Tuple
import json
import logging
from pathlib import Path
import os
import base64
import pandas as pd
import numpy as np
from datetime import datetime
from autogen.code_utils import extract_code

Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



In [None]:
# @title
class AnthropicInFactNode:
    def __init__(self,
                hypothesis: str,
                api_key: str,
                prior_log_odds: float = 0.0,
                log_level: int = logging.INFO):
        """Initialize node with logging configuration."""
        # Setup logging
        self.logger = logging.getLogger(__name__)
        self.logger.setLevel(log_level)

        # Create a unique log file for this instance
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        log_dir = Path("logs")
        log_dir.mkdir(exist_ok=True)

        file_handler = logging.FileHandler(
            log_dir / f"infact_{timestamp}.log",
            encoding='utf-8'
        )
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )
        file_handler.setFormatter(formatter)
        self.logger.addHandler(file_handler)

        # Initialize API client
        self.api_key = api_key
        self.logger.info(f"Initializing AnthropicInFactNode with hypothesis: {hypothesis}")
        self.client = anthropic.Anthropic(api_key=api_key)

        # Store parameters
        self.hypothesis = hypothesis
        self.prior_log_odds = prior_log_odds
        self.current_posterior = prior_log_odds
        self.data_points = []

        self.logger.info("Initialization complete")

    def save(self, filename: str):
        """Save node's data to a JSON file."""
        data = {
            'hypothesis': self.hypothesis,
            'prior_log_odds': self.prior_log_odds,
            'current_posterior': self.current_posterior,
            'data_points': [
                {
                    'metadata': dp['metadata'],
                    'raw_data': dp['raw_data'],
                    'l_plus': dp['l_plus'],
                    'l_minus': dp['l_minus'],
                    'posterior': dp['posterior'],
                    'confidence_assessment': dp.get('confidence_assessment', {}),
                    'analysis_rationale': dp.get('analysis_rationale', '')
                }
                for dp in self.data_points
            ]
        }

        self.logger.info(f"Saving node data to {filename}")
        with open(filename, 'w') as f:
            json.dump(data, f, indent=2)

    @classmethod
    def load(cls, filename: str, api_key: str = None):
        """Load node from a JSON file."""
        with open(filename, 'r') as f:
            data = json.load(f)

        # Create new node
        node = cls(
            hypothesis=data['hypothesis'],
            api_key=api_key,
            prior_log_odds=data['prior_log_odds']
        )

        # Restore state
        node.current_posterior = data['current_posterior']
        node.data_points = data['data_points']

        node.logger.info(f"Loaded node data from {filename}")
        return node

    def process_data(self, data_file: str) -> Tuple[float, Tuple[float, float]]:
        """Process a new data file and update beliefs."""
        self.logger.info(f"Processing data file: {data_file}")

        try:
            # Parse data
            parsed_data = self._parse_data(data_file)
            self.logger.debug(f"Parsed data: {json.dumps(parsed_data, indent=2)}")

            # Check redundancy
            if self._is_redundant(parsed_data):
                self.logger.info("Data determined to be redundant, skipping")
                return self.current_posterior, self._calculate_uncertainty()

            # Analyze data
            l_plus, l_minus, code = self._analyze_data(parsed_data)
            self.logger.info(f"Analysis results - l_plus: {l_plus}, l_minus: {l_minus}")

            # Update posterior
            new_posterior = self.current_posterior + l_plus - l_minus
            self.logger.info(f"Updated posterior from {self.current_posterior} to {new_posterior}")

            # Store data point
            self.data_points.append({
                'raw_data': parsed_data,
                'metadata': self._extract_metadata(data_file),
                'l_plus': l_plus,
                'l_minus': l_minus,
                'posterior': new_posterior,
                'confidence_assessment': parsed_data.get('confidence_assessment', {
                    'confidence_score': 0,
                    'explanation': 'No confidence assessment available',
                    'key_strengths': [],
                    'key_limitations': []
                }),
                'analysis_rationale': code  # Store the analysis code used
            })

            self.current_posterior = new_posterior
            lower, upper = self._calculate_uncertainty()

            self.logger.info(f"Processing complete. Current probability: {self._to_probability(new_posterior):.2%} ({lower:.2%}, {upper:.2%})")
            return new_posterior, (lower, upper)

        except Exception as e:
            self.logger.error(f"Error processing {data_file}: {str(e)}", exc_info=True)
            raise

    def _parse_data(self, data_file: str) -> Dict:
        """Parse different file types using LLM assistance."""
        self.logger.info(f"Parsing data file: {data_file}")
        file_type = Path(data_file).suffix.lower()

        try:
            if file_type == '.csv':
                self.logger.debug("Processing CSV file")
                df = pd.read_csv(data_file)
                content = df.to_string()
                message_content = [{"type": "text", "text": content}]

            elif file_type in ['.pdf', '.PDF']:
                self.logger.debug("Processing PDF file")
                with open(data_file, 'rb') as f:
                    pdf_data = base64.b64encode(f.read()).decode('utf-8')
                message_content = [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": pdf_data
                        }
                    }
                ]

            elif file_type in ['.png', '.jpg', '.jpeg', '.gif', '.webp']:
                self.logger.debug(f"Processing image file of type {file_type}")
                with open(data_file, 'rb') as f:
                    img_data = base64.b64encode(f.read()).decode('utf-8')
                media_type = {
                    '.png': 'image/png',
                    '.jpg': 'image/jpeg',
                    '.jpeg': 'image/jpeg',
                    '.gif': 'image/gif',
                    '.webp': 'image/webp'
                }[file_type]
                message_content = [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": img_data
                        }
                    }
                ]

            else:
                self.logger.debug(f"Processing text file of type {file_type}")
                with open(data_file, 'r') as f:
                    content = f.read()
                message_content = [{"type": "text", "text": content}]

            # Add analysis prompt
            prompt = f"""
            Extract relevant data points for evaluating the hypothesis:
            "{self.hypothesis}"

            Provide your response as a JSON code block, like this:
            ```json
            {{
                "numerical_values": [],
                "metadata": {{}},
                "issues": [],
                "confidence_assessment": {{
                    "confidence_score": 0.75,
                    "explanation": "Detailed explanation of confidence level",
                    "key_strengths": [
                        "Strength 1",
                        "Strength 2"
                    ],
                    "key_limitations": [
                        "Limitation 1",
                        "Limitation 2"
                    ]
                }}
            }}
            ```

            The confidence_assessment should:
            1. Include a confidence_score between 0 and 1
            2. Provide a detailed explanation of the confidence level
            3. List key strengths of the evidence
            4. List key limitations or potential issues

            The overall JSON should include:
            1. Extracted numerical values and their uncertainties
            2. Relevant metadata (source quality, methodology, etc.)
            3. Any potential issues or biases in the data
            """

            message_content.append({"type": "text", "text": prompt})
            self.logger.debug(f"Prepared prompt: {prompt}")

            # Send to Anthropic API
            self.logger.info("Sending request to Anthropic API")
            message = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=8192,
                temperature=0.1,
                messages=[{
                    "role": "user",
                    "content": message_content
                }]
            )

            # Extract and parse response
            response_text = self._get_message_text(message)
            self.logger.debug(f"Received API response: {response_text}")

            # Try to extract JSON using autogen
            extracted_blocks = extract_code(response_text)

            # Look for JSON blocks
            json_str = None
            for lang, block in extracted_blocks:
                if lang.lower() in ['json', '']:
                    try:
                        # Try to parse as JSON to validate
                        parsed = json.loads(block)
                        json_str = block
                        break
                    except json.JSONDecodeError:
                        continue

            # If no valid JSON block found, try parsing the whole response
            if not json_str:
                self.logger.warning("No JSON code block found, trying to parse entire response")
                try:
                    parsed = json.loads(response_text)
                    json_str = response_text
                except json.JSONDecodeError:
                    self.logger.error("Failed to parse response as JSON")
                    return {
                        "extraction_error": "Failed to parse LLM response",
                        "raw_response": response_text
                    }

            parsed_data = json.loads(json_str)
            self.logger.debug(f"Successfully parsed JSON data: {json.dumps(parsed_data, indent=2)}")
            return parsed_data

        except Exception as e:
            self.logger.error(f"Error in _parse_data: {str(e)}", exc_info=True)
            raise

    def _extract_metadata(self, data_file: str) -> Dict[str, Any]:
        """Extract metadata from the data file."""
        self.logger.info(f"Extracting metadata from {data_file}")

        try:
            file_path = Path(data_file)
            basic_metadata = {
                "filename": file_path.name,
                "file_type": file_path.suffix.lower(),
                "file_size": file_path.stat().st_size,
                "last_modified": datetime.fromtimestamp(file_path.stat().st_mtime).isoformat(),
                "source_path": str(file_path.absolute())
            }

            if file_path.suffix.lower() == '.pdf':
                try:
                    import PyPDF2
                    with open(file_path, 'rb') as f:
                        pdf = PyPDF2.PdfReader(f)
                        if pdf.metadata:
                            basic_metadata.update({
                                "title": pdf.metadata.get('/Title', ''),
                                "author": pdf.metadata.get('/Author', ''),
                                "creator": pdf.metadata.get('/Creator', ''),
                                "producer": pdf.metadata.get('/Producer', ''),
                                "creation_date": pdf.metadata.get('/CreationDate', ''),
                                "modification_date": pdf.metadata.get('/ModDate', ''),
                                "page_count": len(pdf.pages)
                            })
                except ImportError:
                    self.logger.warning("PyPDF2 not installed, skipping PDF metadata extraction")

            elif file_path.suffix.lower() in ['.png', '.jpg', '.jpeg', '.gif', '.webp']:
                try:
                    from PIL import Image
                    with Image.open(file_path) as img:
                        basic_metadata.update({
                            "image_format": img.format,
                            "image_size": img.size,
                            "image_mode": img.mode,
                            "image_info": dict(img.info)
                        })
                except ImportError:
                    self.logger.warning("Pillow not installed, skipping image metadata extraction")

            elif file_path.suffix.lower() == '.html':
                try:
                    from bs4 import BeautifulSoup
                    with open(file_path, 'r', encoding='utf-8') as f:
                        soup = BeautifulSoup(f.read(), 'html.parser')
                        meta_tags = {}
                        for meta in soup.find_all('meta'):
                            name = meta.get('name', meta.get('property', ''))
                            content = meta.get('content', '')
                            if name and content:
                                meta_tags[name] = content

                        basic_metadata.update({
                            "title": soup.title.string if soup.title else '',
                            "meta_tags": meta_tags,
                            "has_article": bool(soup.find('article')),
                            "has_main": bool(soup.find('main')),
                            "num_headers": len(soup.find_all(['h1', 'h2', 'h3'])),
                            "has_tables": bool(soup.find_all('table'))
                        })
                except ImportError:
                    self.logger.warning("BeautifulSoup4 not installed, skipping HTML metadata extraction")

            elif file_path.suffix.lower() == '.csv':
                try:
                    df = pd.read_csv(file_path)
                    basic_metadata.update({
                        "num_rows": len(df),
                        "num_columns": len(df.columns),
                        "column_names": list(df.columns),
                        "data_types": {col: str(dtype) for col, dtype in df.dtypes.items()},
                        "has_nulls": df.isnull().any().any()
                    })
                except Exception as e:
                    self.logger.warning(f"Error extracting CSV metadata: {str(e)}")

            self.logger.debug(f"Extracted metadata: {json.dumps(basic_metadata, indent=2)}")
            return basic_metadata

        except Exception as e:
            self.logger.error(f"Error in _extract_metadata: {str(e)}", exc_info=True)
            return {
                "filename": data_file,
                "error": str(e)
            }

    def _analyze_data(self, data: Dict) -> Tuple[float, float, str]:
        """Generate and execute analysis code using LLM."""
        self.logger.info("Analyzing parsed data")

        MAX_LOG_LIKELIHOOD_RATIO = 5.

        try:
            # Generate analysis code
            prompt = f"""
            Given this data:
            {json.dumps(data, indent=2)}

            Generate Python code to calculate log likelihoods for the hypothesis:
            "{self.hypothesis}"

            This should be a single function named `calculate_log_likelihoods`.
            It should take a single argument, a dict with the format given above,
            and output only the tuple of log-likelihoods
              l_plus = log P(data | hypothesis),
              l_minus = log P(data | not hypothesis).

            The code should:
            1. Calculate l_plus and l_minus (log likelihoods)
            2. Handle uncertainties properly
            3. Account for data quality and potential biases
            4. Limit overconfidence by capping the absolute difference between l_plus and l_minus to {MAX_LOG_LIKELIHOOD_RATIO}.
            4. Use the usual libraries such as numpy and scipy for calculations
            5. Use print() to output intermediate results, as well as the final result before returning.

            Return only executable Python code with the function definition.
            Do not include the function call itself.
            """

            self.logger.debug(f"Analysis prompt: {prompt}")

            message = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=8192,
                temperature=0.1,
                messages=[{
                    "role": "user",
                    "content": prompt
                }]
            )

            response_text = self._get_message_text(message)
            self.logger.debug(f"API response: {response_text}")

            # Extract code using autogen
            from autogen.code_utils import extract_code
            extracted_code = extract_code(response_text)

            if not extracted_code:
                self.logger.error("No code block found in API response")
                raise

            # Get the first Python code block
            code = None
            for lang, code_block in extracted_code:
                if lang.lower() in ['python', 'py', '']:
                    code = code_block
                    break

            if not code:
                self.logger.error("No Python code block found in API response")
                raise

            self.logger.debug(f"Extracted Python code: {code}")

            # Execute the code
            l_plus, l_minus, code = self._execute_code_with_debug(code, data)
            return l_plus, l_minus, code

        except Exception as e:
            self.logger.error(f"Error in _analyze_data: {str(e)}", exc_info=True)
            raise

    def _execute_code_with_debug(self, code: str, data: Dict, max_attempts: int = 5) -> Tuple[float, float, str]:
        """Execute code with debug loop for error correction."""
        globals_dict = {
            "np": np,
            "math": math,
            "data": data
        }

        attempt = 1
        while attempt <= max_attempts:
            self.logger.info(f"Code execution attempt {attempt}/{max_attempts}")
            self.logger.debug(f"Executing code:\n{code}")

            try:
                exec(code, globals_dict)
                l_plus, l_minus = globals_dict['calculate_log_likelihoods'](data)

                # Validate outputs
                if l_plus is None or l_minus is None:
                    raise ValueError("Code did not define l_plus and l_minus")

                if not (isinstance(l_plus, (int, float)) and isinstance(l_minus, (int, float))):
                    raise ValueError("l_plus and l_minus must be numeric values")

                self.logger.info(f"Code execution successful - l_plus: {l_plus}, l_minus: {l_minus}")
                return float(l_plus), float(l_minus), code

            except Exception as e:
                self.logger.warning(f"Code execution failed on attempt {attempt}: {str(e)}")

                if attempt == max_attempts:
                    self.logger.error("Max attempts reached, raising error")
                    raise RuntimeError(f"Failed to generate working code after {max_attempts} attempts. Final error: {str(e)}")

                # Ask LLM to fix the code
                debug_prompt = f"""
                The following code failed with error: {str(e)}

                Code:
                ```python
                {code}
                ```

                Input data:
                ```json
                {json.dumps(data, indent=2)}
                ```

                Please fix the code to:
                1. Handle the error properly
                2. Return numeric values for l_plus and l_minus
                3. Include proper error checking
                4. Handle edge cases in the input data

                Return only the corrected Python code.
                """

                self.logger.debug(f"Sending debug prompt to LLM:\n{debug_prompt}")

                message = self.client.messages.create(
                    model="claude-3-5-sonnet-20241022",
                    max_tokens=8192,
                    temperature=0.1,
                    messages=[{
                        "role": "user",
                        "content": debug_prompt
                    }]
                )

                # Extract corrected code
                response_text = self._get_message_text(message)
                extracted_code = extract_code(response_text)

                if not extracted_code:
                    self.logger.error("No code block found in debug response")
                    attempt += 1
                    continue

                # Get the first Python code block
                for lang, code_block in extracted_code:
                    if lang.lower() in ['python', 'py', '']:
                        code = code_block
                        break
                else:
                    self.logger.error("No Python code block found in debug response")
                    attempt += 1
                    continue

            attempt += 1

        # Should never reach here due to raise in loop
        raise RuntimeError("Unexpected error in debug loop")

    def _is_redundant(self, new_data: Dict) -> bool:
        """Check if new data is redundant with existing data."""
        self.logger.info("Checking for data redundancy")

        if not self.data_points:
            self.logger.debug("No existing data points, not redundant")
            return False

        try:
            prompt = f"""
            Compare the following new data:
            {json.dumps(new_data, indent=2)}

            With these existing data points:
            {json.dumps([dp['raw_data'] for dp in self.data_points], indent=2)}

            Is the new data redundant with any existing data points?
            Consider:
            1. Same source or study being cited
            2. Same measurements within uncertainty
            3. Derived results from already incorporated primary data

            Return "true" if redundant, "false" if novel information.
            """

            self.logger.debug(f"Redundancy check prompt: {prompt}")

            message = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=4096,
                temperature=0.1,
                messages=[{
                    "role": "user",
                    "content": prompt
                }]
            )

            response = self._get_message_text(message)
            self.logger.debug(f"Redundancy check response: {response}")

            is_redundant = response.strip().lower() == "true"
            self.logger.info(f"Redundancy check result: {is_redundant}")
            return is_redundant

        except Exception as e:
            self.logger.error(f"Error in _is_redundant: {str(e)}", exc_info=True)
            raise

    def _get_message_text(self, message) -> str:
        """Extract text content from message response."""
        if message.content and len(message.content) > 0:
            content_block = message.content[0]
            if hasattr(content_block, 'text'):
                return content_block.text
        return ""

    def _calculate_uncertainty(self) -> tuple[float, float]:
        """Calculate 95% confidence interval for the posterior probability.

        Returns:
            tuple[float, float]: Lower and upper bounds of the 95% CI
        """
        # Convert current posterior log-odds to probability
        p = self._to_probability(self.current_posterior)

        # Calculate total weight of evidence from Bayes factors
        total_evidence = sum(
            abs(math.exp(dp['l_plus'] - dp['l_minus']) - 1)
            for dp in self.data_points
        )

        if total_evidence < 1e-6:
            return (0.0, 1.0)  # Default CI for effectively no data

        # Each Bayes factor represents the weight of evidence
        # The concentration parameter of our Beta should reflect this
        concentration = total_evidence

        # Calculate Beta parameters to maintain the mean at p
        alpha = concentration * p
        beta = concentration * (1 - p)

        # Calculate 95% confidence interval
        from scipy import stats
        ci_low, ci_high = stats.beta.interval(0.95, alpha, beta)

        # Clip to [0, 1]
        ci_low = max(0.0, min(1.0, ci_low))
        ci_high = max(0.0, min(1.0, ci_high))

        return ci_low, ci_high

    @staticmethod
    def _to_probability(log_odds: float) -> float:
        """Convert log odds to probability."""
        return 1 / (1 + np.exp(-log_odds))

In [None]:
# @title
import numpy as np
from dataclasses import dataclass
import math
from typing import List, Tuple, Dict, Optional
import json

@dataclass
class DataPoint:
    raw_data: str  # Original data representation
    metadata: dict  # Source information, timestamps, etc.
    likelihood_plus: float  # l^+_i
    likelihood_minus: float  # l^-_i
    posterior: float  # π_i
    analysis_rationale: str  # LLM explanation for likelihood calculation

@dataclass
class HyperParameters:
    alpha_plus: float  # α^+
    alpha_minus: float  # α^-

    def update(self, p_h: float):
        """Update hyperparameters based on new posterior probability"""
        self.alpha_plus += p_h
        self.alpha_minus += (1 - p_h)

    def confidence_interval(self, confidence: float = 0.95) -> Tuple[float, float]:
        """Calculate confidence interval for the belief state"""
        from scipy import stats
        a = self.alpha_plus
        b = self.alpha_minus
        interval = stats.beta.interval(confidence, a, b)
        return interval

class InFactNode:
    def __init__(self, hypothesis: str, prior_log_odds: float = 0.0):
        self.hypothesis = hypothesis
        self.prior_log_odds = prior_log_odds
        self.current_posterior = prior_log_odds
        self.hyperparameters = HyperParameters(1.0, 1.0)  # Start with uniform Beta
        self.data_points: List[DataPoint] = []

    def process_data(self, data_file: str) -> Tuple[float, float]:
        """
        Process a new data file and update beliefs.
        Returns the updated posterior and its uncertainty.
        """
        # Parse data file (implemented by derived classes for specific file types)
        data = self._parse_data(data_file)

        # Check for redundancy with existing data points
        if self._is_redundant(data):
            return self.current_posterior, self._calculate_uncertainty()

        # Generate and run analysis code (implemented by derived classes)
        l_plus, l_minus = self._analyze_data(data)

        # Update posterior
        new_posterior = self.current_posterior + l_plus - l_minus

        # Update hyperparameters
        p_h = math.exp(new_posterior) / (1 + math.exp(new_posterior))  # Convert log-odds to probability
        self.hyperparameters.update(p_h)

        # Store data point
        data_point = DataPoint(
            raw_data=str(data),
            metadata=self._extract_metadata(data_file),
            likelihood_plus=l_plus,
            likelihood_minus=l_minus,
            posterior=new_posterior,
            analysis_rationale=""  # Filled by derived class
        )
        self.data_points.append(data_point)

        self.current_posterior = new_posterior
        return new_posterior, self._calculate_uncertainty()

    def _calculate_uncertainty(self) -> float:
        """Calculate confidence interval width for the posterior probability."""
        # Convert current posterior log-odds to probability
        p = self._to_probability(self.current_posterior)

        # Use number of data points to determine effective sample size
        n = len(self.data_points)
        if n < 1:
            return 0.1  # Default uncertainty for no data

        # Calculate Beta distribution parameters
        # Start with uniform prior (α=β=1)
        alpha = 1 + n * p
        beta = 1 + n * (1 - p)

        # Calculate 95% confidence interval
        from scipy import stats
        ci_low, ci_high = stats.beta.interval(0.95, alpha, beta)

        # Return half the interval width as the ± uncertainty
        return (ci_high - ci_low) / 2

    @staticmethod
    def _to_probability(log_odds: float) -> float:
        """Convert log odds to probability."""
        return 1 / (1 + np.exp(-log_odds))

    def _is_redundant(self, new_data) -> bool:
        """Check if new data has already been incorporated"""
        # Basic implementation - should be overridden with more sophisticated redundancy detection
        return str(new_data) in [dp.raw_data for dp in self.data_points]

    def _extract_metadata(self, data_file: str) -> dict:
        """Extract metadata from data file"""
        # Basic implementation - should be overridden for specific file types
        return {
            "filename": data_file,
            "timestamp": "",  # Fill with actual timestamp
            "format": data_file.split(".")[-1]
        }

    def get_belief_history(self) -> List[Dict]:
        """Get history of belief updates for visualization"""
        history = []
        current = self.prior_log_odds
        for dp in self.data_points:
            current += dp.likelihood_plus - dp.likelihood_minus
            history.append({
                "posterior": current,
                "probability": math.exp(current) / (1 + math.exp(current)),
                "data_point": dp.raw_data,
                "rationale": dp.analysis_rationale
            })
        return history

    def save_state(self, filename: str):
        """Save current state to file"""
        state = {
            "hypothesis": self.hypothesis,
            "prior_log_odds": self.prior_log_odds,
            "current_posterior": self.current_posterior,
            "hyperparameters": {
                "alpha_plus": self.hyperparameters.alpha_plus,
                "alpha_minus": self.hyperparameters.alpha_minus
            },
            "data_points": [vars(dp) for dp in self.data_points]
        }
        with open(filename, "w") as f:
            json.dump(state, f, indent=2)

    def load_state(self, filename: str):
        """Load state from file"""
        with open(filename, "r") as f:
            state = json.load(f)

        self.hypothesis = state["hypothesis"]
        self.prior_log_odds = state["prior_log_odds"]
        self.current_posterior = state["current_posterior"]
        self.hyperparameters = HyperParameters(
            state["hyperparameters"]["alpha_plus"],
            state["hyperparameters"]["alpha_minus"]
        )
        self.data_points = [DataPoint(**dp) for dp in state["data_points"]]

    # Abstract methods to be implemented by derived classes
    def _parse_data(self, data_file: str):
        raise NotImplementedError

    def _analyze_data(self, data) -> Tuple[float, float]:
        raise NotImplementedError

In [None]:
# @title
TEMPLATE = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>InFact Analysis: {{ hypothesis }}</title>
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
    <style>
        body {
            font-family: system-ui, -apple-system, sans-serif;
            line-height: 1.5;
            max-width: 1200px;
            margin: 0 auto;
            padding: 2rem;
            color: #1a1a1a;
        }

        .card {
            background: white;
            border-radius: 8px;
            box-shadow: 0 1px 3px rgba(0,0,0,0.1);
            padding: 1.5rem;
            margin-bottom: 1.5rem;
        }

        .hypothesis {
            font-size: 1.2rem;
            color: #4a5568;
            margin: 1rem 0;
            padding: 1rem;
            background: #f7fafc;
            border-radius: 6px;
        }

        .final-assessment {
            text-align: center;
            padding: 2rem;
            background: #ebf8ff;
        }

        .probability {
            font-size: 3rem;
            font-weight: bold;
            color: #2b6cb0;
        }

        .uncertainty {
            font-size: 1.2rem;
            color: #4a5568;
            margin-top: 0.5rem;
        }

        .evidence-point {
            border-left: 4px solid #4299e1;
            padding-left: 1rem;
            margin-bottom: 2rem;
        }

        .evidence-grid {
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
            gap: 1rem;
        }

        .stats-grid {
            display: grid;
            grid-template-columns: repeat(3, 1fr);
            gap: 1rem;
            margin: 1rem 0;
        }

        .stat-card {
            background: #f7fafc;
            padding: 1rem;
            border-radius: 6px;
            text-align: center;
        }

        .stat-value {
            font-size: 1.5rem;
            font-weight: bold;
            color: #2b6cb0;
        }

        .stat-label {
            font-size: 0.875rem;
            color: #4a5568;
            margin-top: 0.25rem;
        }

        .chart-container {
            height: 400px;
            margin: 2rem 0;
        }

        .confidence-high { color: #047857; }
        .confidence-medium { color: #b45309; }
        .confidence-low { color: #dc2626; }

        details {
            margin-top: 1rem;
        }

        summary {
            cursor: pointer;
            color: #2b6cb0;
            font-weight: 500;
        }

        pre {
            background: #f7fafc;
            padding: 1rem;
            border-radius: 6px;
            overflow-x: auto;
            font-size: 0.875rem;
        }
    </style>
</head>
<body>
    <header>
        <h1>Evidence Analysis</h1>
        <div class="hypothesis">{{ hypothesis }}</div>
    </header>

    <main>
        <section class="card final-assessment">
            <h2>Current Assessment</h2>
            <div class="probability">
                {{ "%.1f"|format(final_probability * 100) }}%
            </div>
            <div class="uncertainty">
                ({{ "%.1f"|format(ci_low * 100) }}%, {{ "%.1f"|format(ci_high * 100) }}%)
            </div>
            <div class="interpretation">
                {% if final_probability > 0.99 %}Virtually Certain
                {% elif final_probability > 0.95 %}Extremely Likely
                {% elif final_probability > 0.90 %}Very Likely
                {% elif final_probability > 0.66 %}Likely
                {% elif final_probability > 0.33 %}Uncertain
                {% elif final_probability > 0.10 %}Unlikely
                {% elif final_probability > 0.05 %}Very Unlikely
                {% elif final_probability > 0.01 %}Extremely Unlikely
                {% else %}Virtually Impossible{% endif %}
            </div>
        </section>

        <section class="card">
            <h2>Belief Evolution</h2>
            <div class="chart-container">
                <canvas id="beliefChart"></canvas>
            </div>
        </section>

        <section class="card">
            <h2>Evidence Analysis</h2>
            {% for point in evidence_points %}
            <div class="evidence-point">
                <h3>Evidence {{ loop.index }}: {{ point.file }}</h3>

                <div class="stats-grid">
                    <div class="stat-card">
                        <div class="stat-value">{{ "%.1f"|format(point.prior_prob * 100) }}%</div>
                        <div class="stat-label">Prior Probability</div>
                    </div>
                    <div class="stat-card">
                        <div class="stat-value">{{ "%.1f"|format(point.likelihood_ratio) }}×</div>
                        <div class="stat-label">Likelihood Ratio</div>
                    </div>
                    <div class="stat-card">
                        <div class="stat-value">{{ "%.1f"|format(point.posterior * 100) }}%</div>
                        <div class="stat-label">Posterior Probability</div>
                    </div>
                </div>

                <div class="evidence-grid">
                    <div>
                        <h4>Confidence Assessment</h4>
                        <div class="confidence-score
                            {% if point.confidence_assessment.confidence_score > 0.7 %}confidence-high
                            {% elif point.confidence_assessment.confidence_score > 0.4 %}confidence-medium
                            {% else %}confidence-low{% endif %}">
                            {{ "%.0f"|format(point.confidence_assessment.confidence_score * 100) }}% Confidence
                        </div>

                        {% if point.confidence_assessment.key_strengths %}
                        <h5>Key Strengths</h5>
                        <ul>
                            {% for strength in point.confidence_assessment.key_strengths %}
                            <li>{{ strength }}</li>
                            {% endfor %}
                        </ul>
                        {% endif %}

                        {% if point.confidence_assessment.key_limitations %}
                        <h5>Key Limitations</h5>
                        <ul>
                            {% for limitation in point.confidence_assessment.key_limitations %}
                            <li>{{ limitation }}</li>
                            {% endfor %}
                        </ul>
                        {% endif %}
                    </div>
                </div>

                <details>
                    <summary>Analysis Details</summary>
                    <div class="rationale">
                        {% if point.analysis_rationale %}
                        <pre><code>{{ point.analysis_rationale }}</code></pre>
                        {% else %}
                        <p>No detailed analysis rationale available.</p>
                        {% endif %}
                    </div>
                </details>
            </div>
            {% endfor %}
        </section>
    </main>

    <script>
        const ctx = document.getElementById('beliefChart').getContext('2d');
        new Chart(ctx, {
            type: 'line',
            data: {
                labels: ['Prior'].concat({{ evidence_points|map(attribute='file')|list|tojson }}),
                datasets: [{
                    label: 'Belief Probability',
                    data: [{{ prior_probability }}].concat({{ evidence_points|map(attribute='posterior')|list|tojson }}),
                    borderColor: '#2b6cb0',
                    backgroundColor: 'rgba(43, 108, 176, 0.1)',
                    tension: 0.1
                }]
            },
            options: {
                responsive: true,
                maintainAspectRatio: false,
                scales: {
                    y: {
                        beginAtZero: true,
                        max: 1,
                        ticks: {
                            callback: function(value) {
                                return (value * 100) + '%';
                            }
                        }
                    },
                    x: {
                        ticks: {
                            maxRotation: 45,
                            minRotation: 45
                        }
                    }
                },
                plugins: {
                    tooltip: {
                        callbacks: {
                            label: function(context) {
                                return (context.raw * 100).toFixed(1) + '%';
                            }
                        }
                    }
                }
            }
        });
    </script>
</body>
</html>
"""

In [None]:
# @title
from pathlib import Path
import math
from jinja2 import Environment, FileSystemLoader, BaseLoader, Template

class InFactRenderer:
    def __init__(self, template_dir=None):
        """Initialize renderer with template directory or bundled template"""
        if template_dir:
            self.env = Environment(loader=FileSystemLoader(template_dir))
        else:
            self.env = Environment(loader=BaseLoader())
            self.env.from_string(TEMPLATE)  # TEMPLATE would be the template we created above

    def render_analysis(self, node, output_file=None):
        """Render complete analysis visualization"""
        # template = self.env.get_template('infact.html')
        template = Template(TEMPLATE)

        # Prepare evidence points data
        evidence_points = []
        current_probability = 0


        for point in node.data_points:
            # Calculate probabilities
            posterior = point['posterior']
            likelihood_ratio = math.exp(point['l_plus'] - point['l_minus'])
            prior_prob = math.exp(posterior - (point['l_plus'] - point['l_minus'])) / \
                        (1 + math.exp(posterior - (point['l_plus'] - point['l_minus'])))
            posterior_prob = math.exp(posterior) / (1 + math.exp(posterior))

            evidence_points.append({
                'file': point['metadata'].get('filename', 'Unknown File'),
                'confidence_assessment': point.get('confidence_assessment', {
                    'confidence_score': 0,
                    'explanation': 'No confidence assessment available',
                    'key_strengths': [],
                    'key_limitations': []
                }),
                'prior_prob': prior_prob,
                'likelihood_ratio': likelihood_ratio,
                'posterior': posterior_prob,
                'analysis_rationale': point['analysis_rationale'] if 'analysis_rationale' in point else 'No analysis rationale available'
            })

            current_probability = posterior_prob

        # Calculate prior probability from log odds
        prior_probability = math.exp(node.prior_log_odds) / (1 + math.exp(node.prior_log_odds))

        # Get confidence interval
        ci_low, ci_high = node._calculate_uncertainty()

        # Render template
        html = template.render(
            hypothesis=node.hypothesis,
            prior_probability=prior_probability,
            final_probability=current_probability,
            ci_low=ci_low,
            ci_high=ci_high,
            evidence_points=evidence_points
        )

        # Save to file if requested
        if output_file:
            output_path = Path(output_file)
            output_path.write_text(html)

        return html

In [None]:
# @title
# List files on directory, filtering out those that begin with .
EVIDENCE_DIR = '/content/drive/MyDrive/InFact Prototype/evidence'
evidence_files = list([str(p) for p in Path(EVIDENCE_DIR).glob('[!.]*.*')])
evidence_files

['/content/drive/MyDrive/InFact Prototype/evidence/climate-study.html',
 '/content/drive/MyDrive/InFact Prototype/evidence/emissions-analysis.html',
 '/content/drive/MyDrive/InFact Prototype/evidence/meta-analysis.html']

In [None]:
# @title
from google.colab import userdata
api_key = userdata.get('ANTHROPIC_API_KEY')

In [None]:
# @title
import logging

node = AnthropicInFactNode(
    hypothesis="Human-generated GHG emissions significantly increase global temperatures",
    api_key=api_key,
    log_level=logging.DEBUG  # For maximum detail
)

INFO:__main__:Initializing AnthropicInFactNode with hypothesis: Human-generated GHG emissions significantly increase global temperatures
INFO:__main__:Initialization complete


In [None]:
# @title
node.process_data(evidence_files[0])

INFO:__main__:Processing data file: /content/drive/MyDrive/InFact Prototype/evidence/climate-study.html
INFO:__main__:Parsing data file: /content/drive/MyDrive/InFact Prototype/evidence/climate-study.html
DEBUG:__main__:Processing text file of type .html
DEBUG:__main__:Prepared prompt: 
            Extract relevant data points for evaluating the hypothesis:
            "Human-generated GHG emissions significantly increase global temperatures"
            
            Provide your response as a JSON code block, like this:
            ```json
            {
                "numerical_values": [],
                "metadata": {},
                "issues": [],
                "confidence_assessment": {
                    "confidence_score": 0.75,
                    "explanation": "Detailed explanation of confidence level",
                    "key_strengths": [
                        "Strength 1",
                        "Strength 2"
                    ],
                    "key_limitat

Temperature change z-score: 15.74, p-value: 0.0000
Base log likelihoods - l_plus: 25.20, l_minus: -25.20
Quality-adjusted log likelihoods - l_plus: 24.60, l_minus: -24.60
Final capped log likelihoods - l_plus: 2.50, l_minus: -2.50


(5.0, (0.9753088124811299, 0.9998365728296873))

In [None]:
# @title
node.process_data(evidence_files[1])

INFO:__main__:Processing data file: /content/drive/MyDrive/InFact Prototype/evidence/emissions-analysis.html
INFO:__main__:Parsing data file: /content/drive/MyDrive/InFact Prototype/evidence/emissions-analysis.html
DEBUG:__main__:Processing text file of type .html
DEBUG:__main__:Prepared prompt: 
            Extract relevant data points for evaluating the hypothesis:
            "Human-generated GHG emissions significantly increase global temperatures"
            
            Provide your response as a JSON code block, like this:
            ```json
            {
                "numerical_values": [],
                "metadata": {},
                "issues": [],
                "confidence_assessment": {
                    "confidence_score": 0.75,
                    "explanation": "Detailed explanation of confidence level",
                    "key_strengths": [
                        "Strength 1",
                        "Strength 2"
                    ],
                    "k

Correlation component: 0.664
R-squared component: 1.685
Regression component: -2.745
Quality adjustment: -0.163
Uncertainty penalty: 1.000

Final log likelihoods:
l_plus: -10.875
l_minus: -15.875


(10.0, (0.9996816261099933, 1.0))

In [None]:
# @title
node.process_data(evidence_files[2])

INFO:__main__:Processing data file: /content/drive/MyDrive/InFact Prototype/evidence/meta-analysis.html
INFO:__main__:Parsing data file: /content/drive/MyDrive/InFact Prototype/evidence/meta-analysis.html
DEBUG:__main__:Processing text file of type .html
DEBUG:__main__:Prepared prompt: 
            Extract relevant data points for evaluating the hypothesis:
            "Human-generated GHG emissions significantly increase global temperatures"
            
            Provide your response as a JSON code block, like this:
            ```json
            {
                "numerical_values": [],
                "metadata": {},
                "issues": [],
                "confidence_assessment": {
                    "confidence_score": 0.75,
                    "explanation": "Detailed explanation of confidence level",
                    "key_strengths": [
                        "Strength 1",
                        "Strength 2"
                    ],
                    "key_limitat

ECS likelihood: 1.7077806340693132e-28
Warming likelihood: 2.6649317405792824e-13
Average quality score: 0.8300000000000001
Confidence score: 0.85
Final l_plus: -5.204815281782804
Final l_minus: -0.20481528178280323


(4.999999999999999, (0.9813645501788257, 0.9991985884489322))

In [None]:
# @title
node.save("/content/drive/MyDrive/InFact Prototype/results/analysis.json")

INFO:__main__:Saving node data to /content/drive/MyDrive/InFact Prototype/results/analysis.json


In [None]:
# @title
from IPython.display import HTML
# Render visualization
renderer = InFactRenderer()
output_file = "/content/drive/MyDrive/InFact Prototype/results/analysis.html"
html = renderer.render_analysis(node, output_file)
display(HTML(html))