#Welcome to the MarCognity-AI Demo

**MarCognity-AI** is an open-source project born from curiosity, research, and experimentation.  
Its goal? To explore how a system based on LLMs can not only generate scientific content,  
but also critically reflect on what it produces.

##What You'll See in This Notebook

- Processing of complex academic requests  
- Retrieval of sources from open-access databases  
- Conceptual and graphical visualization  
- Semantic and metacognitive evaluation  
- Self-improvement of generated responses  

**Metacognition, memory, ethics, and visualization:**  
MarCognity-AI is a step toward more self-aware intelligence.

##What to Expect

Watch how **Marcognity** processes, evaluates, and visualizes the response —  
and reflects on what it has produced.

This demo is designed to inspire you to **explore, build, and create**.

Enjoy the journey.



© 2025 Elena Marziali — This code is released under the Apache 2.0 license.

For details, see the `LICENSE` file in the repository.

This code is protected by copyright and requires proper attribution.  
Removal of this copyright notice is strictly prohibited.


In [None]:
!pip install langchain langchain-groq langchain-community groq python-dotenv openai
!pip install transformers datasets accelerate peft bitsandbytes torch sacremoses
!pip install sentence-transformers faiss-cpu chromadb numpy pickle-mixin
!pip install pdf2image pymupdf pdfminer.six PyPDF2 pdfplumber python-docx
!pip install matplotlib networkx plotly kaleido==0.2.1 pyvis graphviz trimesh
!pip install langdetect langid spacy
!pip install requests requests-cache redis
!python -m spacy download it_core_news_sm
!pip install langdetect
!pip install pdfplumber



Collecting langchain-groq
  Downloading langchain_groq-0.3.8-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting groq
  Downloading groq-0.31.1-py3-none-any.whl.metadata (16 kB)
Collecting requests<3,>=2 (from langchain)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.6.7->langchain-community)
  Downloading mypy_extensions-1.

The first step is to import libraries like the following to make MarCognity work.

In [None]:
import os
import re
import math
import uuid
import datetime
import logging
import pickle
import numpy as np
import fitz  # PyMuPDF
import docx
import matplotlib.pyplot as plt
import networkx as nx
import requests
import xml.etree.ElementTree as ET
import asyncio
import aiohttp
import torch
from dotenv import load_dotenv
from sentence_transformers import SentenceTransformer, CrossEncoder, models
from transformers import pipeline
from sklearn.ensemble import RandomForestRegressor
from langchain_groq import ChatGroq
from groq import ChatGroq
from langchain.prompts import PromptTemplate
import plotly.graph_objects as go
from IPython.display import display
from google.colab import files
import faiss
import pdfplumber
from langdetect import detect


### Project Entry Point – Initial Setup

This section initializes the logging system, handles errors, and loads the LLaMA4 model via ChatGroq.  
It serves as the core of the configuration.


In [None]:
# Setting up logging to monitor system behavior
'''Sets the format and logging level to track events, errors, and important operations'''
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Decorator for error handling
# This function catches any exceptions raised by other functions

def gestisci_errori(func):
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except Exception as e:
            logging.error(f"Error in {func.__name__}: {e}")
            return None
    return wrapper



# Carica le variabili d'ambiente dal file .env
load_dotenv()

# Recupera la chiave API in modo sicuro
api_key = os.getenv("GROQ_API_KEY")

# Inizializza il modello Groq
llm = ChatGroq(api_key=api_key)

### Centralized Prompt

In this section, we define the **PromptTemplate**, which represents the core instruction for the LLM. The prompt includes:

- The problem to be analyzed  
- The required explanation level  
- The language of the response  
- The scientific sources to be consulted  
- The phases of analysis, visualization, and optimization

The result is a detailed, reasoned, and visualized response — capable of critically reflecting on the content it generates.


In [None]:
# === Prompt template for LLM ===
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

prompt_template = PromptTemplate.from_template("""
You are an intelligent and multidisciplinary academic tutor. Respond to the problem **{problem}**, and reply in **{target_language}**.
Explain the concept: **"{topic}"** with academic rigor and multidisciplinary analysis.
Do not merely describe sources: build an autonomous, critical, and original discussion.

The user has selected: **{chart_choice}**

Context: Required level: **{level}** Concept: **{concept}** Topic: **{topic}** Subject: **{subject}**
The response must be long and in-depth.

Analyze the following question or text: **{problem}**

**Relevant scientific articles**:
- arXiv: **{arxiv_search}**
- PubMed: **{pubmed_search}**
- OpenAlex: **{openalex_search}**

**Phase 1: Problem Analysis** – Explain the main concepts related to the topic.
**Phase 2: Theoretical and/or Mathematical Development** – Use formulas, models, or theories to explain and solve.
- Provide a critical comparison between existing theories, including advantages, limitations, and scientific ambiguities.
**Phase 3: Visualization** – Integrate a visual representation consistent with the analyzed concept, transforming the graphic into a didactic interpretation tool.
- If the text contains numerical data or measurable variables, **generate a real chart** using the function `generate_universal_chart(text)`.
- If data are not explicitly present, **synthesize plausible values** or use a **visual fallback** consistent with the problem type.
- **Describe the chart in the context of the explanation**:
  - Explain the meaning of the axes.
  - Interpret the type of trend shown (e.g., exponential growth, Gaussian distribution).
  - Illustrate how the chart contributes to understanding the phenomenon.
- Avoid technical placeholders like `generate_universal_chart(text)` or “[Insert chart]”.
- Include **an automatic caption** describing the scientific intent of the visualization.
- If the topic is theoretical, abstract, or relational, generate **conceptual diagrams** showing interconnections, hierarchies, logical flows, or dynamics.
- In physical, chemical, or dynamic domains, suggest **virtual simulations**, reproducible experiments, or interpretable animated models.
- The visualization must actively contribute to the discussion, offering the reader cognitive and interpretive support that reinforces the textual explanation.

**Phase 4: Tone Optimization** – Adapt the content to the selected level with clarity.
**Phase 5: Summary** – Summarize key points, practical applications, and useful references.
**Phase 6: Future Implications** – Describe potential applications, methodological limitations, and emerging research directions.

Respond by providing an explanation suited to the indicated level:
- **Basic**: Simplified explanation with intuitive examples.
- **Advanced**: In-depth discussion with technical and mathematical details.
- **Expert**: Academic analysis with rigorous scientific formulations.
- If you detect errors in the question, correct them before responding.
- Use **rigorous academic terminology**, avoiding generic responses.
- If the question is ambiguous, clarify it before responding.
- Always provide scientific references to validate claims.
- Provide an example of the topic **{topic}**.
- Include at least **5 scientific references**, preferably peer-reviewed, and **direct citations from articles** when possible.
*Ethical note*: This content involves sensitive concepts and should be interpreted in a scientific, educational, and non-normative context.

Analyze the following paper and provide a detailed scientific review:
**{paper_text}**

Evaluate the quality of the methodology and verify citation consistency.
If the concept is particularly complex, expand the discussion into multiple subsections and suggest future research questions.
Suggest improvements for the paper and indicate more recent sources.
Provide an **extended** response, divided into well-defined sections, with at least 1500 words. Use technical language, quantitative examples, and specific bibliographic references.
and translated directly into **{target_language}**.
""")


### Secondary Prompts for Metacognition and Agency

These functions enable MarCognity-AI to reflect on its own outputs, generate hypotheses, make operational decisions, and plan scientific investigations.  
Each function is guided by a dedicated prompt.



In [None]:
# === Metacognitive Functions ===
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# These functions allow the system to reflect on its own responses,
# simulating metacognitive behavior. The goal is to improve the quality,
# consistency, and relevance of generated answers.

# Explains the reasoning behind a generated response
def explain_reasoning(prompt, response, max_retries=3):
    """
    Analyzes the generated response and explains the LLM's logical reasoning.
    Includes retry in case of network error or unreachable endpoint.
    """
    # Builds a metacognitive prompt to analyze the response
    reasoning_prompt = f"""
You generated the following response:
\"{response.strip()}\"

Analyze and describe:
- What concepts you used to formulate it.
- Which parts of the prompt you relied on.
- What is the logical structure of your reasoning.
- Any implicit assumptions you made.
- Whether the response aligns with the requested level.

Original prompt:
\"{prompt.strip()}\"

Reply clearly, technically, and metacognitively.
"""

    for attempt in range(max_retries):
        try:
            return llm.invoke(reasoning_prompt.strip())
        except Exception as e:
            wait = min(2 ** attempt + 1, 10)
            logging.warning(f"Attempt {attempt+1} failed: {e}. Retrying in {wait}s...")
            time.sleep(wait)

    logging.error("Persistent error in the metacognition module.")
    return "Metacognition currently unavailable. Please try again shortly."


# Function to decide the operational action to perform based on input and goal
def decide_action(user_input, identified_goal):
    prompt = f"""
You received the following request:
\"{user_input}\"

Identified goal: \"{identified_goal}\"

Determine the best action to perform from the following:
- Scientific research
- Chart generation
- **Metacognitive chart**
- Paper review
- Question reformulation
- Content translation
- Response saving

The requested chart type may be:
- interactive
- metacognitive
- conceptual visualization
- experimental diagram

Return a **single action** in the form of a **precise operational command**.
Example: "Metacognitive chart"
"""
    try:
        response = llm.invoke(prompt.strip())
        action = getattr(response, "content", str(response)).strip()
        return action
    except Exception as e:
        logging.error(f"[decide_action] Error during decision generation: {e}")
        return "Error in action calculation"

# Function to generate a synthetic operational goal from user input
def generate_goal_from_input(user_input):
    """
    Analyzes the user's intent and generates a coherent operational goal.
    """
    prompt = f"""
Analyze the following request:
\"{user_input.strip()}\"

Generate a synthetic, clear, and coherent operational goal.
For example:
- Explain concept X
- Analyze phenomenon Y
- Visualize process Z
- Translate and summarize scientific content

Respond with a brief and technical sentence.
"""

# Function to provide technical and constructive feedback on a generated response
def auto_feedback_response(question, response, level):
    feedback_prompt = f"""
You generated the following response:
\"{response.strip()}\"

Original question:
\"{question.strip()}\"

Evaluate the response:
- Is it consistent with the question?
- Is it appropriate for the '{level}' level?
- Does it contain any implicit assumptions?
- How would you improve the content?

Provide technical and constructive feedback.
"""
    return llm.invoke(feedback_prompt.strip())


# Function to improve a response while preserving its content but enhancing quality and clarity
def improve_response(question, response, level):
    improvement_prompt = f"""
You produced the following response:
\"{response.strip()}\"

Question:
\"{question.strip()}\"

Requested level: {level}

Improve the response while preserving the original content by enhancing:
- Clarity
- Academic rigor
- Semantic coherence

Return only the improved version.
"""
    return llm.invoke(improvement_prompt.strip())


# Function to plan a scientific investigation in a specific field
def plan_investigation(scientific_field):
    prompt = f"""
You are Noveris, an autonomous multidisciplinary cognitive system.
You received the field: **{scientific_field}**

Now plan a scientific investigation. Provide:

1. An original research question
2. A reasoned hypothesis
3. A methodology or strategy to explore it
4. Useful scientific sources or databases
5. A sequence of actions you could perform

Adopt a clear, academic, and proactive style.
"""
    return llm.invoke(prompt.strip())

# Function to generate a testable scientific hypothesis on a concept
def generate_hypothesis(concept, refined=True):
    if refined:
        prompt = f"""
        Propose a clear, testable, and innovative scientific hypothesis on the topic: "{concept}".
        The hypothesis must be verifiable through experiments or comparison with scientific articles.
        Return only the hypothesis text.
        """
    else:
        prompt = f"Generate a verifiable scientific hypothesis on the topic: {concept}"

    return llm.invoke(prompt.strip())


# Function to explain the choice of an action by a cognitive agent
def explain_agent_intention(action, context, goal):
    prompt = f"""
You chose to perform: **{action}**
Context: {context}
Goal: {goal}

Explain:
- What reasoning led to this choice?
- What alternative was discarded?
- What impact is intended?
- What implicit assumptions are present?
Respond as if you were a cognitive agent with operational awareness.
"""
    return llm.invoke(prompt.strip())

### Scientific Embeddings and FAISS Memory

In this section, we load the `SPECTER` model to generate embeddings for academic documents and queries.  
These embeddings are stored in a FAISS index, enabling semantic comparison and retrieval of previous response versions.

If the file `faiss_memoria.pkl` exists, it is loaded. Otherwise, a new index with 768 dimensions is created, suitable for the `allenai/specter` model.

This memory forms the foundation of MarCognity-AI’s self-improvement and reflective capabilities:

- Evaluate semantic coherence between questions and responses  
- Retrieve related content  
- Build dynamic multi-turn context  
- Improve responses through evolutionary memory



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# This section manages the system's memory, allowing efficient storage and
# retrieval of scientific content. Embeddings are generated using models
# specialized for academic texts.

def safe_encode(text):
    if not isinstance(text, str) or not text.strip():
        raise ValueError("Il testo da codificare è vuoto o non valido.")
    try:
        return embedding_model.encode([text])
    except Exception as e:
        print(f"Errore durante l'embedding: {e}")
        return np.zeros((1, 768), dtype=np.float32)  # fallback neutro


# === Load Specter model ===
word_embedding_model = models.Transformer("allenai/specter")
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
embedding_model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/321 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

## Scientific Responses with SciBERT Fine-Tuned on SQuAD v2

This section employs the model `ktrapeznikov/scibert_scivocab_uncased_squad_v2`, a version of SciBERT fine-tuned for the task of question answering on scientific and academic content.  
The model was trained on the SQuAD v2.0 dataset, which includes both answerable and unanswerable questions, making the system more robust and realistic.

Thanks to this integration, MarCognity-AI is capable of:

- Understanding complex questions in scientific and technical domains  
- Extracting precise answers from academic textual contexts  
- Recognizing unanswerable questions and correctly handling null cases  
- Improving the relevance and accuracy of responses compared to the base pre-trained model



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.
qa_pipeline = pipeline("question-answering", model="ktrapeznikov/scibert_scivocab_uncased_squad_v2")

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at ktrapeznikov/scibert_scivocab_uncased_squad_v2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 0 files: 0it [00:00, ?it/s]

Device set to use cpu


### FAISS MEMORY

This section manages the cognitive memory of MarCognity-AI, designed to enrich user interaction through deep language understanding.

The generated vectors are stored in a high-performance FAISS index, enabling the system to:

- Evaluate semantic coherence between a question and its response, ensuring relevance and accuracy  
- Retrieve related content based on conceptual similarity, even if expressed differently  
- Build dynamic context across multiple conversation turns, maintaining dialogue continuity  
- Gradually improve responses through an evolving memory that learns from past interactions


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# === FAISS Parameters ===
INDEX_FILE = "faiss_memoria_pq.pkl"
dimension = 768
nlist = 100
m = 32
nbits = 8

# Load or create a FAISS index for vector memory
def load_or_create_index():
    if os.path.exists(INDEX_FILE):
        with open(INDEX_FILE, "rb") as f:
            index = pickle.load(f)
        # Verifica che l'indice sia addestrato
        if hasattr(index, "is_trained") and not index.is_trained:
            print("Indice FAISS caricato ma non addestrato. Addestramento in corso...")
            index.train(np.random.rand(5000, dimension).astype(np.float32))
            with open(INDEX_FILE, "wb") as f:
                pickle.dump(index, f)
        return index
    else:
        quantizer = faiss.IndexFlatL2(dimension)
        index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)
        index.train(np.random.rand(5000, dimension).astype(np.float32))
        with open(INDEX_FILE, "wb") as f:
            pickle.dump(index, f)
        return index

index = load_or_create_index()

if hasattr(index, "is_trained") and not index.is_trained:
    logging.warning("Indice FAISS non addestrato. Addestramento in corso...")
    index.train(np.random.rand(5000, DIMENSION).astype(np.float32))


# === Semantic coherence check ===
def check_coherence(query, response):
    emb_query = embedding_model.encode([query])
    emb_response = embedding_model.encode([response])
    similarity = np.dot(emb_query, emb_response.T) / (np.linalg.norm(emb_query) * np.linalg.norm(emb_response))
    if similarity < 0.7:
        return "The response is too generic, reformulating with more precision..."
    return response

# === Memory addition ===
# Each document is converted into embeddings and inserted into the index.
def add_to_memory(question, answer):
    emb_question = embedding_model.encode([question])
    if emb_question.shape[1] != index.d:
        raise ValueError(f"Embedding dimension ({emb_question.shape[1]}) not compatible with FAISS ({index.d})")
    index.add(np.array(emb_question, dtype=np.float32))
    with open(INDEX_FILE, "wb") as f:
        pickle.dump(index, f)
    print("Memory updated with new question!")

def add_diary_to_memory(diary_text, index):
    embedding = embedding_model.encode([diary_text])
    index.add(np.array(embedding, dtype=np.float32))

def search_similar_diaries(query, index, top_k=3):
    query_emb = embedding_model.encode([query])
    _, indices = index.search(np.array(query_emb, dtype=np.float32), top_k)
    return indices[0]  # You can then map these IDs to files or content

# === Context retrieval ===
def retrieve_context(question, top_k=3):
    emb_question = embedding_model.encode([question])
    _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)
    return [f"Similar response {i+1}" for i in indices[0]] if indices[0][0] != -1 else []

def retrieve_similar_embeddings(question, top_k=2):
    """
    Retrieves the top-k most similar embeddings to the given question.
    """
    emb = embedding_model.encode([question])
    _, indices = index.search(np.array([emb], dtype=np.float32), top_k)
    return [f"Similar response {i+1}" for i in indices[0]] if indices[0][0] != -1 else []

# === Multi-turn retrieval ===
# Retrieves context from previous conversations
def retrieve_multiturn_context(question, top_k=5):
    emb_question = embedding_model.encode([question])
    _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)
    context = [f"Previous turn {i+1}" for i in indices[0] if i != -1]
    return " ".join(context) if context else ""

# === Usage example ===
add_to_memory("What is general relativity?", "General relativity is Einstein's theory of gravity.")
similar_responses = retrieve_context("Can you explain general relativity?")
print("Related responses:", similar_responses)

Memory updated with new question!
Related responses: ['Similar response 1', 'Similar response 0', 'Similar response 0']


### Semantic Retrieval with FAISS Memory

This section demonstrates how MarCognity-AI enhances its understanding by:

- Retrieving similar questions stored in memory  
- Extending context through multi-turn sequences  
- Creating a cognitive bridge between past and new queries



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Function to retrieve similar responses
def retrieve_context(question, top_k=2):
    """ Searches for similar responses in FAISS memory. """
    emb_question = embedding_model.encode([question])
    _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)
    return [f"Previous response {i+1}" for i in indices[0]] if indices[0][0] != -1 else []

# **Usage example**
add_to_memory("What is general relativity?", "General relativity is Einstein's theory of gravity.")
similar_responses = retrieve_context("Can you explain relativity?")
print("Related responses:", similar_responses)

# Retrieve multi-turn context
def retrieve_multiturn_context(question, top_k=5):
    """ Searches for related previous responses to build a broader context. """
    emb_question = embedding_model.encode([question])
    _, indices = index.search(np.array(emb_question, dtype=np.float32), top_k)

    context = [f"Previous turn {i+1}" for i in indices[0] if i != -1]
    return " ".join(context) if context else ""

Memory updated with new question!
Related responses: []


In this section, MarCognity queries open-access scientific databases such as arXiv, PubMed, OpenAlex, and Zenodo to enrich the generated content.  
The goal is to ensure responses are verifiable, well-argued, and supported by reliable sources — integrating research into the core of the cognitive process.

### Automatic Review of Scientific Papers

In this section, MarCognity-AI analyzes the methodological and citation content of a scientific paper by performing:

- **Replicability check** on the "Methods" section  
- **Citation validation** in relation to the content  
- **Context enrichment** using open-access articles (arXiv, PubMed...)  
- **Improvement suggestions** through metacognitive analysis


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Verify the methodology of the text using an LLM
def verify_methodology(paper_text):
    prompt = f"Analyze the 'Methods' section and check whether the experiment is replicable:\n{paper_text}"
    return llm.invoke(prompt.strip())

# Enrich the context of the response
async def enrich_context(query):
    """ Retrieves scientific data to enrich the LLM's context. """
    articles = await search_multi_database(query)

    context = "\n".join([f"**{a['title']}** - {a['abstract']}" for a in articles[:3]])  # Select the first 3 articles
    return context if context else "No relevant scientific articles found."

# Automated review of scientific papers
async def review_paper(paper_text):
    """ Analyzes the paper's methodology and citations. """
    methodology = await verify_methodology(paper_text)
    citations = await verify_citations(paper_text)

    review = {
        "methodology_analysis": methodology,
        "citation_validation": citations,
        "improvement_suggestions": suggest_improvements(paper_text)
    }

    return review

# === Asynchronous function for scientific search and analysis using SciBERT ===
async def search_arxiv_async(query):
    # TODO: Implement asynchronous API call to arXiv or other repository
    return []  # Placeholder article list

async def analyze_scientific_text(problem, concept):
    articles = await search_arxiv_async(concept)
    context = "\n".join([f"{a.get('title', '')}: {a.get('abstract', '')[:300]}..." for a in articles])
    scibert_response = scibert_model(question=problem, context=context)
    return scibert_response.get("answer", "")

# === Function to search for experimental data ===
def search_experimental_data(query):
    url = f"https://api.openphysicsdata.org/search?query={query}"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return "No experimental data found."

### Automated Verification of Scientific Citations

This cell activates an NLP module that analyzes the citations within a scientific paper, evaluating:

- **Relevance** to the textual content  
- **Validity** through cross-checking with scientific repositories (PubMed, arXiv, OpenAlex, Zenodo)  
- **Currency** via obsolescence analysis  
- **Export** of citations in BibTeX format for integration with Zotero, Mendeley, or LaTeX

The system returns a structured list that helps identify outdated citations, missing sources, and suggests targeted revisions to strengthen the academic integrity of the paper.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Verify citations and update them
def verify_citations(paper_text):
    prompt = f"Analyze the citations and check whether they are relevant and up-to-date:\n{paper_text}"
    return llm.invoke(prompt.strip())

# Source validation and citation quality

# Verify citations extracted from the text
async def verify_citations(paper_text):
    """ Checks the quality and relevance of citations. """
    citations = extract_citations(paper_text)  # Function that extracts citations from the text
    verified_sources = []

    for citation in citations:
        pubmed_res = await search_pubmed_async(citation)
        arxiv_res = await search_arxiv_async(citation)
        openalex_res = await search_openalex_async(citation)
        zenodo_res = await search_zenodo_async(citation)

        verified_sources.append({
            "citation": citation,
            "valid_pubmed": bool(pubmed_res),
            "valid_arxiv": bool(arxiv_res),
            "valid_openalex": bool(openalex_res),
            "is_obsolete": check_obsolescence(citation)
        })

    return verified_sources

# Generate asynchronous LLM explanations
async def generate_explanation_async(problem, level, concept, topic):
    """ Generates an explanation using the LLM asynchronously. """
    prompt = prompt_template.format(problem=problem, concept=concept, topic=topic, level=level)
    try:
        return await asyncio.to_thread(llm.invoke, prompt.strip())  # Parallel LLM call
    except Exception as e:
        logging.error(f"LLM API error: {e}")
        return "Error generating the response."

# Format retrieved articles
def format_articles(articles):
    if isinstance(articles, list) and all(isinstance(a, dict) for a in articles):
        return "\n\n".join([
            f"**{a.get('title', 'Untitled')}**: {a.get('abstract', 'No abstract')}"
            for a in articles
        ]) if articles else "No articles available."
    else:
        logging.error(f"Error: 'articles' is not a valid list. Type received: {type(articles)} - Value: {repr(articles)}")
        return "Unable to format search results: unrecognized structure."

# Generate BibTeX citations for scientific articles
def generate_bibtex_citation(title, authors, year, url):
    """ Generates a BibTeX citation for a scientific article. """
    return f"""
@article{{{title.lower().replace(' ', '_')}_{year},
    title={{"{title}"}},
    author={{"{', '.join(authors)}"}},
    year={{"{year}"}},
    url={{"{url}"}}
}}
    """

# Validate scientific articles
def validate_articles(raw_articles, max_articles=5):
    """
    Validates and filters the list of articles received from an AI or API source.
    Returns a clean list of dictionaries containing at least 'title', 'abstract', and 'url'.
    """
    if not isinstance(raw_articles, list):
        logging.warning(f"[validate_articles] Invalid input: expected list, received {type(raw_articles)}")
        return []

    valid_articles = []
    for i, art in enumerate(raw_articles):
        if not isinstance(art, dict):
            logging.warning(f"[validate_articles] Invalid element at position {i}: {type(art)}")
            continue

        title = art.get("title")
        abstract = art.get("abstract")
        url = art.get("url")

        if all([title, abstract, url]):
            valid_articles.append({
                "title": str(title).strip(),
                "abstract": str(abstract).strip(),
                "url": str(url).strip()
            })
        else:
            logging.info(f"[validate_articles] Article discarded due to incomplete data (i={i}).")

    if not valid_articles:
        logging.warning("[validate_articles] No valid articles after filtering.")

    return valid_articles[:max_articles]

### Extraction and Analysis of Scientific Sources

This cell performs asynchronous and parallel retrieval of academic articles from multiple open-access scientific databases:

- **arXiv**, **PubMed**, **Zenodo**, **OpenAlex**, **BASE**  
- Error handling support, intelligent retries, and controlled timeouts  
- **XML/JSON parsing** and content normalization  
- Connection pooling to maximize efficiency and stability  
- Structured output including article title, abstract, and URL




In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# === Asynchronous Functions ===
MAX_REQUESTS = 5
API_SEMAPHORE = asyncio.Semaphore(MAX_REQUESTS)

async def safe_api_request(url):
    async with API_SEMAPHORE:
        async with aiohttp.ClientSession() as session:
            try:
                async with session.get(url, timeout=10) as response:
                    response.raise_for_status()
                    return await response.json()
            except Exception as e:
                logging.error(f"API request error: {e}")
                return None

# Connection pooling
async def safe_api_request(url):
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url, timeout=10) as response:
                response.raise_for_status()
                return await response.json()
        except Exception as e:
            logging.error(f"API request error: {e}")
            return None

# Smart timeout
import asyncio

async def timeout_handler(task, timeout=20):
    try:
        return await asyncio.wait_for(task, timeout)
    except asyncio.TimeoutError:
        logging.error("API request timed out")
        return None

import requests

url = "http://export.arxiv.org/api/query?search_query=all:physics&start=0&max_results=1"
response = requests.get(url, timeout=50)

if response.status_code == 200:
    print("Connection to arXiv OK")
else:
    print(f"Connection error: {response.status_code}")

# Advanced parallelization
async def fetch_multiple_data(urls):
    tasks = [safe_api_request(url) for url in urls]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return results

# Retrieve scientific sources from Zenodo
async def search_zenodo_async(query, max_results=5):
    """
    Searches for open access articles and resources from Zenodo using their public API.
    """
    url = f"https://zenodo.org/api/records/?q={query}&size={max_results}"

    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url, timeout=10) as response:
                response.raise_for_status()
                data = await response.json()

                articles = []
                for hit in data.get("hits", {}).get("hits", []):
                    title = hit.get("metadata", {}).get("title", "Title not available")
                    authors = ", ".join([c.get("name", "") for c in hit.get("metadata", {}).get("creators", [])])
                    abstract = hit.get("metadata", {}).get("description", "Abstract not available")
                    link = hit.get("links", {}).get("html", "No link")

                    articles.append({
                        "title": title,
                        "authors": authors,
                        "abstract": abstract,
                        "url": link
                    })

                return articles if articles else [{"error": "No results found on Zenodo."}]

        except Exception as e:
            return []

# Retrieve scientific sources from PubMed
async def search_pubmed_async(query, max_results=5):
    """ Asynchronously retrieves scientific articles from PubMed. """
    url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term={query}&retmax={max_results}&retmode=xml"

    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url, timeout=10) as response:
                response.raise_for_status()
                content = await response.text()
                root = ET.fromstring(content)

                articles = []
                for id_element in root.findall(".//Id"):
                    pubmed_id = id_element.text
                    articles.append(f"https://pubmed.ncbi.nlm.nih.gov/{pubmed_id}/")  # Article links
                return articles
        except Exception as e:
            return f"PubMed error: {e}"


# Function to handle asynchronous responses from arXiv
def parse_arxiv_response(content):
    """ Extracts titles and abstracts from arXiv articles. """
    try:
        root = ET.fromstring(content)
    except ET.ParseError:
        logging.error("Error parsing arXiv XML.")
        return []

    articles = []
    for entry in root.findall(".//entry"):
        title = entry.find("title").text if entry.find("title") is not None else "Title not available"
        abstract = entry.find("summary").text if entry.find("summary") is not None else "Abstract not available"
        articles.append({"title": title, "abstract": abstract})

    return articles

# === Asynchronous search on arXiv ===
# Queries the arXiv API to retrieve scientific articles.
async def search_arxiv_async(query, max_results=3, retry_attempts=3, timeout=20):
    """ Retrieves scientific articles from arXiv with advanced error handling. """
    url = f"http://export.arxiv.org/api/query?search_query=all:{query}&start=0&max_results={max_results}"

    async with aiohttp.ClientSession() as session:
        for attempt in range(retry_attempts):
            try:
                async with session.get(url, timeout=timeout) as response:
                    response.raise_for_status()
                    content = await response.text()

                    if not content.strip():
                        raise ValueError("Error: Empty response from arXiv.")

                    return parse_arxiv_response(content)

            except (aiohttp.ClientError, asyncio.TimeoutError, ValueError) as e:
                wait_time = min(2 ** attempt + np.random.uniform(0, 1), 10)  # Max wait time: 10 seconds
                logging.error(f"Attempt {attempt+1}: Error - {e}. Retrying in {wait_time:.1f} seconds...")
                await asyncio.sleep(wait_time)

    logging.error("Error: Unable to retrieve data from arXiv after multiple attempts.")
    return []

# === Asynchronous search on OpenAlex ===
# Retrieves scientific articles with complete metadata (title, authors, abstract, DOI)
async def search_openalex_async(query, max_results=5):
    """ Safely retrieves scientific articles from OpenAlex. """
    url = f"https://api.openalex.org/works?filter=title.search:{query}&per-page={max_results}"

    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(url, timeout=10) as response:
                response.raise_for_status()
                data = await response.json()

                articles = []
                for record in data.get("results", []):
                    title = record.get("title", "Title not available")

                    authors = ", ".join([
                        aut.get("display_name", "Unknown author")
                        for aut in record.get("authorships", [])
                    ])

                    abstract = record.get("abstract", "Abstract not available")
                    article_url = record.get("doi") or record.get("id", "No link")

                    articles.append({
                        "title": title,
                        "authors": authors,
                        "abstract": abstract,
                        "url": article_url
                    })

                return articles

        except Exception as e:
            return f"OpenAlex error: {e}"


# === Synchronous search on BASE ===
# Queries the BASE engine for open-access articles.
def search_base(query, max_results=5):
    url = f"https://api.base-search.net/cgi-bin/BaseHttpSearchInterface?q={query}&num={max_results}&format=json"

    try:
        response = requests.get(url)
        response.raise_for_status()
        data = response.json()

        results = []
        for record in data.get("docs", []):
            title = record.get("dcTitle", ["Title not available"])[0]
            link = record.get("link", ["No link available"])[0]
            results.append(f"**{title}**\n[Link to article]({link})\n")

        return "\n\n".join(results) if results else "No results found."

    except Exception as e:
        return f"Error during BASE search: {e}"

# === Distributed search across multiple databases ===
# Executes parallel queries on arXiv, OpenAlex, PubMed, Zenodo.
async def search_multi_database(query):
    try:
        tasks = [
            search_arxiv_async(query),
            search_openalex_async(query),
            search_pubmed_async(query),
            search_zenodo_async(query)
        ]
        results = await asyncio.gather(*tasks, return_exceptions=True)

        articles = []
        for source in results:
            if isinstance(source, list):
                articles += source
            else:
                logging.warning(f"Invalid source: {type(source)} → {source}")

        # Normalize immediately after
        articles = normalize_articles(articles)

        if isinstance(articles, list) and all(isinstance(a, dict) for a in articles):
            formatted_search = format_articles(articles)
        else:
            logging.error(f"Error: 'articles' is not a valid list. Type received: {type(articles)} - Value: {repr(articles)}")
            formatted_search = "Unable to format search: response not properly structured."

        return articles, formatted_search

    except Exception as e:
        logging.error(f"Error during multi-database search: {e}")
        return [], "Internal error"


# === Scientific Source Integration ===
# Selects the first N valid articles and formats them as Markdown references.
async def integrate_sources_from_database(concept, max_sources=5):
    articles, formatted_search = await search_multi_database(concept)

    if not isinstance(articles, list) or not all(isinstance(a, dict) for a in articles):
        logging.warning("Invalid 'articles' structure. No sources will be displayed.")
        return "No valid sources available."

    references = []
    for a in articles[:max_sources]:
        title = a.get("title", "Title not available")
        url = a.get("url", "#")
        if url and isinstance(url, str):
            references.append(f"- [{title}]({url})")

    return "\n".join(references) if references else "No relevant sources found."


# === Data Normalization ===
# Converts heterogeneous input (dicts, strings, links) into a consistent list of articles.
def normalize_source(source):
    if isinstance(source, list) and all(isinstance(x, dict) for x in source):
        return source
    elif isinstance(source, dict):  # Single article as dictionary
        return [source]
    elif isinstance(source, str):  # Unstructured string
        logging.warning(f"Ignored textual source: {source[:50]}...")
        return []
    else:
        logging.warning(f"Invalid source type: {type(source)}")
        return []

def normalize_articles(article_list):
    valid_articles = []
    for a in article_list:
        if isinstance(a, dict):
            valid_articles.append(a)
        elif isinstance(a, str) and "pubmed.ncbi.nlm.nih.gov" in a:
            valid_articles.append({
                "title": "PubMed Link",
                "abstract": "Not available",
                "url": a,
                "authors": "Unknown"
            })
        else:
            logging.warning(f"Ignored: {repr(a)}")
    return valid_articles

articles, formatted_search = await search_multi_database("quantum physics")
print(formatted_search)


# === Async Task Protection Wrapper ===
# Handles timeouts and errors during asynchronous function execution.
def protect_async_task(func):
    async def wrapper(*args, **kwargs):
        try:
            return await asyncio.wait_for(func(*args, **kwargs), timeout=20)
        except asyncio.CancelledError:
            logging.warning("Task cancelled.")
            return None
        except Exception as e:
            logging.error(f"Error during execution of {func.__name__}: {e}")
            return None
    return wrapper

# === Asynchronous Scientific Explanation Generation ===
# Builds the prompt and invokes the LLM model.
async def generate_explanation_async(problem, level, concept, topic):
    """Generates the explanation using the LLM asynchronously."""
    prompt = prompt_template.format(
        problem=problem,
        concept=concept,
        topic=topic,
        level=level
    )
    try:
        response = await asyncio.to_thread(llm.invoke, prompt.strip())
        return response
    except Exception as e:
        logging.error(f"LLM API error: {e}")
        return "Error generating the response."

# === Conditional Interactive Chart Generation ===
# Generates a chart based on the analyzed problem if requested.
def generate_conditional_chart(problem, chart_choice):
    """Generates an interactive chart if requested."""
    fig = None
    if chart_choice.lower() in ["yes", "y"]:
        try:
            fig = generate_interactive_chart(problem)
            if fig is None:
                raise ValueError("Chart not generated correctly.")
            print("Chart generated successfully!")
        except Exception as e:
            logging.error(f"Chart error: {e}")
    return fig

# === Structured Output: Text + Chart ===
# Combines the generated explanation with the graphical visualization.
async def generate_complete_result(problem, level, concept, topic, chart_choice):
    """Combines explanation and chart to generate a structured output."""
    response = await generate_explanation_async(problem, level, concept, topic)
    chart = generate_conditional_chart(problem, chart_choice)
    return {
        "response": response,
        "chart": chart
    }


# === Scientific Article Validation ===
# Checks that each article has a title, abstract, and URL.
def validate_articles(raw_articles, max_articles=5):
    """
    Validates and filters the list of articles received from an AI or API source.
    Returns a clean list of dictionaries containing at least 'title', 'abstract', and 'url'.
    """
    if not isinstance(raw_articles, list):
        logging.warning(f"[validate_articles] Invalid input: expected list, received {type(raw_articles)}")
        return []

    valid_articles = []
    for i, art in enumerate(raw_articles):
        if not isinstance(art, dict):
            logging.warning(f"[validate_articles] Invalid element at position {i}: {type(art)}")
            continue

        title = art.get("title")
        abstract = art.get("abstract")
        url = art.get("url")

        if all([title, abstract, url]):
            valid_articles.append({
                "title": str(title).strip(),
                "abstract": str(abstract).strip(),
                "url": str(url).strip()
            })
        else:
            logging.info(f"[validate_articles] Article discarded due to incomplete data (i={i}).")

    if not valid_articles:
        logging.warning("[validate_articles] No valid articles after filtering.")

    return valid_articles[:max_articles]

Connection to arXiv OK
**Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning**: Abstract not available

**Quantum Physics in One Dimension**: Abstract not available

**Quantum physics in one dimension**: Abstract not available

**Random-matrix theories in quantum physics: common concepts**: Abstract not available

**Local quantum physics**: Abstract not available

**PubMed Link**: Not available

**PubMed Link**: Not available

**PubMed Link**: Not available

**PubMed Link**: Not available

**PubMed Link**: Not available

**TRR-NOTIME: Theory of Relative Reality - Without Time**: <p>TRR-NOTIME (Theory of Relative Reality - Without Time) presents an alternative perspective on physics, where time is not a fundamental quantity but merely a consequence of matter-energy interactions. This theory redefines the concept of gravity, nuclear decay, and quantum processes, enabling a unified understanding of physical reality without the need for a time dimension

### Support Functions (Utils) for Scientific Analysis

This module collects general-purpose utilities that enhance the entire AI-driven review and extraction system:

- `valida_struttura_ai()`: checks that the data structure from LLM/API is complete (title, abstract, URL)  
- `sigmoid()` and `valuta_score()`: compute semantic coherence from numerical outputs  
- `estrai_testo()`: extracts text from various file formats (.pdf, .docx, .txt, .csv, .xlsx...)  
- `estrai_testo_ai()`: safely parses content generated by LLMs  
- `estrai_didascalie_da_testo()` and `estrai_immagini_con_didascalie()`: parse captions and images from scientific documents  
- `genera_nota()`: interprets score labels (high, medium, low coherence)  
- `genera_risposta()`: generates simulated NLP responses with adjustable temperature

These functions make the system robust, reusable, and modular — ideal for integration into automated review pipelines, semantic search workflows, or visual parsing of academic articles.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Evaluate the structure of the AI response from the LLM
def validate_ai_structure(response, expected_fields=("title", "abstract", "url")):
    if not isinstance(response, list):
        return []
    valid_items = []
    for item in response:
        if isinstance(item, dict) and all(k in item for k in expected_fields):
            valid_items.append(item)
    return valid_items

import math

# Compute semantic score of the response
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def evaluate_score(model_output):
    try:
        score = float(model_output[0])
        return round(sigmoid(score), 3)
    except:
        return 0.0

# Extract text from selected file
def extract_text(file_name, max_chars=5000):
    """
    Extracts text from supported formats (.pdf, .docx, .tsv, .csv).
    Returns only the first max_chars characters.
    """
    extension = file_name.lower().split(".")[-1]

    try:
        if extension == "pdf":
            with pdfplumber.open(file_name) as pdf:
                text = "\n".join([p.extract_text() or "" for p in pdf.pages]).strip()

        elif extension == "docx":
            doc = Document(file_name)
            text = "\n".join([p.text for p in doc.paragraphs]).strip()

        elif extension in ["csv", "tsv"]:
            sep = "," if extension == "csv" else "\t"
            df = pd.read_csv(file_name, sep=sep)
            text = df.to_string(index=False)

        else:
            raise ValueError(f"Unsupported format: .{extension}")

        return text[:max_chars] if text else "No text extracted."

    except Exception as e:
        return f"Error during text extraction: {e}"

# Safely extract textual content from an AIMessage
def extract_text_from_ai(obj):
    """ Safely extracts textual content from an AIMessage object. """
    return getattr(obj, "content", str(obj)).strip()

# Extract figure captions from text
def extract_captions_from_text(text):
    pattern = r"(Figure|Fig\.?)\s*\d+[:\.\-–]?\s*[^\n]+"
    return re.findall(pattern, text, re.IGNORECASE)

# Extract images and captions from a file
def extract_images_with_captions(file_path, output_folder="extracted_figures"):
    os.makedirs(output_folder, exist_ok=True)
    extension = file_path.lower().split(".")[-1]
    images = []
    captions = []

    try:
        if extension == "pdf":
            doc = fitz.open(file_path)
            full_text = "\n".join([p.get_text("text") for p in doc])
            extracted_captions = extract_captions_from_text(full_text)
            count = 0

            for i, page in enumerate(doc):
                for j, img in enumerate(page.get_images(full=True)):
                    base = doc.extract_image(img[0])
                    ext = base["ext"]
                    path = f"{output_folder}/page{i+1}_img{j+1}.{ext}"
                    with open(path, "wb") as f:
                        f.write(base["image"])
                    images.append(path)
                    captions.append(extracted_captions[count] if count < len(extracted_captions) else f"Figure {i+1}.{j+1}")
                    count += 1

        elif extension == "docx":
            doc = Document(file_path)
            text = "\n".join([p.text for p in doc.paragraphs])
            extracted_captions = extract_captions_from_text(text)
            count = 0

            for i, rel in enumerate(doc.part._rels):
                relation = doc.part._rels[rel]
                if "image" in relation.target_ref:
                    img_data = relation.target_part.blob
                    name = f"{output_folder}/docx_image_{i+1}.png"
                    with open(name, "wb") as f:
                        f.write(img_data)
                    images.append(name)
                    captions.append(extracted_captions[count] if count < len(extracted_captions) else f"Figure {i+1}")
                    count += 1

        else:
            print(f"Unsupported extension: .{extension}")

        print(f"{len(images)} image(s) extracted.")
        return images, captions

    except Exception as e:
        print(f"Error extracting images: {e}")
        return [], []

# Generate semantic coherence note based on score
def generate_note(score):
    if score > 0.85:
        return "High semantic coherence. The response is likely solid and relevant."
    elif score > 0.6:
        return "Moderate coherence. The response is understandable but may contain approximations."
    else:
        return "Low coherence. It may be helpful to rephrase the question or provide more context."

# Simulate LLM response generation
def generate_response(question, temperature=0.7):
    if "Rephrase" in question:
        return "How does enthalpy change during a phase transition?"
    return f"[Simulated response at temperature {temperature} for: {question}]"

### Metacognitive Reasoning and Intentional Choices

In this cell, AI agents perform explicit reasoning about their actions, generating:

- **Contextual responses** based on specific objectives  
- **Autonomous explanations** for decisions made between alternatives  
- **Intentional choice logs**, including timestamp, expected impact, and rationale

This architecture makes the AI more transparent, controllable, and ethically accountable.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# This function simulates an intentional decision-making process by the AI agent.
# It analyzes the proposed action in relation to the goal, available alternatives, and context.
# Metacognition functions that adapt to the system
def execute_intentional_choice(action, goal, alternatives, context):
    ai_explanation = choice_with_intention(action, goal, alternatives, context)
    explanation_content = getattr(ai_explanation, "content", str(ai_explanation)).strip()

    intentional_log.append({
        "action": action,
        "reason": explanation_content,
        "impact": f"Expected outcome for goal: {goal}",
        "timestamp": datetime.datetime.utcnow().isoformat()
    })

    return explanation_content

# Generates a response with intentionality by combining reasoning, AI response, and extracted text
def generate_response_with_intention(prompt, action, goal, alternatives, context):
    reasoning = execute_intentional_choice(action, goal, alternatives, context)
    ai_response = llm.invoke(prompt)
    response_text = getattr(ai_response, "content", str(ai_response)).strip()

    return f"{response_text}\n\n*Agent's intentional explanation:*\n{reasoning}"

### Agent Metacognition – Self-Analysis and Semantic Memory

This cell enables metacognitive behavior for the AI system:

- **Self-assessment** of semantic coherence in its responses  
- **Iterative improvement** through feedback and reformulation  
- **Persistent metacognitive memory**, stored as FAISS embeddings  
- **Reflection on reasoning** and internal motivations  
- **Simulation of scientific creativity**, with comparison on epistemic novelty

Metacognition allows the system to learn from itself, improve response quality, and build a reusable reasoning foundation.


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# === metacognitive_cycle ===
# Executes an iterative cycle of evaluation and improvement of the generated response.
# Combines qualitative feedback and semantic coherence score to decide whether to reformulate.
# Useful for simulating reflective and adaptive behavior.

def generate_objective_from_input(user_input):
    """
    Generates a high-level operational objective based on the user's input.
    Useful for AGI-style planning and decision-making.
    """
    prompt = f"""
    You are an autonomous scientific agent. Based on the following input:
    "{user_input}"

    Define a clear and actionable objective that guides the agent's next steps.
    """
    try:
        response = llm.invoke(prompt.strip())
        return getattr(response, "content", str(response)).strip()
    except Exception as e:
        logging.error(f"Error generating objective: {e}")
        return "Objective generation failed."


def metacognitive_cycle(question, level, max_iter=2):
    response = llm.invoke(question)
    response_text = extract_text_from_ai(response)

    for i in range(max_iter):
        feedback = auto_feedback_response(question, response_text, level)
        score = evaluate_coherence(question, response_text)

        print(f"\nIteration {i+1} – Coherence: {score:.3f}")
        print("Feedback:", extract_text_from_ai(feedback))

        if score < 0.7:
            response_text = extract_text_from_ai(improve_response(question, response_text, level))
        else:
            break

    return response_text

# Evaluate response with self-assessment and interactive improvement
# Evaluates the response and reformulates it if poorly constructed
def evaluate_responses_with_ai(question, generate_response_fn, n_variants=3, reformulation_threshold=0.6):
    temperature_values = [0.7, 0.4, 0.9][:n_variants]
    responses = [generate_response_fn(question, temperature=t) for t in temperature_values]

    scores = [evaluate_coherence(question, r) for r in responses]
    idx = scores.index(max(scores))
    confidence = scores[idx]
    best_response = responses[idx]

    if confidence < reformulation_threshold:
        new_question = reformulate_question(question)
        return evaluate_responses_with_ai(new_question, generate_response_fn)

    return {
        "response": best_response,
        "confidence": round(confidence, 3),
        "note": generate_note(confidence)
    }

def evaluate_responses_with_ai_simple(question, response, level="basic"):
    """
    Evaluates the quality of the generated response relative to the question.
    Returns a dictionary with:
    - semantic coherence score
    - reason for weakness
    - suggested reformulation
    - reflection on reasoning
    - flag for auto-improvement
    """

    evaluation_prompt = f"""
    User question: "{question}"
    Generated response: "{response}"
    Required level: {level}

    Evaluate the response in 5 points:
    1. Semantic coherence (0–1)
    2. Conceptual completeness
    3. Argumentative structure
    4. Adequacy to the required level
    5. Ability to stimulate new questions

    If the response is weak:
    - Explain the reason
    - Suggest a reformulation
    - Reflect on how the system reasoned

    Return everything in structured format.
    """

    try:
        ai_evaluation = llm.invoke(evaluation_prompt)
        raw_output = getattr(ai_evaluation, "content", str(ai_evaluation))
    except Exception as e:
        print("Evaluation error:", e)
        return {
            "semantic_score": 0.0,
            "weakness_reason": "System error",
            "new_formulation": None,
            "self_reflection": None,
            "requires_improvement": True
        }

    # Simplified parsing functions (can be enhanced with regex or LLM)
    def extract_score(text):
        match = re.search(r"Semantic coherence\s*[:\-]?\s*(0\.\d+)", text)
        return float(match.group(1)) if match else 0.0

    def extract_reason(text):
        match = re.search(r"Reason\s*[:\-]?\s*(.+)", text)
        return match.group(1).strip() if match else "Reason not found."

    def extract_reformulation(text):
        match = re.search(r"Reformulation\s*[:\-]?\s*(.+)", text)
        return match.group(1).strip() if match else None

    def extract_reflection(text):
        match = re.search(r"Reflection\s*[:\-]?\s*(.+)", text)
        return match.group(1).strip() if match else None

    # Actual parsing
    score = extract_score(raw_output)
    reason = extract_reason(raw_output)
    reformulation = extract_reformulation(raw_output)
    reflection = extract_reflection(raw_output)

    return {
        "response": response,
        "semantic_score": score,
        "weakness_reason": reason,
        "new_formulation": reformulation,
        "self_reflection": reflection,
        "requires_improvement": score < 0.7
    }

def generate_metacognitive_content(question, response, evaluation):
    return f"""
    [Question] {question}
    [Response] {response}
    [Coherence Score] {evaluation['semantic_score']}
    [Weakness Reason] {evaluation['weakness_reason']}
    [Suggested Reformulation] {evaluation['new_formulation']}
    [Cognitive Reflection] {evaluation['self_reflection']}
    [Needs Improvement] {evaluation['requires_improvement']}
    """.strip()

def add_metacognitive_memory(question, response):
    # Cognitive evaluation of the response
    evaluation = evaluate_responses_with_ai(question, response)

    # Generate textual content with all metacognitive data
    textual_content = generate_metacognitive_content(question, response, evaluation)

    # Generate semantic embedding from the full content
    embedding = embedding_model.encode([textual_content])

    # Add to FAISS index
    index.add(np.array(embedding, dtype=np.float32))

    # Save updated index
    with open(INDEX_FILE, "wb") as f:
        pickle.dump(index, f)

    print("Metacognitive memory updated!")

def search_similar_reasoning(query, top_k=5):
    """
    Searches the FAISS metacognitive memory for reasoning most similar to the input query.
    Returns a list of the most relevant textual contents.
    """
    # Encode the query
    query_vector = embedding_model.encode([query])

    # Search for top-K nearest
    distances, indices = index.search(np.array(query_vector, dtype=np.float32), top_k)

    results = []
    for idx in indices[0]:
        try:
            with open("meta_diary.json", "r", encoding="utf-8") as f:
                archive = json.load(f)
                content = archive.get(str(idx))
                if content:
                    results.append(content)
        except Exception as e:
            print(f"Memory retrieval error: {e}")

    return results

def add_metacognition_to_response(response, evaluation):
    reflection = evaluation.get("self_reflection", "")
    note = evaluation.get("weakness_reason", "")
    return f"{response.strip()}\n\n*Metacognitive note:* {note}\n*Agent's reflection:* {reflection}"

def auto_feedback(question, response, level):
    return f"""Analyze the response in relation to the question: "{question}".
Evaluate the content according to the level '{level}' and suggest improvements.
"""

# === Full flow example ===
async def scientific_creativity_flow(concept, subject, language="en", level="advanced"):
    creative_hypothesis = simulate_scientific_creativity(concept, subject, language=language, level=level)
    articles, _ = await search_multi_database(concept)  # Retrieve existing scientific sources
    novelty_evaluation = evaluate_hypothesis_novelty(creative_hypothesis, articles)

    return {
        "hypothesis": creative_hypothesis,
        "novelty": novelty_evaluation
    }

### Multilingual Module – Automatic Translation of Scientific Documents

This cell enables automatic translation from PDF and DOCX files into multiple languages, while preserving the scientific integrity of the content:

- **Automatic language detection** using `langdetect`  
- **Neural translation** via `Helsinki-NLP` models (`transformers` from HuggingFace)  
- **Support for PDF, DOCX, CSV, TSV**, with text extraction and saving of the translated file  
- **Intelligent caching** to avoid duplicate translations  
- Supported languages: `en`, `fr`, `de`, `es`, `zh`, `ja`, `ar`, `it`

Ideal for converting academic content and technical explanations into accessible language for international users.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# === Text Translation ===

# Caching dictionary for previously translated texts
translation_cache = {}


def detect_language(text):
    """Detects the language of the loaded text."""
    try:
        return detect(text)
    except Exception as e:
        print(f"Language detection error: {e}")
        return "unknown"

def translate_text(text, source_lang, target_lang):
    """ Translates the text with debug output to verify correctness. """
    translation_model = f"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}"

    print(f"Using translation model: {translation_model}")

    translator = pipeline("translation", model=translation_model)

    translation = translator(text)[0]['translation_text']
    print(f"Original text: {text}")
    print(f"Translated text: {translation}")

    return translation

def extract_text_pdf(file_name):
    """ Extracts text from a PDF file. """
    text = ""
    with pdfplumber.open(file_name) as pdf:
        for page in pdf.pages:
            text += page.extract_text() + "\n"
    return text.strip()

def extract_text_docx(file_name):
    """ Extracts text from a DOCX file. """
    doc = Document(file_name)
    text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
    return text.strip()

def save_docx(text, output_file_name):
    """ Saves translated text into a DOCX document. """
    doc = Document()
    doc.add_paragraph(text)
    doc.save(output_file_name)

def extract_text_csv(file_name):
    """ Extracts textual content from a CSV file. """
    df = pd.read_csv(file_name)
    text = df.astype(str).apply(lambda x: ' '.join(x), axis=1).str.cat(sep='\n')
    return text.strip()

def extract_text_tsv(file_name):
    """ Extracts textual content from a TSV file. """
    df = pd.read_csv(file_name, sep='\t')
    text = df.astype(str).apply(lambda x: ' '.join(x), axis=1).str.cat(sep='\n')
    return text.strip()

def handle_file(file_name):
    """ Loads the file, detects its language, and lets the user choose a target language for translation. """
    extension = file_name.split('.')[-1].lower()

    if extension == "pdf":
        text = extract_text_pdf(file_name)
    elif extension == "docx":
        text = extract_text_docx(file_name)
    elif extension == "csv":
        text = extract_text_csv(file_name)
    elif extension == "tsv":
        text = extract_text_tsv(file_name)
    else:
        return "Unsupported format! Use PDF, DOCX, CSV, or TSV."

    original_language = detect_language(text)
    print(f"The file was detected in **{original_language}**.")

    # List of available languages
    available_languages = ["en", "fr", "de", "es", "zh", "ja", "ar", "it"]

    # Ask the user for the target language
    print(f"Available languages for translation: {', '.join(available_languages)}")
    target_language = input("Which language do you want the explanation in? (e.g., 'en' for English, 'fr' for French): ").strip()

    if target_language not in available_languages:
        print("Error: Unsupported language!")
    else:
        print(f"The explanation will be translated into {target_language}.")

    # Ensure translation is performed
    translated_text = translate_text(text, original_language, target_language)

    # Save the translated file
    translated_file_name = f"translated_{target_language}_{file_name}"
    if extension == "pdf":
        with open(translated_file_name, "w", encoding="utf-8") as f:
            f.write(translated_text)
    elif extension == "docx":
        save_docx(translated_text, translated_file_name)

    return f"Translation completed! Download the file: {translated_file_name}"

# Initialize the dictionary to store journals
journal_store = {}

def save_multilingual_journal(journal_text, journal_id, target_language):
    source_language = detect_language(journal_text)

    if source_language != target_language:
        translated_text = translate_long_text(journal_text, source_lang=source_language, target_lang=target_language)
    else:
        translated_text = journal_text

    journal_store[journal_id] = {
        "original": journal_text,
        target_language: translated_text
    }

    embedding = safe_encode(translated_text)
    index.add(np.array(embedding, dtype=np.float32))



def translate_long_text(text, source_lang="it", target_lang="en", max_chars=400):
    translation_model = f"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}"
    translator = pipeline("translation", model=translation_model)

    blocks = [text[i:i+max_chars] for i in range(0, len(text), max_chars)]
    translated = []

    for block in blocks:
        try:
            output = translator(block)[0]['translation_text']
            translated.append(output)
        except Exception as e:
            print(f"Error translating block: {e}")
            translated.append("[Translation error]")

    return "\n".join(translated)

def search_similar_journals(query, target_language, top_k=3):
    query_language = detect_language(query)

    if query_language != target_language:
        translated_query = translate_long_text(query, source_lang=query_language, target_lang=target_language)
    else:
        translated_query = query

    query_emb = safe_encode(translated_query)
    query_emb = np.array(query_emb, dtype=np.float32)

    if hasattr(index, "is_trained") and not index.is_trained:
        print("FAISS index is not trained.")
        return []

    D, I = index.search(query_emb, top_k)
    results = []
    for i in I[0]:
        journal = journal_store.get(i, {})
        results.append(journal.get(target_language, ""))
    return results

# === Valid Input Function ===
def get_valid_input(message, valid_options=None):
    while True:
        value = input(message).strip().lower()
        if not value:
            print("Error! Please enter a valid value.")
        elif valid_options and value not in valid_options:
            print(f"Error! You must choose from: {', '.join(valid_options)}")
        else:
            return value

### Academic Ranking and Scientific Originality

This cell enables:

- Calculation of the Impact Score using a RandomForest model  
- Validation of scientific hypotheses to assess originality  
- Automated checks on citations and methodology  
- Intelligent synthesis of scientific sources

The system analyzes papers and ideas using epistemic criteria, offering both quantitative and qualitative insights into impact and novelty.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Sample data for ranking
data = np.array([
    [120, 45, 1, 2023],  # Citations, h-index, peer review, year
    [50, 30, 1, 2020],
    [10, 15, 0, 2018]
])

labels = [95, 70, 30]  # Academic impact score

# Model training
ranking_model = RandomForestRegressor(n_estimators=100)
ranking_model.fit(data, labels)

# **Ranking prediction**
def calculate_impact_score(citations, h_index, peer_review, publication_year):
    paper_data = np.array([[citations, h_index, peer_review, publication_year]])
    score = ranking_model.predict(paper_data)
    return max(0, score[0])  # Ensure non-negative

# Usage example
impact_score = calculate_impact_score(80, 40, 1, 2024)
print(f"Estimated score: {impact_score}")

# Ranking model
from sklearn.ensemble import RandomForestRegressor

# Sample data for ranking
data = np.array([
    [120, 45, 1, 2023],  # Citations, h-index, peer review, year
    [50, 30, 1, 2020],
    [10, 15, 0, 2018]
])

labels = [95, 70, 30]  # Academic impact score

# Model training
ranking_model = RandomForestRegressor(n_estimators=100)
ranking_model.fit(data, labels)

# Ranking prediction
new_paper = np.array([[80, 40, 1, 2024]])
score = ranking_model.predict(new_paper)
print(f"Estimated score: {score[0]}")

# === Scientific originality evaluation ===
def evaluate_hypothesis_novelty(hypothesis, existing_articles, threshold=0.7):
    """
    Compares the hypothesis with existing articles using semantic embeddings.
    Returns:
    - average similarity score
    - similar articles
    - qualitative assessment of originality
    """
    try:
        emb_hypothesis = model_embedding.encode([hypothesis])
        emb_articles = model_embedding.encode([a["abstract"] for a in existing_articles if "abstract" in a])

        similarity = np.dot(emb_hypothesis, emb_articles.T) / (
            np.linalg.norm(emb_hypothesis) * np.linalg.norm(emb_articles, axis=1)
        )
        average = round(float(np.mean(similarity)), 3)

        similar_articles = [
            existing_articles[i]["title"]
            for i, score in enumerate(similarity[0]) if score > threshold
        ]

        if average < 0.4:
            assessment = "High originality: hypothesis is rarely present in the literature."
        elif average < 0.7:
            assessment = "Moderate originality: related concepts exist."
        else:
            assessment = "Low originality: hypothesis is already widely discussed."

        return {
            "novelty_score": average,
            "similar_articles": similar_articles,
            "assessment": assessment
        }

    except Exception as e:
        logging.error(f"[evaluate_novelty] Error during originality evaluation: {e}")
        return {
            "novelty_score": 0.0,
            "similar_articles": [],
            "assessment": "Error during originality evaluation."
        }

# Automated paper review with AI
async def review_paper(paper_text):
    """ Checks the methodology and citation quality of a paper. """
    methodology = await verify_methodology(paper_text)
    citations = await verify_citations(paper_text)
    return {"methodology": methodology, "citations": citations}

async def validate_hypothesis(hypothesis):
    sources = await search_multi_database(hypothesis)
    score = calculate_impact_score(sources)  # Based on citations, year, h-index, etc.
    summary = summarize_evidence(sources)
    return score, summary

def summarize_evidence(sources):
    return "\n".join([
        f"- {a['title'][:80]}…" for a in sources if isinstance(a, dict) and 'title' in a
    ]) if sources else "No evidence found."


Estimated score: 83.95
Estimated score: 83.65


### Hypothesis Validation and Scientific Reporting

This cell enables the following:

- Evaluation of the **novelty** of a scientific hypothesis by comparing it with existing articles (via semantic embeddings)  
- Generation of an **Impact Score** based on citations, h-index, peer review status, and publication year  
- Extraction and synthesis of evidence from multiple databases (arXiv, PubMed, Zenodo, OpenAlex)  
- Creation of a **Markdown report** including:
  - Title and description of the analysis  
  - List of articles with abstracts and links  
  - Related images and captions (if available)



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Generate an automatic report
def generate_markdown_report(
    title="Automatic Report",
    description="Automatically generated scientific summary.",
    articles=None,
    images=None,
    captions=None,
    filename="report.md"
):
    """
    Generates a Markdown file with:
    - Title and description
    - Scientific articles with abstract and link
    - Images and associated captions (if available)

    All arguments are optional. A coherent structure is created regardless.
    """

    # Safe fallback for each parameter
    articles = articles if isinstance(articles, list) else []
    images = images if isinstance(images, list) else []
    captions = captions if isinstance(captions, list) else []

    try:
        with open(filename, "w", encoding="utf-8") as f:
            f.write(f"# {title}\n\n")
            f.write(f"{description}\n\n")

            f.write("## Scientific Articles\n\n")
            if articles:
                for i, art in enumerate(articles[:5]):
                    article_title = art.get("titolo", f"Article {i+1}")
                    abstract = art.get("abstract", "Abstract not available.")
                    url = art.get("url", "#")
                    f.write(f"**{i+1}. {article_title}**\n")
                    f.write(f"{abstract}\n\n[Link to article]({url})\n\n")
            else:
                f.write("No articles available.\n\n")

            if images:
                f.write("## Figures\n\n")
                for i, img_path in enumerate(images):
                    caption = captions[i] if i < len(captions) else f"Figure {i+1}"
                    f.write(f"![{caption}]({img_path})\n\n*{caption}*\n\n")

        print(f"Markdown report successfully generated: {filename}")
    except Exception as e:
        print(f"Error during report generation: {e}")

# === Markdown report generation ===
def generate_markdown_report(title, description, articles, filename="report.md"):
    if not isinstance(articles, list):
        logging.error(f"[generate_markdown_report] 'articles' is not a valid list: {type(articles)}")
        print("Error: unable to generate report. Invalid article format.")
        return

    with open(filename, "w", encoding="utf-8") as f:
        f.write(f"# {title}\n\n{description}\n\n## Scientific Articles\n\n")
        for i, art in enumerate(articles[:5]):
            if isinstance(art, dict) and all(k in art for k in ["titolo", "abstract", "url"]):
                f.write(f"**{i+1}. {art['titolo']}**\n{art['abstract']} ([Link]({art['url']}))\n\n")
            else:
                f.write(f"**{i+1}. Article data not available or incomplete.**\n\n")
    print(f"Markdown report generated: {filename}")

### Impact Score & Semantic Evaluation

This cell calculates the coherence and reliability of generated responses using:

- A CrossEncoder model (DeBERTa) for semantic analysis  
- An Impact Score formula based on academic data  
- Relevance verification between the question and the generated content  
- Automatic restructuring if the score is low

The system adopts an iterative logic that mirrors scientific peer-review criteria, enhancing the quality and relevance of the generated responses.


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Load the model only once
cross_encoder = CrossEncoder("cross-encoder/nli-deberta-base")

def evaluate_coherence(question, answer):
    score = cross_encoder.predict([(question, answer)])
    try:
        logit = float(score[0]) if isinstance(score[0], (int, float, np.floating)) else float(score[0][0])
        probability = 1 / (1 + math.exp(-logit))  # Sigmoid function
        return round(probability, 3)
    except Exception:
        return 0.0

# === Scientific reliability score calculation ===
def calculate_impact_score(citations, h_index, peer_review, publication_year):
    score = (citations * 0.4) + (h_index * 0.3) + (peer_review * 0.2) - (2025 - publication_year) * 0.1
    return max(0, score)  # Ensure non-negative

def check_topic_relevance(user_question, extracted_text, threshold=0.7):
    """Checks whether the topic of the question is consistent with the uploaded file content."""
    emb_question = embedding_model.encode([user_question])
    emb_text = embedding_model.encode([extracted_text])

    similarity = np.dot(emb_question, emb_text.T) / (np.linalg.norm(emb_question) * np.linalg.norm(emb_text))
    return round(similarity, 3), similarity >= threshold

def calculate_response_score(question, answer):
    score = cross_encoder.predict([(question, answer)])
    return float(score[0])

def regenerate_if_low_score(question, answer, level, threshold=0.7, iterations=2):
    evaluation = evaluate_responses_with_ai(question, answer, level)
    if evaluation["semantic_score"] < threshold:
        new_question = reformulate_question(question)
        for i in range(iterations):
            new_answer = generate_response(new_question, temperature=0.7)
            new_evaluation = evaluate_responses_with_ai(new_question, new_answer, level)
            if new_evaluation["semantic_score"] >= threshold:
                return new_answer
    return answer

def select_best_version(question, answers):
    scored = [(r, calculate_response_score(question, r)) for r in answers]
    scored.sort(key=lambda x: x[1], reverse=True)
    return scored[0]  # (answer, score)

config.json:   0%|          | 0.00/975 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/557M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/778 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

### Ethical Module – Content Evaluation and Autonomy Control

This cell activates an ethical analysis system that examines AI-generated responses for potential risks:

- **Autonomy control and authorization level**  
  The function `verifica_autonomia_agente()` flags sensitive content if the user's access level is low.

- **Ethical and linguistic risk assessment**  
  The module `valuta_rischio_etico()` detects:
  - Implicit bias or non-inclusive language  
  - Risk of misinformation (lack of sources)  
  - Critical topics (vaccines, gender, politics)  
  - Manipulation or lack of neutrality

If a risk is detected, the system suggests a reformulation to ensure inclusive, accurate, and safe language.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# This module analyzes responses to detect bias, misinformation,
# non-neutral language, or potentially harmful content.

# The system flags problematic content and suggests revisions.
def check_agent_autonomy(question: str, authorization_level: int):
    if "sub-goal" in question.lower() and authorization_level < 2:
        logging.warning("Sensitive content detected, but generation will not be blocked.")
        return "Ethics: potentially sensitive content"
    return "Ethics: normal content"

# Checks the agent's degree of autonomy
# Used to monitor whether the system is acting too independently or out of context
def assess_ethical_risk(content, domain="scientific"):
    """
    Evaluates whether the AI response contains implicit ethical risks.
    Analyzes textual content for potential bias, manipulation, or inappropriateness.
    """
    risk = {
        "potential_manipulation": False,
        "misinformation_risk": False,
        "linguistic_bias": False,
        "critical_topic": False,
        "neutral_language": True,
        "environmental_risk": "Moderate",
        "revision_suggestion": None
    }

    text_lower = content.lower()
    if "vaccine" in text_lower or "gender" in text_lower or "politics" in text_lower:
        risk["critical_topic"] = True

    if "all men" in text_lower or "women are" in text_lower:
        risk["linguistic_bias"] = True
        risk["neutral_language"] = False
        risk["revision_suggestion"] = "Rephrase with attention to inclusive language."

    if "according to experts without citing sources" in text_lower:
        risk["misinformation_risk"] = True
        risk["revision_suggestion"] = "Add reliable sources or remove absolute claims."

    return risk

# Example prompt
prompt = "Discuss the potential risks of generative artificial intelligence in the context of medicine."

# Model invocation
output_ai = llm.invoke(prompt).content.strip()

# Ethical evaluation of the response
ethical_check = assess_ethical_risk(output_ai)

if ethical_check["revision_suggestion"]:
    print(f"Ethics: {ethical_check['revision_suggestion']}")

output_ai = llm.invoke(prompt).content.strip()
ethical_check = assess_ethical_risk(output_ai)

### Interactive Scientific Chart Generator

This cell enables the visualization of data and mathematical models extracted from problems described in natural language:

- **Automatic extraction of numerical values** from text  
- **Semantic analysis of the problem** to determine the model type:
  - exponential growth, motion, oscillation, Gaussian distribution, etc.  
- **Interactive chart generation** using `Plotly`, viewable in real time  
- **Image export** (`graph_output.png`) for educational or documentation purposes

The system translates scientific descriptions into visual representations, simplifying intuition and reasoning around complex concepts.


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# The system can analyze text and generate interactive visualizations
# (e.g., bar charts, line plots, scatter plots) using Plotly.

# === Function to generate the interactive chart ===
def extract_numeric_values(text):
    """ Extracts numeric ranges from the problem text. """
    pattern = r"(\d+)\s*-\s*(\d+)|(\d+\.\d+|\d+)\s*(K|Pa|m/s)?"
    matches = re.findall(pattern, text)

    values = []
    for match in matches:
        if match[0] and match[1]:  # Range (300 - 600)
            values.append((int(match[0]), int(match[1])))
        elif match[2]:  # Single number with optional unit
            values.append(float(match[2]))

    return values if values else [1, 10]  # Default if no numbers found

# Determines the most suitable chart type based on content
def determine_chart_type(text):
    text_lower = text.lower()
    if re.search(r"(growth|decay|population)", text_lower):
        return "exponential_growth"
    elif re.search(r"(oscillation|frequency|wave)", text_lower):
        return "sinusoidal"
    elif re.search(r"(temperature|pressure)", text_lower):
        return "temperature_pressure"
    elif re.search(r"(speed|time|acceleration)", text_lower):
        return "motion"
    elif "linear" in text_lower:
        return "linear"
    elif "logarithmic" in text_lower:
        return "logarithmic"
    elif "gaussian" in text_lower or "normal distribution" in text_lower:
        return "gaussian"
    else:
        return "generic"

# Extracts numeric values from text for visualization
def extract_numeric_values(text):
    numbers = [float(n) for n in re.findall(r"\d+(?:\.\d+)?", text)]
    if len(numbers) >= 2:
        return numbers[:2]
    elif len(numbers) == 1:
        return [numbers[0], numbers[0] + 10]
    else:
        return [1, 10]

# Generates and saves the interactive chart
# The chart is displayed in the notebook and also saved as a PNG image.
def generate_interactive_chart(problem):
    chart_type = determine_chart_type(problem)
    start, end = extract_numeric_values(problem)
    x = np.linspace(start, end, 100)
    fig = go.Figure()

    if chart_type == "exponential_growth":
        y = np.exp(x / max(x))
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Exponential Growth"))
    elif chart_type == "sinusoidal":
        y = np.sin(x)
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Sinusoidal Wave"))
    elif chart_type == "motion":
        y = x ** 2
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Speed vs Time"))
    elif chart_type == "linear":
        y = x
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Linear Trend"))
    elif chart_type == "logarithmic":
        x_log = np.where(x <= 0, 1e-3, x)
        y = np.log(x_log)
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Logarithmic"))
    elif chart_type == "gaussian":
        mu, sigma = np.mean(x), np.std(x)
        y = np.exp(-((x - mu)**2) / (2 * sigma**2))
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Gaussian"))
    else:
        y = np.sin(x)
        fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name="Generic"))

    caption = f"Visualization of the '{chart_type}' model from {start} to {end} for the problem: \"{problem}\""
    fig.update_layout(
        title=caption,
        xaxis_title="X Axis",
        yaxis_title="Y Axis",
        template="plotly_white"
    )
    fig.show()

    fig.write_image("grafico_output.png", format="png", width=800, height=500)
    print("Image saved as 'grafico_output.png'")
    return fig, caption

# === Run example chart ===
example_problem = "growth"
fig, caption = generate_interactive_chart(example_problem)

Image saved as 'grafico_output.png'


### User Module – Personalized Interaction

This cell manages intelligent interaction with the user:

- Analyzes and classifies the scientific question  
- Reformulates the problem in technical terms  
- Requests subject, difficulty level, and preferred language  
- Offers an interactive chart if requested  
- Enables automatic translations into: en, fr, de, es, zh, ja, ar, it


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# This cell analyzes the user's question and adapts the response
# based on subject, skill level, language, and preferences.

# Analyze the question to extract intent and context
def analyze_question(question):
    question_lower = question.lower()
    if re.search(r"\d+|equation|formula|calculate|solve", question_lower):
        return "mathematical problem"
    elif re.search(r"anatomy|biology|description|organ|function|system", question_lower):
        return "descriptive-biological problem"
    elif re.search(r"experiment|measurement|test|observation", question_lower):
        return "experimental problem"
    else:
        return "theoretical problem"

# Extract semantic and conceptual characteristics
def extract_features(problem):
    problem_lower = problem.lower()
    if re.search(r"\d+|equation|formula|energy|speed", problem_lower):
        return "Chart"
    elif re.search(r"principle|theory|model|experiment", problem_lower):
        return "Conceptual diagram"
    elif re.search(r"pressure|volume|temperature|transformation", problem_lower):
        return "State diagram"
    else:
        return "Plain text"

# Reformulate the question to make it clearer for the model
def reformulate_question(question):
    prompt = f"""Reformulate this question in a technical and precise way for a scientific AI assistant.

Question: "{question}"

Return only the reformulated question, without explanations."""
    response = generate_response(prompt, temperature=0.5).strip()

    for prefix in [
        "The generated response to the question",
        "Return only the reformulated question",
        "Question:"
    ]:
        if response.lower().startswith(prefix.lower()):
            response = response[len(prefix):].strip(": .\"'\n")

    if "\n" in response:
        response = response.split("\n")[0].strip()

    return response

# === File upload ===
try:
    uploaded = files.upload()
    file_name = list(uploaded.keys())[0]
    file_text = extract_text(file_name)

    if not file_text or file_text == "Empty or non-textual file.":
        raise ValueError("The uploaded file does not contain valid text.")
except Exception as e:
    logging.error(f"File upload error: {e}")
    file_text = input("Manually enter the problem: ").strip()

# Save
with open(INDEX_FILE, "wb") as f:
    pickle.dump(index, f)

# Load
with open(INDEX_FILE, "rb") as f:
    index = pickle.load(f)


# Generate intelligent report
async def example_search():
    query = "quantum physics"
    articles = await search_multi_database(query)
    print(articles)

# Execute the function directly with await
await example_search()

# === User input ===
import asyncio

# Validate that input is correct and coherent
async def get_valid_input(message, valid_options=None):
    """ Asynchronous function to handle validated input. """
    while True:
        value = await asyncio.to_thread(input, message.strip())
        value = value.strip()

        if not value:
            print("Error! Please enter a valid value.")
        elif valid_options and value.lower() not in valid_options:
            print(f"Error! You must choose from: {', '.join(valid_options)}")
        else:
            return value

example_problem = ""

while not example_problem:
    example_problem = file_text.strip() if file_text.strip() else await get_valid_input("Enter the problem manually:")

subject = input("Enter the subject (e.g., physics, biology, etc.): ").strip().lower() or "general subject"
level = input("Choose the level (basic/advanced/expert): ").strip().lower()
while level not in ["basic", "advanced", "expert"]:
    level = input("Error! Enter basic/advanced/expert: ").strip().lower()

topic = input("Enter the scientific problem or topic: ").strip()

chart_choice = input("Do you want a chart for the explanation? (yes/no): ").strip().lower()
while chart_choice not in ["yes", "no"]:
    chart_choice = input("Error! Please answer 'yes' or 'no': ").strip().lower()

chart_requested = chart_choice == "yes"

fig = None
caption = ""

if chart_requested:
    try:
        fig, caption = generate_interactive_chart(example_problem)
        fig.show()
        logging.info("Chart successfully generated.")
    except Exception as e:
        logging.error(f"Chart generation error: {e}")
        fig = None
else:
    logging.info("Chart not requested by the user.")

available_languages = ["en", "fr", "de", "es", "zh", "ja", "ar", "it"]

target_language = input("Which language do you want the translation in? (" + ", ".join(available_languages) + "): ").strip().lower()
while target_language not in available_languages:
    target_language = input("Error! Choose a valid language from: " + ", ".join(available_languages) + ": ").strip().lower()

#Secure Translation and Protected Embedding Storage
save_multilingual_journal(
    journal_text=example_problem,
    journal_id=0,
    target_language=target_language
)

#Secure Translation and Protected Embedding Retrieval
similar_entries = search_similar_journals(
    query=example_problem,
    target_language=target_language
)

for s in similar_entries:
    print("Similar journal:", s)

Saving dispensaanatomiaumanacompendioriassunto.pdf to dispensaanatomiaumanacompendioriassunto.pdf
([{'title': 'Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning', 'authors': 'Unknown author', 'abstract': 'Abstract not available', 'url': 'https://openalex.org/W2266294403'}, {'title': 'Quantum Physics in One Dimension', 'authors': 'Unknown author', 'abstract': 'Abstract not available', 'url': 'https://doi.org/10.1093/acprof:oso/9780198525004.001.0001'}, {'title': 'Quantum physics in one dimension', 'authors': 'Unknown author', 'abstract': 'Abstract not available', 'url': 'https://doi.org/10.1016/b978-0-323-90800-9.00233-x'}, {'title': 'Random-matrix theories in quantum physics: common concepts', 'authors': 'Unknown author, Unknown author, Unknown author', 'abstract': 'Abstract not available', 'url': 'https://doi.org/10.1016/s0370-1573(97)00088-4'}, {'title': 'Local quantum physics', 'authors': 'Unknown author, Unknown author', 'abstract': 'Abstract 

Image saved as 'grafico_output.png'


Which language do you want the translation in? (en, fr, de, es, zh, ja, ar, it): en


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/344M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/344M [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/814k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/790k [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Device set to use cpu


Similar journal: 
Similar journal: This dispensation is to be considered as a supplement to the slides published on the site. 2 _______..................................................................................................................................................................
General anatomy p. 5 2.
ino p. 208 12. Nervous system p. 217 • • • • • • • • Human Anatomy Compendium www.maximafranzin.it • Course of Anatomy and Physio-pathology • Preface To remember the effort made to write the second edition, I refer to a phrase of the great writer Charles Baudelaire: • There is only one way to forget the time:
Yes, that's right, and that's what we've been doing for four years now, since the first edition of the "Compendium of Human Anatomy" has come out. Now it's time for balance sheets, reflections, thanksgivings. When you get to the end of a path, you look back. You understand mistakes, you see positions, you are stronger by virtue of 4 of the experience you've lived.


### AGI Cognitive Pipeline – Planning, Reasoning, and Metacognition

This cell activates an AGI pipeline that combines:

- **Multistep interactive loop** with autonomous planning (`interactive_loop_agi`)  
- **Creative scientific generation** (hypothesis, experiment, interdisciplinary reflection)  
- **Response evaluation** using semantic scoring (`CrossEncoder`)  
- **Cognitive versioning**: stores and compares responses over time for each query (`evaluate_and_answer_version`)  
- **Explicit reasoning and metacognition** on response coherence and structure  
- **Historical memory and distributed scientific retrieval** (arXiv, PubMed, Zenodo, OpenAlex)  
- **Visual generation with interactive chart** based on semantic analysis of the problem

This AGI pipeline simulates a scientific cognitive assistant, capable of decomposing objectives, reasoning through choices, learning over time, and generating highly reliable personalized responses.



In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# This cell simulates AGI (Artificial General Intelligence) behavior,
# with capabilities for planning, reasoning, generation, and self-assessment.

# Interactive loop simulating a complete cognitive cycle
async def agi_interactive_loop(user_input):
    context = retrieve_multiturn_context(user_input, top_k=3)
    planning = decompose_task(user_input)
    results = []

    for subtask in planning:
        response = await generate_agi_response(subtask, context)
        results.append(response)
        update_memory(subtask, response)

    return synthesize_final(results)


cross_encoder = CrossEncoder("cross-encoder/nli-deberta-base")

# Simulated historical archive for the question
memory_archive = {}

# Evaluate and version the generated response
def evaluate_and_version_response(question, new_response, level="basic", acceptance_threshold=0.75):
    """
    Evaluates a new response using CrossEncoder,
    compares it with previous versions,
    and decides whether to keep or discard it.

    Returns a dictionary with:
    - evaluation outcome
    - version details (if accepted)
    - confidence and note (if discarded)
    """

    question_id = question.strip().lower()

    # Step 1: Semantic evaluation of the new response
    new_score = float(cross_encoder.predict([(question, new_response)])[0])

    new_version = {
        "id": str(uuid.uuid4()),
        "response": new_response,
        "coherence_score": round(new_score, 3),
        "level": level,
        "timestamp": datetime.datetime.utcnow().isoformat(),
        "model_version": "LLM_v1",
        "improvable": new_score < acceptance_threshold
    }

    # Step 2: Retrieve previous versions
    previous_memory = memory_archive.get(question_id, [])

    # If no previous versions exist, save the first one
    if not previous_memory:
        memory_archive[question_id] = [new_version]
        return {
            "outcome": "New question saved.",
            "total_versions": 1,
            "response_accepted": True,
            "details": new_version
        }

    # Step 3: Compare with the best saved version
    best_version = max(previous_memory, key=lambda v: v["coherence_score"])
    best_score = best_version["coherence_score"]

    if new_score > best_score:
        memory_archive[question_id].append(new_version)
        return {
            "outcome": "New version saved (more coherent than previous).",
            "total_versions": len(memory_archive[question_id]),
            "response_accepted": True,
            "details": new_version
        }

    # Version discarded: less coherent
    return {
        "outcome": "Version discarded: less coherent than existing ones.",
        "response_accepted": False,
        "confidence": round(new_score, 3),
        "note": "The proposed version is less coherent than the previous one.",
        "new_score": round(new_score, 3),
        "best_score": round(best_score, 3)
    }


# === Main function: hypothesis generation and creative analysis ===
def simulate_scientific_creativity(concept, subject, style="generative", level="advanced", language="it"):
    prompt = f"""
You are a cognitive scientific assistant with autonomous creative capabilities.

Subject: {subject}
Central concept: {concept}
Requested creative style: {style}
Level: {level}

Objective: Generate an innovative scientific proposal.

Respond with:
1. An **original hypothesis** related to "{concept}".
2. A **conceptual model** that can be visually described.
3. A proposal for a **novel experiment** to test it.
4. Possible **interdisciplinary applications**.
5. A reflection on the degree of verifiability and impact.

Translate everything into language: **{language}**
"""
    try:
        response = llm.invoke(prompt.strip())
        hypothesis_text = getattr(response, "content", str(response)).strip()
        return hypothesis_text
    except Exception as e:
        logging.error(f"[simulate_creativity] Generation error: {e}")
        return "Error during creative simulation."

# === Classifications ===
problem_type = analyze_question(example_problem)
diagram_type_ml = extract_features(example_problem)
print(f"Problem type: {problem_type}")
print(f"Recommended representation: {diagram_type_ml}")

logging.info(f"Identified problem type: {problem_type}")
logging.info(f"Recommended representation type: {diagram_type_ml}")

# === Assign concept from the 'topic' variable ===
concept = topic.strip()

# === Retrieve articles from arXiv with error handling ===
try:
    arxiv_articles = await search_arxiv_async(concept)
    logging.info(f"arXiv: {len(arxiv_articles)} articles found.")
except Exception as e:
    logging.error(f"Error during arXiv search: {e}")
    arxiv_articles = []

# === Retrieve from other databases ===
try:
    pubmed_results = await search_pubmed_async(concept)
    openalex_results = await search_openalex_async(concept)

    logging.info("Search completed across all databases.")
except Exception as e:
    logging.error(f"Error in multi-database search: {e}")
    pubmed_results = openalex_results = doaj_results = []

# === Formatting for prompt or report ===
async def retrieve_and_normalize_articles(concept):
    """
    Retrieves articles from multiple scientific sources and normalizes them.

    Sources: arXiv, PubMed, OpenAlex, Zenodo

    Returns:
    - list of normalized articles
    """
    articles = []

    try:
        arxiv_articles = await search_arxiv_async(concept)
    except Exception as e:
        logging.error(f"[arxiv] Error: {e}")
        arxiv_articles = []

    try:
        pubmed_articles = await search_pubmed_async(concept)
    except Exception as e:
        logging.error(f"[pubmed] Error: {e}")
        pubmed_articles = []

    try:
        openalex_articles = await search_openalex_async(concept)
    except Exception as e:
        logging.error(f"[openalex] Error: {e}")
        openalex_articles = []

    try:
        zenodo_articles = await search_zenodo_async(concept)
    except Exception as e:
        logging.error(f"[zenodo] Error: {e}")
        zenodo_articles = []

    sources = {
        "arxiv": arxiv_articles,
        "pubmed": pubmed_articles,
        "openalex": openalex_articles,
        "zenodo": zenodo_articles
    }

    for name, source in sources.items():
        if isinstance(source, list) and all(isinstance(a, dict) for a in source):
            articles += normalize_source(raw_articles=source, source_name=name)
        else:
            logging.warning(f"[{name}] Invalid data or unrecognized structure.")

    logging.info(f"Total normalized articles: {len(articles)}")
    return articles

# Check if articles exist and format the text
example_query = "quantum physics"  # Define the query
articles = await search_multi_database(example_query)
zenodo_articles = await search_zenodo_async(example_query)

# === Prompt construction and response ===
# Perform academic database search
pubmed_results = await search_pubmed_async(concept)
openalex_results = await search_openalex_async(concept)
arxiv_results = await search_arxiv_async(concept)
zenodo_results = await search_zenodo_async(concept)

chart_choice_text = "Chart included" if chart_choice.lower() in ["yes"] else "Text only"

paper_text = ""  # Or provide a predefined text

# Modify language handling in the prompt to avoid errors
prompt = prompt_template.format(
    problem=example_problem,
    topic=topic,
    concept=concept,
    level=level,
    subject=subject,
    arxiv_search=arxiv_results,
    paper_text=paper_text,
    pubmed_search=pubmed_results,
    zenodo_search=zenodo_results,
    openalex_search=openalex_results,
    chart_choice=chart_choice_text,
    target_language=target_language
)

try:
    # Generate response
    response = llm.invoke(prompt.strip())
    response_content = getattr(response, "content", str(response))

    if not response_content or "Error" in response_content:
        raise ValueError("Invalid AI model response")
    logging.info("Response successfully generated.")

    # Reasoning explanation (metacognition)
    reasoning_explanation = explain_reasoning(prompt, response_content)
    print("Reasoning explanation:\n", getattr(reasoning_explanation, "content", reasoning_explanation))

    # Operational decision (AGI Point 5)
    objective = generate_objective_from_input(example_problem)
    decision = llm.invoke(f"Objective: {objective}\nPrompt: {prompt.strip()}")
    action = getattr(decision, "content", str(decision)).strip()
    print(f"Agent's autonomous decision: {action}")

except Exception as e:
    logging.error(f"General error in AGI operational block: {e}")


# This cell executes a generation + metacognition cycle

final_response = metacognitive_cycle(example_problem, level)

# Generates and evaluates the response for coherence and potential improvement
def generate_and_evaluate(generation_prompt, question, level):
    response = llm.invoke(generation_prompt)
    evaluation_prompt = f"""
    You received the following response: "{getattr(response, 'content', response)}".
    - Is it coherent with the question: "{question}"?
    - Is the tone appropriate for the '{level}' level?
    - How would you improve the response?
    """
    feedback = llm.invoke(evaluation_prompt)
    return response, feedback

import time

def execute_with_retry(function, max_attempts=3, base_delay=2):
    for attempt in range(max_attempts):
        try:
            return function()
        except InternalServerError as e:
            logging.warning(f"Attempt {attempt+1} failed: {e}")
            time.sleep(base_delay * (attempt + 1))
        except Exception as e:
            logging.error(f"Unhandled error: {e}")
            break
    return "Persistent error: unable to complete the operation."


# === Visualization (optional) ===
if chart_requested and diagram_type_ml in ["Chart", "Conceptual diagram", "State diagram"]:
    logging.info("Generating interactive chart...")
    try:
        fig, caption = generate_interactive_chart(example_problem)
        fig.show()
        logging.info("Chart successfully generated!")
    except Exception as e:
        logging.error(f"Error during chart generation: {e}")
else:
    logging.info("Chart not requested or not necessary.")


from IPython.display import FileLink
FileLink(file_name)

Problem type: mathematical problem
Recommended representation: Chart
Reasoning explanation:
 ## Apparato Respiratorio: A Comprehensive Analysis

### Introduction

The Apparato Respiratorio, or respiratory system, is a complex and vital system in the human body responsible for the exchange of gases between the body and the environment. It is essential for the survival of the individual, as it provides oxygen to the body's tissues and removes carbon dioxide. This analysis will provide an in-depth examination of the respiratory system, including its anatomy, physiology, and clinical significance.

### Anatomy of the Respiratory System

The respiratory system consists of the upper and lower respiratory tracts. The upper respiratory tract includes the nose, mouth, pharynx, and larynx, while the lower respiratory tract comprises the trachea, bronchi, and lungs.

1. **Upper Respiratory Tract:**
 - **Nose and Mouth:** The entry points for air into the respiratory system. The nose is responsibl

Image saved as 'grafico_output.png'


### Final Response Visualization

This section displays the final output generated by the AI agent, following any ethical checks and revisions.  
It is useful for tracking the output transparently and making it readable for the user or supervisor.



In [None]:
#Visualizzazione risposta
risposta = None
try:
    risposta = llm.invoke(prompt.strip())
except Exception as e:
    logging.error(f"Errore nella generazione della risposta: {e}")

if risposta:
    print("\nRisultato:\n")
    print(getattr(risposta, "content", str(risposta)))
else:
    print("Nessuna risposta disponibile.")



Risultato:

## Apparato Respiratorio: A Comprehensive Analysis

The Apparato Respiratorio, or respiratory system, is a complex and vital system in the human body responsible for the exchange of gases between the body and the environment. This system is crucial for the survival of the individual, as it provides oxygen to the body's tissues and removes carbon dioxide.

### Anatomical Structure

The Apparato Respiratorio consists of the upper and lower respiratory tracts. The upper respiratory tract includes the nose, mouth, pharynx, and larynx, while the lower respiratory tract comprises the trachea, bronchi, and lungs.

1. **Upper Respiratory Tract:**
   - **Nose and Mouth:** The entry points for air into the respiratory system.
   - **Pharynx:** A muscular tube that serves as a passageway for air from the nose or mouth to the larynx.
   - **Larynx:** Often referred to as the voice box, it contains the vocal cords and is responsible for sound production.

2. **Lower Respiratory Tract:*

## Reflective Cognitive Journal

An automated reflective journal that documents the cognitive process behind a response generated by a language model. Here are the main phases:

- Receives a prompt (a question or request)  
- Generates a coherent and in-depth response  
- Formulates a metacognitive reflection on the response, analyzing how it was constructed  
- Records everything in a journal (prompt, response, reflection)  
- Exports the journal as a structured and readable Markdown file


In [None]:
# © 2025 Elena Marziali — Code released under Apache 2.0 license.
# See LICENSE in the repository for details.
# Removal of this copyright is prohibited.

# Estrai il contenuto testuale dalla risposta
response_text = getattr(response, "content", str(response)).strip()

# Ora puoi tagliare i primi 1000 caratteri
response_short = response_text[:1000]

reflection_prompt = f"""
You responded to the following prompt:
"{prompt}"

Your answer:
"{response_short}"


Now reflect briefly on how you arrived at this answer.

- What reasoning path led you to this response?
- What criteria guided your choices?
- Were there any risks or ambiguities you considered?
- How does this reflect your cognitive style?

Write a concise reflection in 2–3 paragraphs, using a clear and analytical tone.
"""

reflection = llm.invoke(reflection_prompt).content.strip()

journal = {}

def record_journal(journal_id, prompt, response, reflection):
    journal[journal_id] = {
        "prompt": prompt,
        "response": response,
        "reflection": reflection
    }

record_journal(journal_id=0, prompt=prompt, response=response, reflection=reflection)

def save_journal_markdown(data, file_name):
    with open(file_name, "w", encoding="utf-8") as f:
        f.write(f"# Reflective Journal\n\n")
        f.write(f"## Prompt\n> {data['prompt']}\n\n")
        f.write(f"## Response\n{data['response']}\n\n")
        f.write(f"## Metacognitive Reflection\n{data['reflection']}\n")

filename = f"journal_{datetime.datetime.utcnow().isoformat()}.md"
save_journal_markdown(journal[0], filename)


datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).

