<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/IDVSCI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install  torch transformers accelerate xformers peft bitsandbytes unsloth -q

In [None]:
# FULL CODE FOR IDVSCI FRAMEWORK WITH LLM INTEGRATION

# Import necessary libraries
import torch # For handling tensors, required by LLMs
from unsloth import FastLanguageModel # Your specified LLM loading library
from transformers import AutoTokenizer, AutoModelForCausalLM # General Hugging Face components

# --- LLM Configuration (as provided by you) ---
MAX_SEQ_LENGTH = 2048 # Example: You might adjust this based on your needs
DTYPE = None # Auto detects dtype (e.g., torch.float16, torch.bfloat16)
LOAD_IN_4BIT = True # Loads model in 4-bit for efficiency

# --- Agent Class ---
class Agent:
    def __init__(self, agent_id, knowledge_base, role, initial_prompt, llm_model, llm_tokenizer):
        self.agent_id = agent_id
        self.knowledge_base = knowledge_base
        self.role = role # e.g., 'leader', 'scientist'
        self.current_prompt = initial_prompt
        self.ideas = []
        self.llm_model = llm_model      # The actual loaded LLM model
        self.llm_tokenizer = llm_tokenizer # The actual loaded LLM tokenizer

    def _call_llm(self, prompt_text):
        """
        Function to make a real LLM call using the loaded model and tokenizer.
        This is where your LLM inference logic resides.
        """
        inputs = self.llm_tokenizer(prompt_text, return_tensors="pt").to(self.llm_model.device)

        # Generate response using the LLM
        # Parameters like max_new_tokens, do_sample, top_k, top_p can be tuned
        outputs = self.llm_model.generate(
            **inputs,
            max_new_tokens=200, # Generate up to 200 new tokens
            do_sample=True,     # Enable sampling for more diverse outputs
            top_k=50,           # Consider top 50 tokens
            top_p=0.95,         # Use nucleus sampling
            temperature=0.7,    # Control randomness
            num_return_sequences=1,
            pad_token_id=self.llm_tokenizer.eos_token_id # Important for generation
        )

        # Decode the generated tokens back to text
        generated_text = self.llm_tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Basic parsing to extract only the LLM's response part if prompt is included in output
        # You might need more sophisticated parsing depending on the LLM's exact output format
        if prompt_text in generated_text:
            generated_text = generated_text.replace(prompt_text, "").strip()

        return generated_text

    def generate_idea(self):
        prompt = f"As a {self.role} with expertise in {self.knowledge_base} and considering the goal: '{self.current_prompt}', generate a highly novel and impactful scientific idea. Be concise."
        idea = self._call_llm(prompt)
        self.ideas.append(idea)
        return idea

    def revise_idea(self, idea_to_revise, secondary_prompt):
        prompt = f"Given the idea to revise: '{idea_to_revise}', as a {self.role} with knowledge in {self.knowledge_base}, '{secondary_prompt}'. Provide a constructive critique and suggest specific improvements or alternative perspectives to enhance its novelty and feasibility. Be detailed."
        revised_idea = self._call_llm(prompt)
        return revised_idea

    def reflect_on_feedback(self, initial_idea, synthesized_feedback):
        prompt = f"You initially proposed: '{initial_idea}'. The team leader has synthesized feedback: '{synthesized_feedback}'. Reflect on this feedback and provide a refined, final version of your scientific idea, incorporating the suggestions effectively. Ensure the refined idea is innovative and well-justified."
        final_idea = self._call_llm(prompt)
        return final_idea

    def generate_abstract_draft(self, final_idea):
        prompt = f"Based on the final scientific idea: '{final_idea}', write an initial draft of a scientific abstract (around 200-250 words). Include an introduction, methods, potential results, and conclusion, highlighting the novelty and impact."
        abstract_draft = self._call_llm(prompt)
        return abstract_draft

    def refine_abstract(self, current_abstract):
        prompt = f"As a {self.role} with your scientific background, review and refine the following abstract draft for clarity, conciseness, scientific rigor, and impact: '{current_abstract}'. Focus on improving flow and highlighting key contributions. Make it sound professional and publishable."
        refined_abstract = self._call_llm(prompt)
        return refined_abstract

# --- IDVSCIFramework Class ---
class IDVSCIFramework:
    def __init__(self, num_scientists, leader_knowledge, past_paper_dataset, llm_model, llm_tokenizer):
        # Pass the actual loaded LLM model and tokenizer to all agents
        self.llm_model = llm_model
        self.llm_tokenizer = llm_tokenizer
        self.leader = Agent("Leader", leader_knowledge, "leader", "Synthesize and evaluate scientific ideas and abstracts", self.llm_model, self.llm_tokenizer)
        self.scientists = [
            Agent(f"Scientist{i+1}", f"Knowledge_Domain_{i+1}", "scientist", "Generate innovative scientific ideas", self.llm_model, self.llm_tokenizer)
            for i in range(num_scientists)
        ]
        self.all_agents = [self.leader] + self.scientists
        self.past_paper_dataset = past_paper_dataset # Simulated database of papers for DDR
        self.selected_topic = None

    def _select_topic(self):
        # Stage 1: Topic Discussion
        print("\n--- Stage 1: Topic Discussion ---")
        proposed_topics = []
        for agent in self.all_agents:
            # Each agent proposes a topic based on their initial prompt/knowledge
            topic_prompt = f"Given your background ({agent.knowledge_base}), suggest a compelling research topic for a multi-agent AI research team focused on scientific discovery. Be brief."
            agent.current_prompt = topic_prompt # Update agent's prompt for this task
            proposed_topic = agent.generate_idea() # Use generate_idea for topic proposal
            proposed_topics.append(proposed_topic)
            print(f"{agent.agent_id} proposed: {proposed_topic}")

        # In a real system, this would involve a selection mechanism (e.g., voting, leader decision)
        # For simplicity, let's pick the first one or simulate a leader's choice
        self.selected_topic = proposed_topics[0] # Simplification
        print(f"\nSelected Research Topic for the team: '{self.selected_topic}'")
        # Update initial prompts for next stage based on selected topic
        for agent in self.all_agents:
            agent.current_prompt = f"Develop an innovative idea related to: '{self.selected_topic}'"

    def _dynamic_knowledge_exchange(self):
        # Stage 2: Idea Generation (Dynamic Knowledge Exchange)
        print("\n--- Stage 2: Idea Generation (Dynamic Knowledge Exchange) ---")
        initial_ideas = {}
        revised_ideas_by_others = {}

        # 1. Scientists generate initial ideas based on the selected topic
        for scientist in self.scientists:
            initial_ideas[scientist.agent_id] = scientist.generate_idea()
            print(f"{scientist.agent_id} initial idea: {initial_ideas[scientist.agent_id]}")

        # 2. Cross-agent modification (scientists review each other's ideas)
        for original_scientist_id, idea_i in initial_ideas.items():
            revised_ideas_by_others[idea_i] = []
            for reviewing_scientist in self.scientists:
                if reviewing_scientist.agent_id != original_scientist_id:
                    # P2 is a secondary prompt for peer review
                    secondary_review_prompt = "Critique this idea for its novelty, feasibility, and potential impact. Suggest specific improvements or alternative angles."
                    revised_version = reviewing_scientist.revise_idea(idea_i, secondary_review_prompt)
                    revised_ideas_by_others[idea_i].append(revised_version)
                    print(f"  {reviewing_scientist.agent_id} revised '{idea_i}' (partially): {revised_version[:70]}...") # Truncate for display

        # 3. Leader aggregates and synthesizes revisions
        synthesized_feedbacks = {}
        for idea_i, revisions in revised_ideas_by_others.items():
            synthesis_prompt = f"You are the leader. Consolidate and synthesize the following peer review comments for the idea '{idea_i}': {'; '.join(revisions)}. Provide actionable feedback for the original author."
            synthesized_feedback = self.leader._call_llm(synthesis_prompt) # Leader uses direct LLM call for synthesis
            synthesized_feedbacks[idea_i] = synthesized_feedback
            print(f"Leader synthesized feedback for '{idea_i}': {synthesized_feedback[:100]}...") # Truncate for display

        # 4. Original scientists reflect and refine their ideas
        final_ideas = []
        for scientist in self.scientists:
            initial_idea = initial_ideas[scientist.agent_id]
            synthesized_feedback = synthesized_feedbacks[initial_idea]
            final_idea = scientist.reflect_on_feedback(initial_idea, synthesized_feedback) # Reflect() integrates feedback
            final_ideas.append(final_idea)
            print(f"{scientist.agent_id} final refined idea: {final_idea[:100]}...") # Truncate for display
        return final_ideas

    def _dual_diversity_review(self, ideas_to_check):
        # Stage 3: Check Novelty (Dual-Diversity Review)
        print("\n--- Stage 3: Check Novelty (Dual-Diversity Review) ---")

        # Simulate prompt refinement based on relevant literature
        # In a real system, Faiss library or similar vector search would retrieve top-k papers.
        # For this example, we'll just simulate the integration of "relevant paper info".
        for agent in self.all_agents:
            # Conceptually retrieve relevant papers for the current ideas
            relevant_paper_info = f"Based on prior research: {', '.join(self.past_paper_dataset)}. "
            # Update agent's prompt to include this context for evaluation
            agent.current_prompt = f"Evaluate scientific ideas for novelty and impact, considering the following literature: {relevant_paper_info} Your evaluation should assign a rank and confidence (1-10)."
            # print(f"{agent.agent_id}'s review prompt updated.") # Too verbose, uncomment for debugging

        # Simulate weighted Borda count voting
        scores = {idea: 0.0 for idea in ideas_to_check} # Initialize scores

        n_ideas = len(ideas_to_check)
        if n_ideas == 0:
            print("No ideas to review.")
            return None

        # Each scientist provides a "ranking" and "confidence" for each idea
        for j, scientist in enumerate(self.scientists):
            evaluation_prompt = f"Given these ideas: {', '.join(ideas_to_check)}, rank them by novelty and impact, and provide a confidence score (1-10) for each ranking. Format: Idea: [Idea Text], Rank: [Rank], Confidence: [Confidence]."
            # Use LLM to "evaluate" and provide structured output (simulated here)
            llm_evaluation_raw = scientist._call_llm(evaluation_prompt)

            print(f"\n{scientist.agent_id}'s raw evaluation:\n{llm_evaluation_raw[:200]}...") # Show part of LLM output

            # --- Parsing LLM output for ranking and confidence (Crucial real-world step) ---
            # This is a highly simplified parser. In a real system, you'd use more robust NLP/regex
            # to extract structured data from LLM text output.
            parsed_evaluations = {}
            for line in llm_evaluation_raw.split('\n'):
                if "Idea:" in line and "Rank:" in line and "Confidence:" in line:
                    try:
                        idea_text_part = line.split("Idea:")[1].split(", Rank:")[0].strip()
                        rank_part = line.split("Rank:")[1].split(", Confidence:")[0].strip()
                        confidence_part = line.split("Confidence:")[1].strip().replace(".", "") # Remove trailing period
                        # Find the closest matching idea from ideas_to_check
                        # In a real system, use embedding similarity for robust matching
                        matched_idea = next((i for i in ideas_to_check if idea_text_part in i), None)
                        if matched_idea:
                            parsed_evaluations[matched_idea] = {
                                "rank": int(rank_part),
                                "confidence": int(confidence_part)
                            }
                    except Exception as e:
                        print(f"Could not parse line '{line}': {e}")
            # --- End of parsing ---

            for idea_k in ideas_to_check:
                if idea_k in parsed_evaluations:
                    r_jk = parsed_evaluations[idea_k]["rank"]
                    c_jk = parsed_evaluations[idea_k]["confidence"]
                    # Clamp confidence to 1-10 range if LLM generates outside
                    c_jk = max(1, min(10, c_jk))

                    # Borda score calculation: B_k = sum((n - r_jk) * (c_jk / 10))
                    # (n_ideas - r_jk) gives more points for higher ranks (lower rank number)
                    score_contribution = (n_ideas - r_jk) * (c_jk / 10.0)
                    scores[idea_k] += score_contribution
                    print(f"  {scientist.agent_id} contributes {score_contribution:.2f} to '{idea_k}' (Rank {r_jk}, Conf {c_jk})")
                else:
                    print(f"  {scientist.agent_id} did not provide a valid evaluation for '{idea_k}'. Skipping contribution.")


        # Select the idea with the highest Borda score
        if not scores: # Handle case where no scores were accumulated
            print("No valid scores generated for ideas.")
            return None

        best_idea = max(scores, key=scores.get)
        print(f"\nBest idea selected by Weighted Borda Count: '{best_idea}' (Total Score: {scores[best_idea]:.2f})")
        return best_idea

    def _abstract_generation(self, final_idea):
        # Stage 4: Abstract Generation
        print("\n--- Stage 4: Abstract Generation ---")
        if not final_idea:
            print("No final idea to generate abstract from.")
            return "No abstract generated."

        abstract_drafts = []
        for agent in self.all_agents:
            draft = agent.generate_abstract_draft(final_idea)
            abstract_drafts.append(draft)
            print(f"{agent.agent_id} initial abstract draft: {draft[:100]}...") # Truncate for display

        # Sequential refinement of the abstract
        # The leader can initiate, or the first scientist's draft is refined by others
        current_abstract = abstract_drafts[0] # Start with the first agent's draft
        print(f"\nStarting abstract refinement with: {current_abstract[:100]}...")

        for i, agent in enumerate(self.all_agents):
            if i == 0: continue # Skip the first agent as their draft is the starting point
            current_abstract = agent.refine_abstract(current_abstract)
            print(f"Abstract after {agent.agent_id} refinement: {current_abstract[:100]}...") # Truncate for display

        print("\n--- Final Abstract after Refinement ---")
        print(current_abstract)
        return current_abstract

    def run_scientific_research(self):
        print("--- Starting IDVSCI Simulated Research ---")
        self._select_topic()
        final_ideas = self._dynamic_knowledge_exchange()
        best_idea = self._dual_diversity_review(final_ideas)
        final_abstract = self._abstract_generation(best_idea)
        print("\n--- Research Complete ---")
        print(f"Final Abstract: {final_abstract[:300]}...") # Truncate for final display

# --- Main Execution Block ---
if __name__ == "__main__":
    print("Initializing LLM. This may take some time depending on model size and hardware.")
    try:
        # --- Load your Unsloth LLM and Tokenizer here ---
        # Ensure you have logged in to Hugging Face if the model requires authentication
        # For example: huggingface-cli login
        llm_model, llm_tokenizer = FastLanguageModel.from_pretrained(
            model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
            max_seq_length=MAX_SEQ_LENGTH,
            dtype=DTYPE,
            load_in_4bit=LOAD_IN_4BIT,
            # token=True # Uncomment if you have logged in and want to use your token
        )
        print("LLM loaded successfully!")

    except ImportError:
        print("Unsloth or other required libraries not found. Please install them:")
        print("pip install unsloth[cu121] torch transformers accelerate xformers peft bitsandbytes")
        print("Exiting as LLM cannot be loaded.")
        exit()
    except Exception as e:
        print(f"Error loading LLM: {e}")
        print("Ensure you have sufficient VRAM and necessary dependencies are installed.")
        print("Exiting.")
        exit()

    # Simulate initial conditions
    leader_knowledge_base = "Expert in AI Ethics, Multi-Agent Systems, and Large Language Model applications."
    past_papers_data = [
        "The Role of Generative AI in Scientific Discovery Automation",
        "Ethical Considerations in Autonomous AI Research Agents",
        "Novelty Detection Algorithms in Scientific Literature",
        "Frameworks for Multi-Agent Collaboration in Research",
        "Impact of Peer Review on Scientific Breakthroughs"
    ]

    # Initialize the framework, passing the loaded LLM objects
    idvsci_system = IDVSCIFramework(
        num_scientists=3, # Example: 3 scientists + 1 leader
        leader_knowledge=leader_knowledge_base,
        past_paper_dataset=past_papers_data,
        llm_model=llm_model,
        llm_tokenizer=llm_tokenizer
    )

In [2]:
# Run the simulated research process
idvsci_system.run_scientific_research()

--- Starting IDVSCI Simulated Research ---

--- Stage 1: Topic Discussion ---
Leader proposed: 'transformative' means it should have the potential to change how we approach a field.

Okay, so I'm trying to come up with a research topic for a multi-agent AI team focused on scientific discovery. The leader has expertise in AI Ethics, Multi-Agent Systems, and Large Language Model applications. The goal is to suggest a topic that's transformative, meaning it could change how we approach a field. Let me break this down.

First, I need to think about what's currently happening in scientific discovery with AI. I know that AI is used for things like drug discovery, materials science, predictive modeling, etc. But maybe there's a limit in how these systems handle complex, dynamic, or interdisciplinary problems. So perhaps the problem is that existing AI systems aren't collaboration well across different domains or aren't as ethical as they should be.

Given that the leader is an expert in AI Et