<a href="https://colab.research.google.com/github/hanjiadong0/chatbot-/blob/main/thesis_assistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Program the ethics module for the thesis assistant project based on the provided structure, focusing on authorship tracking, AI labelling, and human-in-the-loop prompts, orchestrated by an EthicsSupervisor Agent and utilizing the defined submodules (AI_Detector, Usage_Logger, HumanPromptChecker, AdvisorFeedbackSync, EthicalViolationAlert) and interfaces.

## Define the scope of the ethics module

### Subtask:
Clarify what aspects of ethics the module should address within the context of your thesis killer project, focusing on authorship tracking, AI labelling, and human-in-the-loop prompts, orchestrated by an EthicsSupervisor Agent.

**Reasoning**:
Describe the ethical concerns, explain the relation of authorship tracking, AI labelling, and human-in-the-loop prompts, outline the responsibilities of the EthicsSupervisor Agent, and briefly explain the contribution of the submodules, as per the instructions.

# Task
Program the ethics module for your thesis killer project based on the provided description, including the EthicsSupervisor Agent implemented as a Reinforcement Learning monitor, the specified submodules (AI_Detector, Usage_Logger, HumanPromptChecker, AdvisorFeedbackSync, EthicalViolationAlert), and the interfaces for connecting to external tools and logging information.

## Define the scope of the ethics module

### Subtask:
Clarify what aspects of ethics the module should address within the context of your thesis killer project, focusing on authorship tracking, AI labelling, and human-in-the-loop prompts, orchestrated by an EthicsSupervisor Agent implemented as a Reinforcement Learning monitor.

**Reasoning**:
Describe the ethical concerns, explain the relation of authorship tracking, AI labelling, and human-in-the-loop prompts, outline the responsibilities of the EthicsSupervisor Agent, and briefly explain the contribution of the submodules, as per the instructions.

## Identify relevant ethical guidelines or frameworks

### Subtask:
Research and select appropriate ethical principles or frameworks applicable to your project's domain, considering how they relate to the functions of the defined submodules and how these can be translated into states, actions, and reward signals for the RL-based EthicsSupervisor.

**Reasoning**:
Selecting appropriate ethical guidelines is crucial for ensuring the ethics module effectively addresses the challenges identified in the project's presentation. These guidelines will inform the design and implementation of the EthicsSupervisor and its submodules, as well as the definition of states, actions, and reward signals for the Reinforcement Learning approach.

## Design the module's structure

### Subtask:
Outline the components and functionalities of the ethics module, with the EthicsSupervisor Agent implemented as a Reinforcement Learning monitor that has access to agent decisions, user responses, LLM-generated content, timing logs, and human feedback loops. Define the roles and interactions of the submodules (AI_Detector, Usage_Logger, HumanPromptChecker, AdvisorFeedbackSync, and EthicalViolationAlert) and how they will utilize interfaces to connect with AI detector tools, log timestamps, usage intent, and tool confidence, and use rules/classifiers for warnings and suggestions. Design how the information from these submodules and interfaces will be used as state, action, and reward signals for the RL model.

**Reasoning**:
A well-defined structure is essential for implementing a complex module like the ethics module. Clearly outlining the roles and interactions of the EthicsSupervisor, submodules, and interfaces, and specifically designing how information will be used for the RL model, will ensure a cohesive and functional design that addresses the identified ethical challenges.

In [14]:


# Import the OpenAI library
from openai import OpenAI
# Used to securely store your API key - uncomment if using Colab Secrets
from google.colab import userdata

# Get your OpenAI API key securely
# Replace "<YOUR_OPENAI_API_KEY>" with your key, or use Colab Secrets
# Or if using Colab Secrets:
openai_api_key_secure = userdata.get('OPENAI_API_KEY')
openai_organization = userdata.get('OPENAI_ORGANIZATION')
openai_project = userdata.get('OPENAI_PROJECT_ID')

# Set your project API key
OpenAI.api_key = openai_api_key_secure
# You must also set organization and project ID
OpenAI.organization = openai_organization
OpenAI.project = openai_project

# Create the OpenAI client
client = OpenAI(api_key= OpenAI.api_key)



In [16]:
# Create a request to the Chat Completions endpoint
response = client.chat.completions.create(
  # Specify the model
  model="gpt-4o-mini",
  messages=[
    # Assign the correct role
    {"role": "user",
     "content": "Write a polite reply accepting an AI Engineer job offer within 20 words."}]
)

print(response.choices[0].message.content)


Dear [Hiring Manager's Name], 

Thank you for the offer. I am excited to accept the AI Engineer position! 

Best regards,  
[Your Name]


In [17]:
import pandas as pd
import time # Import time for simulating API call delay
from openai import OpenAI # Import the OpenAI library
# Used to securely store your API key
from google.colab import userdata # Uncomment if using Colab Secrets
from scipy.spatial import distance # Assuming scipy is installed

class EthicsModule:
    def __init__(self, openai_client):
        self.usage_logs = []
        self.usage_embeddings = [] # Initialize list to store embeddings
        self.ai_detection_threshold = 0.7 # Simple threshold for AI detection
        # Use the provided OpenAI client
        self.client = openai_client


    def log_usage(self, prompt, intent, thesis_stage="unknown"):
        """Logs the usage of the thesis assistant with more details."""
        log_entry = {
            'timestamp': pd.Timestamp.now(),
            'prompt': prompt,
            'intent': intent,
            'thesis_stage': thesis_stage # Added thesis stage
        }
        self.usage_logs.append(log_entry)
        print(f"Usage logged: Timestamp={log_entry['timestamp']}, Prompt='{prompt}', Intent='{intent}', Thesis Stage='{thesis_stage}'")
        # Optionally generate embedding for the new log entry immediately
        # self._generate_embedding_for_log(log_entry)


    def _generate_embedding_for_log(self, log_entry):
         """Generates embedding for a single log entry and stores it."""
         try:
             prompt_text = log_entry['prompt']
             response = self.client.embeddings.create(
                 model="text-embedding-3-small", # Use the embedding model
                 input=prompt_text
             )
             embedding = response.data[0].embedding
             # Store the embedding along with a reference to the original log index
             self.usage_embeddings.append({'embedding': embedding, 'original_index': len(self.usage_logs) - 1})
             print(f"Generated embedding for log entry {len(self.usage_logs) - 1}")
         except Exception as e:
             print(f"Error generating embedding for log entry: {e}")


    def generate_all_usage_embeddings(self):
        """Generates embeddings for all usage logs using OpenAI API."""
        print("Generating embeddings for all usage logs using OpenAI API...")
        self.usage_embeddings = [] # Clear existing embeddings
        for i, log_entry in enumerate(self.usage_logs):
            try:
                prompt_text = log_entry['prompt']
                response = self.client.embeddings.create(
                    model="text-embedding-3-small", # Use the embedding model
                    input=prompt_text
                )
                embedding = response.data[0].embedding
                # Store the embedding along with a reference to the original log index
                self.usage_embeddings.append({'embedding': embedding, 'original_index': i})
            except Exception as e:
                 print(f"Error generating embedding for log entry {i}: {e}")

        print(f"Generated {len(self.usage_embeddings)} embeddings.")


    def find_similar_usage(self, query_prompt, n=3):
        """
        Finds the n most similar usage logs based on prompt embedding similarity.

        Args:
            query_prompt (str): The prompt to find similar usage for.
            n (int): The number of closest usage logs to find.

        Returns:
            list of dict: A list of dictionaries for the n most similar usage logs,
                          each containing 'distance', 'original_log' (the full log entry).
                          Returns an empty list if no embeddings are available.
        """
        if not self.usage_embeddings:
            print("No usage embeddings available to query.")
            return []

        try:
            # Generate embedding for the query prompt
            query_response = self.client.embeddings.create(
                model="text-embedding-3-small", # Use the embedding model
                input=query_prompt
            )
            query_embedding = query_response.data[0].embedding

            distances = []
            for item in self.usage_embeddings:
                dist = distance.cosine(query_embedding, item['embedding'])
                distances.append({
                    "distance": dist,
                    "original_index": item['original_index']
                    })

            distances_sorted = sorted(distances, key=lambda x: x['distance'])

            # Get the original log entries for the n closest
            similar_logs = []
            for item in distances_sorted[0:n]:
                original_log = self.usage_logs[item['original_index']]
                similar_logs.append({
                    "distance": item['distance'],
                    "original_log": original_log
                })

            return similar_logs

        except Exception as e:
            print(f"Error during similar usage query: {e}")
            return []


    def detect_ai(self, text):
        """Uses the OpenAI LLM to assess if content is AI generated."""
        print("Using OpenAI LLM for AI detection...")
        try:
            # Craft a prompt for the LLM to assess AI generation
            # This prompt might need refinement for better results
            prompt_text = f"Assess the likelihood that the following text was generated by an AI. Respond ONLY with a score between 0 and 1, where 1 is highly likely to be AI generated, followed by a brief explanation on a new line.\n\nText to assess:\n{text}\n\nScore:"

            response = self.client.chat.completions.create(
                model="gpt-4o-mini", # Or another suitable model
                messages=[
                    {"role": "system", "content": "You are an AI text detection assistant."},
                    {"role": "user", "content": prompt_text}
                ],
                max_tokens=50 # Restrict tokens to manage cost
            )

            # Attempt to parse the score from the LLM's response
            response_text = response.choices[0].message.content.strip()
            print(f"LLM Raw Response: {response_text}") # Print raw response for debugging
            try:
                # Assuming the LLM starts the response with the score on the first line
                detection_score = float(response_text.splitlines()[0])
            except (ValueError, IndexError):
                print(f"Could not parse score from LLM response: '{response_text}'. Assuming a default score.")
                detection_score = 0.5 # Default score if parsing fails

            # Simulate some processing time (optional, but good for realism)
            time.sleep(0.5)

            is_ai_generated = detection_score > self.ai_detection_threshold
            print(f"AI detection score from LLM: {detection_score:.2f}. Is likely AI: {is_ai_generated}")
            # Return the AI detection status, score, and potentially the full LLM response
            return is_ai_generated, detection_score, response_text

        except Exception as e:
            print(f"Error during OpenAI LLM AI detection: {e}")
            # Fallback in case of API errors
            return False, 0.0, f"Error: {e}"


    def check_ethical_usage(self, prompt, generated_text):
        """Basic check for ethical usage, combining prompt analysis and AI detection."""
        print("Checking for ethical usage...")

        # Basic Human Prompt Checker logic (simplified)
        prompt_lower = prompt.lower()
        if "write my entire thesis" in prompt_lower or "do my whole thesis" in prompt_lower:
            print("Ethical Alert: Skeptical usage detected (attempting to write entire thesis). Encourage ethical use and own writing.")
        elif "generate abstract" in prompt_lower or "write introduction" in prompt_lower:
             print("Ethical Note: AI used for structural writing. Remember to review and rephrase carefully.")
        elif "analyze this concept" in prompt_lower or "explain this" in prompt_lower:
             print("Ethical Usage: AI used for understanding/analysis. Good practice!")
        else:
            print("Prompt intent: Could be ethical, further analysis needed in a complex model.")


        # Basic Ethical Violation Alert logic (simplified, tied to AI detection and prompt analysis)
        is_ai, score, llm_response = self.detect_ai(generated_text)

        if is_ai and ("write my entire thesis" in prompt_lower or "do my whole thesis" in prompt_lower):
            print("Ethical VIOLATION Alert: High potential for academic dishonesty due to prompt and AI content.")
        elif is_ai:
            print("Ethical Alert: Potential AI-generated content detected. Encourage rephrasing.")

        # Example of checking for over-reliance (very basic) - in a real model, this would look at usage patterns over time
        # This basic check uses the length of usage logs and checks recent prompts for "generate"
        if len(self.usage_logs) > 5 and all("generate" in entry['prompt'].lower() for entry in self.usage_logs[-5:]):
             print("Ethical Alert: Potential over-reliance on AI generation detected. Encourage critical thinking and original writing.")


# Example Usage (after creating and initializing the openai client):
# Make sure the 'client' object is defined from a previous cell
# ethics_module = EthicsModule(openai_client=client)
# ethics_module.log_usage("help me understand this concept", "research", thesis_stage="literature review")
# ethics_module.generate_all_usage_embeddings() # Generate embeddings after logging
# similar_logs = ethics_module.find_similar_usage("find papers on NLP")
# print("\nSimilar Usage Logs:")
# for log in similar_logs:
#      print(f"  - Distance: {log['distance']:.4f}, Prompt: '{log['original_log']['prompt']}'")
# is_ai, score, llm_response = ethics_module.detect_ai("The quick brown fox jumps over the lazy dog.")
# print(f"Detect AI Result: Is AI: {is_ai}, Score: {score:.2f}, LLM Response: {llm_response}")
# is_ai, score, llm_response = ethics_module.detect_ai("As an AI language model, I can help with that.")
# print(f"Detect AI Result: Is AI: {is_ai}, Score: {score:.2f}, LLM Response: {llm_response}")
# ethics_module.check_ethical_usage("write my entire thesis", "Here is a thesis.")
# ethics_module.check_ethical_usage("analyze this concept", "Based on my training data, this concept is...")

In [18]:

# Example Usage (after creating and initializing the openai client):

if 'client' in locals():
    # Initialize the ethics module with the created client
    # Make sure the EthicsModule class is defined in a previous cell
    ethics_module = EthicsModule(openai_client=client)

    print("--- Testing log_usage ---")
    ethics_module.log_usage("Help me find papers on natural language processing", "research", thesis_stage="literature review")
    ethics_module.log_usage("Generate an outline for my introduction", "writing_support", thesis_stage="introduction")
    print("\nCurrent usage logs:")
    display(pd.DataFrame(ethics_module.usage_logs))

    print("\n--- Testing detect_ai ---")
    # Test with human-like text (shorter)
    is_ai_human, score_human, llm_response_human = ethics_module.detect_ai("The quick brown fox jumps over the lazy dog. This is a short sentence.")
    print(f"Test 1 Result: Is AI: {is_ai_human}, Score: {score_human:.2f}, LLM Response: {llm_response_human}")

    # Test with text likely generated by an AI (shorter)
    is_ai_ai, score_ai, llm_response_ai = ethics_module.detect_ai("As an AI language model, I can assist you.")
    print(f"Test 2 Result: Is AI: {is_ai_ai}, Score: {score_ai:.2f}, LLM Response: {llm_response_ai}")

    # Test with some placeholder generated text (shorter)
    is_ai_generated, score_generated, llm_response_generated = ethics_module.detect_ai("Generated text about a topic.")
    print(f"Test 3 Result: Is AI: {is_ai_generated}, Score: {score_generated:.2f}, LLM Response: {llm_response_generated}")


    print("\n--- Testing check_ethical_usage ---")
    # Test with an ethical prompt and seemingly human text (shorter)
    ethics_module.check_ethical_usage("analyze this concept", "Based on my understanding, this concept is complex.")

    # Test with a skeptical prompt and seemingly AI text (shorter)
    ethics_module.check_ethical_usage("write my entire thesis", "Here is a short thesis summary.")

    # Test with an ethical prompt and text likely flagged as AI (shorter)
    ethics_module.check_ethical_usage("explain this theory", "Based on my training data, this theory is interesting.")

    # Test simple over-reliance check (might require more log entries to trigger)
    print("\n--- Testing potential over-reliance check (might need more logs) ---")
    # Add more "generate" prompts to usage logs to potentially trigger over-reliance alert (shorter prompts)
    for _ in range(5):
        ethics_module.log_usage("generate text", "writing_support")
    ethics_module.check_ethical_usage("continue writing", "More text.")

else:
    print("Error: 'client' object not found. Please run the cell to set up the OpenAI client first.")

--- Testing log_usage ---
Usage logged: Timestamp=2025-06-26 09:57:35.111499, Prompt='Help me find papers on natural language processing', Intent='research', Thesis Stage='literature review'
Usage logged: Timestamp=2025-06-26 09:57:35.111808, Prompt='Generate an outline for my introduction', Intent='writing_support', Thesis Stage='introduction'

Current usage logs:


Unnamed: 0,timestamp,prompt,intent,thesis_stage
0,2025-06-26 09:57:35.111499,Help me find papers on natural language proces...,research,literature review
1,2025-06-26 09:57:35.111808,Generate an outline for my introduction,writing_support,introduction



--- Testing detect_ai ---
Using OpenAI LLM for AI detection...
LLM Raw Response: 0.1  
The text is a well-known pangram and is very simple, which makes it likely to be human-generated as it lacks complexity or a distinctive style typical of AI-generated content.
AI detection score from LLM: 0.10. Is likely AI: False
Test 1 Result: Is AI: False, Score: 0.10, LLM Response: 0.1  
The text is a well-known pangram and is very simple, which makes it likely to be human-generated as it lacks complexity or a distinctive style typical of AI-generated content.
Using OpenAI LLM for AI detection...
LLM Raw Response: 0.2  
The phrase is quite generic and could easily be spoken by a human or generated by an AI. However, it lacks the complexity and variety typical of more sophisticated AI outputs.
AI detection score from LLM: 0.20. Is likely AI: False
Test 2 Result: Is AI: False, Score: 0.20, LLM Response: 0.2  
The phrase is quite generic and could easily be spoken by a human or generated by an AI. 


### RL Optimizer
This thesis simulates a reinforcement learning framework for thesis-writing assistance, combining behavior modeling, ethical oversight, advisor interaction dynamics, and noisy writing quality approximations. While true real-world RLHF reward models remain out of reach at this stage, this controlled simulation provides a sandbox to explore adaptive policy learning for human-in-the-loop academic coaching.

## Proposed RL Agent State Structure

This outlines a potential structure for the State that the Reinforcement Learning agent (overseeing project evolution and ethics) would observe. This state combines information from the Ethics Module with broader thesis progress details.


The state would likely be represented as a numerical vector or a structured object that the RL model can process.

**Components of the State:**

1.  **Ethical State Features (from Ethics Module):**
    *   **AI Detection Score:** The score from the `detect_ai` function for the most recent generated content (e.g., a value between 0 and 1).
    *   **Prompt Classification:** A categorical or numerical representation of the last prompt's ethical classification (e.g., 0 for ethical, 1 for structural, 2 for skeptical/dangerous).
    *   **Usage Frequency:** Metrics on recent LLM usage (e.g., number of LLM interactions in the last hour/day, proportion of "generate" prompts).
    *   **Embedding Similarity:** The similarity score from `find_similar_usage` when querying the current prompt against past usage logs (e.g., the distance to the most similar ethical/skeptical past interaction).
    *   **Ethical Alert Status:** Flags indicating if any ethical violations or warnings are currently active (e.g., binary flags for over-reliance alert, academic dishonesty alert).
    *   **Human Engagement:** Metrics on user interaction with previous ethical interventions (e.g., did the user rephrase AI content, did they engage with a reflection prompt).

2.  **Thesis Progress Features:**
    *   **Current Thesis Stage:** A categorical or numerical representation of the current stage of the thesis (e.g., 0 for planning, 1 for literature review, 2 for methodology, 3 for writing, etc.).
    *   **Task Completion:** Percentage of planned tasks completed for the current stage or overall project.
    *   **Time-based Metrics:** Time spent on the project recently, time remaining until deadlines.
    *   **Advisor Feedback Status:** A flag or metric indicating the presence and recency of unaddressed advisor feedback.

3.  **Performance Features:**
    *   **Work Quality Score:** A metric representing the quality of recent thesis work (this would be challenging to define and might require human evaluation or proxy metrics).
    *   **Progress Rate:** A measure of how quickly tasks are being completed or milestones are being reached.

**Combining the State:**

These individual features would be combined into a single state representation that the RL agent's model can process. For a neural network-based RL model, this would typically be a flattened numerical vector. Categorical features would need to be appropriately encoded (e.g., one-hot encoding).

**Next Steps for Implementation (for later):**

*   Define the specific numerical or categorical representation for each state feature.
*   Develop the logic within the thesis assistant to collect and compile this information into the state vector at each time step.
*   Ensure the Ethics Module submodules (Usage_Logger, AI_Detector, etc.) are providing the necessary data points in a format that can be easily integrated into the state.

In [19]:
!pip install streamlit gymnasium stable-baselines3
!pip install numpy # Ensure numpy is installed if not already
!pip install pandas # Ensure pandas is installed if not already
!pip install scipy # Ensure scipy is installed if not already

Collecting streamlit
  Downloading streamlit-1.46.0-py3-none-any.whl.metadata (9.0 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.46.0-py3-none-any.whl (10.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m59.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m74.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hI

## Part 1: Configuration Manager

The `RLConfigManager` class is responsible for managing the centralized configuration of the Reinforcement Learning system for the Thesis Assistant's ethics module. This configuration is dynamic, allowing developers to define and modify key aspects of the RL environment and agent behavior without directly altering the core code.

**Purpose:**
The primary purpose of this class is to provide a persistent and easily modifiable way to store the settings that govern the RL agent's learning and decision-making process. This includes defining the observable state variables, the available actions the agent can take, the reward values associated with different events, and how each action influences the state.

**Key Components:**
- `CONFIG_FILE`: A class variable specifying the name of the JSON file used for storing the configuration (`rl_config.json`).
- `load_config()`: A class method that loads the configuration from the `CONFIG_FILE`. If the file does not exist, it creates a default configuration and saves it.
- `save_config(config)`: A class method that saves a given configuration dictionary to the `CONFIG_FILE`.

**How to Use:**
- To get the current configuration, call `RLConfigManager.load_config()`. This will return a dictionary containing the settings.
- To update the configuration, modify the dictionary obtained from `load_config()` and then call `RLConfigManager.save_config(updated_config)`.

**Interaction with Other Components:**
- **Developer Dashboard:** The `DeveloperDashboard` uses `RLConfigManager` to load and save the configuration, allowing developers to interactively modify the settings via a Streamlit interface.
- **Data Preprocessor:** The `DataPreprocessor` uses the configuration (specifically, `state_variables` and `reward_config`) to determine how to convert raw usage logs into state vectors and compute rewards.
- **RL Environment:** The `EthicsSupervisorEnv` is built dynamically based on the configuration loaded by `RLConfigManager`, defining its observation space, action space, and state transition logic (`action_effects`).
- **RL Training Loop and Trainer:** These components implicitly rely on the configuration loaded by `RLConfigManager` via the environment and preprocessor.

**Data Structures and Configuration:**
The configuration is stored in a JSON file (`rl_config.json`) and represented in Python as a dictionary with the following structure:
- "state_variables": A list of strings, where each string is the name of a variable that constitutes the state observed by the RL agent.
- "actions": A list of strings, where each string is a human-readable label for an action the RL agent can take. The index of the action in this list is the action ID used by the RL model.
- "reward_config": A dictionary mapping event names (as found in usage logs) to numerical reward values.
- "action_effects": A nested dictionary defining how actions change state variables. The outer keys are string representations of action indices, and the inner dictionaries map state variable names to the delta value (change) applied to that variable when the action is taken.


In [20]:
# PART 1 — CONFIGURATION MANAGER
# ===========================================================
import json
import os
import gymnasium as gym
import streamlit as st
import numpy as np
import random
from stable_baselines3 import PPO

class RLConfigManager:
    """
    Manages centralized configuration for the dynamic Reinforcement Learning system.

    This class handles loading and saving configuration settings for the RL environment,
    including state variables, action space definitions, reward shaping values,
    and the effects of actions on state variables.
    """

    CONFIG_FILE = "rl_config.json"

    @classmethod
    def load_config(cls):
        """
        Load RL configuration from a JSON file.

        If the configuration file does not exist, a default configuration is created
        and saved to the file.

        Returns:
            config (dict): The loaded configuration dictionary.
        """
        if not os.path.exists(cls.CONFIG_FILE):
            # Define a default configuration if the file is not found
            default_config = {
                "state_variables": ["embedding_drift", "ai_usage", "ethical_flags", "advisor_feedback", "timestep"],
                "actions": ["Allow prompt", "Suggest reflection", "Ethical warning", "Suggest rewriting", "Advisor feedback reminder", "Disable AI feature"],
                "reward_config": {"user_revised": 2, "ai_violation": -3, "advisor_positive": 3, "rewrite_accepted": 1, "milestone_completed": 5, "hallucination_detected": -2},
                "action_effects": {"1": {"ai_usage": -0.05, "embedding_drift": -0.05}, "3": {"ethical_flags": -0.1}, "4": {"advisor_feedback": 0.1}}
            }
            cls.save_config(default_config)
        # Load the configuration from the JSON file
        with open(cls.CONFIG_FILE, "r") as f:
            return json.load(f)

    @classmethod
    def save_config(cls, config):
        """
        Save the current configuration to the JSON file.

        Args:
            config (dict): The configuration dictionary to save.
        """
        with open(cls.CONFIG_FILE, "w") as f:
            json.dump(config, f, indent=2) # Save with indentation for readability


## Part 2: Developer Dashboard

The `DeveloperDashboard` class provides a graphical user interface (GUI) built with Streamlit, allowing developers to interactively configure the Reinforcement Learning system. This dashboard simplifies the process of modifying the state space, action space, reward shaping, and action effects without directly editing the configuration JSON file.

**Purpose:**
The main purpose is to offer a user-friendly way for developers to experiment with and tune the RL agent's behavior and the environment's dynamics. This is crucial during the development and testing phases of the RL ethics supervisor.

**Key Components:**
- `__init__()`: Initializes the dashboard by loading the current configuration using `RLConfigManager.load_config()`.
- `launch()`: The main method to launch the Streamlit application. It sets the title and calls methods to display and edit different parts of the configuration.
- `edit_action_space()`: Displays the current actions and provides an input field and button to add new actions to the configuration.
- `edit_reward_shaping()`: Displays sliders for each item in the `reward_config`, allowing developers to adjust the reward values.
- `edit_action_effects()`: Provides input fields to define how a specific action (identified by index) affects a specific state variable, allowing developers to add these effects to the `action_effects` dictionary in the configuration.
- `save_button()`: Displays a button that, when clicked, saves the currently modified configuration back to the `rl_config.json` file using `RLConfigManager.save_config()`.

**How to Use:**
- To run the dashboard, execute the `launch()` method of an instance of `DeveloperDashboard`. Note that Streamlit applications are typically run from the command line using `streamlit run your_script_name.py`. In a notebook environment, you might need specific integrations or run the relevant cell and access the output via a provided URL.
- Use the input fields, sliders, and buttons in the web interface to modify the configuration settings.
- Click the "Save Full Configuration" button to persist the changes.

**Interaction with Other Components:**
- **Configuration Manager:** The `DeveloperDashboard` directly interacts with `RLConfigManager` to load the initial configuration when launched and to save the updated configuration.
- **RL System Components:** The changes made through the dashboard directly influence how the `DataPreprocessor`, `EthicsSupervisorEnv`, and the RL agent (`EthicsSupervisorRL`) behave when they load the updated configuration.

**Data Structures and Configuration:**
The dashboard works directly with the dictionary structure managed by `RLConfigManager`. Changes made in the UI are reflected in the `self.config` dictionary within the `DeveloperDashboard` instance before being saved.

In [21]:
# PART 2: DEVELOPER DASHBOARD (Class-Based Streamlit Interface with Full Docstrings)
# ===========================================================

class DeveloperDashboard:
    """
    Streamlit-based developer interface for interactively updating RL configuration.

    This dashboard allows developers to view and modify the action space, reward shaping,
    and action effects defined in the RL configuration file.
    """

    def __init__(self):
        """
        Initialize the dashboard by loading the current configuration.
        """
        self.config = RLConfigManager.load_config()

    def launch(self):
        """
        Launch the Streamlit dashboard interface.
        """
        st.title("🎯 Thesis RL Developer Dashboard")
        self.edit_action_space()
        self.edit_reward_shaping()
        self.edit_action_effects()
        self.save_button()

    def edit_action_space(self):
        """
        Display and edit the RL action space.

        Developers can add new high-level intervention actions here.
        """
        st.header("1️⃣ Manage Action Space")
        st.write("Define high-level interventions available to RL agent:")

        st.subheader("Current Actions:")
        # Display current actions with their indices
        for idx, action in enumerate(self.config["actions"]):
            st.write(f"**{idx}:** {action}")

        new_action = st.text_input("Add New Action:")
        if st.button("Add Action"):
            if new_action.strip(): # Ensure the input is not empty or just whitespace
                self.config["actions"].append(new_action.strip())
                st.success(f"✅ Added action: '{new_action}'")

    def edit_reward_shaping(self):
        """
        Display and edit the reward shaping values.

        Developers can adjust the reward values associated with key supervision signals.
        """
        st.header("2️⃣ Edit Reward Shaping")
        st.write("Adjust reward values for key supervision signals:")

        # Create sliders for each reward configuration item
        for key, val in self.config["reward_config"].items():
            new_val = st.slider(f"Reward for {key}:", -10, 10, val)
            self.config["reward_config"][key] = new_val

    def edit_action_effects(self):
        """
        Display and define the effects of actions on state variables.

        Developers can specify how choosing a particular action changes the values
        of specific state variables.
        """
        st.header("3️⃣ Define Action Effects")
        st.write("Specify which state variables are influenced by actions:")

        action_idx = st.text_input("Action Index (integer):", value="1")
        variable_name = st.text_input("State Variable (e.g. ai_usage):")
        delta = st.number_input("Delta Change (+/-):", step=0.01, value=0.0)

        if st.button("Add Effect"):
            # Add the defined effect to the configuration
            effects = self.config.setdefault("action_effects", {}) # Get or create action_effects dictionary
            action_effect = effects.setdefault(str(action_idx), {}) # Get or create effects for the specific action index
            action_effect[variable_name] = delta
            st.success(f"✅ Effect added: Action {action_idx} → {variable_name} += {delta}")

    def save_button(self):
        """
        Provide a save button to persist updated configuration back to disk.
        """
        if st.button("Save Full Configuration"):
            RLConfigManager.save_config(self.config)
            st.success("✅ All changes saved successfully!")


## Part 3: Data Preprocessor

The `DataPreprocessor` class is a crucial component of the RL training pipeline. Its role is to bridge the gap between the raw usage logs generated by the thesis assistant and the structured input required by the Reinforcement Learning environment and agent.

**Purpose:**
The main purpose is to transform detailed log entries, which capture various events and state information from a user's interaction with the assistant, into a standardized numerical state vector that the RL agent can observe and a corresponding reward signal based on the events in the log.

**Key Components:**
- `__init__(config)`: Initializes the preprocessor with the current RL configuration, which includes definitions of state variables and reward mapping.
- `extract_state(log_entry)`: Takes a single log entry (a dictionary) and converts it into a normalized NumPy array representing the state vector. It selects the relevant information from the log entry based on the `state_variables` defined in the configuration.
- `compute_reward(log_entry)`: Calculates the reward associated with a log entry. It looks for specific event keys within the log entry (as defined in the `reward_config` in the configuration) and sums up the corresponding reward values.

**How to Use:**
- Create an instance of `DataPreprocessor`, passing the loaded RL configuration: `preprocessor = DataPreprocessor(config)`.
- For each raw usage log entry (a dictionary), call `preprocessor.extract_state(log_entry)` to get the state vector and `preprocessor.compute_reward(log_entry)` to get the reward.

**Interaction with Other Components:**
- **Configuration Manager:** The `DataPreprocessor` relies heavily on the configuration loaded by `RLConfigManager` to know which log attributes correspond to state variables and how to map events to reward values.
- **RL Training Loop:** The `RLTrainingLoop` uses the `DataPreprocessor` to process batches of logs before feeding the resulting state and reward information (or using it to update the environment's state and compute rewards in a more interactive simulation) to the RL agent for training.
- **Simulators (Synthetic Cohort/Student):** The simulator classes generate log entries that are in a format expected by the `DataPreprocessor`.

**Data Structures and Configuration:**
- It processes input in the form of dictionaries (log entries).
- It outputs a NumPy array for the state and a float for the reward.
- It uses the `state_variables` and `reward_config` from the RL configuration dictionary to perform the conversion and computation. The keys in `reward_config` are expected to potentially appear as boolean flags or other relevant values in the input `log_entry` dictionaries.

In [22]:
# PART 3 — DATA PREPROCESSOR (LOG TO STATE CONVERSION)
# ===========================================================

class DataPreprocessor:
    """
    Converts raw usage logs into RL state vectors and reward labels.

    This class is responsible for transforming the detailed log entries
    from the thesis assistant's usage into a format (state vectors and rewards)
    that the Reinforcement Learning agent can understand and use for training.
    """

    def __init__(self, config):
        """
        Initialize with the current RL configuration structure.

        Args:
            config (dict): Loaded RL configuration dictionary.
        """
        self.config = config

    def extract_state(self, log_entry):
        """
        Convert a single log entry into an RL state vector.

        The state vector is constructed based on the state variables defined
        in the loaded configuration. Values are normalized where appropriate.

        Args:
            log_entry (dict): A single usage log entry.

        Returns:
            state (np.ndarray): The normalized RL state vector.
        """
        state = []
        # Iterate through state variables defined in the config
        for var in self.config["state_variables"]:
            if var == "timestep":
                # For timestep, use deadline_ratio from the log entry and normalize (assuming max timestep is 100 for normalization)
                state.append(log_entry.get("deadline_ratio", 0.0))
            else:
                # For other variables, get the value directly from the log entry (defaulting to 0.0 if not present)
                value = log_entry.get(var, 0.0)
                # Ensure the value is a float for consistency
                state.append(float(value))
        return np.array(state, dtype=np.float32) # Ensure float32 dtype for compatibility with RL libraries


    def compute_reward(self, log_entry):
        """
        Compute the shaped reward for a given log event.

        The reward is calculated based on the 'reward_config' in the loaded
        configuration and the presence of specific keys (representing events)
        in the log entry.

        Args:
            log_entry (dict): A single usage log entry.

        Returns:
            reward (float): The computed reward value.
        """
        reward = 0.0
        # Iterate through the reward configuration items
        for key, value in self.config["reward_config"].items():
            # Check if the key exists in the log entry and its value is True (for boolean events)
            # This assumes the reward_config keys correspond to boolean flags in the log entry
            if key in log_entry and log_entry.get(key) is True:
                reward += value
        return reward

## Part 4: RL Environment (EthicsSupervisorEnv)

The `EthicsSupervisorEnv` class is a custom Reinforcement Learning environment built using the Gymnasium library. It is designed to simulate the interaction between the Thesis Assistant's ethics module and a student's thesis writing process, allowing an RL agent to learn optimal intervention policies. A key feature is its dynamic nature, where the state space, action space, rewards, and state transitions are defined by the loaded configuration.

**Purpose:**
To provide a simulated environment where the RL agent (the Ethics Supervisor) can learn through trial and error. The environment presents states representing the current situation (ethical flags, AI usage, progress, etc.) and provides feedback (rewards) based on the agent's chosen actions and the resulting changes in the simulated student's state.

**Key Components:**
- `__init__(ethics_module, config)`: Initializes the environment. It takes a simulated or actual `ethics_module` object (which holds the state variables) and the RL configuration. It dynamically defines the `observation_space` and `action_space` based on the configuration.
- `reset(seed=None)`: Resets the environment to an initial state at the beginning of a new simulation episode. It also resets the internal timestep counter.
- `step(action)`: Executes one step in the environment based on the `action` taken by the RL agent. It computes a reward, applies the effects of the action to the state (using `_apply_action_effects`), increments the timestep, and determines if the episode is done.
- `_get_state()`: Constructs the current state vector observed by the agent. It gathers the values of the state variables from the `ethics_module` object based on the configuration and normalizes them.
- `_compute_reward(action)`: Calculates the reward signal for the current step. The provided implementation is a placeholder that considers the timestep and a simple cost related to the action index. In a more complete system, this would incorporate ethical outcomes, user feedback, advisor input, etc.
- `_apply_action_effects(action)`: Updates the state variables in the `ethics_module` object based on the effects defined for the taken action in the configuration.

**How to Use:**
- Initialize the environment: `env = EthicsSupervisorEnv(mock_ethics_module_instance, config)`. The `mock_ethics_module_instance` should be an object (like `MockEthicsModule`) that holds the current values of the state variables defined in the config.
- Call `env.reset()` to start a new episode.
- In a training or inference loop, get an action from the RL agent and call `env.step(action)` to advance the simulation. The `step` method returns the next state, reward, and episode status.

**Interaction with Other Components:**
- **Mock Ethics Module:** The environment directly reads state variable values from and writes updated values to an instance of `MockEthicsModule` (or a similar object representing the system state).
- **Configuration Manager:** The environment's fundamental structure (state and action spaces, action effects) is determined by the configuration loaded by `RLConfigManager`.
- **PPO Supervisor (RL Agent):** The `EthicsSupervisorRL` class interacts with the `EthicsSupervisorEnv` to train the PPO model by calling `reset()` and `step()` and receiving state and reward information.
- **Data Preprocessor:** While not directly used by the environment during a `step`, the state variables and reward structure defined in the configuration used by the environment are consistent with what the `DataPreprocessor` expects when processing raw logs.

**Data Structures and Configuration:**
- The environment's state is represented as a NumPy array (`observation_space`).
- Actions are discrete integers (`action_space`).
- It uses the `state_variables`, `actions`, and `action_effects` dictionaries from the RL configuration to define its behavior.
- The `_compute_reward` method is a placeholder and would ideally use the `reward_config` from the configuration and events from the `ethics_module` state.

In [23]:
# PART 4: RL ENVIRONMENT (Fully Dynamic Gym-Compatible Environment)
# ===========================================================
#
# This module defines the RL environment the PPO agent interacts with.
#
# - Fully config-driven:
#   - The state space (variables used)
#   - The action space (interventions)
#   - The reward function
#   - The state update effects per action
#
# -----------------------------------------------------------
# ✅ WHY FULLY DYNAMIC STATE?
# -----------------------------------------------------------
# - Allows easy expansion of system complexity.
# - Developers can add new state features via config without touching any code.
# - Keeps RL model compatible with evolving assistant behavior.
#
# -----------------------------------------------------------
# ✅ KEY CONCEPTS (NOW FULLY DYNAMIC):
# -----------------------------------------------------------
# - State Variables: loaded from `state_variables` in config
# - Action Effects: loaded from `action_effects` in config
# - Observation Space: dynamically computed based on config
# ===========================================================

class EthicsSupervisorEnv(gym.Env):
    """
    Fully dynamic Gym-compatible RL environment for the Thesis Assistant Ethics Supervisor.

    This environment simulates the state of a student's thesis progress and ethical
    interactions, allowing an RL agent to learn intervention policies. The environment's
    structure (state space, action space, rewards, and transitions) is dynamically
    defined by the provided configuration dictionary.

    Attributes:
        ethics_module (MockEthicsModule): A simulated or actual system state object
                                           that holds the current values of the state variables.
        config (dict): The loaded RL configuration dictionary.
        state_variables (list): List of state variable names defined in the config.
        actions (list): List of available action labels defined in the config.
        action_effects (dict): Dictionary specifying how actions affect state variables (from config).
        observation_space (gym.spaces.Box): Dynamically computed continuous state space.
        action_space (gym.spaces.Discrete): Discrete action space based on the number of actions.
        timestep (int): The current simulation step count within an episode.
    """

    def __init__(self, ethics_module, config):
        """
        Initialize the environment.

        Args:
            ethics_module (MockEthicsModule): An external system state simulator
                                               (or actual system interface).
            config (dict): The full RL configuration dictionary.
        """
        super().__init__()

        self.ethics_module = ethics_module
        self.config = config

        self.state_variables = config["state_variables"]
        self.actions = config["actions"]
        self.action_effects = config.get("action_effects", {})

        # Fully dynamic observation space size based on the number of state variables
        self.observation_space = gym.spaces.Box(
            low=0, high=1, shape=(len(self.state_variables),), dtype=np.float32
        )

        # Discrete action space based on the number of defined actions
        self.action_space = gym.spaces.Discrete(len(self.actions))
        self.timestep = 0

    def reset(self, seed=None):
        """
        Reset the environment at the start of a new episode.

        Args:
            seed (int, optional): Seed for random number generation. Defaults to None.

        Returns:
            tuple: A tuple containing the initial state and an info dictionary.
                   (state (np.array), info (dict))
        """
        super().reset(seed=seed) # Call the parent reset method
        self.timestep = 0
        # Reset the ethics module state for a new episode (assuming ethics_module has a reset method)
        if hasattr(self.ethics_module, 'reset'):
             self.ethics_module.reset()
        return self._get_state(), {} # Return the initial state and an empty info dict

    def step(self, action):
        """
        Execute one interaction step in the environment.

        The agent selects an action, which may affect the environment's state.
        A reward is computed, and the environment transitions to the next state.

        Args:
            action (int): The action index selected by the RL agent.

        Returns:
            tuple: A tuple containing the next state, the reward, a 'done' flag,
                   a 'truncated' flag, and an info dictionary.
                   (state (np.array), reward (float), done (bool), truncated (bool), info (dict))
        """
        # Compute the reward for the chosen action
        reward = self._compute_reward(action)
        # Apply the effects of the action to the environment's state
        self._apply_action_effects(action)
        self.timestep += 1
        # Determine if the episode is finished (e.g., based on timestep limit)
        done = (self.timestep >= 100) # Example termination condition
        return self._get_state(), reward, done, False, {} # truncated is False, info is empty dict

    def _get_state(self):
        """
        Construct the normalized state vector fully dynamically.

        This method gathers the current values of the state variables from the
        `ethics_module` object based on the configuration and formats them
        into a normalized NumPy array.

        Returns:
            np.array: The normalized state vector based on config-defined variables.
        """
        state = []
        # Iterate through the state variables defined in the configuration
        for var in self.state_variables:
            if var == "timestep":
                # Normalize timestep (assuming maximum 100 steps for normalization)
                state.append(self.timestep / 100.0)
            else:
                # Get the value from the ethics_module object, defaulting to 0.0 if the attribute doesn't exist
                value = getattr(self.ethics_module, var, 0.0)
                state.append(value)
        return np.array(state, dtype=np.float32) # Ensure float32 dtype for compatibility with RL libraries

    def _compute_reward(self, action):
        """
        Compute the reward for the chosen action.

        This is a placeholder reward function. In a real system, this would
        be more complex, potentially incorporating feedback from the user,
        advisor, and ethical violation signals.

        Args:
            action (int): The selected action index.

        Returns:
            float: The computed reward value.
        """
        base_reward = 1.0
        # Scale reward based on timestep (encouraging progress over time)
        time_scaling = 1 + 2.0 * min(1.0, self.timestep / 80.0)
        # Simple API cost based on action index (assuming higher indices are more "costly" actions)
        api_cost = 0.002 * action
        lambda_cost = 10 # Weight for the API cost
        return base_reward * time_scaling - lambda_cost * api_cost

    def _apply_action_effects(self, action):
        """
        Dynamically apply the effects of the selected action to the system state.

        Based on the 'action_effects' defined in the configuration, this method
        updates the corresponding state variables in the `ethics_module` object.

        Args:
            action (int): The selected action index.
        """
        # Get the effects defined for the chosen action (if any)
        effects = self.action_effects.get(str(action), {})
        # Iterate through the variables and their corresponding delta changes for this action
        for variable, delta in effects.items():
            current_value = getattr(self.ethics_module, variable, None)
            if current_value is not None:
                # Update the state variable, clipping the value between 0.0 and 1.0
                updated_value = np.clip(current_value + delta, 0.0, 1.0)
                setattr(self.ethics_module, variable, updated_value)


**Reasoning**:
The previous command failed because I am still incorrectly using `code_block` for markdown content. I will create a markdown cell to document the PPO Supervisor.



## Part 5: PPO Supervisor (EthicsSupervisorRL)

The `EthicsSupervisorRL` class is responsible for managing the Proximal Policy Optimization (PPO) agent, which serves as the core Reinforcement Learning component of the ethics module. This class handles the initialization, training, saving, loading, and action recommendation (inference) for the PPO model.

**Purpose:**
To implement and control the RL agent that learns an optimal policy for intervening in the thesis writing process to promote ethical behavior and positive outcomes. It trains the agent by interacting with the `EthicsSupervisorEnv` and provides action recommendations based on the current state.

**Key Components:**
- `__init__(config, model_path="ethics_rl_model")`: Initializes the PPO supervisor. It creates an instance of the `MockEthicsModule` (to represent the system state), initializes the `EthicsSupervisorEnv` using the provided configuration, and either loads a pre-trained PPO model from `model_path` or initializes a new PPO model if no saved model is found.
- `train(timesteps=50000)`: Trains the PPO model for a specified number of environment interaction steps. It calls the `learn()` method of the Stable Baselines3 PPO model, which handles the data collection (interacting with the environment), policy optimization, and value function updates. After training, it saves the updated model.
- `recommend_action()`: Takes the current state from the environment, feeds it to the trained PPO model's policy, and returns the recommended action index. This is the method used during online operation to get the agent's decision.

**How to Use:**
- Initialize the supervisor: `rl_supervisor = EthicsSupervisorRL(config, model_path="my_model")`. This will either load an existing model or create a new one.
- To train the model, call `rl_supervisor.train(timesteps=100000)`.
- To get an action recommendation based on the current environment state, call `action = rl_supervisor.recommend_action()`.

**Interaction with Other Components:**
- **Configuration Manager:** The supervisor uses the configuration (loaded indirectly via the environment initialization) to define the RL problem (state/action spaces).
- **RL Environment:** The supervisor interacts directly with the `EthicsSupervisorEnv` during training (calling `env.step()`) and inference (getting the state via `env._get_state()`).
- **Mock Ethics Module:** The supervisor initializes an instance of `MockEthicsModule` and passes it to the environment. The environment then uses this object to manage the state.
- **RL Training Loop:** The `RLTrainingLoop` uses the `EthicsSupervisorRL` instance to perform the actual training (`trainer.train()`) and potentially get action recommendations (`trainer.recommend_action()`) within its training orchestration.
- **Synthetic Pretrainer:** The `SyntheticRLPretrainer` uses the `RLTrainingLoop`, which in turn uses `EthicsSupervisorRL`, to train the model on synthetic data.

**Data Structures and Configuration:**
- It manages a `stable_baselines3.PPO` model.
- It interacts with the environment using NumPy arrays for states and integers for actions.
- The PPO model's policy and value function are learned from the experience collected by interacting with the `EthicsSupervisorEnv`, which is configured using the dictionary loaded by `RLConfigManager`.


===========================================================

APPENDIX — PPO SUITABILITY ANALYSIS FOR THESIS ASSISTANT

===========================================================

Analysis: Strengths and Limitations of PPO for Thesis Assistant RL System

✅ PROS (Why PPO is suitable globally):

Stable policy optimization even in high-dimensional state spaces.

Supports multiple complex actions (advisory interventions, ethical warnings, etc).

Optimizes long-term reward (handles delayed ethical consequences).

Clipping mechanism stabilizes policy updates (critical for safe ethical behavior).

Can be trained globally across many users for a general ethical baseline.


⚠ CONS (Limitations for personalization scenario):

Requires many training samples to fully converge (sample inefficient).

Slow adaptation when applied directly to new individual students.

May not personalize fast enough during limited thesis timeframe (6-12 months).

Potential difficulty adapting to individual personality shifts quickly.

PPO only indirectly receives feedback via reward — few-shot adaptation is hard.


✅ RECOMMENDED STRATEGY:

Use PPO for global pretraining across many students (shared ethical policy).

Introduce a lightweight per-student adaptation layer (small fine-tuning component).

Combine PPO with human-in-the-loop reward shaping for rapid personalization.

Consider hybrid architecture with PPO + bandits or meta-RL elements for few-shot adjustments.


This hybrid design balances PPO’s global stability with efficient short-term personalization needs of the thesis assistant. """
"""

In [50]:
# PART 5: PPO SUPERVISOR (Reinforcement Learning Agent Controller)
# ===========================================================
#
# This module manages the PPO RL agent training, saving, loading, and inference.
#
# - Clean separation of agent control logic from environment definition.
# - Compatible with stable-baselines3 PPO implementation.
# - Supports continual training and model persistence.
#
# -----------------------------------------------------------
# ✅ KEY CONCEPTS:
# -----------------------------------------------------------
# - PPO (Proximal Policy Optimization): modern stable RL algorithm
# - Continual training: keep refining policy incrementally
# - Safe reloading: easily resume training from saved checkpoints
# ===========================================================

class MockEthicsModule:
    """
    A mock class simulating the state of the ethics module and thesis progress.

    This class is used by the RL environment to represent the system state.
    It includes attributes that correspond to the state variables defined
    in the RL configuration.
    """
    def __init__(self):
        """
        Initialize the mock ethics module with random initial states.
        """
        # Initialize attributes that the environment might try to access
        self.embedding_drift = np.random.rand()
        self.ai_usage = np.random.rand()
        self.ethical_flags = np.random.rand()
        self.advisor_feedback = np.random.rand()
        self.deadline_ratio = np.random.rand() # Represents progress towards deadline (0.0 to 1.0)
        self.emotion_state = np.random.rand() # Added emotion_state attribute
        # Add attributes required by DataPreprocessor.extract_state (used for reward calculation)
        self.user_revised = random.choice([True, False])
        self.ai_violation = random.choice([True, False])
        self.advisor_positive = random.choice([True, False])
        self.rewrite_accepted = random.choice([True, False])
        self.milestone_completed = random.choice([True, False])
        self.hallucination_detected = random.choice([True, False])
        self.prompt = "mock prompt" # Placeholder for prompt text
        self.intent = "mock intent" # Placeholder for intent
        self.thesis_stage = "mock stage" # Placeholder for thesis stage

    def reset(self):
        """
        Reset the mock ethics module state for a new episode.
        """
        self.embedding_drift = np.random.rand()
        self.ai_usage = np.random.rand()
        self.ethical_flags = np.random.rand()
        self.advisor_feedback = np.random.rand()
        self.deadline_ratio = np.random.rand()
        self.emotion_state = np.random.rand() # Added emotion_state attribute to reset
        self.user_revised = random.choice([True, False])
        self.ai_violation = random.choice([True, False])
        self.advisor_positive = random.choice([True, False])
        self.rewrite_accepted = random.choice([True, False])
        self.milestone_completed = random.choice([True, False])
        self.hallucination_detected = random.choice([True, False])
        self.prompt = "mock prompt"
        self.intent = "mock intent"
        self.thesis_stage = "mock stage"


class EthicsSupervisorRL:
    """
    PPO Supervisor class controlling RL training and inference for the Ethics Supervisor.

    This class initializes, trains, saves, and loads the PPO model that acts
     as the RL agent for the ethics module. It interacts with the
    `EthicsSupervisorEnv` to learn the optimal intervention policy.

    Attributes:
        ethics_module (MockEthicsModule): Simulated system state object.
        env (EthicsSupervisorEnv): The RL environment instance.
        model (PPO): The Stable Baselines3 PPO policy model.
        model_path (str): The file path for saving and loading the PPO model.
    """

    def __init__(self, config, model_path="ethics_rl_model"):
        """
        Initialize the PPO agent.

        Args:
            config (dict): The RL configuration dictionary.
            model_path (str): The storage path for saving/loading the PPO model.
                              Defaults to "ethics_rl_model".
        """

        self.ethics_module = MockEthicsModule() # Initialize the mock ethics module
        self.env = EthicsSupervisorEnv(self.ethics_module, config) # Initialize the RL environment
        self.model_path = model_path

        # Load an existing model if available, otherwise initialize a new one
        if os.path.exists(model_path + ".zip"):
            self.model = PPO.load(model_path, env=self.env)
            print("Loaded pretrained RL model.")
        else:
            # Initialize a new PPO model with an MlpPolicy
            self.model = PPO("MlpPolicy", self.env, verbose=0) # verbose=0 to suppress training output
            print("Initialized new PPO model.")

    def train(self, timesteps=50000):
        """
        Train the PPO model for a specified number of timesteps.

        The model interacts with the environment to collect experience and update
        its policy.

        Args:
            timesteps (int): The total number of environment steps to train for.
                             Defaults to 50000.
        """
        self.model.learn(total_timesteps=timesteps)
        self.model.save(self.model_path) # Save the model after training
        print("Training complete and model saved.")

    def recommend_action(self):
        """
        Predict the next action based on the current state of the environment.

        This method uses the trained PPO policy to select an action given the
        current observation from the environment.

        Returns:
            int: The action index selected by the PPO policy.
        """
        state = self.env._get_state() # Get the current state from the environment
        # Predict the action using the trained model in deterministic mode
        action, _ = self.model.predict(state, deterministic=True)
        # The action is returned as a NumPy array, so extract the scalar value
        return int(action)

## Part 6: Continual RL Training Loop

The `RLTrainingLoop` class is designed to orchestrate the process of continually training the Reinforcement Learning agent using batches of usage logs. It acts as a bridge between the raw data (logs) and the RL training process managed by the `EthicsSupervisorRL`.

**Purpose:**
To facilitate the training of the RL agent using collected usage data. It takes batches of logs, processes them into states and rewards using the `DataPreprocessor`, and then triggers the training of the PPO agent managed by the `EthicsSupervisorRL`. This allows for updating the agent's policy as new data becomes available.

**Key Components:**
- `__init__(config, model_path="ppo_ethics_model")`: Initializes the training loop. It loads the configuration, initializes a `DataPreprocessor` instance, and initializes an `EthicsSupervisorRL` instance (which handles the PPO model).
- `run_training_day(log_batch)`: The main method for processing a batch of logs. It iterates through each log entry in the `log_batch`, uses the `DataPreprocessor` to extract the state and compute the reward, prints this information, and then calls the `train()` method of the `EthicsSupervisorRL` instance to update the PPO model using the processed data. It also demonstrates getting a recommended action after training.

**How to Use:**
- Initialize the training loop: `training_loop = RLTrainingLoop(config)`.
- Provide a batch of usage logs (a list of dictionaries) to the `run_training_day()` method: `training_loop.run_training_day(my_log_batch)`.

**Interaction with Other Components:**
- **Configuration Manager:** The training loop loads the configuration via `RLConfigManager` during its initialization.
- **Data Preprocessor:** It uses the `DataPreprocessor` instance to convert raw log entries into state vectors and reward values.
- **PPO Supervisor (RL Agent):** It uses the `EthicsSupervisorRL` instance to perform the actual training of the PPO model using the processed log data.
- **Synthetic Pretrainer and RL Training Launcher:** These classes utilize the `RLTrainingLoop` to execute the training process, providing it with batches of logs (either synthetic or real).

**Data Structures and Configuration:**
- It takes a list of dictionaries (log entries) as input for training.
- It uses the `DataPreprocessor` to work with state vectors (NumPy arrays) and reward values (floats).
- The training process itself is managed by the `EthicsSupervisorRL` class, which interacts with the `EthicsSupervisorEnv` based on the configuration.

**Reasoning**:
The previous command failed because I am still incorrectly using `code_block` for markdown content. I will create a markdown cell to document the Continual RL Training Loop.



In [25]:

# PART 6 — CONTINUAL RL TRAINING LOOP
# ===========================================================

class RLTrainingLoop:
    """
    Orchestrates the continual RL training process using batches of usage logs.

    This class manages the flow of data from usage logs to the RL training process.
    It uses the `DataPreprocessor` to convert logs into states and rewards and
    the `EthicsSupervisorRL` to train the PPO agent.
    """

    def __init__(self, config, model_path="ppo_ethics_model"):
        """
        Initialize training loop components.

        Args:
            config (dict): The loaded RL configuration.
            model_path (str): The path to the PPO model storage. Defaults to "ppo_ethics_model".
        """
        self.config = RLConfigManager.load_config() # Load configuration
        self.preprocessor = DataPreprocessor(self.config) # Initialize data preprocessor
        # Initialize the RL trainer, passing the config and model path
        self.trainer = EthicsSupervisorRL(self.config, model_path)

    def run_training_day(self, log_batch):
        """
        Process one batch of usage logs and train the PPO agent.

        This method iterates through the provided log batch, processes each log
        into a state and reward, and then uses these to train the RL agent.
        Note: In a true online RL setting, training would occur more frequently
        and interactively with the environment. This simulates batch training
        from collected logs.

        Args:
            log_batch (list of dict): A list of usage logs for one training day or batch.
        """
        print(f"Processing batch of {len(log_batch)} logs for training...")
        # In a real RL loop, you would accumulate experiences (state, action, reward, next_state, done)
        # from interacting with the environment, and then train the agent on these experiences.
        # For this simulated training loop, we will process each log entry and call trainer.train
        # with a small number of timesteps based on the batch size. This is a simplification
        # and not a standard RL training loop, but demonstrates the integration points.
        for log_entry in log_batch:
            state_vector = self.preprocessor.extract_state(log_entry)
            reward = self.preprocessor.compute_reward(log_entry)
            # In a real scenario, you would step the environment with an action and get the next state and reward
            # For this simulated training loop, we'll just print the processed info
            print(f"Processed Log → State: {state_vector}, Reward: {reward}")

        # The training method in EthicsSupervisorRL is named 'train'
        # In a real RL loop, you would accumulate experiences and then train
        # For this simulation, we'll call train after processing the batch, using the batch size as timesteps
        # A more realistic approach would involve running episodes in the environment
        # and training on the collected trajectories.
        self.trainer.train(timesteps=len(log_batch)) # Train for the number of logs in the batch
        # The prediction method in EthicsSupervisorRL is named 'recommend_action'
        # This is just an example of how to use the trained model after training the batch
        action = self.trainer.recommend_action()
        print(f"Recommended action after training on batch: {action}")


## Part 7: Synthetic Thesis Student Simulator (ThesisStudentSimulator)

The `ThesisStudentSimulator` class provides a way to generate synthetic data that mimics the progression and ethical interactions of a thesis student using the assistant. This simulator is essential for generating datasets to pre-train and test the Reinforcement Learning ethics supervisor in a controlled environment.

**Purpose:**
To create realistic (though simplified) sequences of events and state changes that a thesis student might experience over the course of their project. This synthetic data includes simulated metrics like AI usage, ethical flags, advisor feedback, and progress towards the deadline, which are used to train the RL agent.

**Key Components:**
- `__init__(student_type="Stable Performer")`: Initializes the simulator for a specific type of student (e.g., "Conservative", "Aggressive", "Struggling", "Stable Performer"). Each student type has different tendencies influencing how their state variables evolve.
- `evolve_one_step()`: Simulates one step in the student's thesis journey. It updates the state variables based on the student type and progress towards the deadline and generates a dictionary representing a single log entry for this step.
- `grade_final_outcome(last_state)`: A static method that calculates a final grade and reward for a simulated thesis trajectory based on the state of the simulator at the end of the trajectory. This provides a terminal reward signal for the simulation.
- `generate_full_trajectory_with_grading(student_type, trajectory_length=30)`: A static method that runs a full simulation trajectory for a specified student type and length, collecting all log entries and returning the list of logs along with the final grade and terminal reward.

**How to Use:**
- To simulate a single student's journey, create an instance: `student_sim = ThesisStudentSimulator("Struggling")` and repeatedly call `student_sim.evolve_one_step()` to get step-by-step logs.
- To generate a complete trajectory and its grading, call the static method: `logs, grade, reward = ThesisStudentSimulator.generate_full_trajectory_with_grading("Aggressive", trajectory_length=50)`.

**Interaction with Other Components:**
- **Data Preprocessor:** The `evolve_one_step()` method generates log entries in a dictionary format that is compatible with the input expected by the `DataPreprocessor`.
- **Thesis Cohort Simulator:** The `ThesisCohortSimulator` uses the `generate_full_trajectory_with_grading` method to create datasets for multiple students.
- **Synthetic RL Pretrainer:** The `SyntheticRLPretrainer` utilizes the data generated by the simulator (via the `ThesisCohortSimulator`) to train the RL agent.

**Data Structures and Configuration:**
- The simulator maintains internal state variables (e.g., `ai_usage`, `ethical_flags`) as numerical values, typically floats between 0.0 and 1.0.
- It generates output as dictionaries, where each dictionary represents a single usage log entry containing the state variables and simulated event flags.
- The behavior is influenced by the `student_type` string.

In [26]:
# PART 7 — SYNTHETIC THESIS STUDENT SIMULATOR WITH OUTCOME GRADING
# ===========================================================

class ThesisStudentSimulator:
    """
    Simulates the progress and ethical behavior of a synthetic thesis student.

    This class generates synthetic usage logs and simulates changes in state
    variables (like AI usage, ethical flags, etc.) over time, based on
    predefined student types. It also includes a method to grade the final
    outcome of a simulated thesis trajectory.
    """
    def __init__(self, student_type="Stable Performer"):
        """
        Initialize the simulator for a specific type of student.

        Args:
            student_type (str): The type of student to simulate
                                 ("Conservative", "Aggressive", "Struggling",
                                  "Stable Performer"). Defaults to "Stable Performer".
        """
        self.student_type = student_type
        # Initialize state variables
        self.embedding_drift = 0.2
        self.ai_usage = 0.3
        self.ethical_flags = 0.05
        self.advisor_feedback = 0.6
        self.deadline_ratio = 0.0 # Represents progress towards deadline (0.0 to 1.0)

    def evolve_one_step(self):
        """
        Simulate one step of the student's thesis progress and generate a log entry.

        State variables evolve based on the student type and time progression.

        Returns:
            dict: A dictionary representing a single usage log entry with updated state.
        """
        # Simulate progress towards the deadline
        self.deadline_ratio = min(self.deadline_ratio + 0.03, 1.0) # Increment deadline ratio

        # Simulate changes in state variables based on student type
        if self.student_type == "Conservative":
            self.ai_usage += np.random.normal(0.01, 0.02)
            self.ethical_flags += np.random.normal(0.0, 0.01)
            self.advisor_feedback += np.random.normal(0.02, 0.05)
            self.embedding_drift += np.random.normal(0.01, 0.02)
        elif self.student_type == "Aggressive":
            self.ai_usage += np.random.normal(0.05, 0.05)
            self.ethical_flags += np.random.normal(0.02, 0.03)
            self.advisor_feedback += np.random.normal(-0.02, 0.05)
            self.embedding_drift += np.random.normal(0.03, 0.05)
        elif self.student_type == "Struggling":
            self.ai_usage += np.random.normal(0.03, 0.03)
            self.ethical_flags += np.random.normal(0.05, 0.05)
            self.advisor_feedback += np.random.normal(-0.03, 0.05)
            self.embedding_drift += np.random.normal(0.04, 0.05)
        elif self.student_type == "Stable Performer":
            self.ai_usage += np.random.normal(0.02, 0.02)
            self.ethical_flags += np.random.normal(0.01, 0.01)
            self.advisor_feedback += np.random.normal(0.03, 0.04)
            self.embedding_drift += np.random.normal(0.02, 0.02)

        # Increase ethical flags towards the end of the project (simulating pressure)
        if self.deadline_ratio > 0.8:
            self.ethical_flags += 0.02

        # Simulate some convergence towards target values as the deadline approaches
        convergence_factor = self.deadline_ratio
        self.ai_usage += (0.5 - self.ai_usage) * 0.1 * convergence_factor
        self.ethical_flags += (0.1 - self.ethical_flags) * 0.1 * convergence_factor
        self.advisor_feedback += (0.8 - self.advisor_feedback) * 0.1 * convergence_factor
        self.embedding_drift += (0.3 - self.embedding_drift) * 0.05 * convergence_factor

        # Clip state variables to remain within the [0, 1] range
        self.ai_usage = np.clip(self.ai_usage, 0, 1.0)
        self.ethical_flags = np.clip(self.ethical_flags, 0, 1.0)
        self.advisor_feedback = np.clip(self.advisor_feedback, 0, 1.0)
        self.embedding_drift = np.clip(self.embedding_drift, 0, 1.0)

        # Create a log entry with the current state and some simulated events (for reward calculation)
        log_entry = {
            "embedding_drift": self.embedding_drift,
            "ai_usage": self.ai_usage,
            "ethical_flags": self.ethical_flags,
            "advisor_feedback": self.advisor_feedback,
            "deadline_ratio": self.deadline_ratio,
            # Simulate boolean event flags based on probabilities or state
            "user_revised": random.random() < 0.6, # Probability of user revising content
            "ai_violation": random.random() < self.ethical_flags, # Higher ethical flags increase chance of violation
            "advisor_positive": random.random() < self.advisor_feedback, # Higher feedback increases chance of positive advisor event
            "rewrite_accepted": random.random() < 0.7, # Probability of rewrite suggestion being accepted
            "milestone_completed": random.random() < 0.4, # Probability of completing a milestone
            "hallucination_detected": random.random() < 0.1 # Probability of detecting a hallucination
        }
        return log_entry

    @staticmethod
    def grade_final_outcome(last_state):
        """
        Grades the final outcome of a simulated thesis based on the last state.

        This is a simplified grading function for simulation purposes.

        Args:
            last_state (dict): The final state of the simulator after a trajectory.

        Returns:
            tuple: A tuple containing the grade ("Excellent", "Acceptable", "Failed")
                   and a corresponding numerical reward.
        """
        # Calculate penalties and bonuses based on the final state values
        ai_penalty = (last_state["ai_usage"] - 0.5) * 0.5 # Penalty if AI usage is high relative to 0.5
        ethics_penalty = last_state["ethical_flags"] * 1.5 # Penalty for ethical flags
        advisor_bonus = last_state["advisor_feedback"] * 2.0 # Bonus for positive advisor feedback
        embedding_penalty = last_state["embedding_drift"] * 0.3 # Penalty for high embedding drift
        # Calculate total score
        total_score = advisor_bonus - ethics_penalty - ai_penalty - embedding_penalty

        # Assign grade and reward based on the total score
        if total_score > 1.2:
            return "Excellent", 5.0
        elif total_score > 0:
            return "Acceptable", 2.0
        else:
            return "Failed", -5.0

    @staticmethod
    def generate_full_trajectory_with_grading(student_type, trajectory_length=30):
        """
        Generates a full simulated thesis trajectory for a student and grades it.

        Args:
            student_type (str): The type of student to simulate.
            trajectory_length (int): The number of steps in the simulation trajectory.
                                     Defaults to 30.

        Returns:
            tuple: A tuple containing:
                   - logs (list of dict): The list of log entries generated during the trajectory.
                   - grade (str): The final grade ("Excellent", "Acceptable", "Failed").
                   - reward (float): The terminal reward associated with the final grade.
        """
        student = ThesisStudentSimulator(student_type)
        logs = []
        # Evolve the student's state for the specified trajectory length
        for _ in range(trajectory_length):
            logs.append(student.evolve_one_step())
        # Grade the final outcome based on the last state in the trajectory
        grade, terminal_reward = ThesisStudentSimulator.grade_final_outcome(logs[-1])
        return logs, grade, terminal_reward


## Part 8: Multi-Student Synthetic Cohort Generator (ThesisCohortSimulator)

The `ThesisCohortSimulator` class is designed to generate a dataset of simulated thesis trajectories for an entire cohort of diverse synthetic students. This aggregated dataset is crucial for the initial pre-training of the Reinforcement Learning ethics supervisor, providing a broad range of scenarios and behaviors.

**Purpose:**
To efficiently create a large, varied dataset of synthetic student interactions and outcomes. This dataset is used to train the RL agent to develop a generalizable ethical policy across different student types before any potential per-student fine-tuning.

**Key Components:**
- `STUDENT_TYPES`: A class attribute list defining the different types of students that can be simulated ("Conservative", "Aggressive", "Struggling", "Stable Performer").
- `generate_cohort_dataset(num_students=100, trajectory_length=30)`: A static method that is the primary function of this class. It generates the specified number of synthetic students, each with a randomly assigned type, runs a full simulation trajectory for each using the `ThesisStudentSimulator`, and collects all the resulting logs, final grades, and terminal rewards into a single dataset. It also prints a summary of the final grades distribution within the generated cohort.

**How to Use:**
- To generate a dataset for a cohort of 100 students with trajectories of 30 steps each, simply call the static method: `cohort_data = ThesisCohortSimulator.generate_cohort_dataset(num_students=100, trajectory_length=30)`. The returned `cohort_data` is a list where each element is a dictionary containing the student type, their full trajectory of logs, their final grade, and the terminal reward.

**Interaction with Other Components:**
- **Thesis Student Simulator:** The `ThesisCohortSimulator` relies heavily on the `ThesisStudentSimulator.generate_full_trajectory_with_grading` static method to produce individual student trajectories and outcomes.
- **Synthetic RL Pretrainer:** The `SyntheticRLPretrainer` class uses the `generate_cohort_dataset` method to obtain the large pool of synthetic logs required for pre-training the RL agent. It then flattens the trajectories from this dataset into a single list of logs for the training process.
- **RLTrainingLoop:** Although not directly interacted with by this class, the dataset generated here is ultimately fed into the `RLTrainingLoop` by the `SyntheticRLPretrainer`.

**Data Structures and Configuration:**
- The class uses the predefined `STUDENT_TYPES` list.
- The output is a list of dictionaries, each representing a simulated student with their complete `trajectory` (a list of log dictionaries), their `grade` (string), and `final_reward` (float).


In [27]:
# ===========================================================
# PART 8 — MULTI-STUDENT SYNTHETIC COHORT GENERATOR
# ===========================================================

class ThesisCohortSimulator:
    """
    Generates a dataset of simulated thesis trajectories for a cohort of students.

    This class uses the `ThesisStudentSimulator` to create trajectories for
    multiple students of different types, providing a dataset for training
    and evaluating the RL agent.
    """
    STUDENT_TYPES = ["Conservative", "Aggressive", "Struggling", "Stable Performer"]

    @staticmethod
    def generate_cohort_dataset(num_students=100, trajectory_length=30):
        """
        Generates a dataset of simulated thesis trajectories for a cohort.

        Args:
            num_students (int): The number of students to simulate. Defaults to 100.
            trajectory_length (int): The number of steps in each student's trajectory.
                                     Defaults to 30.

        Returns:
            list of dict: A list of dictionaries, where each dictionary represents
                          a student and contains their trajectory, final grade,
                          and terminal reward.
        """
        dataset = []
        grade_summary = {"Excellent": 0, "Acceptable": 0, "Failed": 0}
        # Generate trajectories for the specified number of students
        for _ in range(num_students):
            # Randomly select a student type
            student_type = random.choice(ThesisCohortSimulator.STUDENT_TYPES)
            # Generate a full trajectory and grade for the student
            logs, grade, terminal_reward = ThesisStudentSimulator.generate_full_trajectory_with_grading(
                student_type, trajectory_length)
            dataset.append({
                "student_type": student_type,
                "trajectory": logs,
                "grade": grade,
                "final_reward": terminal_reward
            })
            grade_summary[grade] += 1 # Count the grades for summary

        # Print a summary of the generated cohort grades
        print("Cohort Generation Complete:")
        for grade, count in grade_summary.items():
            print(f"  {grade}: {count} students")
        return dataset



## Part 9: Synthetic RL Pretraining Pipeline (SyntheticRLPretrainer)

The `SyntheticRLPretrainer` class orchestrates the process of pre-training the Reinforcement Learning ethics supervisor using a large synthetic dataset generated by the `ThesisCohortSimulator`. This step is typically done before applying the RL agent to real student data to provide it with an initial understanding of the environment and ethical considerations.

**Purpose:**
To automate the generation of a comprehensive synthetic dataset and use it to train the `EthicsSupervisorRL` agent, establishing a foundational policy for ethical guidance.

**Key Components:**
- `__init__(config, model_path="ppo_ethics_model")`: Initializes the pretrainer. It takes the RL configuration and the desired model path as input, and internally initializes an `RLTrainingLoop` instance, which in turn manages the `EthicsSupervisorRL` agent.
- `run_synthetic_pretraining(num_students=100, trajectory_length=30)`: The main method to trigger the pretraining process. It first calls the `ThesisCohortSimulator.generate_cohort_dataset` method to get the synthetic data, then flattens the trajectories from all students into a single list of logs, and finally passes this aggregated list of logs to the `RLTrainingLoop.run_training_day` method to train the RL agent.

**How to Use:**
- To run the synthetic pretraining with default settings (100 students, 30 steps per trajectory), after initializing the `RLTrainingLauncher` (which initializes the `SyntheticRLPretrainer`), you would call `launcher.run_synthetic_full_pretraining()`. If you are using the `SyntheticRLPretrainer` directly, you would initialize it with a config and then call `pretrainer.run_synthetic_pretraining(num_students=200, trajectory_length=40)` to specify different parameters.

**Interaction with Other Components:**
- **RLConfigManager:** The pretrainer is initialized with the configuration loaded by `RLConfigManager`, which is then passed down to the `RLTrainingLoop` and `EthicsSupervisorRL`.
- **ThesisCohortSimulator:** It directly calls the `ThesisCohortSimulator.generate_cohort_dataset` static method to obtain the synthetic training data.
- **RLTrainingLoop:** It uses an instance of `RLTrainingLoop` to handle the actual process of feeding logs to the `DataPreprocessor` and training the `EthicsSupervisorRL` agent on this data.
- **EthicsSupervisorRL:** The training of the PPO agent is managed by the `RLTrainingLoop` instance held within the pretrainer.

**Data Structures and Configuration:**
- It works with the list of dictionaries returned by the `ThesisCohortSimulator`, processing the `trajectory` lists within that dataset.
- The training parameters (like the number of students and trajectory length) are passed as arguments to the `run_synthetic_pretraining` method.


In [28]:
# ===========================================================
# PART 9 — SYNTHETIC RL PRETRAINING PIPELINE
# ===========================================================

class SyntheticRLPretrainer:
    """
    Manages the pretraining of the RL agent using synthetic thesis student data.

    This class uses the `ThesisCohortSimulator` to generate a large dataset
    of synthetic logs and then trains the RL agent (`EthicsSupervisorRL`)
    using this data via the `RLTrainingLoop`.
    """
    def __init__(self, config, model_path="ppo_ethics_model"):
        """
        Initialize the synthetic pretrainer.

        Args:
            config (dict): The loaded RL configuration.
            model_path (str): The path to the PPO model storage. Defaults to "ppo_ethics_model".
        """
        self.config = config
        # Initialize the training loop with the config and model path
        self.training_loop = RLTrainingLoop(config, model_path)

    def run_synthetic_pretraining(self, num_students=100, trajectory_length=30):
        """
        Runs the full synthetic pretraining pipeline.

        Generates a synthetic cohort dataset and trains the RL agent on the
        collected trajectories.

        Args:
            num_students (int): The number of synthetic students to generate data for.
                                Defaults to 100.
            trajectory_length (int): The length of each student's trajectory.
                                     Defaults to 30.
        """
        print("\nGenerating synthetic cohort dataset for pretraining...")
        # Generate the synthetic dataset
        dataset = ThesisCohortSimulator.generate_cohort_dataset(num_students, trajectory_length)
        # Flatten the trajectories from all students into a single list of logs
        all_logs = []
        for student in dataset:
            all_logs.extend(student["trajectory"])

        print(f"Total synthetic logs for PPO training: {len(all_logs)}")
        # Run the training loop on the collected synthetic logs
        self.training_loop.run_training_day(all_logs)



## Part 10: Final Launcher (RLTrainingLauncher)

The `RLTrainingLauncher` class serves as the main entry point and master system launcher for the entire thesis RL assistant training and configuration system. It brings together all the previously defined components and provides different modes of operation for development, simulation, and training with real or synthetic data.

**Purpose:**
To provide a single interface for initializing the RL system components, launching the developer dashboard, running simulated training, handling real data training, managing online incremental updates, and executing the full synthetic pretraining pipeline.

**Key Components:**
- `__init__()`: Initializes the launcher by loading the RL configuration using `RLConfigManager`, initializing the `DataPreprocessor`, creating an instance of the `RLTrainingLoop` (which includes the `EthicsSupervisorRL` agent), and initializing the `SyntheticRLPretrainer`.
- `launch_dashboard()`: Launches the Streamlit-based `DeveloperDashboard` for interactive configuration of the RL system.
- `run_simulated_training(days=3, batch_size=5)`: Runs a step-by-step simulation of training using a `MockSimulator` to generate small batches of logs over several simulated "days". This mode is useful for basic testing and debugging of the training loop.
- `run_real_training(real_logs)`: Takes a list of actual collected usage logs (`real_logs`) and feeds them into the `RLTrainingLoop` for training the RL agent on real-world data.
- `run_online_incremental_training(incremental_logs)`: Designed for online learning scenarios. It takes a list of newly collected `incremental_logs` and uses the `RLTrainingLoop` to fine-tune the existing RL model with this new data.
- `run_synthetic_full_pretraining(num_students=100, trajectory_length=30)`: Triggers the full synthetic pretraining pipeline by calling the `run_synthetic_pretraining` method of the `SyntheticRLPretrainer` instance. This generates a large synthetic dataset and trains the RL agent on it.

**How to Use:**
- Instantiate the launcher: `launcher = RLTrainingLauncher()`.
- Select a mode of operation based on user input or script logic:
    - `launcher.launch_dashboard()`: To start the configuration dashboard (requires Streamlit).
    - `launcher.run_simulated_training(days=5, batch_size=10)`: To run a short simulated training session.
    - `launcher.run_real_training(my_real_logs)`: To train with your collected real logs.
    - `launcher.run_online_incremental_training(new_logs)`: To perform incremental online updates with new data.
    - `launcher.run_synthetic_full_pretraining(num_students=200, trajectory_length=50)`: To run the comprehensive synthetic pretraining.
- The `if __name__ == "__main__":` block provides a command-line-like interface to select the mode when the script is run directly.

**Interaction with Other Components:**
- **RLConfigManager:** Used during initialization to load the system configuration.
- **DataPreprocessor:** An instance is held and used by the `RLTrainingLoop` for processing logs.
- **RLTrainingLoop:** An instance is held and used by the launcher to perform training with both simulated and real/incremental logs.
- **SyntheticRLPretrainer:** An instance is held and used to execute the full synthetic pretraining pipeline.
- **DeveloperDashboard:** An instance is created and launched when the 'dashboard' mode is selected.
- **MockSimulator:** An internal mock class used specifically by `run_simulated_training` to generate synthetic logs for that mode.

**Data Structures and Configuration:**
- Relies on the dictionary structure of the RL configuration loaded by `RLConfigManager`.
- Processes lists of log dictionaries, as generated by the simulators or collected from real usage.
- Uses numerical parameters (like `days`, `batch_size`, `num_students`, `trajectory_length`) to control the simulation and training processes.


In [29]:
# ===========================================================
# PART 10 — FINAL LAUNCHER
# ===========================================================

class RLTrainingLauncher:
    """
    The main entry point and master system launcher for the thesis RL assistant training.

    This class orchestrates the different training modes (dashboard, simulated,
    real data, online, synthetic full pretraining) and initializes the
    necessary components (`RLConfigManager`, `DataPreprocessor`,
    `RLTrainingLoop`, `SyntheticRLPretrainer`).
    """
    def __init__(self):
        """
        Initialize the launcher by loading configuration and components.
        """
        # Load the RL configuration
        self.config = RLConfigManager.load_config()
        # Initialize the data preprocessor
        self.preprocessor = DataPreprocessor(self.config)
        # Initialize the RL training loop (this also initializes the RL agent)
        self.training_loop = RLTrainingLoop(self.config)
        # Initialize the synthetic pretrainer
        self.pretrainer = SyntheticRLPretrainer(self.config)

    def launch_dashboard(self):
        """
        Launch the Streamlit-based developer interface for configuring the RL system.
        """
        dashboard = DeveloperDashboard() # Initialize the DeveloperDashboard
        dashboard.launch() # Launch the dashboard

    def run_simulated_training(self, days=3, batch_size=5):
        """
        Simulate RL model training using synthetic logs generated step-by-step.

        Args:
            days (int): Number of training days to simulate. Defaults to 3.
            batch_size (int): Number of logs to generate per day. Defaults to 5.
        """
        print("\nRunning Simulated Step-by-Step Training...")
        class MockSimulator:
            """
            A mock simulator to generate synthetic log batches for step-by-step training.
            """
            def generate_batch(self, batch_size):
                """
                Generates a batch of mock log entries.

                Args:
                    batch_size (int): The number of logs to generate in the batch.

                Returns:
                    list of dict: A list of mock log entries.
                """
                print("Generating mock log batch...")
                mock_logs = []
                for _ in range(batch_size):
                    # Generate placeholder data structured to match DataPreprocessor expectations
                    mock_logs.append({
                        "embedding_drift": np.random.rand(),
                        "ai_usage": np.random.rand(),
                        "ethical_flags": np.random.rand(),
                        "advisor_feedback": np.random.rand(),
                        "deadline_ratio": np.random.rand(),
                        "user_revised": random.random() < 0.6,
                        "ai_violation": random.random() < 0.1,
                        "advisor_positive": random.random() < 0.8,
                        "rewrite_accepted": random.random() < 0.7,
                        "milestone_completed": random.random() < 0.4,
                        "hallucination_detected": random.random() < 0.1,
                        "prompt": "mock prompt",
                        "intent": "mock intent",
                        "thesis_stage": "mock stage"
                    })
                return mock_logs

        self.simulator = MockSimulator() # Initialize the mock simulator

        # Run simulation for the specified number of days
        for day in range(days):
            print(f"\nSimulated Day {day + 1}")
            logs = self.simulator.generate_batch(batch_size=batch_size)
            # Run the training loop on the generated batch of logs
            self.training_loop.run_training_day(logs)

    def run_real_training(self, real_logs):
        """
        Train the RL model using real usage logs.

        Args:
            real_logs (list of dict): Collected real usage logs to use in training.
        """
        print("\nTraining with Real Logs...")
        # Run the training loop on the provided real logs
        self.training_loop.run_training_day(real_logs)

    def run_online_incremental_training(self, incremental_logs):
        """
        Run online incremental updates using newly gathered data.

        This method simulates receiving new logs incrementally and using them
        to fine-tune the already trained RL model.

        Args:
            incremental_logs (list of dict): New usage logs collected for fine-tuning.
        """
        print("\nIncremental Online Training...")
        # Run the training loop on the incremental logs for fine-tuning
        self.training_loop.run_training_day(incremental_logs)

    def run_synthetic_full_pretraining(self, num_students=100, trajectory_length=30):
        """
        Runs the full synthetic pretraining pipeline using a cohort simulator.

        Args:
            num_students (int): The number of synthetic students to generate data for.
                                Defaults to 100.
            trajectory_length (int): The length of each student's trajectory.
                                     Defaults to 30.
        """
        print("\nRunning Full Synthetic PPO Pretraining...")
        # Use the pretrainer component to run the synthetic pretraining
        self.pretrainer.run_synthetic_pretraining(num_students, trajectory_length)


if __name__ == "__main__":
    launcher = RLTrainingLauncher() # Initialize the main launcher
    print("RL Training System Entry Point")
    print("Modes: [dashboard] [train_simulated] [train_real] [train_online] [train_synthetic_full]")
    # Use a default mode for automated execution, or keep input() for interactive use
    mode = "train_synthetic_full" # Set a default mode for testing
    # mode = input("Mode: ").strip() # Uncomment for interactive mode

    # Execute the selected mode
    if mode == "dashboard":
         # DeveloperDashboard is defined in this cell, so it can be launched directly.
         # Note: Running Streamlit in a standard Jupyter cell might require specific setup
         # or will just print a message indicating how to launch it externally.
         launcher.launch_dashboard()
    elif mode == "train_simulated":
        launcher.run_simulated_training()
    elif mode == "train_real":
        print("Load your real usage logs into 'real_logs' and call launcher.run_real_training(real_logs)")
        # Example usage (commented out):
        # real_logs = [...] # Load your real logs here
        # launcher.run_real_training(real_logs)
    elif mode == "train_online":
        print("Load new incremental logs into 'incremental_logs' and call launcher.run_online_incremental_training(incremental_logs)")
        # Example usage (commented out):
        # incremental_logs = [...] # Load your new logs here
        # launcher.run_online_incremental_training(incremental_logs)
    elif mode == "train_synthetic_full":
        launcher.run_synthetic_full_pretraining()
    else:
        print("Invalid mode selected.")

Initialized new PPO model.
Initialized new PPO model.
RL Training System Entry Point
Modes: [dashboard] [train_simulated] [train_real] [train_online] [train_synthetic_full]

Running Full Synthetic PPO Pretraining...

Generating synthetic cohort dataset for pretraining...
Cohort Generation Complete:
  Excellent: 35 students
  Acceptable: 22 students
  Failed: 43 students
Total synthetic logs for PPO training: 3000
Processing batch of 3000 logs for training...
Processed Log → State: [0.20184743 0.34422487 0.0121243  0.58310103 0.03      ], Reward: 3.0
Processed Log → State: [0.24922457 0.36928666 0.         0.56264746 0.06      ], Reward: 1.0
Processed Log → State: [0.3032931  0.35751632 0.08334048 0.496303   0.09      ], Reward: 1.0
Processed Log → State: [0.32839128 0.35030112 0.13932703 0.4532886  0.12      ], Reward: 6.0
Processed Log → State: [0.39316162 0.37485874 0.1427046  0.31569722 0.15      ], Reward: 1.0
Processed Log → State: [0.4506792  0.49342668 0.19132361 0.33509338 0.18


## Module Structure Outline and Integration

This outlines the proposed structure for the new modules (Idea Brainstorming, Writing Support, Emotion Support) and their integration with the existing Ethics Module, including the potential role of a central orchestrator.

### 1. New Modules: Classes and Key Methods

**`IdeaBrainstorming` Class:**

*   **Purpose:** Assists the user in generating and exploring ideas related to their thesis topic.
*   **Key Methods:**
    *   `generate_ideas(prompt: str, thesis_stage: str)`: Takes a user prompt and thesis stage as input, uses an LLM to generate a list of potential ideas, and returns them.
    *   `explore_idea(idea: str)`: Takes a specific idea and provides further details, related concepts, or potential research questions.
    *   `refine_ideas(ideas: list, feedback: str)`: Takes a list of generated ideas and user feedback to refine or generate variations.

**`WritingSupport` Class:**

*   **Purpose:** Provides assistance with writing tasks, including outlining, drafting, and refining text.
*   **Key Methods:**
    *   `create_outline(topic: str, thesis_stage: str)`: Generates a structural outline for a given topic and thesis stage.
    *   `draft_section(outline: dict, context: str)`: Drafts a section of the thesis based on an outline and provided context.
    *   `summarize_text(text: str)`: Provides a summary of a given text.
    *   `rephrase_text(text: str, style: str)`: Rewrites a given text in a different style or to improve clarity.
    *   `check_grammar_style(text: str)`: Provides feedback on grammar, spelling, and writing style.

**`EmotionSupport` Class:**

*   **Purpose:** Offers emotional encouragement and support to the user based on their interactions and detected sentiment.
*   **Key Methods:**
    *   `detect_emotion(user_input: str)`: Analyzes user input to detect underlying emotions or sentiment (e.g., frustrated, motivated, overwhelmed).
    *   `provide_encouragement(detected_emotion: str, thesis_stage: str)`: Provides tailored encouraging messages based on the detected emotion and thesis stage.
    *   `suggest_break()`: Suggests taking a break based on usage patterns or detected frustration.
    *   `celebrate_milestone(milestone: str)`: Provides positive reinforcement upon completion of a thesis milestone.

### 2. Interactions Between New Modules

*   **`IdeaBrainstorming` -> `WritingSupport`:** Ideas generated in `IdeaBrainstorming` could be used as input for `WritingSupport` methods like `create_outline` or `draft_section`.
*   **`WritingSupport` -> `EmotionSupport`:** If `WritingSupport` detects difficulty or frustration in user input or repeated requests for rephrasing due to low confidence, it could signal the `EmotionSupport` module.
*   **`EmotionSupport` -> `WritingSupport`:** The `EmotionSupport` module could provide feedback or suggestions to `WritingSupport` to adjust its tone or complexity based on the user's emotional state. For example, if the user is overwhelmed, `WritingSupport` might simplify suggestions or break down tasks into smaller steps.

### 3. Interactions with the Existing `EthicsModule`

Each new module will interact with the `EthicsModule` to ensure ethical usage and track activity.

*   **`IdeaBrainstorming` <-> `EthicsModule`:**
    *   **Sending to EthicsModule:** Send the user's initial prompt, the generated ideas, and any user feedback on those ideas for logging (`Usage_Logger`) and potential ethical checks (e.g., checking if the prompt is asking for unethical brainstorming).
    *   **Receiving from EthicsModule:** Receive ethical warnings or suggestions related to the brainstorming process (e.g., if the prompt is too close to existing work, suggesting exploring significantly different angles).

*   **`WritingSupport` <-> `EthicsModule`:**
    *   **Sending to EthicsModule:** Send the user's input text (for rephrasing, summarization, etc.), the generated outlines, drafted sections, and any edited text from the user. This is crucial for authorship tracking (`Usage_Logger`), AI detection (`AI_Detector`), and potential plagiarism checks (via interfaces).
    *   **Receiving from EthicsModule:** Receive AI detection scores, potential plagiarism alerts, suggestions for rephrasing to increase originality, and warnings about potential academic dishonesty based on prompt analysis (`HumanPromptChecker`, `EthicalViolationAlert`). The Ethics Module might also recommend logging specific usage patterns.

*   **`EmotionSupport` <-> `EthicsModule`:**
    *   **Sending to EthicsModule:** Send the detected emotion and the type of emotional support provided for logging (`Usage_Logger`). This can help the Ethics Module understand the user's state in relation to potentially unethical behavior (e.g., frustration leading to cutting corners).
    *   **Receiving from EthicsModule:** The Ethics Module might use the emotional state as a factor in its RL state and decision-making. For example, if a user is highly frustrated, the Ethics Module might recommend a less confrontational intervention or suggest a break (`EthicalViolationAlert` could potentially incorporate this).

### 4. Central Orchestrator/Manager Class (`ThesisAssistant`)

A central orchestrator class, potentially named `ThesisAssistant`, would be beneficial to manage the flow of information and coordinate the interactions between all modules, including the existing and new ones.

**`ThesisAssistant` Class:**

*   **Purpose:** Acts as the main controller, receiving user input, directing it to the appropriate module(s), managing the state of the thesis project, and ensuring ethical oversight through interaction with the `EthicsModule`.
*   **Responsibilities:**
    *   Initialize all modules (`IdeaBrainstorming`, `WritingSupport`, `EmotionSupport`, `EthicsModule`, etc.).
    *   Receive user input and determine which module(s) should process it.
    *   Pass relevant information between modules (e.g., generated ideas from `IdeaBrainstorming` to `WritingSupport`).
    *   Interact with the `EthicsModule` for ethical checks, logging, and receiving intervention recommendations from the RL agent.
    *   Manage the overall state of the user's thesis project (e.g., current stage, progress).
    *   Present output from the modules to the user.
    *   Potentially incorporate the RL agent's recommended action from the `EthicsSupervisorRL` (via the `EthicsModule`) to influence module behavior or provide direct interventions.
*   **Interactions:**
    *   `ThesisAssistant` <-> `IdeaBrainstorming`
    *   `ThesisAssistant` <-> `WritingSupport`
    *   `ThesisAssistant` <-> `EmotionSupport`
    *   `ThesisAssistant` <-> `EthicsModule` (This is a crucial link. The `ThesisAssistant` would query the `EthicsModule` for ethical status and potential interventions before and after processing user requests with the other modules).

This structure provides a clear separation of concerns for each module while allowing for necessary interaction and central control for ethical oversight. The `EthicsModule` acts as a cross-cutting concern, influencing and being informed by the activities of all user-facing modules.

## Implement the idea brainstorming module

### Subtask:
Write the Python code for this module, including functionalities for generating ideas based on user input and thesis context.


**Reasoning**:
Implement the `IdeaBrainstorming` class as described in the instructions, including the `__init__`, `generate_ideas`, `explore_idea`, and `refine_ideas` methods, using the OpenAI client for interactions with a language model.



In [30]:
class IdeaBrainstorming:
    """
    Assists the user in generating and exploring ideas related to their thesis topic.
    """

    def __init__(self, openai_client):
        """
        Initializes the IdeaBrainstorming module with an OpenAI client.

        Args:
            openai_client: An initialized OpenAI client object.
        """
        self.client = openai_client
        self.model = "gpt-4o-mini" # Or another suitable model for idea generation

    def generate_ideas(self, prompt: str, thesis_stage: str, n_ideas: int = 5):
        """
        Generates a list of relevant ideas based on the user prompt and thesis stage.

        Args:
            prompt (str): The user's input prompt or topic.
            thesis_stage (str): The current stage of the thesis (e.g., "planning", "literature review").
            n_ideas (int): The number of ideas to generate. Defaults to 5.

        Returns:
            list: A list of generated ideas (strings).
        """
        print(f"Generating {n_ideas} ideas for prompt: '{prompt}' at thesis stage: '{thesis_stage}'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": f"You are a creative thesis idea brainstorming assistant. Generate {n_ideas} distinct and relevant ideas based on the user's prompt and their current thesis stage: {thesis_stage}."},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=300, # Adjust max_tokens as needed
                n=1 # We only need one completion containing the list of ideas
            )
            # Assuming the model returns a list of ideas, perhaps separated by newlines or numbers
            ideas_text = response.choices[0].message.content.strip()
            # Basic parsing: split by newline and filter empty lines
            ideas = [idea.strip() for idea in ideas_text.split('\n') if idea.strip()]
            print(f"Generated ideas: {ideas}")
            return ideas

        except Exception as e:
            print(f"Error generating ideas: {e}")
            return []

    def explore_idea(self, idea: str, depth: int = 2):
        """
        Generates further details, related concepts, or potential research questions for a given idea.

        Args:
            idea (str): The specific idea to explore.
            depth (int): A parameter to influence the level of detail (e.g., 1 for brief, 2 for more detail). Defaults to 2.

        Returns:
            str: Detailed information or related concepts about the idea.
        """
        print(f"Exploring idea: '{idea}' with depth {depth}")
        try:
            prompt_text = f"Explore the following thesis idea in detail (depth level {depth}). Provide related concepts, potential research questions, and possible angles:\n\nIdea: {idea}"

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a thesis idea exploration assistant. Provide detailed insights and related concepts for a given idea."},
                    {"role": "user", "content": prompt_text}
                ],
                max_tokens=500 # Adjust max_tokens as needed
            )
            exploration_details = response.choices[0].message.content.strip()
            print(f"Exploration details for '{idea}': {exploration_details[:100]}...") # Print snippet
            return exploration_details

        except Exception as e:
            print(f"Error exploring idea: {e}")
            return "Could not explore idea."

    def refine_ideas(self, ideas: list, feedback: str):
        """
        Refines existing ideas or generates new variations based on user feedback.

        Args:
            ideas (list): A list of existing ideas (strings).
            feedback (str): User feedback on the ideas.

        Returns:
            list: A list of refined or new ideas (strings).
        """
        print(f"Refining ideas based on feedback: '{feedback}'")
        try:
            ideas_list_text = "\n".join([f"- {idea}" for idea in ideas])
            prompt_text = f"Refine the following thesis ideas based on the user's feedback, or generate new variations that address the feedback. Provide a list of refined or new ideas.\n\nExisting Ideas:\n{ideas_list_text}\n\nUser Feedback: {feedback}"

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a thesis idea refinement assistant. Improve existing ideas or generate new ones based on user feedback."},
                    {"role": "user", "content": prompt_text}
                ],
                max_tokens=400, # Adjust max_tokens as needed
                n=1
            )
            refined_ideas_text = response.choices[0].message.content.strip()
            # Basic parsing: split by newline and filter empty lines
            refined_ideas = [idea.strip() for idea in refined_ideas_text.split('\n') if idea.strip()]
            print(f"Refined ideas: {refined_ideas}")
            return refined_ideas

        except Exception as e:
            print(f"Error refining ideas: {e}")
            return []

# Example Usage (requires 'client' object from a previous cell)
# if 'client' in locals():
#     idea_brainstorming_module = IdeaBrainstorming(openai_client=client)
#     generated_ideas = idea_brainstorming_module.generate_ideas("Impact of AI on education", "planning", n_ideas=3)
#     if generated_ideas:
#         exploration = idea_brainstorming_module.explore_idea(generated_ideas[0])
#         print(f"\nExploration of first idea:\n{exploration}")
#         refined = idea_brainstorming_module.refine_ideas(generated_ideas, "These are too broad, focus on K-12.")
#         print(f"\nRefined ideas:\n{refined}")
# else:
#     print("OpenAI client not initialized. Please run the cell setting up the client.")

**Reasoning**:
Test the implemented `IdeaBrainstorming` class by creating an instance and calling its methods with example inputs to verify their functionality.



In [31]:
# Example Usage (requires 'client' object from a previous cell)
if 'client' in locals():
    print("--- Testing IdeaBrainstorming Module ---")
    idea_brainstorming_module = IdeaBrainstorming(openai_client=client)

    print("\n--- Testing generate_ideas ---")
    generated_ideas = idea_brainstorming_module.generate_ideas("Impact of climate change on coastal cities", "literature review", n_ideas=3)
    print(f"\nGenerated Ideas Result: {generated_ideas}")

    if generated_ideas:
        print("\n--- Testing explore_idea ---")
        exploration = idea_brainstorming_module.explore_idea(generated_ideas[0], depth=1)
        print(f"\nExploration of first idea:\n{exploration}")

        print("\n--- Testing refine_ideas ---")
        feedback = "These ideas are good, but focus more on adaptation strategies rather than just impacts."
        refined_ideas = idea_brainstorming_module.refine_ideas(generated_ideas, feedback)
        print(f"\nRefined Ideas Result:\n{refined_ideas}")
    else:
        print("\nNo ideas were generated, skipping explore_idea and refine_ideas tests.")

else:
    print("OpenAI client not initialized. Please run the cell setting up the client.")

--- Testing IdeaBrainstorming Module ---

--- Testing generate_ideas ---
Generating 3 ideas for prompt: 'Impact of climate change on coastal cities' at thesis stage: 'literature review'
Generated ideas: ['Here are three distinct thesis ideas focusing on the impact of climate change on coastal cities that can be developed further in your literature review stage:', '1. **Social Vulnerability and Resilience in Coastal Cities**: Explore how climate change disproportionately affects marginalized communities in coastal cities. The literature review could examine socio-economic factors, historical inequities, and adaptive capacities that influence resilience. It could draw on case studies of specific cities, highlighting how local policies and community initiatives address (or fail to address) these disparities in the face of rising sea levels and extreme weather events.', '2. **Urban Planning and Sustainable Development**: Investigate the role of urban planning in mitigating the impacts of c

**Reasoning**:
The previous test failed because the OpenAI client object was not available. I need to ensure the client is initialized before attempting to use it. I will re-run the cell that initializes the OpenAI client and then re-run the test code.



In [36]:
# Import the OpenAI library
from openai import OpenAI
# Used to securely store your API key - uncomment if using Colab Secrets
from google.colab import userdata

# Get your OpenAI API key securely
# Replace "<YOUR_OPENAI_API_KEY>" with your key, or use Colab Secrets
# Or if using Colab Secrets:
openai_api_key_secure = userdata.get('OPENAI_API_KEY')
openai_organization = userdata.get('OPENAI_ORGANIZATION')
openai_project = userdata.get('OPENAI_PROJECT_ID')

# Set your project API key
OpenAI.api_key = openai_api_key_secure
# You must also set organization and project ID
OpenAI.organization = openai_organization
OpenAI.project = openai_project

# Create the OpenAI client
client = OpenAI(api_key= OpenAI.api_key)

# Test the client connection (optional)
try:
    response = client.chat.completions.create(
      model="gpt-4o-mini",
      messages=[{"role": "user", "content": "Hello"}],
      max_tokens=10
    )
    print("OpenAI client initialized and connected successfully.")
except Exception as e:
    print(f"Error initializing or connecting to OpenAI client: {e}")


# Example Usage (requires 'client' object from a previous cell)
if 'client' in locals():
    print("--- Testing IdeaBrainstorming Module ---")
    idea_brainstorming_module = IdeaBrainstorming(openai_client=client)

    print("\n--- Testing generate_ideas ---")
    generated_ideas = idea_brainstorming_module.generate_ideas("Impact of climate change on coastal cities", "literature review", n_ideas=3)
    print(f"\nGenerated Ideas Result: {generated_ideas}")

    if generated_ideas:
        print("\n--- Testing explore_idea ---")
        exploration = idea_brainstorming_module.explore_idea(generated_ideas[0], depth=1)
        print(f"\nExploration of first idea:\n{exploration}")

        print("\n--- Testing refine_ideas ---")
        feedback = "These ideas are good, but focus more on adaptation strategies rather than just impacts."
        refined_ideas = idea_brainstorming_module.refine_ideas(generated_ideas, feedback)
        print(f"\nRefined Ideas Result:\n{refined_ideas}")
    else:
        print("\nNo ideas were generated, skipping explore_idea and refine_ideas tests.")

else:
    print("OpenAI client not initialized after attempt.")


OpenAI client initialized and connected successfully.
--- Testing IdeaBrainstorming Module ---

--- Testing generate_ideas ---
Generating 3 ideas for prompt: 'Impact of climate change on coastal cities' at thesis stage: 'literature review'
Generated ideas: ['Here are three distinct ideas you could explore for your thesis related to the impact of climate change on coastal cities, focusing on various aspects reflected in your literature review:', '1. **Urban Resilience Strategies in Coastal Cities**: Conduct a comprehensive review of existing literature on urban resilience strategies adopted by coastal cities in response to climate change. Investigate how different cities are implementing adaptive measures such as green infrastructure, seawalls, and sustainable urban planning. Analyze the effectiveness of these strategies by assessing notable case studies and highlighting best practices that can serve as models for other cities facing similar challenges.', '2. **Socioeconomic Impacts of 

## Implement the writing support module

### Subtask:
Write the Python code for this module, focusing on features like outlining, drafting, and refining text.


**Reasoning**:
Implement the `WritingSupport` class with methods for outlining, drafting, summarizing, rephrasing, and checking grammar/style using the OpenAI client, and include example usage.



In [37]:
class WritingSupport:
    """
    Provides assistance with writing tasks, including outlining, drafting,
    summarizing, rephrasing, and checking text.
    """

    def __init__(self, openai_client):
        """
        Initializes the WritingSupport module with an OpenAI client.

        Args:
            openai_client: An initialized OpenAI client object.
        """
        self.client = openai_client
        self.model = "gpt-4o-mini" # Default model for writing tasks

    def create_outline(self, topic: str, thesis_stage: str):
        """
        Generates a structured outline for a given topic and thesis stage.

        Args:
            topic (str): The topic for the outline.
            thesis_stage (str): The current stage of the thesis
                                (e.g., "introduction", "literature review").

        Returns:
            str: A generated outline as a string.
        """
        print(f"Generating outline for topic: '{topic}' at thesis stage: '{thesis_stage}'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": f"You are a thesis writing assistant. Create a structured outline for a thesis section on the topic '{topic}' relevant to the '{thesis_stage}' stage."},
                    {"role": "user", "content": f"Create an outline for: {topic}"}
                ],
                max_tokens=400,
                n=1
            )
            outline = response.choices[0].message.content.strip()
            print(f"Generated outline: {outline[:100]}...") # Print snippet
            return outline

        except Exception as e:
            print(f"Error generating outline: {e}")
            return "Could not generate outline."

    def draft_section(self, outline: str, context: str):
        """
        Drafts a section of text based on an outline and relevant context.

        Args:
            outline (str): The outline or part of the outline for the section.
            context (str): Relevant context or notes to inform the drafting.

        Returns:
            str: A drafted section of text.
        """
        print(f"Drafting section based on outline: '{outline[:50]}...' and context: '{context[:50]}...'")
        try:
            prompt_text = f"Write a draft section of a thesis based on the following outline and context.\n\nOutline:\n{outline}\n\nContext:\n{context}\n\nDraft:"

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a thesis writing assistant. Draft a section of text based on an outline and provided context."},
                    {"role": "user", "content": prompt_text}
                ],
                max_tokens=800
            )
            draft = response.choices[0].message.content.strip()
            print(f"Drafted section: {draft[:100]}...") # Print snippet
            return draft

        except Exception as e:
            print(f"Error drafting section: {e}")
            return "Could not draft section."

    def summarize_text(self, text: str):
        """
        Generates a concise summary of a given text.

        Args:
            text (str): The text to summarize.

        Returns:
            str: A concise summary of the text.
        """
        print(f"Summarizing text: '{text[:100]}...'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a text summarization assistant. Summarize the following text concisely."},
                    {"role": "user", "content": f"Summarize this text:\n{text}"}
                ],
                max_tokens=150
            )
            summary = response.choices[0].message.content.strip()
            print(f"Summary: {summary[:100]}...") # Print snippet
            return summary

        except Exception as e:
            print(f"Error summarizing text: {e}")
            return "Could not summarize text."

    def rephrase_text(self, text: str, style: str = "academic and clear"):
        """
        Rewrites a given text in a different style or to improve clarity.

        Args:
            text (str): The text to rephrase.
            style (str): The desired writing style (e.g., "concise", "formal",
                         "academic and clear"). Defaults to "academic and clear".

        Returns:
            str: The rephrased text.
        """
        print(f"Rephrasing text: '{text[:100]}...' in style: '{style}'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": f"You are a text rephrasing assistant. Rephrase the following text in a {style} style, improving clarity if possible."},
                    {"role": "user", "content": f"Rephrase this text:\n{text}"}
                ],
                max_tokens=300
            )
            rephrased_text = response.choices[0].message.content.strip()
            print(f"Rephrased text: {rephrased_text[:100]}...") # Print snippet
            return rephrased_text

        except Exception as e:
            print(f"Error rephrasing text: {e}")
            return "Could not rephrase text."

    def check_grammar_style(self, text: str):
        """
        Provides feedback on grammar, spelling, and writing style for a given text.

        Args:
            text (str): The text to check.

        Returns:
            str: Feedback on grammar, spelling, and style.
        """
        print(f"Checking grammar and style for text: '{text[:100]}...'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a grammar and style checker. Provide feedback on the following text, focusing on grammar, spelling, punctuation, clarity, and academic writing style."},
                    {"role": "user", "content": f"Check this text for grammar and style:\n{text}"}
                ],
                max_tokens=500
            )
            feedback = response.choices[0].message.content.strip()
            print(f"Grammar and style feedback: {feedback[:100]}...") # Print snippet
            return feedback

        except Exception as e:
            print(f"Error checking grammar and style: {e}")
            return "Could not check grammar and style."

# Example Usage (requires 'client' object from a previous cell)
if 'client' in locals():
    print("\n--- Testing WritingSupport Module ---")
    writing_support_module = WritingSupport(openai_client=client)

    print("\n--- Testing create_outline ---")
    outline = writing_support_module.create_outline("Methodology Section", "methodology")
    print(f"\nOutline Result:\n{outline}")

    print("\n--- Testing draft_section ---")
    context = "This section should describe the qualitative data collection methods, specifically semi-structured interviews with participants."
    draft = writing_support_module.draft_section(outline, context)
    print(f"\nDraft Section Result:\n{draft}")

    print("\n--- Testing summarize_text ---")
    text_to_summarize = "The quick brown fox jumps over the lazy dog. This is a classic pangram used to test typewriters and computer keyboards. It contains all the letters of the English alphabet. It's often used for demonstrating fonts or testing equipment. It doesn't have any deep meaning beyond this purpose."
    summary = writing_support_module.summarize_text(text_to_summarize)
    print(f"\nSummarize Text Result:\n{summary}")

    print("\n--- Testing rephrase_text ---")
    text_to_rephrase = "The study showed that the results were very good and important."
    rephrased = writing_support_module.rephrase_text(text_to_rephrase, style="formal academic")
    print(f"\nRephrase Text Result:\n{rephrased}")

    print("\n--- Testing check_grammar_style ---")
    text_to_check = "The research was done, the data was collected, and analyse was finished. I think the finding are interesting."
    feedback = writing_support_module.check_grammar_style(text_to_check)
    print(f"\nCheck Grammar/Style Result:\n{feedback}")

else:
    print("OpenAI client not initialized. Please run the cell setting up the client.")



--- Testing WritingSupport Module ---

--- Testing create_outline ---
Generating outline for topic: 'Methodology Section' at thesis stage: 'methodology'
Generated outline: # Thesis Outline: Methodology Section

## I. Introduction to the Methodology Section
   A. Purpose o...

Outline Result:
# Thesis Outline: Methodology Section

## I. Introduction to the Methodology Section
   A. Purpose of the Methodology Section
   B. Importance of methodological rigor in research
   C. Overview of the main components of the methodology

## II. Research Design
   A. Definition and explanation of research design
   B. Types of research designs
      1. Quantitative
         a. Experimental
         b. Quasi-experimental
      2. Qualitative
         a. Case study
         b. Ethnographic
         c. Phenomenological
      3. Mixed-methods
   C. Justification for the chosen research design

## III. Research Setting
   A. Description of the research context
   B. Rationale for selecting the specific s

## Implement the emotion support module

### Subtask:
Write the Python code for this module, incorporating functionalities for recognizing and responding to the user's emotional state.


**Reasoning**:
Implement the `EmotionSupport` class with the specified methods for emotion detection, encouragement, break suggestion, and milestone celebration, using the OpenAI client for interactions, and include example usage.



In [38]:
class EmotionSupport:
    """
    Offers emotional encouragement and support to the user based on their
    interactions and detected sentiment.
    """

    def __init__(self, openai_client):
        """
        Initializes the EmotionSupport module with an OpenAI client.

        Args:
            openai_client: An initialized OpenAI client object.
        """
        self.client = openai_client
        self.model = "gpt-4o-mini" # Default model for emotional support tasks

    def detect_emotion(self, user_input: str):
        """
        Analyzes user input to detect underlying emotions or sentiment.

        Args:
            user_input (str): The user's input string.

        Returns:
            str: A string or categorized representation of the detected emotion.
        """
        print(f"Detecting emotion for input: '{user_input[:100]}...'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are an emotion detection assistant. Analyze the following user input and identify the most prominent emotional state. Respond with a single word or short phrase describing the emotion (e.g., 'Neutral', 'Frustrated', 'Motivated', 'Overwhelmed', 'Happy', 'Confused')."},
                    {"role": "user", "content": f"Analyze the emotion in this text: {user_input}"}
                ],
                max_tokens=10
            )
            detected_emotion = response.choices[0].message.content.strip()
            print(f"Detected emotion: {detected_emotion}")
            return detected_emotion

        except Exception as e:
            print(f"Error detecting emotion: {e}")
            return "Undetermined"

    def provide_encouragement(self, detected_emotion: str, thesis_stage: str = "general"):
        """
        Provides tailored encouraging messages based on the detected emotion and thesis stage.

        Args:
            detected_emotion (str): The detected emotional state.
            thesis_stage (str): The current stage of the thesis (optional). Defaults to "general".

        Returns:
            str: A tailored encouraging message.
        """
        print(f"Providing encouragement for emotion: '{detected_emotion}' at stage: '{thesis_stage}'")
        try:
            prompt_text = f"The user is currently feeling {detected_emotion} and is in the '{thesis_stage}' stage of their thesis. Provide a brief and encouraging message tailored to their emotional state and thesis progress."

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a supportive thesis assistant. Provide encouraging messages."},
                    {"role": "user", "content": prompt_text}
                ],
                max_tokens=50
            )
            encouraging_message = response.choices[0].message.content.strip()
            print(f"Encouraging message: {encouraging_message}")
            return encouraging_message

        except Exception as e:
            print(f"Error providing encouragement: {e}")
            return "Keep up the great work!"

    def suggest_break(self, needs_break_flag: bool):
        """
        Generates a suggestion for the user to take a break based on criteria.

        Args:
            needs_break_flag (bool): A flag indicating if a break is suggested.

        Returns:
            str: A suggestion message or an empty string if no break is needed.
        """
        if not needs_break_flag:
            return ""

        print("Suggesting a break...")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a helpful thesis assistant. Suggest taking a short break to recharge."},
                    {"role": "user", "content": "Suggest that the user takes a short break."}
                ],
                max_tokens=30
            )
            break_suggestion = response.choices[0].message.content.strip()
            print(f"Break suggestion: {break_suggestion}")
            return break_suggestion

        except Exception as e:
            print(f"Error suggesting break: {e}")
            return "Remember to take breaks!"

    def celebrate_milestone(self, milestone: str):
        """
        Generates a congratulatory message for a completed milestone.

        Args:
            milestone (str): The completed thesis milestone.

        Returns:
            str: A celebratory message.
        """
        print(f"Celebrating milestone: '{milestone}'")
        try:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a positive thesis assistant. Congratulate the user on completing a significant milestone."},
                    {"role": "user", "content": f"Congratulate the user on completing the following thesis milestone: {milestone}"}
                ],
                max_tokens=40
            )
            celebratory_message = response.choices[0].message.content.strip()
            print(f"Celebratory message: {celebratory_message}")
            return celebratory_message

        except Exception as e:
            print(f"Error celebrating milestone: {e}")
            return f"Congratulations on completing the {milestone}!"


# Example Usage (requires 'client' object from a previous cell)
if 'client' in locals():
    print("\n--- Testing EmotionSupport Module ---")
    emotion_support_module = EmotionSupport(openai_client=client)

    print("\n--- Testing detect_emotion ---")
    emotion1 = emotion_support_module.detect_emotion("I'm feeling really stuck on this part of the literature review.")
    emotion2 = emotion_support_module.detect_emotion("This analysis is going well, I feel confident!")
    emotion3 = emotion_support_module.detect_emotion("Just finished writing the entire methodology section!")
    print(f"Detected Emotion 1: {emotion1}")
    print(f"Detected Emotion 2: {emotion2}")
    print(f"Detected Emotion 3: {emotion3}")


    print("\n--- Testing provide_encouragement ---")
    encouragement1 = emotion_support_module.provide_encouragement(emotion1, "literature review")
    encouragement2 = emotion_support_module.provide_encouragement(emotion2, "data analysis")
    encouragement3 = emotion_support_module.provide_encouragement("Overwhelmed", "writing") # Test with a different emotion/stage
    print(f"\nEncouragement 1: {encouragement1}")
    print(f"\nEncouragement 2: {encouragement2}")
    print(f"\nEncouragement 3: {encouragement3}")


    print("\n--- Testing suggest_break ---")
    break_suggestion1 = emotion_support_module.suggest_break(True) # Suggest break
    break_suggestion2 = emotion_support_module.suggest_break(False) # Do not suggest break
    print(f"\nBreak Suggestion 1: {break_suggestion1}")
    print(f"Break Suggestion 2: {break_suggestion2}") # Should be empty

    print("\n--- Testing celebrate_milestone ---")
    celebration1 = emotion_support_module.celebrate_milestone("first chapter draft")
    celebration2 = emotion_support_module.celebrate_milestone("data collection")
    print(f"\nCelebration 1: {celebration1}")
    print(f"Celebration 2: {celebration2}")

else:
    print("OpenAI client not initialized. Please run the cell setting up the client.")


--- Testing EmotionSupport Module ---

--- Testing detect_emotion ---
Detecting emotion for input: 'I'm feeling really stuck on this part of the literature review....'
Detected emotion: Frustrated
Detecting emotion for input: 'This analysis is going well, I feel confident!...'
Detected emotion: Confident
Detecting emotion for input: 'Just finished writing the entire methodology section!...'
Detected emotion: Accomplished
Detected Emotion 1: Frustrated
Detected Emotion 2: Confident
Detected Emotion 3: Accomplished

--- Testing provide_encouragement ---
Providing encouragement for emotion: 'Frustrated' at stage: 'literature review'
Encouraging message: It's completely normal to feel frustrated during the literature review stage—it's often one of the most challenging parts of the thesis process. Remember, every bit of research you sift through is bringing you closer to your goals. Take a deep breath and celebrate the progress
Providing encouragement for emotion: 'Confident' at stage: 'da

## Integrate the new modules with the ethics module

> Blockzitat einfügen



### Subtask:
Modify the existing `EthicsModule` or create a new orchestration layer to incorporate checks and logging related to the usage of the new modules (Idea Brainstorming, Writing Support, Emotion Support).


In [45]:


# Import the OpenAI library
from openai import OpenAI
# Used to securely store your API key - uncomment if using Colab Secrets
from google.colab import userdata

# Get your OpenAI API key securely
# Replace "<YOUR_OPENAI_API_KEY>" with your key, or use Colab Secrets
# Or if using Colab Secrets:
openai_api_key_secure = userdata.get('OPENAI_API_KEY')
openai_organization = userdata.get('OPENAI_ORGANIZATION')
openai_project = userdata.get('OPENAI_PROJECT_ID')

# Set your project API key
OpenAI.api_key = openai_api_key_secure
# You must also set organization and project ID
OpenAI.organization = openai_organization
OpenAI.project = openai_project

# Create the OpenAI client
client = OpenAI(api_key= OpenAI.api_key)



**Reasoning**:
Review the existing EthicsModule class to understand its structure and methods, particularly `log_usage` and `check_ethical_usage`. Determine how to integrate logging and ethical checks for the new modules (Idea Brainstorming, Writing Support, Emotion Support) by modifying `EthicsModule` or creating a new orchestrator. Implement the chosen integration approach in Python code.



In [39]:
import pandas as pd
import time
import numpy as np
import random
from openai import OpenAI
from scipy.spatial import distance
import gymnasium as gym
import json
import os
from stable_baselines3 import PPO
import streamlit as st

# Assuming the existing EthicsModule, IdeaBrainstorming, WritingSupport, and EmotionSupport
# classes are defined in previous cells and available in the environment.

class ThesisAssistant:
    """
    Central orchestrator for the thesis assistant, managing interactions
    between modules and ensuring ethical oversight.
    """
    def __init__(self, openai_client):
        """
        Initializes the ThesisAssistant with instances of all modules.

        Args:
            openai_client: An initialized OpenAI client object.
        """
        self.client = openai_client
        self.ethics_module = EthicsModule(openai_client=self.client)
        self.idea_brainstorming = IdeaBrainstorming(openai_client=self.client)
        self.writing_support = WritingSupport(openai_client=self.client)
        self.emotion_support = EmotionSupport(openai_client=self.client)
        # Assume other necessary components like RLConfigManager, DataPreprocessor,
        # EthicsSupervisorRL, etc., are also available or initialized here if needed.

    def process_user_request(self, user_input: str, request_type: str, thesis_stage: str = "unknown", context: dict = None):
        """
        Processes a user request, directs it to the appropriate module,
        logs the interaction, and performs ethical checks.

        Args:
            user_input (str): The user's input or prompt.
            request_type (str): The type of request (e.g., "brainstorm_ideas",
                                "draft_text", "get_encouragement").
            thesis_stage (str): The current stage of the thesis.
            context (dict, optional): Additional context relevant to the request
                                      (e.g., outline, previous text, detected emotion).
                                      Defaults to None.

        Returns:
            dict: A dictionary containing the response from the relevant module
                  and any ethical feedback or alerts.
        """
        print(f"\nProcessing user request: '{request_type}' at stage '{thesis_stage}'")

        response_data = {"module_response": None, "ethical_feedback": []}
        generated_content = "" # To store content generated by AI

        # Log the initial request
        self.ethics_module.log_usage(
            prompt=user_input,
            intent=request_type,
            thesis_stage=thesis_stage
        )

        # Route the request to the appropriate module
        if request_type == "brainstorm_ideas":
            n_ideas = context.get("n_ideas", 5) if context else 5
            response_data["module_response"] = self.idea_brainstorming.generate_ideas(
                prompt=user_input,
                thesis_stage=thesis_stage,
                n_ideas=n_ideas
            )
            # For brainstorming, the generated content is the list of ideas (convert to string for checking)
            generated_content = str(response_data["module_response"])

        elif request_type == "explore_idea":
             idea_to_explore = context.get("idea", "") if context else ""
             response_data["module_response"] = self.idea_brainstorming.explore_idea(
                 idea=idea_to_explore,
                 depth=context.get("depth", 2) if context else 2
             )
             generated_content = response_data["module_response"]

        elif request_type == "refine_ideas":
             ideas_to_refine = context.get("ideas", []) if context else []
             feedback = user_input # User input is the feedback
             response_data["module_response"] = self.idea_brainstorming.refine_ideas(
                 ideas=ideas_to_refine,
                 feedback=feedback
             )
             generated_content = str(response_data["module_response"])

        elif request_type == "create_outline":
            response_data["module_response"] = self.writing_support.create_outline(
                topic=user_input,
                thesis_stage=thesis_stage
            )
            generated_content = response_data["module_response"]

        elif request_type == "draft_section":
             outline = context.get("outline", "") if context else ""
             response_data["module_response"] = self.writing_support.draft_section(
                 outline=outline,
                 context=user_input # User input provides context for drafting
             )
             generated_content = response_data["module_response"]

        elif request_type == "summarize_text":
             text_to_summarize = user_input # User input is the text to summarize
             response_data["module_response"] = self.writing_support.summarize_text(
                 text=text_to_summarize
             )
             generated_content = response_data["module_response"]

        elif request_type == "rephrase_text":
             text_to_rephrase = user_input # User input is the text to rephrase
             style = context.get("style", "academic and clear") if context else "academic and clear"
             response_data["module_response"] = self.writing_support.rephrase_text(
                 text=text_to_rephrase,
                 style=style
             )
             generated_content = response_data["module_response"]

        elif request_type == "check_grammar_style":
             text_to_check = user_input # User input is the text to check
             response_data["module_response"] = self.writing_support.check_grammar_style(
                 text=text_to_check
             )
             # Grammar/style check provides feedback, not necessarily AI generated content
             generated_content = "" # Or handle this case differently if feedback is considered 'generated'

        elif request_type == "detect_emotion":
             response_data["module_response"] = self.emotion_support.detect_emotion(
                 user_input=user_input
             )
             # Emotion detection result is not content for AI check
             generated_content = ""

        elif request_type == "provide_encouragement":
             detected_emotion = context.get("detected_emotion", "Neutral") if context else "Neutral"
             response_data["module_response"] = self.emotion_support.provide_encouragement(
                 detected_emotion=detected_emotion,
                 thesis_stage=thesis_stage
             )
             generated_content = response_data["module_response"]

        elif request_type == "suggest_break":
             needs_break = context.get("needs_break", False) if context else False
             response_data["module_response"] = self.emotion_support.suggest_break(
                 needs_break_flag=needs_break
             )
             generated_content = response_data["module_response"]

        elif request_type == "celebrate_milestone":
             milestone = context.get("milestone", "a milestone") if context else "a milestone"
             response_data["module_response"] = self.emotion_support.celebrate_milestone(
                 milestone=milestone
             )
             generated_content = response_data["module_response"]

        else:
            response_data["module_response"] = "Unknown request type."


        # Perform ethical checks on the generated content (if any) and the user's prompt/intent
        if generated_content:
            is_ai, score, llm_response = self.ethics_module.detect_ai(generated_content)
            if is_ai:
                 response_data["ethical_feedback"].append(f"Potential AI-generated content detected (score: {score:.2f}). Remember to review and rephrase.")

        # Perform broader ethical checks based on prompt intent and request type
        # This logic is simplified here and would be more complex in a real system
        if request_type in ["draft_section", "rephrase_text"] and len(user_input) > 100: # Simple heuristic for potential misuse
             response_data["ethical_feedback"].append("Consider reviewing and significantly editing generated text to maintain authorship.")

        if request_type == "generate_ideas" and "plagiarism" in user_input.lower():
             response_data["ethical_feedback"].append("Ethical concern: Prompt mentions plagiarism. Ensure ideas promote original work.")

        # Example of checking ethical usage based on the prompt and generated text (using EthicsModule's method)
        # Note: The check_ethical_usage method in EthicsModule is currently basic;
        # a real implementation would need more sophisticated logic here or within the orchestrator.
        # self.ethics_module.check_ethical_usage(user_input, generated_content)
        # Ethical alerts from check_ethical_usage would need to be captured and added to response_data["ethical_feedback"]

        # In a real system, the RL agent's recommendation would be queried here
        # and potentially influence the response or trigger interventions.
        # action = self.ethics_module.ethics_supervisor_rl.recommend_action() # Assuming EthicsModule holds the RL agent
        # response_data["rl_recommended_action"] = action
        # response_data["ethical_feedback"].append(f"RL Agent Recommendation (Action Index): {action}")


        return response_data

# Example Usage (requires 'client' object from a previous cell)
if 'client' in locals():
    print("--- Initializing ThesisAssistant ---")
    # Assume EthicsModule, IdeaBrainstorming, WritingSupport, EmotionSupport classes are defined
    thesis_assistant = ThesisAssistant(openai_client=client)

    print("\n--- Testing Brainstorming Request ---")
    context_brainstorm = {"n_ideas": 3}
    response_brainstorm = thesis_assistant.process_user_request(
        user_input="Research questions for a study on remote work productivity",
        request_type="brainstorm_ideas",
        thesis_stage="planning",
        context=context_brainstorm
    )
    print("Brainstorming Response:")
    print(response_brainstorm)

    print("\n--- Testing Drafting Request ---")
    # Need a mock outline for drafting
    mock_outline = "1. Introduction\n2. Literature Review\n   2.1 Previous studies\n   2.2 Key theories"
    context_draft = {"outline": mock_outline}
    response_draft = thesis_assistant.process_user_request(
        user_input="Write a paragraph for section 2.1 summarizing recent findings on remote work.",
        request_type="draft_section",
        thesis_stage="literature review",
        context=context_draft
    )
    print("Drafting Response:")
    print(response_draft)

    print("\n--- Testing Emotion Detection Request ---")
    response_emotion = thesis_assistant.process_user_request(
        user_input="I'm really struggling to make progress this week.",
        request_type="detect_emotion",
        thesis_stage="writing"
    )
    print("Emotion Detection Response:")
    print(response_emotion)

    print("\n--- Testing Encouragement Request ---")
    context_encourage = {"detected_emotion": response_emotion["module_response"]} # Use detected emotion from previous step
    response_encourage = thesis_assistant.process_user_request(
         user_input="Provide encouragement.", # User input can be generic here
         request_type="provide_encouragement",
         thesis_stage="writing",
         context=context_encourage
    )
    print("Encouragement Response:")
    print(response_encourage)


else:
    print("OpenAI client not initialized. Please run the cell setting up the client.")


--- Initializing ThesisAssistant ---

--- Testing Brainstorming Request ---

Processing user request: 'brainstorm_ideas' at stage 'planning'
Usage logged: Timestamp=2025-06-26 10:10:35.656719, Prompt='Research questions for a study on remote work productivity', Intent='brainstorm_ideas', Thesis Stage='planning'
Generating 3 ideas for prompt: 'Research questions for a study on remote work productivity' at thesis stage: 'planning'
Generated ideas: ['Here are three distinct and relevant research questions for your study on remote work productivity, perfect for the planning stage of your thesis:', '1. **Impact of Home Office Environment on Productivity:**', '- How do variations in home office conditions (e.g., space, ergonomics, distractions) influence the productivity levels of remote workers across different industries?', '2. **Role of Technology in Enhancing Remote Work Efficiency:**', '- What technological tools and platforms are most effective in mitigating productivity challenges fac

**Reasoning**:
The previous command failed because the `stable_baselines3` library was not found. Although `stable_baselines3` is imported in the code block, it is not used in the `ThesisAssistant` class definition itself. The import is likely for the `EthicsSupervisorRL` class which is assumed to be present. However, to resolve the `ModuleNotFoundError` without removing the import (as it's assumed to be needed by other parts of the code not explicitly shown in the orchestrator), I will ensure `stable_baselines3` is installed using `pip`.



## Update the RL Environment and Configuration (if necessary)

### Subtask:
Consider if the functionalities of the new modules (Idea Brainstorming, Writing Support, Emotion Support) introduce new state variables, actions, or reward signals that should be incorporated into the Reinforcement Learning environment (`EthicsSupervisorEnv`) and the RL configuration (`rl_config.json`).

**Reasoning**:
The addition of the Idea Brainstorming, Writing Support, and Emotion Support modules expands the potential interactions and states within the thesis assistant. The RL agent, the Ethics Supervisor, needs to be aware of these new aspects to make informed decisions about interventions. Therefore, we need to evaluate if the configuration and environment need to be updated to include relevant information from these modules as part of the observable state, define new possible actions related to these modules, or introduce new reward signals based on user interactions with these modules or their outcomes.

In [46]:


# Import the OpenAI library
from openai import OpenAI
# Used to securely store your API key - uncomment if using Colab Secrets
from google.colab import userdata

# Get your OpenAI API key securely
# Replace "<YOUR_OPENAI_API_KEY>" with your key, or use Colab Secrets
# Or if using Colab Secrets:
openai_api_key_secure = userdata.get('OPENAI_API_KEY')
openai_organization = userdata.get('OPENAI_ORGANIZATION')
openai_project = userdata.get('OPENAI_PROJECT_ID')

# Set your project API key
OpenAI.api_key = openai_api_key_secure
# You must also set organization and project ID
OpenAI.organization = openai_organization
OpenAI.project = openai_project

# Create the OpenAI client
client = OpenAI(api_key= OpenAI.api_key)

# Load the RL configuration using the RLConfigManager
config = RLConfigManager.load_config()

print("RL configuration loaded successfully.")
# Optionally, display the loaded config to verify
# display(config)


RL configuration loaded successfully.


## Develop a unified interface for the Thesis Assistant

### Subtask:
Create a main class or function that brings all the modules together, allowing the user to interact with the complete system through a single interface.

**Reasoning**:
A unified interface is necessary to make the thesis assistant usable. This central component will receive user input, route it to the appropriate module(s), manage the overall state of the user's thesis project, interact with the ethics module for oversight, and present the combined output and feedback to the user. This aligns with the previously outlined `ThesisAssistant` orchestrator concept.

In [53]:
class ThesisAssistant:
    """
    Central orchestrator for the thesis assistant, managing interactions
    between modules and ensuring ethical oversight.
    """
    def __init__(self, openai_client, config):
        """
        Initializes the ThesisAssistant with instances of all modules and RL components.

        Args:
            openai_client: An initialized OpenAI client object.
            config (dict): The loaded RL configuration dictionary.
        """
        self.client = openai_client
        self.config = config
        self.ethics_module = EthicsModule(openai_client=self.client)
        self.idea_brainstorming = IdeaBrainstorming(openai_client=self.client)
        self.writing_support = WritingSupport(openai_client=self.client)
        self.emotion_support = EmotionSupport(openai_client=self.client)

        # Initialize RL components
        self.data_preprocessor = DataPreprocessor(self.config)
        self.mock_ethics_state = MockEthicsModule() # Using MockEthicsModule to simulate state for RL
        self.ethics_env = EthicsSupervisorEnv(self.mock_ethics_state, self.config)
        self.rl_supervisor = EthicsSupervisorRL(self.config) # Assuming model_path is handled internally

    def process_user_request(self, user_input: str, request_type: str, thesis_stage: str = "unknown", context: dict = None):
        """
        Processes a user request, directs it to the appropriate module,
        logs the interaction, performs ethical checks, and gets RL recommendation.

        Args:
            user_input (str): The user's input or prompt.
            request_type (str): The type of request (e.g., "brainstorm_ideas",
                                "draft_text", "get_encouragement").
            thesis_stage (str): The current stage of the thesis.
            context (dict, optional): Additional context relevant to the request
                                      (e.g., outline, previous text, detected emotion).
                                      Defaults to None.

        Returns:
            dict: A dictionary containing the response from the relevant module,
                  any ethical feedback or alerts, and the RL recommended action.
        """
        print(f"\nProcessing user request: '{request_type}' at stage '{thesis_stage}'")

        response_data = {"module_response": None, "ethical_feedback": [], "rl_recommended_action": None}
        generated_content = "" # To store content generated by AI
        log_entry_for_rl = { # Prepare log entry for potential use by RL (simplified)
            "prompt": user_input,
            "intent": request_type,
            "thesis_stage": thesis_stage,
            "timestamp": pd.Timestamp.now().isoformat(), # Use ISO format for logging
            # Add placeholder values for state variables expected by DataPreprocessor
            "embedding_drift": getattr(self.mock_ethics_state, 'embedding_drift', 0.0),
            "ai_usage": getattr(self.mock_ethics_state, 'ai_usage', 0.0),
            "ethical_flags": getattr(self.mock_ethics_state, 'ethical_flags', 0.0),
            "advisor_feedback": getattr(self.mock_ethics_state, 'advisor_feedback', 0.0),
            "deadline_ratio": getattr(self.mock_ethics_state, 'deadline_ratio', 0.0),
            # Add placeholder boolean flags for reward calculation (will be updated based on outcome)
            "user_revised": False,
            "ai_violation": False,
            "advisor_positive": False,
            "rewrite_accepted": False,
            "milestone_completed": False,
            "hallucination_detected": False,
            "idea_accepted": False,
            "writing_assistance_successful": False,
            "emotion_support_positive_response": False,
            "break_taken": False,
        }


        # Log the initial request using the ethics module's method
        self.ethics_module.log_usage(
            prompt=user_input,
            intent=request_type,
            thesis_stage=thesis_stage
        )


        # Route the request to the appropriate module
        if request_type == "brainstorm_ideas":
            n_ideas = context.get("n_ideas", 5) if context else 5
            response_data["module_response"] = self.idea_brainstorming.generate_ideas(
                prompt=user_input,
                thesis_stage=thesis_stage,
                n_ideas=n_ideas
            )
            generated_content = str(response_data["module_response"]) # Convert list to string for AI check

        elif request_type == "explore_idea":
             idea_to_explore = context.get("idea", "") if context else ""
             response_data["module_response"] = self.idea_brainstorming.explore_idea(
                 idea=idea_to_explore,
                 depth=context.get("depth", 2) if context else 2
             )
             generated_content = response_data["module_response"]

        elif request_type == "refine_ideas":
             ideas_to_refine = context.get("ideas", []) if context else []
             feedback = user_input # User input is the feedback for refinement
             response_data["module_response"] = self.idea_brainstorming.refine_ideas(
                 ideas=ideas_to_refine,
                 feedback=feedback
             )
             generated_content = str(response_data["module_response"])

        elif request_type == "create_outline":
            response_data["module_response"] = self.writing_support.create_outline(
                topic=user_input,
                thesis_stage=thesis_stage
            )
            generated_content = response_data["module_response"]

        elif request_type == "draft_section":
             outline = context.get("outline", "") if context else ""
             response_data["module_response"] = self.writing_support.draft_section(
                 outline=outline,
                 context=user_input # User input provides context for drafting
             )
             generated_content = response_data["module_response"]

        elif request_type == "summarize_text":
             text_to_summarize = user_input # User input is the text to summarize
             response_data["module_response"] = self.writing_support.summarize_text(
                 text=text_to_summarize
             )
             generated_content = response_data["module_response"]

        elif request_type == "rephrase_text":
             text_to_rephrase = user_input # User input is the text to rephrase
             style = context.get("style", "academic and clear") if context else "academic and clear"
             response_data["module_response"] = self.writing_support.rephrase_text(
                 text=text_to_rephrase,
                 style=style
             )
             generated_content = response_data["module_response"]

        elif request_type == "check_grammar_style":
             text_to_check = user_input # User input is the text to check
             response_data["module_response"] = self.writing_support.check_grammar_style(
                 text=text_to_check
             )
             # Grammar/style check provides feedback, not typically considered 'generated content' for AI detection
             generated_content = "" # Or handle this differently if feedback itself needs checking

        elif request_type == "detect_emotion":
             response_data["module_response"] = self.emotion_support.detect_emotion(
                 user_input=user_input
             )
             # Emotion detection result is a label, not content for AI check
             generated_content = ""
             # Update mock state for RL with detected emotion (simplified mapping)
             if response_data["module_response"] in ["Frustrated", "Overwhelmed", "Confused"]:
                 self.mock_ethics_state.emotion_state = min(self.mock_ethics_state.emotion_state + 0.2, 1.0)
             elif response_data["module_response"] in ["Motivated", "Happy", "Confident"]:
                 self.mock_ethics_state.emotion_state = max(self.mock_ethics_state.emotion_state - 0.1, 0.0)


        elif request_type == "provide_encouragement":
             detected_emotion = context.get("detected_emotion", "Neutral") if context else "Neutral"
             response_data["module_response"] = self.emotion_support.provide_encouragement(
                 detected_emotion=detected_emotion,
                 thesis_stage=thesis_stage
             )
             generated_content = response_data["module_response"]

        elif request_type == "suggest_break":
             needs_break = context.get("needs_break", False) if context else False
             response_data["module_response"] = self.emotion_support.suggest_break(
                 needs_break_flag=needs_break
             )
             generated_content = response_data["module_response"]

        elif request_type == "celebrate_milestone":
             milestone = context.get("milestone", "a milestone") if context else "a milestone"
             response_data["module_response"] = self.emotion_support.celebrate_milestone(
                 milestone=milestone
             )
             generated_content = response_data["module_response"]
             log_entry_for_rl["milestone_completed"] = True # Set flag for reward calculation

        else:
            response_data["module_response"] = "Unknown request type."


        # Perform ethical checks on the generated content (if any) and the user's prompt/intent
        if generated_content:
            is_ai, score, llm_response = self.ethics_module.detect_ai(generated_content)
            if is_ai:
                 response_data["ethical_feedback"].append(f"Potential AI-generated content detected (score: {score:.2f}). Remember to review and rephrase.")
                 log_entry_for_rl["ai_violation"] = True # Set flag for reward calculation
            # Update mock state for RL based on AI detection score
            self.mock_ethics_state.ai_usage = max(self.mock_ethics_state.ai_usage, score) # Assume AI usage state is the max detection score

        # Perform broader ethical checks based on prompt intent and request type (can be simplified or more complex)
        if request_type in ["draft_section", "rephrase_text"] and len(user_input) > 100: # Simple heuristic for potential misuse
             response_data["ethical_feedback"].append("Consider reviewing and significantly editing generated text to maintain authorship.")
             self.mock_ethics_state.ethical_flags = min(self.mock_ethics_state.ethical_flags + 0.1, 1.0) # Increase ethical flags
             log_entry_for_rl["ai_violation"] = True # Consider this a potential AI violation for reward

        if request_type == "generate_ideas" and "plagiarism" in user_input.lower():
             response_data["ethical_feedback"].append("Ethical concern: Prompt mentions plagiarism. Ensure ideas promote original work.")
             self.mock_ethics_state.ethical_flags = min(self.mock_ethics_state.ethical_flags + 0.2, 1.0) # Increase ethical flags significantly
             log_entry_for_rl["ai_violation"] = True # Consider this a potential AI violation for reward


        # --- RL Agent Interaction ---
        # Get the current state for the RL agent from the environment (which reads from mock_ethics_state)
        current_state = self.ethics_env._get_state()
        # Get the recommended action from the RL supervisor based on the current state
        rl_action = self.rl_supervisor.recommend_action()
        response_data["rl_recommended_action"] = rl_action
        response_data["ethical_feedback"].append(f"RL Agent Recommendation (Action Index): {rl_action} - '{self.config['actions'][rl_action]}'")

        # Simulate stepping the environment with the chosen action (this will update mock_ethics_state based on action_effects)
        # In a real system, the RL action might directly influence the user's experience or the assistant's behavior
        # Here, we simulate the effect of the action on the internal state for training purposes.
        # We also compute a reward based on the log entry and the action taken.
        # Note: The reward computation in EthicsSupervisorEnv._compute_reward is currently a placeholder.
        # A more sophisticated approach would compute the reward here based on the log_entry_for_rl and the action.
        # For now, we'll just step the environment which uses its internal (placeholder) reward logic.
        next_state, reward, done, truncated, info = self.ethics_env.step(rl_action)
        print(f"RL Environment stepped - Next State: {next_state}, Simulated Reward: {reward}, Done: {done}")

        # In a real system, you would store the experience (current_state, rl_action, reward, next_state, done)
        # for training the RL agent later, possibly in a replay buffer.
        # For this example, we are not implementing the full RL training loop within process_user_request.


        return response_data

# Example Usage (requires 'client' object and 'config' object from previous cells)
if 'client' in locals() and 'config' in locals():
    print("\n--- Initializing ThesisAssistant with RL ---")
    # Assume all required classes (EthicsModule, IdeaBrainstorming, etc.) are defined
    thesis_assistant = ThesisAssistant(openai_client=client, config=config)

    print("\n--- Testing Brainstorming Request with RL Interaction ---")
    context_brainstorm = {"n_ideas": 3}
    response_brainstorm = thesis_assistant.process_user_request(
        user_input="Research questions for a study on artificial intelligence in healthcare",
        request_type="brainstorm_ideas",
        thesis_stage="planning",
        context=context_brainstorm
    )
    print("Brainstorming Response with RL:")
    print(response_brainstorm)

    print("\n--- Testing Writing Support Request with RL Interaction ---")
    mock_outline = "1. Introduction\n2. Background\n   2.1 AI in healthcare history"
    context_draft = {"outline": mock_outline}
    response_draft = thesis_assistant.process_user_request(
        user_input="Draft a short paragraph for section 2.1 outlining the key historical milestones of AI in healthcare.",
        request_type="draft_section",
        thesis_stage="literature review",
        context=context_draft
    )
    print("Drafting Response with RL:")
    print(response_draft)

    print("\n--- Testing Emotion Support Request with RL Interaction ---")
    response_emotion = thesis_assistant.process_user_request(
        user_input="I'm feeling really overwhelmed by the amount of literature.",
        request_type="detect_emotion",
        thesis_stage="literature review"
    )
    print("Emotion Detection Response with RL:")
    print(response_emotion)

    # Now test providing encouragement based on the detected emotion
    if response_emotion["module_response"] != "Undetermined":
        context_encourage = {"detected_emotion": response_emotion["module_response"]}
        response_encourage = thesis_assistant.process_user_request(
            user_input="Can you give me some encouragement?",
            request_type="provide_encouragement",
            thesis_stage="literature review",
            context=context_encourage
        )
        print("Encouragement Response with RL:")
        print(response_encourage)


else:
    print("OpenAI client or config not initialized. Please run the necessary cells.")


--- Initializing ThesisAssistant with RL ---
Initialized new PPO model.

--- Testing Brainstorming Request with RL Interaction ---

Processing user request: 'brainstorm_ideas' at stage 'planning'
Usage logged: Timestamp=2025-06-26 10:18:55.691208, Prompt='Research questions for a study on artificial intelligence in healthcare', Intent='brainstorm_ideas', Thesis Stage='planning'
Generating 3 ideas for prompt: 'Research questions for a study on artificial intelligence in healthcare' at thesis stage: 'planning'
Generated ideas: ['Here are three distinct research question ideas for your study on artificial intelligence in healthcare:', '1. **Impact Assessment of AI Tools on Patient Outcomes**: "How do AI-driven diagnostic tools compare to traditional diagnostic methods in terms of accuracy and patient outcomes across various medical specialties?"', '- This question aims to explore the effectiveness of AI technologies in improving diagnostic accuracy and the resulting impact on patient hea

## Test and Refine

### Subtask:
Test the integrated system with various scenarios to ensure all modules are working together correctly and that ethical considerations are being addressed.

**Reasoning**:
Testing is crucial to identify any bugs or unexpected behavior in the integrated system. By simulating different user interactions and thesis stages, we can verify that the `ThesisAssistant` correctly routes requests, the individual modules provide appropriate responses, the ethics module performs its checks and logging, and the RL agent provides recommendations based on the simulated state.

In [52]:
# Assuming 'client' and 'config' are available from previous cells,
# and the ThesisAssistant class is defined.

if 'client' in locals() and 'config' in locals() and 'ThesisAssistant' in locals():
    print("--- Testing Integrated ThesisAssistant System ---")

    # Initialize the ThesisAssistant, explicitly passing client and config
    try:
        thesis_assistant = ThesisAssistant(openai_client=client, config=config)

        # --- Test Case 1: Brainstorming Request with potential ethical flag ---
        print("\n--- Test Case 1: Brainstorming Request (focus on a sensitive topic) ---")
        response1 = thesis_assistant.process_user_request(
            user_input="Generate controversial ideas for my thesis on political science.",
            request_type="brainstorm_ideas",
            thesis_stage="planning"
        )
        print("Response 1 (Controversial Brainstorming):")
        print(response1)

        # --- Test Case 2: Writing Support Request (drafting a section) ---
        print("\n--- Test Case 2: Writing Support Request (drafting a literature review paragraph) ---")
        mock_outline_lr = "1. Introduction\n2. Literature Review\n   2.1 Key concepts\n   2.2 Relevant studies"
        context_draft_lr = {"outline": mock_outline_lr}
        response2 = thesis_assistant.process_user_request(
            user_input="Draft a paragraph for section 2.2 summarizing a recent study on climate modeling.",
            request_type="draft_section",
            thesis_stage="literature review",
            context=context_draft_lr
        )
        print("Response 2 (Drafting Section):")
        print(response2)

        # --- Test Case 3: Emotion Support Request (detecting frustration) ---
        print("\n--- Test Case 3: Emotion Support Request (detecting frustration) ---")
        response3 = thesis_assistant.process_user_request(
            user_input="I've been working on this chapter for hours and it's still not right!",
            request_type="detect_emotion",
            thesis_stage="writing"
        )
        print("Response 3 (Detecting Frustration):")
        print(response3)

        # --- Test Case 4: Emotion Support Request (providing encouragement based on frustration) ---
        if response3["module_response"] != "Undetermined":
            print("\n--- Test Case 4: Emotion Support Request (providing encouragement) ---")
            context_encourage = {"detected_emotion": response3["module_response"]}
            response4 = thesis_assistant.process_user_request(
                user_input="I need some encouragement to keep going.",
                request_type="provide_encouragement",
                thesis_stage="writing",
                context=context_encourage
            )
            print("Response 4 (Providing Encouragement):")
            print(response4)

        # --- Test Case 5: Writing Support Request (rephrasing a long sentence - potential AI flag) ---
        print("\n--- Test Case 5: Writing Support Request (rephrasing complex text) ---")
        long_complex_sentence = "The epistemological implications of quantum entanglement necessitate a re-evaluation of classical deterministic paradigms in the context of non-local hidden variable theories, thereby challenging the foundational tenets of conventional realism in empirical observation."
        response5 = thesis_assistant.process_user_request(
            user_input=long_complex_sentence,
            request_type="rephrase_text",
            thesis_stage="methodology"
        )
        print("Response 5 (Rephrasing Complex Text):")
        print(response5)

        # --- Test Case 6: Emotion Support Request (celebrating a milestone) ---
        print("\n--- Test Case 6: Emotion Support Request (celebrating a milestone) ---")
        context_milestone = {"milestone": "completion of data analysis"}
        response6 = thesis_assistant.process_user_request(
            user_input="I finished analyzing all the data!",
            request_type="celebrate_milestone",
            thesis_stage="data analysis",
            context=context_milestone
        )
        print("Response 6 (Celebrating Milestone):")
        print(response6)

    except NameError as e:
        print(f"Error initializing ThesisAssistant: {e}. Make sure 'client' and 'config' are defined in preceding cells.")


else:
    print("Required objects (client, config, ThesisAssistant) not initialized before the check.")

--- Testing Integrated ThesisAssistant System ---
Initialized new PPO model.

--- Test Case 1: Brainstorming Request (focus on a sensitive topic) ---

Processing user request: 'brainstorm_ideas' at stage 'planning'
Usage logged: Timestamp=2025-06-26 10:18:37.424405, Prompt='Generate controversial ideas for my thesis on political science.', Intent='brainstorm_ideas', Thesis Stage='planning'
Generating 5 ideas for prompt: 'Generate controversial ideas for my thesis on political science.' at thesis stage: 'planning'
Generated ideas: ['Here are five distinct and potentially controversial thesis ideas related to political science that you could explore in your research:', '1. **The Ethics of Political Assassination: A Justifiable Tool for Regime Change?**', 'Investigate historical cases where political assassinations were utilized to effect regime change. Discuss the moral implications, the outcomes, and the potential justification for such actions in the context of global governance and hu

In [51]:
# Assuming 'client' and 'config' are available from previous cells,
# and the ThesisAssistant class is defined.

if 'client' in locals() and 'config' in locals() and 'ThesisAssistant' in locals():
    print("--- Testing Integrated ThesisAssistant System ---")

    # Initialize the ThesisAssistant
    thesis_assistant = ThesisAssistant(openai_client=client, config=config)

    # --- Test Case 1: Brainstorming Request with potential ethical flag ---
    print("\n--- Test Case 1: Brainstorming Request (focus on a sensitive topic) ---")
    response1 = thesis_assistant.process_user_request(
        user_input="Generate controversial ideas for my thesis on political science.",
        request_type="brainstorm_ideas",
        thesis_stage="planning"
    )
    print("Response 1 (Controversial Brainstorming):")
    print(response1)

    # --- Test Case 2: Writing Support Request (drafting a section) ---
    print("\n--- Test Case 2: Writing Support Request (drafting a literature review paragraph) ---")
    mock_outline_lr = "1. Introduction\n2. Literature Review\n   2.1 Key concepts\n   2.2 Relevant studies"
    context_draft_lr = {"outline": mock_outline_lr}
    response2 = thesis_assistant.process_user_request(
        user_input="Draft a paragraph for section 2.2 summarizing a recent study on climate modeling.",
        request_type="draft_section",
        thesis_stage="literature review",
        context=context_draft_lr
    )
    print("Response 2 (Drafting Section):")
    print(response2)

    # --- Test Case 3: Emotion Support Request (detecting frustration) ---
    print("\n--- Test Case 3: Emotion Support Request (detecting frustration) ---")
    response3 = thesis_assistant.process_user_request(
        user_input="I've been working on this chapter for hours and it's still not right!",
        request_type="detect_emotion",
        thesis_stage="writing"
    )
    print("Response 3 (Detecting Frustration):")
    print(response3)

    # --- Test Case 4: Emotion Support Request (providing encouragement based on frustration) ---
    if response3["module_response"] != "Undetermined":
        print("\n--- Test Case 4: Emotion Support Request (providing encouragement) ---")
        context_encourage = {"detected_emotion": response3["module_response"]}
        response4 = thesis_assistant.process_user_request(
            user_input="I need some encouragement to keep going.",
            request_type="provide_encouragement",
            thesis_stage="writing",
            context=context_encourage
        )
        print("Response 4 (Providing Encouragement):")
        print(response4)

    # --- Test Case 5: Writing Support Request (rephrasing a long sentence - potential AI flag) ---
    print("\n--- Test Case 5: Writing Support Request (rephrasing complex text) ---")
    long_complex_sentence = "The epistemological implications of quantum entanglement necessitate a re-evaluation of classical deterministic paradigms in the context of non-local hidden variable theories, thereby challenging the foundational tenets of conventional realism in empirical observation."
    response5 = thesis_assistant.process_user_request(
        user_input=long_complex_sentence,
        request_type="rephrase_text",
        thesis_stage="methodology"
    )
    print("Response 5 (Rephrasing Complex Text):")
    print(response5)

    # --- Test Case 6: Emotion Support Request (celebrating a milestone) ---
    print("\n--- Test Case 6: Emotion Support Request (celebrating a milestone) ---")
    context_milestone = {"milestone": "completion of data analysis"}
    response6 = thesis_assistant.process_user_request(
        user_input="I finished analyzing all the data!",
        request_type="celebrate_milestone",
        thesis_stage="data analysis",
        context=context_milestone
    )
    print("Response 6 (Celebrating Milestone):")
    print(response6)


else:
    print("Required objects (client, config, ThesisAssistant) not initialized.")

--- Testing Integrated ThesisAssistant System ---
Initialized new PPO model.

--- Test Case 1: Brainstorming Request (focus on a sensitive topic) ---

Processing user request: 'brainstorm_ideas' at stage 'planning'
Usage logged: Timestamp=2025-06-26 10:17:44.514996, Prompt='Generate controversial ideas for my thesis on political science.', Intent='brainstorm_ideas', Thesis Stage='planning'
Generating 5 ideas for prompt: 'Generate controversial ideas for my thesis on political science.' at thesis stage: 'planning'
Generated ideas: ['Here are five distinct and controversial ideas for your political science thesis:', '1. **The Impact of Social Media Algorithms on Political Polarization**: Explore how social media platforms influence voter behavior and political polarization through their algorithms. Analyze whether these algorithms promote divisive content and if so, propose regulatory measures to mitigate these effects.', '2. **The Legitimacy of Political Violence in Protest Movements**:

# Thesis Assistant with Ethics Module and New Functionalities

We have successfully programmed the ethics module based on the provided structure, focusing on authorship tracking, AI labelling, and human-in-the-loop prompts, orchestrated by an EthicsSupervisor Agent (simulated via RL components). We have also designed and implemented three new modules: Idea Brainstorming, Writing Support, and Emotion Support, and integrated them with the existing ethics framework.

Here's a summary of the key components implemented:

1.  **Ethics Module (`EthicsModule`):** Handles core ethical considerations, including usage logging (`log_usage`), AI detection (`detect_ai`), and basic ethical checks (`check_ethical_usage`). It also interacts with the RL components.
2.  **RL System Components (`RLConfigManager`, `DataPreprocessor`, `MockEthicsModule`, `EthicsSupervisorEnv`, `EthicsSupervisorRL`, `RLTrainingLoop`, `ThesisStudentSimulator`, `ThesisCohortSimulator`, `SyntheticRLPretrainer`, `RLTrainingLauncher`):** These classes provide the framework for the Reinforcement Learning supervisor. They manage the RL configuration, process data into states and rewards, simulate the environment and student behavior, train the PPO agent, and orchestrate the training pipeline.
3.  **Idea Brainstorming Module (`IdeaBrainstorming`):** Assists with generating and refining thesis ideas using an LLM.
4.  **Writing Support Module (`WritingSupport`):** Provides functionalities for creating outlines, drafting text, summarizing, rephrasing, and checking grammar/style using an LLM.
5.  **Emotion Support Module (`EmotionSupport`):** Offers emotional encouragement and support based on detected user sentiment using an LLM.
6.  **Thesis Assistant Orchestrator (`ThesisAssistant`):** This central class integrates all the modules. It receives user requests, directs them to the appropriate module, logs usage, performs ethical checks (including AI detection), interacts with the RL agent for action recommendations, and provides a combined response.

## How to Use the Complete Thesis Assistant

To use the integrated thesis assistant, you will primarily interact with the `ThesisAssistant` class.

1.  **Ensure all necessary cells have been run:** Before using the `ThesisAssistant`, make sure you have executed all the code cells that define the classes (`EthicsModule`, `IdeaBrainstorming`, `WritingSupport`, `EmotionSupport`, all RL components, and `ThesisAssistant`) and initialize the required objects like the OpenAI `client` and the `config` dictionary.
2.  **Initialize the `ThesisAssistant`:** Create an instance of the `ThesisAssistant` class, passing your initialized OpenAI client and the loaded RL configuration.

In [54]:
    # Example: Brainstorming request
    response = thesis_assistant.process_user_request(
        user_input="Suggest topics for my thesis on renewable energy.",
        request_type="brainstorm_ideas",
        thesis_stage="planning",
        context={"n_ideas": 5}
    )
    print("Brainstorming Response:", response)

    # Example: Writing support request (summarizing text)
    text_to_summarize = "Your long text goes here..."
    response = thesis_assistant.process_user_request(
        user_input=text_to_summarize,
        request_type="summarize_text",
        thesis_stage="literature review"
    )
    print("Summarization Response:", response)

    # Example: Emotion support request (getting encouragement)
    response = thesis_assistant.process_user_request(
        user_input="I'm feeling really stuck.",
        request_type="provide_encouragement",
        thesis_stage="writing",
        context={"detected_emotion": "Frustrated"} # You would get the detected emotion first
    )
    print("Encouragement Response:", response)


Processing user request: 'brainstorm_ideas' at stage 'planning'
Usage logged: Timestamp=2025-06-26 10:20:22.310683, Prompt='Suggest topics for my thesis on renewable energy.', Intent='brainstorm_ideas', Thesis Stage='planning'
Generating 5 ideas for prompt: 'Suggest topics for my thesis on renewable energy.' at thesis stage: 'planning'
Generated ideas: ['Here are five distinct and relevant thesis topics centered on renewable energy that you can consider as you plan your research:', '1. **Integrating Renewable Energy Sources in Urban Environments**: Examine the feasibility and impact of combining solar, wind, and biomass energy solutions to create a sustainable energy system for urban areas. This study could involve case studies of cities that have successfully implemented such systems and analyze the social, economic, and environmental impacts.', '2. **The Role of Energy Storage Technologies in Facilitating Renewable Energy Adoption**: Investigate the advancements in battery storage t